Leaderboard¶
Tasks¶
Zero-shot¶
The first task is to create a zero-shot detection system to generalize across geography and acquisition conditions. Selected datasets are held out from training completely and used for evaluation in new conditions. This is a challenging task with no local training data.
Random¶
The second task is to create the best global detector for individual trees given a set of training and test data. Datasets are split randomly, reflecting information within localities. This is consistent with how most applied users engage with models, by fine-tuning backbone models with sample data from a desired locality.
Cross-geometry¶
Off the shelf tools often limit users for a single annotation type. We have ‘point’ models, ‘box’ models and ‘polygon’ models. To create truly global models for biological inference, we need models that can use all available data, not just one annotation geometry. In particular, polygon annotations are very time consuming to create, but are often desirable for downstream usecases. We opted against polygon training sources, for example polygons to points, as this is an unrealistic, or atleast, very uncommon downstream use case.
Boxes to Polygons¶
All box sources are used to train and predict all polygon sources. There is no local data from the test localities in train.
Points to Polygons¶
All point sources are used to train and predict all polygon sources
Points to Boxes¶
All point sources are used to train and predict all box sources.
Results¶
TreePoints¶
Random¶
Model |
Fine-tuned |
Counting MAE |
Mask-Aware Precision |
Script |
|---|---|---|---|---|
DeepForest |
✗ |
22.304 |
0.547 |
|
SAM3 |
✗ |
26.675 |
0.714 |
|
DeepForest |
✓ |
35.189 |
0.505 |
|
Zero-shot¶
Model |
Fine-tuned |
Counting MAE |
Mask-Aware Precision |
Script |
|---|---|---|---|---|
DeepForest |
✗ |
50.602 |
0.732 |
|
SAM3 |
✗ |
51.860 |
0.544 |
|
DeepForest |
✓ |
74.581 |
0.666 |
|
Cross-geometry¶
Note: Cross-geometry splits are designed for predicting polygons from other annotation geometries. The 0.000 scores below reflect that this split is not applicable to point prediction.
Model |
Fine-tuned |
Counting MAE |
Script |
|---|---|---|---|
DeepForest |
✗ |
0.000 |
|
SAM3 |
✗ |
0.000 |
|

TreeBoxes¶
Random¶
Model |
Fine-tuned |
Avg Recall |
Mask-Aware Precision |
Script |
|---|---|---|---|---|
DeepForest |
✓ |
0.721 |
0.610 |
|
DeepForest |
✗ |
0.414 |
0.760 |
|
SAM3 |
✗ |
0.175 |
0.619 |
|
Zero-shot¶
Model |
Fine-tuned |
Avg Recall |
Mask-Aware Precision |
Script |
|---|---|---|---|---|
DeepForest |
✓ |
0.460 |
0.900 |
|
DeepForest |
✗ |
0.416 |
0.959 |
|
SAM3 |
✗ |
0.201 |
0.810 |
|
Cross-geometry¶
Note: Cross-geometry splits are designed for predicting polygons from other annotation geometries. The 0.000 scores below reflect that this split is not applicable to box prediction.
Model |
Fine-tuned |
Avg Recall |
Script |
|---|---|---|---|
DeepForest |
✗ |
0.000 |
|
SAM3 |
✗ |
0.000 |
|

TreePolygons¶
Random¶
Model |
Fine-tuned |
Avg Mask Accuracy |
Mask-Aware Precision |
Script |
|---|---|---|---|---|
DeepForest |
✓ |
0.232 |
0.872 |
|
SAM3 |
✗ |
0.223 |
0.681 |
|
DeepForest |
✗ |
0.087 |
0.005 |
|
Zero-shot¶
Model |
Fine-tuned |
Avg Mask Accuracy |
Mask-Aware Precision |
Script |
|---|---|---|---|---|
DeepForest |
✓ |
0.146 |
0.758 |
|
SAM3 |
✗ |
0.180 |
0.719 |
|
DeepForest |
✗ |
0.108 |
0.000 |
|
Cross-geometry¶
Model |
Fine-tuned |
Avg Mask Accuracy |
Mask-Aware Precision |
Script |
|---|---|---|---|---|
DeepForest |
✗ |
0.109 |
0.000 |
|

Submissions¶
Submit to the leaderboard¶
Once you have trained a model and evaluated its performance, you can submit your results to the MillionTrees leaderboard. Here’s how:
Create a public repository with your code and model training scripts. Make sure to include:
Clear instructions for reproducing your results
Requirements file listing all dependencies
Training configuration files/parameters
Code for data preprocessing and augmentation
Model architecture definition
Evaluation code
Generate predictions on the test split:
test_dataset = dataset.get_subset("test") # Use test split test_loader = get_eval_loader("standard", test_dataset, batch_size=16) predictions = [] for metadata, images, _ in test_loader: pred = model(images) predictions.append(pred)
Submit a pull request to the MillionTrees repository with:
Link to your code repository
Model description and approach
Performance metrics on test set
Example prediction visualizations
Instructions for reproducing results