Models and repository structure¶
Every leaderboard number is produced by a script in one of two top-level folders. The split is deliberate and maps directly onto the Fine-tuned column of the leaderboard:
Folder |
Purpose |
Leaderboard column |
Data used |
|---|---|---|---|
|
Models trained on the MillionTrees train split, then evaluated on the test split |
Fine-tuned ✓ |
train + test |
|
Pretrained released weights evaluated against the MillionTrees test split |
Fine-tuned ✗ |
test only |
There are no model scripts in docs/examples/. If you are looking for a runnable
template, see existing_models/external_segmentation_adapter.py.
training/ — fine-tuned models (✓)¶
One folder per geometry, each with the same two entry points:
Geometry |
Model |
Train |
Evaluate a checkpoint |
|---|---|---|---|
|
DeepForest (RetinaNet) |
|
|
|
TreeFormer |
|
|
|
Mask R-CNN |
|
|
Common usage (works for random and zeroshot split schemes):
uv run python training/boxes/train.py --split-scheme random --root-dir "$MT_ROOT"
The point model needs the TreeFormer extra (DeepForest
treeformer-training
branch until it merges to weecology main):
uv sync --extra treeformer
uv run --extra treeformer python training/points/train.py --split-scheme random
Each run writes training/<geometry>/outputs/<split>/results_<split>.txt (+ .json),
which scripts/make_benchmark_table.py reads to regenerate the leaderboard tables.
existing_models/ — pretrained baselines (✗)¶
One folder per model, each containing eval_<geometry>.py for the geometries that
model natively predicts. Each model folder has its own pyproject.toml so its
dependencies stay isolated from the core package.
Model |
Folder |
Geometries |
|---|---|---|
DeepForest |
|
boxes |
TreeFormer |
|
points |
SAM3 |
|
boxes, points, polygons |
uv run python existing_models/deepforest/eval_boxes.py --split-scheme zeroshot --root-dir "$MT_ROOT"
Results are written to existing_models/<model>/outputs/<split>/results_<geometry>_<split>.txt.
existing_models/external_segmentation_adapter.py is a template showing how to convert
an arbitrary external model’s outputs into the MillionTrees evaluation format; copy it as
the starting point for a new existing_models/<your_model>/ entry.
Reproducing the leaderboard for a new dataset version¶
SLURM launchers fan out over geometry × split. To launch everything after packaging a new dataset version:
# 1. fine-tuned training jobs + pretrained eval jobs
bash slurm/submit_all.sh
# 2. once all jobs finish, regenerate the tables
uv run python scripts/make_benchmark_table.py --splits random zeroshot
slurm/submit_all.sh simply calls the two per-folder launchers, which you can also run
independently:
training/slurm/submit_all_training.sh→train_boxes.sbatch,train_points.sbatch,train_polygons.sbatchexisting_models/slurm/submit_all_eval.sh→eval_deepforest.sbatch,eval_treeformer.sbatch,eval_sam3.sbatch
For a dependency-chained run that automatically rebuilds the table once every job
finishes, use slurm/run_benchmark.sbatch instead.
Leaderboard panel figures (fine-tuned)¶
The images embedded in leaderboard.md (leaderboard_predictions_*.png)
are not produced by submit_all.sh. They are regenerated from fine-tuned checkpoints
after training completes:
Geometry |
Model |
Checkpoint path |
|---|---|---|
TreePoints |
TreeFormer |
|
TreeBoxes |
DeepForest |
|
TreePolygons |
Mask R-CNN |
|
Each figure has two rows (random, zeroshot fine-tuning tasks) and two columns (ground truth vs fine-tuned prediction on the same test image).
uv run --extra treeformer python scripts/create_finetuned_visualizations.py \
--root-dir "$MT_ROOT" \
--output-dir docs \
--panel-dir docs/figures/finetuned_panels
Outputs:
docs/leaderboard_predictions_{points,boxes,polygons}.pngand.svg(combined panels)docs/figures/finetuned_panels/<geometry>_<split>_{ground_truth,finetuned}.svg(one file per panel for manuscript layout)
On SLURM: sbatch slurm/visualize_finetuned.sbatch (included as a dependent step in
run_benchmark.sbatch after the three training array jobs).