Models and repository structure

Every leaderboard number is produced by a script in one of two top-level folders. The split is deliberate and maps directly onto the Fine-tuned column of the leaderboard:

Folder

Purpose

Leaderboard column

Data used

training/

Models trained on the MillionTrees train split, then evaluated on the test split

Fine-tuned ✓

train + test

existing_models/

Pretrained released weights evaluated against the MillionTrees test split

Fine-tuned ✗

test only

There are no model scripts in docs/examples/. If you are looking for a runnable template, see existing_models/external_segmentation_adapter.py.

training/ — fine-tuned models (✓)

One folder per geometry, each with the same two entry points:

Geometry

Model

Train

Evaluate a checkpoint

training/boxes/

DeepForest (RetinaNet)

train.py

eval.py

training/points/

TreeFormer

train.py

eval.py

training/polygons/

Mask R-CNN

train.py

eval.py

Common usage (works for random and zeroshot split schemes):

uv run python training/boxes/train.py --split-scheme random --root-dir "$MT_ROOT"

The point model needs the TreeFormer extra (DeepForest treeformer-training branch until it merges to weecology main):

uv sync --extra treeformer
uv run --extra treeformer python training/points/train.py --split-scheme random

Each run writes training/<geometry>/outputs/<split>/results_<split>.txt (+ .json), which scripts/make_benchmark_table.py reads to regenerate the leaderboard tables.

existing_models/ — pretrained baselines (✗)

One folder per model, each containing eval_<geometry>.py for the geometries that model natively predicts. Each model folder has its own pyproject.toml so its dependencies stay isolated from the core package.

Model

Folder

Geometries

DeepForest

existing_models/deepforest/

boxes

TreeFormer

existing_models/treeformer/

points

SAM3

existing_models/sam3/

boxes, points, polygons

uv run python existing_models/deepforest/eval_boxes.py --split-scheme zeroshot --root-dir "$MT_ROOT"

Results are written to existing_models/<model>/outputs/<split>/results_<geometry>_<split>.txt.

existing_models/external_segmentation_adapter.py is a template showing how to convert an arbitrary external model’s outputs into the MillionTrees evaluation format; copy it as the starting point for a new existing_models/<your_model>/ entry.

Reproducing the leaderboard for a new dataset version

SLURM launchers fan out over geometry × split. To launch everything after packaging a new dataset version:

# 1. fine-tuned training jobs + pretrained eval jobs
bash slurm/submit_all.sh

# 2. once all jobs finish, regenerate the tables
uv run python scripts/make_benchmark_table.py --splits random zeroshot

slurm/submit_all.sh simply calls the two per-folder launchers, which you can also run independently:

  • training/slurm/submit_all_training.shtrain_boxes.sbatch, train_points.sbatch, train_polygons.sbatch

  • existing_models/slurm/submit_all_eval.sheval_deepforest.sbatch, eval_treeformer.sbatch, eval_sam3.sbatch

For a dependency-chained run that automatically rebuilds the table once every job finishes, use slurm/run_benchmark.sbatch instead.

Leaderboard panel figures (fine-tuned)

The images embedded in leaderboard.md (leaderboard_predictions_*.png) are not produced by submit_all.sh. They are regenerated from fine-tuned checkpoints after training completes:

Geometry

Model

Checkpoint path

TreePoints

TreeFormer

training/points/outputs/<split>/checkpoints/

TreeBoxes

DeepForest

training/boxes/outputs/<split>/checkpoints/

TreePolygons

Mask R-CNN

training/polygons/outputs/<split>/checkpoints/

Each figure has two rows (random, zeroshot fine-tuning tasks) and two columns (ground truth vs fine-tuned prediction on the same test image).

uv run --extra treeformer python scripts/create_finetuned_visualizations.py \
  --root-dir "$MT_ROOT" \
  --output-dir docs \
  --panel-dir docs/figures/finetuned_panels

Outputs:

  • docs/leaderboard_predictions_{points,boxes,polygons}.png and .svg (combined panels)

  • docs/figures/finetuned_panels/<geometry>_<split>_{ground_truth,finetuned}.svg (one file per panel for manuscript layout)

On SLURM: sbatch slurm/visualize_finetuned.sbatch (included as a dependent step in run_benchmark.sbatch after the three training array jobs).