LEADERBOARD 1. Model A --.--% 2. Model B --.--% 3. SGP4 Baseline --.--% 4. Your model ? METRICS MAE (km) RMSE (km) @7 days @30 days EVALUATION Train Test Score DATA SPLIT Train (70%) Val Test Temporal split, no leakage DATA SOURCE Space-Track Public TLEs BASELINES SGP4 Alfano FORMAT Parquet + eval code

Research

Benchmarks for
orbital ML

Standard evaluation benchmarks for orbit prediction, conjunction analysis, and distributed space computing. Compare your models against baselines.

Why benchmarks matter

The orbital ML field lacks standard evaluation protocols. We're working to change that.

Reproducibility

Fixed test sets, documented metrics, and reference implementations. Results you can verify and build upon.

Fair comparison

Compare approaches on identical data splits with standardized evaluation. No cherry-picking or hidden advantages.

Clear baselines

Know exactly what you're improving upon. SGP4, threshold detection, and other standard approaches as reference points.

Launching Q2 2026

We're finalizing benchmark datasets and evaluation infrastructure. Contact us to be notified at launch or to discuss research collaboration.

Based on Public TLE Data

Orbital Intelligence Benchmarks

Benchmarks for satellite tracking, conjunction analysis, and space situational awareness. Built on publicly available TLE data.

01

OrbML-Predict

Benchmark for orbit prediction models. Given historical TLE observations, predict future orbital elements. Evaluated at 1-day, 7-day, and 30-day horizons across LEO, MEO, and GEO regimes. Test set held out with temporal split.

Orbit Prediction Source: Space-Track TLEs

Metrics: MAE (km), RMSE (km), position error by orbital regime
Baseline: SGP4 propagator

02

ConjunctionNet

Benchmark for collision probability prediction. Given two objects' orbital states and uncertainties, predict probability of collision and time of closest approach. Evaluated on conjunction events computed from the public catalog.

Conjunction Analysis Source: Conjunction Events Dataset

Metrics: Brier Score, calibration, TCA error
Baseline: Alfano method, Monte Carlo

03

ManeuverDetect

Benchmark for satellite maneuver detection. Given a sequence of TLE observations, identify when a maneuver occurred and classify its type. Ground truth derived from TLE discontinuity analysis and public conjunction warnings.

Maneuver Detection Source: Maneuver Detection Dataset

Metrics: F1 Score, detection latency, classification accuracy
Baseline: Threshold-based detection on orbital element changes

04

SatClass

Benchmark for satellite classification from orbital behavior. Given only orbital elements over time, classify satellite type and operational status. Labels from UCS Satellite Database and public catalogs.

Classification Source: Satellite Classification Dataset

Metrics: Accuracy, F1 per class, confusion matrix
Baseline: Rule-based classification on orbital parameters

05

ReentryPredict

Benchmark for atmospheric reentry prediction. Given an object's decaying orbit, predict reentry time window. Evaluated on historical reentry events with known outcomes.

Reentry Prediction Source: Historical reentry records

Metrics: Time window accuracy, window width
Baseline: Numerical propagation with NRLMSISE-00 atmosphere

Based on Simulation Data

Distributed Compute Benchmarks

Benchmarks for federated learning, model partitioning, and coordination under space-like constraints. Built on our simulation experiments.

01

FedSpace

Benchmark for federated learning under bandwidth constraints and intermittent connectivity. Evaluate gradient compression and aggregation strategies on standard ML tasks with simulated Earth-space network conditions.

Federated Learning Source: FL Experiment Logs

Metrics: Final accuracy, communication cost, convergence rate
Baseline: FedAvg with Top-K sparsification

02

PartitionBench

Benchmark for neural network partitioning across distributed infrastructure. Given a model architecture and infrastructure constraints (latency, bandwidth, compute), find optimal layer placement.

Model Splitting Source: Partitioning Results Dataset

Metrics: End-to-end latency, bandwidth utilization, accuracy preservation
Baseline: Greedy partitioning by layer compute cost

03

SyncSchedule

Benchmark for data synchronization scheduling with intermittent connectivity. Given limited bandwidth windows and data with varying freshness requirements, optimize what to sync when.

Synchronization Source: GS Visibility Dataset

Metrics: Data freshness, bandwidth utilization, priority adherence
Baseline: Priority queue with FIFO tie-breaking

04

CompressionBench

Benchmark for gradient compression techniques. Evaluate compression methods on downstream model accuracy across different compression ratios and network architectures.

Gradient Compression Source: Compression Benchmarks Dataset

Metrics: Compression ratio, reconstruction error, final model accuracy
Baseline: Top-K sparsification, random sparsification

Evaluation protocol

Rigorous evaluation ensures fair comparison across submissions.

Data splits

Fixed train/validation/test splits with temporal separation for time-series tasks. Test data held out - you submit predictions, we evaluate.

Metrics

Domain-appropriate metrics for each task. Position error in kilometers, probability calibration, detection latency, classification accuracy.

Baselines

Reference implementations with published scores. SGP4 for orbit prediction, Alfano for conjunction analysis, threshold detection for maneuvers.

Reproducibility

Evaluation code provided. Run locally to validate before submission. Deterministic evaluation with fixed random seeds.

How to participate

01

Download the data

Get the training and validation sets from our data portal. Test sets are held out for fair evaluation.

02

Train your model

Use any approach - ML, physics-based, hybrid. We evaluate results, not methods.

03

Submit predictions

Upload predictions for the test set inputs. We evaluate against held-out labels and return your scores.

04

Compare results

See how your model compares to baselines and other submissions. Optionally publish to the leaderboard.

Benchmark overview

Orbital Intelligence Benchmarks

Benchmark Task Data Source Primary Metric
OrbML-Predict Orbit prediction Space-Track TLEs MAE @ 7 days
ConjunctionNet Collision probability Conjunction Events Dataset Brier Score
ManeuverDetect Maneuver detection Maneuver Detection Dataset F1 Score
SatClass Classification Satellite Classification Dataset Accuracy
ReentryPredict Reentry prediction Historical reentries Window accuracy

Distributed Compute Benchmarks

Benchmark Task Data Source Primary Metric
FedSpace Federated learning FL Experiment Logs Accuracy / Comm cost
PartitionBench Model partitioning Partitioning Results End-to-end latency
SyncSchedule Synchronization GS Visibility Dataset Data freshness
CompressionBench Gradient compression Compression Benchmarks Compression ratio

Want early access to benchmarks?

Contact us to be notified when benchmarks launch, or to discuss research collaboration.