Research
Benchmarks for
orbital ML
Standard evaluation benchmarks for orbit prediction, conjunction analysis, and distributed space computing. Compare your models against baselines.
Why benchmarks matter
The orbital ML field lacks standard evaluation protocols. We're working to change that.
Reproducibility
Fixed test sets, documented metrics, and reference implementations. Results you can verify and build upon.
Fair comparison
Compare approaches on identical data splits with standardized evaluation. No cherry-picking or hidden advantages.
Clear baselines
Know exactly what you're improving upon. SGP4, threshold detection, and other standard approaches as reference points.
Launching Q2 2026
We're finalizing benchmark datasets and evaluation infrastructure. Contact us to be notified at launch or to discuss research collaboration.
Orbital Intelligence Benchmarks
Benchmarks for satellite tracking, conjunction analysis, and space situational awareness. Built on publicly available TLE data.
OrbML-Predict
Benchmark for orbit prediction models. Given historical TLE observations, predict future orbital elements. Evaluated at 1-day, 7-day, and 30-day horizons across LEO, MEO, and GEO regimes. Test set held out with temporal split.
Metrics: MAE (km), RMSE (km), position error by orbital regime
Baseline: SGP4 propagator
ConjunctionNet
Benchmark for collision probability prediction. Given two objects' orbital states and uncertainties, predict probability of collision and time of closest approach. Evaluated on conjunction events computed from the public catalog.
Metrics: Brier Score, calibration, TCA error
Baseline: Alfano method, Monte Carlo
ManeuverDetect
Benchmark for satellite maneuver detection. Given a sequence of TLE observations, identify when a maneuver occurred and classify its type. Ground truth derived from TLE discontinuity analysis and public conjunction warnings.
Metrics: F1 Score, detection latency, classification accuracy
Baseline: Threshold-based detection on orbital element changes
SatClass
Benchmark for satellite classification from orbital behavior. Given only orbital elements over time, classify satellite type and operational status. Labels from UCS Satellite Database and public catalogs.
Metrics: Accuracy, F1 per class, confusion matrix
Baseline: Rule-based classification on orbital parameters
ReentryPredict
Benchmark for atmospheric reentry prediction. Given an object's decaying orbit, predict reentry time window. Evaluated on historical reentry events with known outcomes.
Metrics: Time window accuracy, window width
Baseline: Numerical propagation with NRLMSISE-00 atmosphere
Distributed Compute Benchmarks
Benchmarks for federated learning, model partitioning, and coordination under space-like constraints. Built on our simulation experiments.
FedSpace
Benchmark for federated learning under bandwidth constraints and intermittent connectivity. Evaluate gradient compression and aggregation strategies on standard ML tasks with simulated Earth-space network conditions.
Metrics: Final accuracy, communication cost, convergence rate
Baseline: FedAvg with Top-K sparsification
PartitionBench
Benchmark for neural network partitioning across distributed infrastructure. Given a model architecture and infrastructure constraints (latency, bandwidth, compute), find optimal layer placement.
Metrics: End-to-end latency, bandwidth utilization, accuracy preservation
Baseline: Greedy partitioning by layer compute cost
SyncSchedule
Benchmark for data synchronization scheduling with intermittent connectivity. Given limited bandwidth windows and data with varying freshness requirements, optimize what to sync when.
Metrics: Data freshness, bandwidth utilization, priority adherence
Baseline: Priority queue with FIFO tie-breaking
CompressionBench
Benchmark for gradient compression techniques. Evaluate compression methods on downstream model accuracy across different compression ratios and network architectures.
Metrics: Compression ratio, reconstruction error, final model accuracy
Baseline: Top-K sparsification, random sparsification
Evaluation protocol
Rigorous evaluation ensures fair comparison across submissions.
Data splits
Fixed train/validation/test splits with temporal separation for time-series tasks. Test data held out - you submit predictions, we evaluate.
Metrics
Domain-appropriate metrics for each task. Position error in kilometers, probability calibration, detection latency, classification accuracy.
Baselines
Reference implementations with published scores. SGP4 for orbit prediction, Alfano for conjunction analysis, threshold detection for maneuvers.
Reproducibility
Evaluation code provided. Run locally to validate before submission. Deterministic evaluation with fixed random seeds.
How to participate
Download the data
Get the training and validation sets from our data portal. Test sets are held out for fair evaluation.
Train your model
Use any approach - ML, physics-based, hybrid. We evaluate results, not methods.
Submit predictions
Upload predictions for the test set inputs. We evaluate against held-out labels and return your scores.
Compare results
See how your model compares to baselines and other submissions. Optionally publish to the leaderboard.
Benchmark overview
Orbital Intelligence Benchmarks
| Benchmark | Task | Data Source | Primary Metric |
|---|---|---|---|
| OrbML-Predict | Orbit prediction | Space-Track TLEs | MAE @ 7 days |
| ConjunctionNet | Collision probability | Conjunction Events Dataset | Brier Score |
| ManeuverDetect | Maneuver detection | Maneuver Detection Dataset | F1 Score |
| SatClass | Classification | Satellite Classification Dataset | Accuracy |
| ReentryPredict | Reentry prediction | Historical reentries | Window accuracy |
Distributed Compute Benchmarks
| Benchmark | Task | Data Source | Primary Metric |
|---|---|---|---|
| FedSpace | Federated learning | FL Experiment Logs | Accuracy / Comm cost |
| PartitionBench | Model partitioning | Partitioning Results | End-to-end latency |
| SyncSchedule | Synchronization | GS Visibility Dataset | Data freshness |
| CompressionBench | Gradient compression | Compression Benchmarks | Compression ratio |
Want early access to benchmarks?
Contact us to be notified when benchmarks launch, or to discuss research collaboration.