Compute Isn't a Placement Problem Anymore

The whole of modern distributed systems rests on a single question.

Where should this workload run?

Every piece of cloud infrastructure you’ve ever touched (Kubernetes, Borg, Mesos, Ray, Spark, Airflow, whatever flavor of scheduler you prefer) is an elaborate answer to that question. You describe a workload. The scheduler describes your machines. Somewhere in the middle, a placement decision gets made. The rest is plumbing.

This worked because the world cooperated. Machines stayed put. Power was constant. Network was, if you squinted, always there. A node that was available at 10:00:01 was overwhelmingly likely to still be available at 10:00:02. Placement was a spatial decision: this workload on that machine. Time entered the picture only as an afterthought: retries, timeouts, cron.

Then some of us started trying to run compute in orbit, and the model broke in a way that I don’t think the industry has fully registered yet.

What breaks

A satellite in low-Earth orbit goes around the planet every ninety minutes. For roughly thirty of those minutes, it’s in Earth’s shadow. No sun, no power, nothing in the solar panel budget. If you have a 340W GPU running inference, you need to decide in advance, with no graceful option, what to do with it at eclipse time.

C(t) Resources Over One Orbit · ISS-class LEO

Three constraint dimensions a scheduler has to respect in orbit, each varying on a different time scale. Solar power follows the sun-eclipse cycle. Ground bandwidth is zero except during discrete pass windows measured in minutes. Thermal headroom responds to recent compute load. None of these are scalars you can provision against. They are functions of time, and the plan has to move with them.

You will also have line-of-sight to any given ground station for maybe ten to twelve minutes per pass, a few passes per day. An Earth observation satellite today produces about 1 TB per day. A typical X-band pass moves roughly 7.5 GB. Do the arithmetic. It’s a 130:1 mismatch, and the gap widens every time someone puts a better sensor in orbit.

The hardware itself is unreliable by design. Cosmic rays flip bits. Bit flips aren’t rare events in LEO. They’re baseline environmental noise. Single-event upsets are treated as unusual on Earth. In orbit, a machine that isn’t handling them continuously is a machine that’s about to corrupt something.

None of this is exotic. None of it is a surprise. It’s just that every scheduler ever written assumes the opposite.

Kubernetes doesn’t have an eclipse primitive. Ray has no concept of a ground station pass. PyTorch distributed doesn’t know what a thermal budget is. You cannot adapt these systems by tuning them. The assumptions are wrong in the bones.

The real shift

Here is the thing that took me the longest to see clearly, and that I now think is the single most important idea in this space.

Placement was never a spatial problem. It was a temporal problem that happened to be stable enough to look spatial.

The Shift Placement vs. Planning

The cloud scheduler picks a machine. The orbital planner picks a trajectory through time. The same workload description, two fundamentally different decisions. Everything downstream of the choice of primitive follows from this.

On Earth, resources don’t fluctuate much over the time horizon of a scheduling decision. Power is there. Bandwidth is there. Nodes stay nodes. So the entire body of distributed systems literature flattened time out of the equation and reasoned about space. Which machine? became the question, because when was boring.

In orbit, time isn’t boring. The satellite that has 850W of solar available at 10:00:01 has 8W at 10:18:00, because it just crossed the terminator. The bandwidth available to node 4 over the next hour is a discrete schedule of pass windows, each with its own link budget. The thermal headroom on node 7 is a function of what you asked it to do three minutes ago.

The old model asks: where should this run?

The new model asks: when and where can this run, given how compute, power, bandwidth, and topology will change over the next several hours?

That is not a tweak. It is a different kind of problem. It belongs to a different branch of computer science. In the classical scheduler model, resources are static and workloads are variable. In what we’re now building, resources are variable and workloads are planned against that variability. The thing you’re optimizing over isn’t a machine. It’s a trajectory.

We’ve been calling this Constraint-Aware Execution, or CAE, because that’s what it is: execution that treats the environmental constraint function as a first-class input rather than an inconvenience handled by retry logic. The name matters less than the shift. The shift is that compute, in this regime, is a planning problem that unfolds over time.

What this looks like in practice

In March we shipped the first API that actually does this. You hand it a satellite and a workload. It returns a plan. A real execution plan computed from real orbital mechanics, not a mockup.

SGP4 propagation on the satellite’s current TLE. Eclipse windows from a cylindrical shadow model. Ground station geometry for twelve real stations: Svalbard, Fairbanks, Awarua, Santiago, and eight more. Link budgets per pass. FEC overhead. Encryption overhead. The actual arithmetic of moving a workload through the space-ground boundary.

Hand it the ISS and a split-learning workload. It tells you: extract features on-board, downlink 52.82 MB via Awarua at rate 3/4 FEC on the next pass, aggregate gradients on the ground, upload 7.55 MB of updated weights on the Fairbanks pass an orbit later. 99% delivery confidence. 14 MB of FEC overhead. Here is your cost. Here is your power envelope. Here is what happens if the next pass has poor link quality.

Execution Plan ISS · Split-Learning Workload · CAE Output

A real CAE plan for the ISS running a split-learning workload. Feature extraction runs on-board during orbit 1. The output downlinks through Awarua on the first pass, gradients aggregate on the ground, then updated weights upload through Fairbanks on the next orbit. Eclipse windows and ground-station passes are computed from SGP4 propagation, not mocked.

None of this comes from prompting a model. It comes from actually solving the scheduling problem. A dependency graph over time-varying resources, with a cost function that includes energy, thermal headroom, and data movement.

The reason I’m telling you this isn’t that the API is clever. It’s that once you sit down and build this thing, you start to see the shape of a very general problem, and the general problem is much bigger than orbit.

It’s not just orbit

The thing I did not expect when we started building CAE is how much of the design turned out to have nothing specific to space in it.

Eclipse maps to any periodic power interruption. That’s a drone on a battery with a charging schedule. That’s a solar microgrid in the outback. That’s a remote sensor on a 12V panel.

Ground station pass maps to any windowed connectivity. That’s a ship crossing out of satellite coverage. That’s a remote vehicle in a tunnel. That’s an autonomous system in a contested EM environment. That’s a container ship with intermittent uplink.

Thermal budget maps to any system where heat can’t be dissipated freely. That’s dense edge compute in a sealed enclosure. That’s anything in a vacuum or near-vacuum. That’s a car dashboard in summer.

Radiation maps to any hardware whose error profile is continuous rather than discrete. That’s automotive-grade silicon at altitude. That’s anything near a reactor. That’s a very long-running distributed system where tail reliability actually starts to matter.

If you look at the frontier of where serious engineering is happening right now (autonomous vehicles, robotics, distributed sensing, edge ML, anything with the word “field-deployed” in it) you find the same structural pattern. Resources that vary on human-relevant time scales, operating against workloads that used to assume they didn’t.

Orbit is where this pattern is most extreme and least escapable. You cannot paper over orbital mechanics. You cannot push a hotfix to a satellite in eclipse. The constraints are real and they bite immediately, which is why the software coming out of this problem space is going to be more rigorous than the software coming out of, say, on-prem Kubernetes. You don’t get to hand-wave orbit.

But the techniques that work in orbit are going to flow downhill. They always do. Fault-tolerant distributed systems research came out of places that had to fail gracefully: telephone switching, mainframe transaction processing, early web scale. The rest of us inherited the vocabulary later. Constraint-aware execution will follow the same path.

What the next decade of systems infrastructure looks like

If you buy the argument so far, a few things fall out of it.

The scheduler gets replaced by the planner. Not augmented. Replaced. A scheduler says “run this now, here.” A planner says “here is the execution trajectory that optimizes your cost function given what is knowable about resource availability over the next N hours, with explicit fallbacks for the unknowable part.” These are categorically different things. Kubernetes is a scheduler. CAE is a planner. The difference is not a feature flag.

Static resource models become temporal resource models. Every monitoring, capacity, and placement system built in the last decade assumes node capacity is a scalar per dimension: CPU cores, RAM bytes, network bytes/sec. In the new model, capacity is a function of time and environment. node.cpu isn’t 16 anymore. It’s cpu(t). Everything downstream of that has to change. Billing, SLOs, observability, autoscaling, all of it.

Runtime gets intimate with physics. The runtime layer on Earth has been able to pretend physics doesn’t exist. Electrons flow, packets arrive, and if they don’t, you retry. In orbit the runtime has to know why power is dropping, not just that it’s dropping. That move, from symptom-handling to constraint-modeling, is going to happen everywhere the constraint function gets visible enough to matter. Your next-generation edge runtime will have a thermal model, a battery curve, and a connectivity forecast baked in. This is how you get to systems that actually stay up in the field.

Simulation moves upstream of deployment. You cannot test an orbital system in production. The cost is too high and the iteration loop is too long. That forces a discipline terrestrial software has been able to skip for twenty years: high-fidelity simulation against real physics, as a first-class artifact in the development lifecycle. The orbital compute companies that survive will have simulation environments as sophisticated as their runtimes, because they have no choice. The terrestrial systems that come after will inherit this discipline, hopefully before they need it.

The claim

I’m going to say this plainly so it can’t be misread.

Constraint-Aware Execution is the next primitive in distributed systems. Placement was the primitive of the cloud era. Planning under time-varying constraints is the primitive of whatever we’re calling this next thing. Orbit is where it becomes unavoidable. It’s the forcing function making us build the software properly. But the shape of the problem is general, and the tools coming out of solving it will change how every field-deployed system gets built.

We’re the company working on this. That’s an obvious thing for me to write, so take it with the salt it deserves. But the work is real. The API is live. You can try it at console.rotastellar.com or hit the tracker and schedule a plan against a real satellite. The docs describe how each plan gets constructed, step by step. It is not a demo. It runs.

If you are building anything that will have to operate against resources that change faster than your scheduler can reason about them (in orbit, at the edge, on a battery, at sea, in a contested environment, or on any platform where the old placement assumptions quietly stop holding) the right thing to do is look at this class of problem now, before you have to. You don’t want to be porting a Kubernetes stack into an environment that was never going to tolerate it.

The cloud assumed the world cooperated. The next layer assumes it doesn’t. Everything changes from there.

If you want to talk about this, we’re easy to reach. Early access. Contact. Or just try the planner on a satellite you care about and see what falls out.