TL;DR
- Reserved Instances are a 1-year or 3-year commitment to a specific instance type in exchange for a discount, typically in the 40-60 % range against on-demand pricing.
- Spot (or preemptible) instances are sold from spare capacity at 70-90 % discount, but can be reclaimed by the provider with very short notice — usually less than two minutes.
- On-demand is the most flexible and most expensive; suitable for unpredictable workloads where commitment risk is unacceptable.
- The right mix is workload-shaped: steady baseline on reservations, bursty middle on on-demand, fault-tolerant training and batch inference on spot.
The Three Pricing Models#
Every major public cloud — and most GPU clouds — offers three broad pricing models for compute. The trade-off is always between commitment, discount, and risk of interruption.
| Model | Discount | Commitment | Interruption risk |
|---|---|---|---|
| On-demand | 0 % | None | None |
| Reserved (1-year) | ~40 % | 12 months | None |
| Reserved (3-year) | ~55-60 % | 36 months | None |
| Savings Plan | Similar to reserved | Compute spend, not SKU | None |
| Spot / preemptible | ~70-90 % | None | High — short notice reclaim |
Reserved Instances#
A Reserved Instance (RI) is a commitment to pay for a specific instance type in a specific region for either one or three years. In exchange, the provider gives a substantial discount against on-demand pricing. RIs are useful for workloads with predictable steady-state demand — production inference services, baseline training capacity, always-on databases.
RIs are rigid: if your workload moves to a different instance type or region, the reservation either has to be exchanged (where supported) or sits unused. AWS Convertible RIs offer some flexibility at a slightly lower discount; standard RIs are non-convertible.
- 1-year RI — typically 30-45 % off on-demand depending on payment option (no upfront, partial, all upfront).
- 3-year RI — typically 50-65 % off on-demand; suitable for long-lived stable workloads.
- All-upfront RIs give the deepest discount but tie up the most capital.
- Reservation must match instance family, size, region, and tenancy to apply.
Savings Plans#
AWS Savings Plans are a commitment to a level of compute spend per hour (in dollars), rather than to a specific instance. The discount is similar to RIs, but the commitment is portable: as long as you spend at least your committed dollars per hour on covered services, the discount applies regardless of which instance type or region you actually use.
Compute Savings Plans cover EC2, Fargate, and Lambda across all regions and instance families. EC2 Savings Plans are cheaper but lock you to an instance family and region. GCP Committed Use Discounts and Azure Reservations behave similarly to RIs; Azure also offers Savings Plans for Compute.
Savings Plans are usually the safer commitment for a growing or evolving workload. RIs squeeze out a little more discount but only if you are confident the underlying instance type will not change for the full term.
Spot / Preemptible Instances#
Spot capacity is unused server capacity that providers sell at a steep discount in exchange for the right to reclaim it. AWS gives a two-minute interruption notice; GCP Preemptible gives 30 seconds; Azure Spot gives 30 seconds.
Spot is exceptional value for workloads that can tolerate interruption — fault-tolerant distributed training with checkpointing, batch inference, CI/CD runners, large-scale data processing. It is a bad fit for stateful single-instance services or workloads that cannot recover quickly from a kill.
- Use checkpointing for training jobs so a reclaim costs minutes, not hours.
- Mix spot with a small on-demand or reserved baseline so service continues during a reclaim wave.
- Diversify across instance types and zones — spot capacity is per-pool.
- Set a maximum price to cap exposure if spot rates spike.
Workload-to-Pricing Mapping#
| Workload | Recommended model |
|---|---|
| Production inference (steady QPS) | Reserved / Savings Plan baseline + on-demand burst. |
| Production inference (bursty) | Savings Plan for baseline + on-demand for spikes. |
| Large training run | Reserved or sovereign-capacity contract — interruption is expensive. |
| Hyperparameter sweep | Spot — many parallel jobs, fault tolerant by design. |
| Batch inference / nightly scoring | Spot with retry, or off-peak on-demand. |
| Interactive notebooks | On-demand — predictability matters more than price. |
| CI/CD GPU runners | Spot — short-lived, retry-safe. |
Commitment Strategy#
A defensible commitment strategy starts from a clear understanding of baseline utilisation — the GPU-hours you know you will consume regardless of business variability. Cover the baseline with reservations or savings plans, the middle with on-demand, and the volatile top of the curve with spot where tolerable.
Yobitel's reserved-capacity model is closer to a Savings Plan than a classic RI: customers commit to a compute envelope across H100, H200 and B200 capacity at a published discount tier, with capacity guaranteed under the sovereign-residency contract.
Over-commitment is a classic FinOps failure mode. Buying 3-year RIs that go unused costs more than paying on-demand. Start conservative; lengthen and deepen commitments only after observing actual utilisation.
References
- AWS Savings Plans · AWS
- AWS EC2 Spot Instances · AWS
- Azure Reservations · Microsoft Learn
- Google Cloud Committed Use Discounts · Google Cloud