TL;DR
- Yobitel NeoCloud is the sovereign Tier III+ GPU cloud that sits underneath Yobibyte and is also available as a standalone capacity service — multi-tenant on-demand, dedicated single-tenant, and confidential-compute tenancies, with a per-second pricing model in USD.
- Sovereignty leads: UK NCSC OFFICIAL alignment in primary regions, EU Data Boundary regions for EU workloads, and US FedRAMP-equivalent partner regions for US-regulated workloads. Admission refuses to spill workloads across boundaries regardless of capacity or price.
- Accelerator coverage spans NVIDIA B300/B200/H200/H100/A100/L40S/L4 and AMD MI300X, with high-bandwidth AI fabric (NVLink within node, InfiniBand within rack, RoCEv2 across racks) and Tier III+ facility design at sub-1.1 PUE.
- Customer-facing surfaces: a console for reservation, spot, and on-demand provisioning; a per-region capacity catalogue; DCGM-emitted metrics into customer Prometheus; FOCUS 1.1 billing export to customer-owned object storage; the standard Yobibyte workspace boundary for tenancy.
- Yobitel operates the facility, the fleet, the fabric, and the platform end-to-end; the customer owns the data, the workload, and the consumption decisions. A 99.99% uptime SLA backs the production tier.
Overview#
GPU capacity has become the gating constraint for production AI, and the constraint is not uniform. A UK trust cannot place patient data in a US region. A European bank cannot leave the EU Data Boundary. A US federal customer cannot consume capacity that has not cleared a moderate-baseline assessment. A FinOps team cannot defend a procurement decision that does not pass an open price test. Each of those constraints has historically forced a separate cloud relationship; the platform team's job has been to keep the pieces aligned by hand.
Yobitel NeoCloud is the sovereign Tier III+ GPU cloud Yobitel runs to solve the constraint set rather than the capacity question alone. Primary regions in the UK align to NCSC Cloud Security Principles and the OFFICIAL classification; EU regions sit inside the EU Data Boundary; US regions, delivered through FedRAMP-equivalent partner facilities, align to the moderate-baseline control set. A workload pinned to a sovereignty region never spills across the boundary, regardless of capacity or price — admission refuses non-eligible placement at the platform layer.
The capacity layer covers the SKUs production teams actually need. NVIDIA B300, B200, H200, H100, A100, L40S, and L4 cover the bulk of training and inference; AMD MI300X covers the open-accelerator path; AI fabric (NVLink within node, InfiniBand within rack, RoCEv2 across racks) supports up to 2,048-node distributed training; Tier III+ facility design at sub-1.1 PUE keeps the energy cost of compute defensible. The customer-facing surface is a console for reservation, spot, and on-demand consumption, a per-region capacity catalogue, and the standard Yobibyte workspace boundary for tenancy and identity.
Yobitel Communications, the UK-headquartered AI infrastructure company that operates NeoCloud, runs the facility, the fleet, the fabric, and the platform end-to-end. The customer owns the workload, the data, the encryption keys, and the consumption decisions. The 99.99% uptime SLA on the production tier carries financial credits; per-second billing is exported via the FOCUS 1.1 specification into customer-owned object storage; observability is delivered through DCGM and OpenTelemetry on stable metric names so the customer's existing Prometheus and Grafana setup just works.
Quick start#
Provisioning capacity on NeoCloud is a console workflow today; sign in to the Yobitel console with your corporate identity provider, open NeoCloud under the main navigation, and choose three things — the region, the accelerator, and the term. The console quote is live against current capacity and surfaces sovereignty eligibility next to every result.
The customer first picks a region from the per-region catalogue. Each region carries a sovereignty tag (UK NCSC OFFICIAL, EU Data Boundary, US FedRAMP-equivalent), a facility tier (Tier III+ is the production default), and a current capacity heat map by accelerator. Next, the customer picks an accelerator (B300, B200, H200, H100, A100, L40S, L4, or MI300X) and a tenancy model (`shared` multi-tenant, `dedicated` single-tenant, or `confidential` with TEE attestation). Finally, the customer picks a term — on-demand by the second, reservation by the month or year, or spot for fault-tolerant workloads — and a USD spend cap.
Once provisioned, the workload runs inside the customer's Yobibyte workspace (or a standalone NeoCloud workspace if the customer is consuming capacity without adopting Yobibyte). Identity federates from any OIDC IdP, telemetry flows into customer Prometheus through the DCGM exporter, and billing lands hourly in the customer's FOCUS 1.1 export bucket.
Spot capacity on NeoCloud is per-region rather than per-account; the spot floor moves with regional supply and demand. Set a spend cap on every spot reservation — when the floor moves above the cap, the reservation simply rolls off rather than running at an unexpected price.
Concepts#
NeoCloud exposes a small set of customer-facing concepts that map to how platform and procurement teams think about sovereign GPU capacity. The mental model is region (and its sovereignty pin) at the top, with tenancy and term as the two key dimensions for any consumption decision underneath.
- Region — the geographic placement of capacity, bound to a sovereignty tag. UK regions align to NCSC Cloud Security Principles and the OFFICIAL classification; EU regions sit inside the EU Data Boundary; US regions, delivered through FedRAMP-equivalent partner facilities, align to the moderate-baseline control set.
- Tenancy — the isolation model for a workload. `shared` runs on multi-tenant nodes with cgroup and MIG isolation (the right default for most workloads). `dedicated` pins to single-tenant nodes for regulatory or noisy-neighbour reasons. `confidential` runs on NVIDIA confidential-compute mode with TEE attestation so encryption keys never leave the GPU.
- Reservation — committed capacity for a defined term (monthly, 1-year, or 3-year) at a discounted rate. Used for steady production inference and known-shape training reservations.
- Spot — opportunistic capacity available at a discount, reclaimable on short notice. Used for fault-tolerant fine-tunes, batch scoring, and shadow training. Per-region spot floors move with supply and demand.
- On-demand — uncommitted per-second capacity at the headline rate. Used for bursty workloads and pre-reservation prototyping.
- AI Fabric — the high-bandwidth interconnect that ties accelerators together. NVLink-fabric within a single node (typical for 8-GPU single-host workloads), InfiniBand-rack within one rack (typical for distributed training up to 64 nodes), and RoCEv2-DC across racks (for cluster sizes beyond a single rack). The customer picks the topology requirement; the platform satisfies it from the regional fabric.
- Spend Cap — a hard USD budget set per reservation, per workspace, or per organisation. When reached, on-demand consumption pauses and a P2 alert fires; reservations continue to bill until the term ends.
- Compliance Tag — the regulatory posture the workload is pinned to. NCSC OFFICIAL, EU Data Boundary, FedRAMP-equivalent, HIPAA-eligible, ISO 27001-certified, SOC 2 Type II attested. Admission refuses placement outside the tagged set.
Reference — customer-facing API and console fields#
Customers who prefer a declarative consumption flow can submit reservations through the NeoCloud API. The fields below are the full customer-facing surface; the underlying scheduler, admission engine, and fleet are operated by Yobitel and are not customer-tunable.
| Field | Type | Description |
|---|---|---|
| reservation.region | string | Sovereignty region (e.g. `uk-london`, `uk-manchester`, `eu-frankfurt`, `eu-paris`, `us-ashburn`). |
| reservation.accelerator | enum | `b300`, `b200`, `h200`, `h100-sxm5`, `h100-pcie`, `a100`, `l40s`, `l4`, `mi300x`. |
| reservation.count | integer | Number of accelerators. For distributed training, the count is expressed in nodes (8-GPU each). |
| reservation.tenancy | enum | `shared`, `dedicated`, or `confidential`. |
| reservation.term | enum | `on-demand`, `spot`, `reserved-monthly`, `reserved-1y`, `reserved-3y`. |
| reservation.fabric | enum | `nvlink-node`, `infiniband-rack`, `infiniband-row`, `rocev2-dc`, or `none` for single-accelerator workloads. |
| reservation.complianceTags | string[] | Required tags (e.g. `ncsc-official`, `eu-data-boundary`, `hipaa`). |
| reservation.spendCap | object | USD budget with `amount` and `window` (`hourly`, `daily`, or `monthly`). |
| reservation.startTime | datetime | Requested start. Reservations queue if the region does not have capacity at the requested time. |
| reservation.endTime | datetime | Requested end (optional for on-demand and reserved terms). |
| identity.oidcIssuer | string | Workspace OIDC issuer for identity federation. |
| identity.scimEndpoint | string | Optional SCIM 2.0 endpoint for user and group provisioning. |
| observability.prometheusScrape | boolean | Exposes the standard `yobitel_*` metric set on a workspace-scoped scrape endpoint. |
| observability.dcgmExporter | boolean | DCGM exporter enabled per accelerator. |
| observability.otelEndpoint | string | Customer-owned OTel collector endpoint. |
| billing.focusExportBucket | string | Customer-owned object-storage bucket for FOCUS 1.1 hourly export. |
| billing.focusKmsKey | string | Customer-managed KMS key the export is encrypted with at rest. |
| network.cidrAllowList | string[] | Optional inbound CIDR allow-list at the workspace gateway. |
| network.privateConnectivity | enum | `internet`, `aws-directconnect`, `azure-expressroute`, `gcp-interconnect`, or `dedicated-cross-connect`. |
Workload patterns#
Three patterns cover most customers. The patterns differ on term type, tenancy, and topology; the customer-facing console and API are the same across all three.
- Pattern A — training-cluster reservation. A research team reserves a 64-node H100 cluster on InfiniBand fabric in `uk-london` for a 3-week training run, with confidential tenancy and an immutable FOCUS export to the team's S3 bucket. The reservation queue surfaces the start time once the rack is allocated; the team's workload runs inside its Yobibyte workspace.
- Pattern B — production inference tenancy. A SaaS product reserves a steady pool of 16 H200 cards on `shared` tenancy in `uk-london` and `eu-frankfurt`, with autoscaling spot capacity layered on top for traffic peaks. The Yobibyte workspace draws on the reservation first and bursts to spot when steady capacity is saturated.
- Pattern C — air-gapped sovereign deployment. A regulated UK customer takes a dedicated tenancy block in a Tier III+ enclave with no internet egress, NCSC OFFICIAL-SENSITIVE alignment, and on-premise audit export. The customer's workspace runs inside the enclave; Yobitel operates the facility under a NeoCloud Operations engagement.
Sizing and capacity tiers#
Per-region capacity is grouped into tiers so customers can plan reservations against published bands. The tier list below covers the most common request shapes; capacity beyond these tiers is available through the account team and a reservation lead-time.
| Tier | Typical use | Reservation scale | Term |
|---|---|---|---|
| Pilot | Single-team prototype, evaluation | 1 - 8 accelerators | On-demand or monthly reservation |
| Production-small | Steady inference for a single product | 8 - 64 accelerators | Monthly or 1-year reservation |
| Production-medium | Multi-product platform, mixed inference and fine-tune | 64 - 256 accelerators | 1-year or 3-year reservation |
| Production-large | Org-wide AI platform, multi-region | 256 - 1,024 accelerators | 1-year or 3-year reservation with multi-region split |
| Training-cluster | Distributed training on InfiniBand fabric | 8 - 512 nodes (64 - 4,096 GPUs) | Reserved 2-12 week training block |
| Reserved-frontier | Frontier model training, multi-thousand-GPU clusters | 512 - 2,048 nodes (4,096 - 16,384 GPUs) | Reserved 3-month to 1-year training block; lead time required |
Limits and quotas#
Default limits keep pilot and onboarding workloads predictable; almost every ceiling is raisable through self-service or a support request once the customer moves past pilot. The hard ceilings exist where the underlying primitive imposes them (e.g. per-region capacity, fabric fan-out).
| Resource | Default | Enterprise ceiling | How to raise |
|---|---|---|---|
| GPUs per workspace | 32 | 16,384 | Self-service up to 256; ticket beyond. |
| Workspaces per organisation | 10 | 200 | Self-service. |
| Reservations per workspace | 20 | 500 | Self-service. |
| Concurrent regions per workspace | 3 | 12 | Self-service. |
| Spot reservations per workspace | 10 | 200 | Self-service; per-region spot floors apply. |
| Reservation lead time (pilot tier) | Immediate | Immediate | On-demand capacity is provisioned in seconds when available. |
| Reservation lead time (frontier tier) | 30 days | 30 - 90 days | Frontier reservations require lead time and account-team coordination. |
| Network egress to internet | 10 TB/month per workspace included; metered beyond | 1 PB/month | Self-service; FOCUS export shows egress per workspace. |
| Egress between NeoCloud regions | Unmetered | Unmetered | Hard floor; no inter-region egress charge. |
| Confidential-tenancy quota | 8 GPUs | 1,024 GPUs | Support request; confirms TEE attestation supply. |
| Spend cap precision | USD 1 | USD 1 | Hard floor. |
| FOCUS export retention | 13 months | 7 years | Configurable in console; customer storage policy governs. |
| DCGM scrape interval | 10 seconds | 1 second | Configurable per workspace. |
Observability#
NeoCloud emits three telemetry streams for every customer workload. DCGM (the NVIDIA open standard) covers GPU SM occupancy, HBM usage, power draw, and NVLink throughput. Yobitel-namespaced metrics under `yobitel_neocloud_*` cover reservation state, fabric health, and per-region capacity. OpenTelemetry traces cover the workspace gateway and the platform control plane.
All three streams are scrape-compatible with customer Prometheus and OTel collectors. Most customers add the alert below within the first week — it catches the 'reservation is allocated but the fabric is degraded' failure mode that simple liveness probes miss, and pages on it before a distributed training run produces silent corruption.
groups:
- name: neocloud-fabric
interval: 30s
rules:
- alert: FabricDegraded
expr: |
max by (workspace, reservation, fabric) (
yobitel_neocloud_fabric_link_error_rate
) > 1e-9
for: 5m
labels: { severity: page }
annotations:
summary: "{{ $labels.reservation }} fabric error rate above 1e-9"
runbook: https://docs.yobitel.com/neocloud/runbooks/fabric-degraded
- alert: ReservationCapacityShort
expr: |
yobitel_neocloud_reservation_capacity_short_seconds > 0
for: 10m
labels: { severity: page }
annotations:
summary: "{{ $labels.reservation }} short of committed capacity"
- alert: SovereigntyAdmissionDenied
expr: |
rate(yobitel_neocloud_admission_denied_total{reason="compliance"}[15m]) > 0
for: 30m
labels: { severity: ticket }DCGM is the open NVIDIA standard for accelerator telemetry; the same exporter is supported on every major Kubernetes distribution. NeoCloud emits DCGM on the same metric names as a self-managed deployment, so customers who already have Prometheus and Grafana setups can keep their dashboards.
Cost and FinOps#
Pricing is per-second for GPUs on the on-demand and spot terms, per-month for reservations, and unmetered for inter-region egress between NeoCloud regions. The table below is indicative for mid-2026 UK regions; the account team confirms current rates and any multi-year or volume discounts before contract. All numbers are USD; the FOCUS 1.1 billing export carries the standard column set (BilledCost, EffectiveCost, ListCost, ChargePeriod*, ServiceCategory, SubAccountId, Tags) so the data drops directly into a Cloudability, Apptio, Vantage, or customer-built lakehouse pipeline.
| SKU | On-demand $/GPU/hr | 1-yr reserved $/GPU/hr | 3-yr reserved $/GPU/hr | Spot floor |
|---|---|---|---|---|
| NVIDIA B200 192GB | $6.00 | $4.50 | $3.60 | n/a |
| NVIDIA H200 141GB | $4.25 | $3.20 | $2.55 | $1.75 |
| NVIDIA H100 SXM5 80GB | $3.25 | $2.45 | $1.95 | $1.20 |
| NVIDIA H100 PCIe 80GB | $2.95 | $2.20 | $1.75 | $1.05 |
| NVIDIA A100 80GB | $2.25 | $1.70 | $1.40 | $0.80 |
| NVIDIA L40S 48GB | $1.20 | $0.90 | $0.70 | $0.45 |
| NVIDIA L4 24GB | $0.50 | $0.40 | $0.30 | $0.22 |
| AMD MI300X 192GB | $4.00 | $3.00 | $2.45 | n/a |
| Object storage (per GB-month) | $0.022 | — | — | — |
| Egress to internet (per GB) | $0.075 | — | — | — |
| Egress between NeoCloud regions (per GB) | $0.00 | — | — | — |
| Confidential-tenancy surcharge (per GPU/hr) | $0.40 | $0.30 | $0.24 | n/a |
| Dedicated-tenancy surcharge (per GPU/hr) | $0.30 | $0.22 | $0.18 | n/a |
Reservations are billed for the full term regardless of utilisation; on-demand consumption pauses when a spend cap is reached. Customers running steady-state inference often combine a smaller reservation footprint with on-demand burst to keep the reservation utilisation high while keeping headroom available.
Security and compliance#
Sovereignty is enforced at admission. A workload labelled `compliance=ncsc-official` is rejected if it targets a non-UK region; a workload labelled `compliance=eu-data-boundary` is rejected if it targets outside the EU; a workload labelled `compliance=fedramp-equiv` is rejected if it targets outside the US partner regions. NeoCloud does not silently spill workloads across compliance boundaries regardless of capacity or price.
UK frameworks lead the compliance posture — NCSC Cloud Security Principles, G-Cloud, OFFICIAL and OFFICIAL-SENSITIVE classifications, Cyber Essentials Plus — before EU frameworks (GDPR, EU Data Boundary, DORA-aligned where customers require it) and US frameworks (FedRAMP-equivalent, HIPAA BAA, HITRUST alignment where applicable). The full attestation set is available under NDA through the account team.
- NCSC Cloud Security Principles — controls mapped per principle for UK primary regions; OFFICIAL-tier audited annually.
- G-Cloud — listed under Cloud Hosting (Lot 1) and Cloud Software (Lot 2); orderable through the Crown Commercial Service framework.
- Cyber Essentials Plus — current certificate.
- ISO 27001:2022, ISO 27017, ISO 27018 — current certificates.
- SOC 2 Type II — annual third-party audit covering security, availability, confidentiality.
- GDPR / UK DPA 2018 — DPA, sub-processor list, EU SCCs available; data residency enforced at admission.
- EU Data Boundary — EU regions sit inside the boundary; admission refuses spill.
- DORA — operational-resilience evidence available for financial-services customers.
- FedRAMP-equivalent — moderate-baseline-aligned controls available via US partner regions.
- HIPAA — BAA available for healthcare workloads.
- Confidential tenancy — NVIDIA confidential-compute mode with TEE attestation; encryption keys never leave the GPU.
Migration and alternatives#
NeoCloud is one option in a small set of credible sovereign GPU cloud providers. The comparison below is the honest read on when NeoCloud, a US-anchored neocloud, a hyperscaler, or a UK-historic sovereign cloud fit best.
| Concern | Yobitel NeoCloud | CoreWeave | Lambda | UKCloud |
|---|---|---|---|---|
| UK NCSC OFFICIAL primary regions | Yes, multiple | No (US-anchored) | No (US-anchored) | Yes, historic |
| EU Data Boundary regions | Yes | Limited | No | No |
| US FedRAMP-equivalent regions | Yes via partner | Yes | Limited | No |
| NVIDIA B200/H200/H100 coverage | B300, B200, H200, H100 | H100, H200, GB200 | H100, H200 | Limited |
| AMD MI300X coverage | Yes | Limited | Yes | No |
| Confidential tenancy | NVIDIA TEE attestation | Limited | No | No |
| Per-second billing in USD | Yes | Yes (USD) | Yes (USD) | GBP-anchored |
| FOCUS 1.1 billing export | Yes | Proprietary | Proprietary | Proprietary |
| Sovereignty enforcement at admission | Yes | Region-only | Region-only | Yes |
| AI fabric (InfiniBand-rack, RoCEv2-DC) | Yes | Yes | Yes (NVLink-fabric, InfiniBand) | Limited |
| Spot floor | Yes (per-region floor) | Yes | Yes | No |
| Yobibyte managed surface available | Native | No | No | No |
Troubleshooting#
The errors below are the most common during onboarding and the first weeks of production. The full runbook library is at docs.yobitel.com/neocloud/runbooks.
| Error | Cause | Fix |
|---|---|---|
| ReservationProvisioningFailed: capacity-short | Requested region does not currently have the requested SKU and term available. | Either accept the reservation queue (NeoCloud auto-provisions when capacity frees), pre-purchase via the Capacity tab, or move to a sibling region inside the same sovereignty boundary. |
| RegionCapacityUnavailable: uk-london-1 | Region is at capacity for the requested SKU. | Same fixes as above; the console surfaces nearest sibling regions inside the workspace's sovereignty boundary. |
| FabricDegraded: link error rate above threshold | An InfiniBand or RoCEv2 link in the reservation's fabric is experiencing elevated error rates. | NeoCloud's NOC auto-pages on this alert; for customer-side action, pause distributed training until the runbook clears (typically minutes) to avoid silent corruption in NCCL collectives. |
| AdmissionDenied: complianceMismatch | Workload labelled with a sovereignty tag that does not match the region the reservation targets. | Either move the workload to a region inside the workspace's sovereignty boundary, or remove the compliance label if the constraint no longer applies. |
| BillingExportEmpty: FOCUS bucket policy denies write | Customer-owned FOCUS export bucket policy does not allow the NeoCloud writer role to put objects. | Apply the bucket policy snippet in the workspace's Billing tab and wait one export window; exports retry hourly. |
| SpotReclaimWarning: 60-second notice | Spot capacity is being reclaimed by NeoCloud for a higher-priority reservation. | Workload receives a 60-second reclaim notice on the standard NeoCloud webhook; checkpoint state and exit. Spot reservations retry on the next eligible window. |
| IdentityFederationFailed: OIDC discovery 401 | Workspace OIDC issuer URL or audience misconfigured. | Re-enter issuer URL and audience in the workspace Identity tab; confirm the IdP's discovery document resolves over the public internet. |
| SpendCapExceeded: on-demand paused | Workspace USD spend cap reached on the configured window. | Raise the cap in the Billing tab or wait for the next budget window; reservations continue, on-demand resumes when the cap allows. |
| KmsDecryptDenied | Customer-managed KMS key policy is missing the NeoCloud data-plane role. | Add the role ARN shown in the workspace's Setup tab to the KMS key policy. |
| NetworkEgressOverrun | Workspace egress to internet exceeded the included quota for the month. | FOCUS export shows per-workspace egress; either raise the quota, route through dedicated cross-connect, or move egress-heavy workloads to a region with private connectivity. |
Where NeoCloud fits in the Yobitel stack#
NeoCloud is the sovereign capacity layer that sits underneath every Yobitel-operated surface. Yobibyte runs on NeoCloud — when a customer deploys an Inference inside a Yobibyte workspace, the capacity that backs the deployment is NeoCloud capacity (selected by the platform from the customer's sovereignty region). Omniscient Compute indexes NeoCloud capacity alongside every other provider in its vendor-neutral catalogue, with no preferential ranking. MediQuery and the rest of the Yobitel AI Applications suite run on Yobibyte and therefore consume NeoCloud capacity transparently.
Practically, a customer can adopt NeoCloud at three levels. A platform team that wants raw sovereign capacity adopts NeoCloud directly and runs its own workloads in the workspace boundary; pricing is per-second USD with reservation and spot options. A team that wants the managed inference surface adopts Yobibyte (which uses NeoCloud underneath) and never sees the capacity layer. A team building or operating its own sovereign GPU cloud adopts NeoCloud Operations (the consulting and operating engagement) to stand up a partner-branded neocloud that participates in the broader Omniscient Compute index.
The boundaries are deliberate. NeoCloud is the sovereign-capacity contract — facility, fleet, fabric, sovereignty enforcement, billing. Yobibyte is the managed-inference contract. NeoCloud Operations is the partner-build contract. A customer can buy any of the three independently; most customers buy two or three together because the boundaries map to how procurement is actually organised.
References
- Yobitel NeoCloud page · Yobitel
- Yobibyte platform · Yobitel
- Omniscient Compute · Yobitel
- NCSC Cloud Security Principles · NCSC
- FOCUS — FinOps Open Cost and Usage Specification · FinOps Foundation
- EU Data Boundary · European Commission
- FedRAMP · FedRAMP PMO