TL;DR
- Released by Ultralytics on 30 September 2024, YOLOv11 is the company's production-line successor to YOLOv8, keeping the same `ultralytics` PyPI package, CLI verbs and dataset YAML format so most upgrades are a one-string change.
- Five scaled variants (n / s / m / l / x) span 2.6M to 56.9M parameters, all reference-trained on COCO 2017, exportable to ONNX, TensorRT, OpenVINO, CoreML, TFLite, EdgeTPU, NCNN, PaddlePaddle and IMX500 from a single `model.export(...)` call.
- One model family covers five tasks — detection, instance segmentation, classification, pose estimation and oriented-bounding-box (OBB) — with consistent training, validation and inference ergonomics across all of them.
- Architecture refinements (C3k2 block replacing C2f, C2PSA spatial-attention module late in the backbone, slimmer decoupled head) deliver YOLOv8m-level mAP at roughly 22 percent fewer parameters and noticeably higher FPS on edge accelerators such as Jetson Orin and the L4.
- Dual-licensed AGPL-3.0 plus paid Ultralytics Enterprise Licence. Yobitel Edge AI ships YOLOv11 as the baseline detector on Jetson Orin / Thor reference appliances, and Yobibyte exposes YOLOv11 as a managed inference resource for both streaming and batch workloads.
Overview#
YOLOv11 is the September 2024 release in the Ultralytics-maintained You Only Look Once line. It is not a research fork or a community variant — it is the production baseline the `ultralytics` SDK and the Ultralytics Hub default to, and it ships with the same unified CLI that turned YOLOv8 into the most-deployed open detector in the field. The model preserves the broad YOLO topology (backbone → neck → decoupled head) but tightens almost every block: C3k2 replaces C2f for cheaper feature mixing, C2PSA adds Cross-Stage Partial Spatial Attention late in the backbone for small-object focus, and the detection head sheds depth-wise convolutions to lower decode-side cost.
Operationally YOLOv11 behaves like a multi-task runtime, not a single model. The same five-variant family (n / s / m / l / x) supports detection, instance segmentation, classification, pose estimation and OBB — exposed as separate weight files (`yolo11n.pt`, `yolo11n-seg.pt`, `yolo11n-pose.pt`, etc.) with identical training, validation and prediction CLIs. Export targets include ONNX with dynamic shapes, TensorRT engines (FP16 and INT8), OpenVINO, CoreML, TFLite (with NNAPI / GPU delegate variants), TensorFlow SavedModel, EdgeTPU, NCNN, PaddlePaddle and Sony's IMX500 on-sensor format.
On Yobitel infrastructure YOLOv11 occupies two roles that the rest of this entry expands on. First, it is the default object detector on Yobitel Edge AI's Jetson Orin and forthcoming Jetson Thor appliances — preinstalled, calibrated, and benchmarked against customer footage before shipping. Second, Yobibyte offers YOLOv11 as a managed inference resource: customers point an RTSP feed or a batch image manifest at a workspace and Yobibyte routes the request to a YOLOv11 endpoint on Yobitel NeoCloud L4 / L40S capacity without the customer ever touching a Triton config. This entry helps you size, deploy and operate YOLOv11 in production — whether you are running raw upstream on your own cluster, shipping it on Yobitel Edge AI hardware, or consuming it through a Yobibyte managed endpoint.
Quick start#
The example below installs Ultralytics, runs a brief fine-tune of YOLOv11s on a custom dataset, validates the result, and exports an FP16 TensorRT engine ready for Triton's `tensorrt` backend. The same CLI verbs work for every variant and every task; the only change for, say, instance segmentation is the weight string (`yolo11s-seg.pt`) and the dataset label format.
# 1. Install
pip install "ultralytics>=8.3.0"
# 2. Fine-tune YOLOv11s on a custom dataset (YOLO YAML)
yolo detect train \
model=yolo11s.pt \
data=fleet-cameras.yaml \
epochs=100 \
imgsz=640 \
batch=32 \
device=0 \
project=runs/detect \
name=fleet-v1
# 3. Validate
yolo detect val \
model=runs/detect/fleet-v1/weights/best.pt \
data=fleet-cameras.yaml
# 4. Export to TensorRT FP16 for production
yolo export \
model=runs/detect/fleet-v1/weights/best.pt \
format=engine \
half=True \
dynamic=True \
workspace=4
# 5. Python equivalent (same operations via the SDK)
python - <<'PY'
from ultralytics import YOLO
model = YOLO("yolo11s.pt")
model.train(data="fleet-cameras.yaml", epochs=100, imgsz=640, batch=32, device=0)
metrics = model.val()
model.export(format="engine", half=True, dynamic=True)
PYHow it works#
YOLOv11 inherits the three-stage YOLO topology — a CSPDarknet-style backbone, a PAN-FPN neck and an anchor-free decoupled head — but reworks the building blocks. The C3k2 block, a CSP variant with two 3x3 convolutions per branch (configurable via the `c3k=True` flag), replaces the C2f block from v8 throughout the backbone and neck. C3k2 cuts FLOPs at equivalent representational capacity, which is most of the source of v11's parameter savings.
Late in the backbone, the C2PSA (Cross-Stage Partial with Position-Sensitive Attention) block introduces lightweight spatial self-attention over the highest-level feature maps. This is the architectural change that lifts small-object detection — the historical YOLO weak spot — into competitive territory. The detection head retains the v8 anchor-free decoupled design with Distribution Focal Loss for box-edge refinement and the Task-Aligned Assigner (TAL) for label matching, but trims depth-wise convolution stages to lower decode-side cost.
Loss is the standard YOLO-family composition: classification BCE, box CIoU, and DFL on box edges. The Task-Aligned Assigner combines a classification score with an alignment IoU into a single score that dynamically chooses positive samples — removing the anchor-tuning burden that defined YOLOv3 through YOLOv7. Inference is a single forward pass per image plus a small NMS step on the head outputs; export to TensorRT fuses pre- and post-processing into the engine where possible.
- C3k2 — CSP-style block with 2-conv branches; default building block in backbone and neck.
- C2PSA — Cross-Stage Partial Position-Sensitive Attention applied to high-level features; lifts small-object recall.
- Decoupled head — separate classification and regression branches per FPN level, anchor-free.
- Distribution Focal Loss — box edges as a discrete distribution then expectation; sub-pixel precision without quantisation.
- Task-Aligned Assigner — label matching driven by a joint classification x IoU score, no manual anchor design.
- Multi-task by default — detect / segment / classify / pose / OBB share the same backbone + neck and swap heads.
Reference and specifications#
The table below is the canonical reference for the five YOLOv11 detection variants and the most-used CLI knobs. Parameter counts are Ultralytics' published numbers for the detect task on COCO 2017 at 640x640. Per-variant FPS depends sharply on TensorRT version, precision and pre/post-processing implementation; the numbers in the deployment section are mid-range observations from Yobitel Edge AI lab runs.
| Variant | Params | COCO mAP50-95 | Primary use |
|---|---|---|---|
| YOLOv11n | ~2.6M | ~39.5 | Edge — Jetson Orin Nano, on-sensor, low-power CCTV |
| YOLOv11s | ~9.4M | ~47.0 | L4 multi-stream, Jetson Orin NX, drone fleets |
| YOLOv11m | ~20.1M | ~51.5 | Balanced accuracy / throughput on L40S, Jetson AGX Orin |
| YOLOv11l | ~25.3M | ~53.4 | High-accuracy production inference, retail / safety analytics |
| YOLOv11x | ~56.9M | ~54.7 | Offline batch labelling, research baselines, dataset bootstrapping |
Each detect variant has matching `-seg`, `-cls`, `-pose` and `-obb` checkpoints. The training and export CLI is identical across tasks — only the dataset YAML schema changes (mask polygons, class lists, keypoints, rotated boxes).
Workload patterns#
Three deployment shapes cover the bulk of production YOLOv11 use: low-latency edge inference on a single device, multi-stream RTSP analytics behind one server GPU, and offline batch labelling for dataset bootstrapping. Each pattern targets a different precision, batch size and pre-processing path. These are also the three shapes Yobibyte automates for managed customers — the flags below are what a team running raw Ultralytics on their own infrastructure has to hand-tune; on Yobibyte the workspace SLO derives them.
Pattern A — edge inference on Jetson Orin. Single device, FP16 TensorRT engine, batch=1 or batch=2 with NVDEC-decoded camera input. Pattern B — multi-stream server analytics on L4 / L40S. One detector serves many RTSP feeds via DeepStream or a custom Triton ensemble. Pattern C — batch labelling on L40S or H100. Throughput-optimised; latency irrelevant; INT8 with calibration or FP16 for higher quality.
# Pattern A — Jetson Orin edge inference (FP16 TensorRT engine)
yolo export model=yolo11s.pt format=engine half=True imgsz=640 device=0
# Pattern B — multi-stream RTSP analytics behind one L4 server
# (Triton ensemble: DALI preprocess -> YOLOv11s TensorRT -> NMS)
yolo export model=yolo11s.pt format=engine half=True imgsz=640 batch=16 dynamic=True
# Pattern C — batch labelling on L40S, INT8 calibrated
yolo export \
model=yolo11m.pt \
format=engine \
int8=True \
data=calibration-coco.yaml \
imgsz=640 \
batch=32
# Drive the labeller from Python
python - <<'PY'
from ultralytics import YOLO
model = YOLO("yolo11m.engine") # TensorRT engine
results = model.predict(
source="s3://bucket/unlabelled-archive/",
save_txt=True,
save_conf=True,
conf=0.25,
iou=0.5,
imgsz=640,
stream=True,
)
for r in results:
pass # write labels to disk / queue
PYOn Yobibyte the equivalent workspace exposes the same three workloads as `edge-detect`, `stream-analytics` and `batch-label` workspace types. The customer submits a model identifier and an SLO; the platform picks variant, precision and concurrency.
Sizing and capacity planning#
YOLOv11 sizing is governed by three quantities — per-stream FPS budget, per-stream resolution and concurrent stream count — not by KV cache or parameter memory. The planning model below is for FP16 TensorRT engines on common Yobitel-deployable accelerators at 640x640 input. Real-world numbers vary with NVDEC headroom, post-processing implementation and TensorRT version; treat the table as a sizing anchor, not a contract.
Memory budgets at FP16 are small (under 200 MB for v11x, around 30 MB for v11n) so VRAM is rarely the limit — concurrent streams almost always saturate compute or PCIe first. For dense scenes (retail aisles, traffic junctions) NMS becomes the bottleneck at high stream counts; offload NMS to the engine via the `EfficientNMS_TRT` plugin where possible. On Jetson Orin and the forthcoming Thor, the relevant lever is power mode — 60W mode roughly doubles inference throughput over 15W mode at the cost of thermal headroom.
| Variant | Jetson Orin Nano | Jetson AGX Orin | L4 (1x stream) | L40S (1x stream) |
|---|---|---|---|---|
| YOLOv11n | ~110 FPS | ~330 FPS | ~480 FPS | ~1100 FPS |
| YOLOv11s | ~55 FPS | ~210 FPS | ~310 FPS | ~780 FPS |
| YOLOv11m | ~22 FPS | ~110 FPS | ~165 FPS | ~430 FPS |
| YOLOv11l | ~14 FPS | ~75 FPS | ~120 FPS | ~310 FPS |
| YOLOv11x | ~6 FPS | ~36 FPS | ~58 FPS | ~155 FPS |
On Yobitel NeoCloud, L4 reservations price comfortably under L40S for any workload that does not need the L40S's NVENC headroom or larger VRAM. As a rule of thumb, YOLOv11n/s on L4 is the right floor for multi-stream analytics; YOLOv11l/x on L40S is the right ceiling for offline labelling.
Limits and quotas#
Raw Ultralytics imposes no hard quotas; everything is bounded by the host GPU and disk. The limits below are operational ceilings observed in production deployments and the corresponding ceilings enforced on Yobibyte managed YOLOv11 endpoints — they are intentionally generous, but they exist to keep one tenant from starving another.
| Limit | Raw upstream | Yobibyte managed default | Notes |
|---|---|---|---|
| Max input resolution | Limited by VRAM | 4K (3840 x 2160) | Higher resolutions billed as separate workspace tier. |
| Max concurrent streams per endpoint | GPU-bound | 32 | Lift via additional replicas; route via stream-aware load balancer. |
| Max batch size (export-time) | 1024 | 64 | Higher batches stall short-latency requests under continuous load. |
| Max classes per task | 10,000 (head capacity) | 1,000 | Beyond 1,000 classes, fine-grained accuracy degrades — switch to a two-stage detector + classifier. |
| Max keypoints (pose) | 256 | 133 (COCO + WholeBody) | Custom keypoint sets accepted on dedicated workspaces. |
Observability#
On raw deployments, the Ultralytics SDK emits per-batch loss components during training and per-call timings during prediction. For production serving behind Triton or DeepStream, the canonical observability surface is Prometheus metrics from the serving layer plus NVIDIA DCGM exporters for GPU telemetry. The minimum useful set:
- Per-stream FPS — `deepstream_fps` or Triton `nv_inference_request_success` per second per endpoint.
- Per-call latency histogram — Triton `nv_inference_request_duration_us` histogram, broken down by request type.
- Pre-process / inference / post-process split — DALI / Triton timing components; helps tell engine bottleneck from CPU bottleneck.
- NMS time — instrument explicitly if NMS is on CPU; move to `EfficientNMS_TRT` if it exceeds 15 percent of total latency.
- Detection counts — sliding-window count of detections per class; spikes indicate camera drift or model degradation.
- Confidence histogram — distribution of `score` per detection; gradual leftward shift signals retraining is due.
On Yobibyte the same metrics surface in the workspace dashboard without extra instrumentation. On a self-hosted Triton deployment, scrape the same series into your own Prometheus stack — the metric names are stable across vLLM, Triton and DeepStream.
Cost and FinOps#
Total cost of a YOLOv11 deployment is dominated by accelerator-hours, not by software. The ranges below are Yobitel NeoCloud on-demand reference prices for common YOLOv11 deployment SKUs; reserved pricing is materially lower at 12+ month commitment. All figures USD; treat as planning anchors, not committed quotes.
| SKU | Yobitel NeoCloud on-demand | Right-sized YOLOv11 deployment |
|---|---|---|
| Jetson Orin NX 16GB (Yobitel Edge AI appliance) | Capex unit + managed support | Single-camera or 2-4 camera edge box |
| Jetson AGX Orin 64GB (Edge AI appliance) | Capex unit + managed support | 8-16 camera edge gateway, on-prem inference |
| NVIDIA L4 (server class) | ~$0.85 / GPU / hour | 12-32 RTSP streams per GPU at YOLOv11s FP16 |
| NVIDIA L40S (server class) | ~$2.40 / GPU / hour | 32-64 streams per GPU or offline labelling at v11x |
| NVIDIA H100 SXM5 (overkill, included for sizing) | ~$3.80 / GPU / hour | Only justified if YOLOv11 shares the box with an LLM workload |
InferenceBench publishes weekly throughput-per-dollar updates for YOLOv11 across Yobitel and peer providers — use it to sanity-check whether a self-managed cluster beats a managed Yobibyte endpoint for your workload shape.
Security and compliance#
Two material concerns dominate YOLOv11 production: licence compliance (AGPL-3.0 vs Enterprise) and data residency for sensitive video. The first is the most common pitfall when teams move from prototype to product. The second is what drives the choice between Yobitel UK Sovereign tenancy and a multi-region neocloud deployment.
- AGPL-3.0 applies to network use. A closed-source SaaS that calls YOLOv11 in the backend is in scope; the corresponding source must be released under AGPL unless an Ultralytics Enterprise Licence is purchased.
- Distribution as an on-device or appliance product also triggers AGPL — Yobitel Edge AI appliances ship under enterprise licence terms for this reason.
- Video data residency — for NCSC OFFICIAL workloads, pin inference to Yobitel's UK Sovereign region; for HIPAA / patient-facing video (MediQuery), use the dedicated MediQuery deployment which routes through Yobitel's HIPAA-aligned infrastructure.
- Model supply chain — pin a specific Ultralytics release (`ultralytics==8.3.x`) in production and verify checksums; the project ships often and silent regressions in NMS or export occasionally land.
If your product is closed-source SaaS and you cannot redistribute source under AGPL, you need either an Ultralytics Enterprise Licence or a permissively-licensed alternative (RT-DETR is the obvious one). Yobibyte's managed YOLOv11 endpoint operates under appropriate licence terms — the customer-facing surface (HTTP API) is not subject to AGPL.
Migration and alternatives#
The realistic alternatives to YOLOv11 in 2026 cluster around licence trade-offs and architecture lineage. The comparison below is the standard decision matrix Yobitel solutions engineers walk customers through.
| Option | Licence | When to choose |
|---|---|---|
| YOLOv11 (this entry) | AGPL-3.0 / Ultralytics Enterprise | Default production detector; fastest training and tooling. |
| YOLOv8 | AGPL-3.0 / Ultralytics Enterprise | Migrate-later: production miles, broader downstream tooling support. |
| RT-DETR | Apache 2.0 | Closed-source SaaS where AGPL is unacceptable; NMS-free pipeline. |
| YOLOv9 / YOLOv10 | GPL-3.0 / AGPL-3.0 | Research baselines; less mature SDK and export tooling. |
| Yobibyte managed YOLOv11 | Yobitel Service Terms | Skip the runtime entirely; consume a hosted detection endpoint with SLA. |
If you are already on YOLOv8, the YOLOv11 upgrade is almost always worth doing on edge fleets (smaller models, higher FPS) and almost never urgent on server deployments where compute is cheap. Re-validate on your own holdout before swapping.
Troubleshooting#
The failure modes below cover roughly 80 percent of YOLOv11 production tickets. Each has a clear remediation path; the ordering reflects observed frequency in Yobitel Managed Operations runbooks.
| Symptom | Likely cause | Remediation |
|---|---|---|
| Sudden mAP drop after export | INT8 calibration set unrepresentative | Recalibrate with a recent sample of production frames; or fall back to FP16. |
| Per-stream FPS lower than expected | Pre-processing on CPU | Move resize / normalise to DALI; verify NVDEC is fed via Triton or DeepStream. |
| NMS dominates latency | CPU NMS path with many candidates | Export with `EfficientNMS_TRT` plugin; lower `max_det` if scene density is genuinely high. |
| Training loss diverges after epoch 30+ | LR schedule too aggressive for small dataset | Switch to `optimizer=AdamW`, `lr0=0.001`, reduce `mosaic` augmentation. |
| Custom dataset accuracy plateaus low | Label noise or class imbalance | Use YOLOv11x at higher resolution as a pseudo-labeller; then re-train the production variant. |
| TensorRT engine fails at runtime | Engine built on a different driver / SM major | Rebuild engine on the deployment host; engines are not portable across SM families. |
Where it fits in the Yobitel stack#
YOLOv11 is the default object detector across two Yobitel surfaces. On Yobitel Edge AI appliances (Jetson Orin NX and AGX, Jetson Thor when it ships), YOLOv11 ships preinstalled, pre-calibrated against the customer's reference footage, and benchmarked before delivery — customers receive a working detector on day one rather than a training problem. On Yobibyte, YOLOv11 is one of the managed inference resources customers can request from a workspace; the platform picks the variant, the precision and the placement (L4 vs L40S, single-stream vs multi-stream), and emits standards-based observability into the customer's dashboard.
Yobibyte's managed YOLOv11 endpoint is operated under enterprise licence terms, so customers consuming the HTTP API do not inherit AGPL obligations. For customers who need a permissive baseline, Yobitel solutions engineers will route to RT-DETR instead. InferenceBench tracks YOLOv11 throughput-per-dollar weekly across Yobitel and peer neoclouds — that data feeds the Omniscient Compute placement engine, which Yobibyte uses internally to decide where each inference replica lands.
- [Yobitel Edge AI](/products/yobibyte) — YOLOv11 baseline on Jetson Orin / Thor reference appliances.
- [Yobibyte](/products/yobibyte) — managed YOLOv11 detection endpoint for streaming and batch.
- [Yobitel NeoCloud](/services/neocloud) — L4 and L40S capacity for self-managed deployments.
- [InferenceBench](/products/inferencebench) — public throughput-per-dollar tracking.
References
- Ultralytics YOLOv11 Documentation · Ultralytics Docs
- Ultralytics YOLO GitHub · GitHub
- You Only Look Once: Unified, Real-Time Object Detection (Redmon et al., 2015) · arXiv
- TensorRT EfficientNMS plugin reference · NVIDIA TensorRT GitHub