CVAT on AWS - How Yobitel's pre-built AMI makes AI data annotation effortless at scale?

Logesh Palaniyappan
3 days ago
10 min read

Yobitel's CVAT AMI is a pre-configured Amazon Machine Image available on AWS Marketplace. It deploys a fully functional CVAT (v1.11.0) annotation environment on Ubuntu 24.04, with Docker, Python, automated workflow scripts, and native S3 integration via EC2 IAM role pre-installed. Teams can subscribe to AWS Marketplace, launch an EC2 instance, and start annotating within minutes with no per-seat licensing fees.

Setting up a scalable CVAT annotation environment from scratch requires 5 sequential steps:

Docker Compose installation
SSL certificate configuration
Nginx reverse proxy setup
IAM access key management
S3FS bucket mounting

This typically accounts for days of dedicated DevOps work before a single image is annotated. Beyond the initial setup, ongoing infrastructure maintenance adds a continuous operational overhead covering version updates, SSL certificate renewal, storage configuration changes, and container health monitoring. CVAT AMI eliminates this entirely. Data remains within your chosen AWS region and VPC, fully under your IAM policies and security group controls.

What is CVAT? Computer Vision Annotation Tool is an open-source platform for labelling images and videos for machine learning and AI projects. It supports a wide range of annotation types, covered in full in the Annotation Types section below.

Datasets can be exported in multiple formats, including YOLO, COCO, and Pascal VOC. See the Dataset Export section for the full list. CVAT offers 3 deployment options, each with distinct differences in setup, data control, and scalability:

cvat.ai - Hosted platform
Self-Hosted - On your own infrastructure
AWS AMI - Pre-configured on AWS via Yobitel

The choice between these options is determined by data privacy requirements, infrastructure capacity, annotation volume, and deployment complexity.

What are your options when you want to use CVAT?

As there are 3 CVAT deployment options, 2 of them come with hard limitations that block production-scale annotation. AMI is built specifically to remove those limitations inside your existing AWS environment. Use the comparison below to evaluate each deployment option against your team's technical and operational requirements.

Why was a CVAT AMI built?

CVAT is the most capable open-source annotation platform available, but getting it running in production has always been the real challenge. Yobitel's CVAT AMI solves this problem exactly. It packages a fully configured, production-ready CVAT environment as an Amazon Machine Image (AMI), available directly on the AWS Marketplace.

The entire setup process has just three steps:

Subscribe
Launch
Annotate

What does AMI mean?

AMI is a pre-built image of a fully configured system. When an EC2 instance boots from an AMI, every required component, including Docker, Python, PostgreSQL, Redis, Nginx, and S3 integration, is already installed and running. After launch, CVAT is accessible in a browser without an installation or configuration phase.

The gap between subscribing to the Marketplace and annotating your first image is measured in minutes, not days.

Your data never leaves your AWS account

This is one of the most critical advantages of the AMI model over hosted SaaS platforms.

Teams that cannot use hosted SaaS annotation platforms include

Healthcare - Clinical imaging data is protected under HIPAA and equivalent regulations. Uploading it to a third-party SaaS platform creates compliance obligations that many organisations cannot meet.
Finance - Financial documents subject to GDPR or regional data protection laws cannot be processed on external vendor servers without significant legal review.
Defence and government - Projects with explicit data sovereignty requirements prohibit data from leaving controlled or nationally designated infrastructure.
Enterprise organisations - Internal InfoSec policies often block third-party data processing entirely, making SaaS platforms non-viable regardless of vendor assurances.

Yobitel's CVAT AMI resolves this directly. All annotations, exports, and raw images stay inside the S3 bucket.

What comes pre-configured inside the Yobitel CVAT AMI?

The full CVAT service stack starts automatically on boot. Every component listed below is pre-configured, tested, and production-ready before the instance reaches your hands.

Docker and Docker Compose run all CVAT services in isolated containers.
Python ships with every required dependency pre-installed.
PostgreSQL is initialised and pre-migrated before the first user logs in.
The Redis task queue is configured and running.
Nginx reverse proxy routes all incoming traffic from the first request.

Superuser credentials are generated automatically on first boot and written to a secure file on the instance. Instructions for retrieving them are in the deployment steps below.

Automation scripts for environment setup, task creation, and user management are included in the image and accessible immediately after launch.

Annotation types

The AMI supports the complete range of CVAT annotation types for both image and video datasets.

AI-assisted annotation

Three AI-assisted labelling integrations ship pre-configured within the AMI. Each reduces manual annotation time on tasks where full manual labelling is the primary cost driver.

Segment anything model (SAM) generates a segmentation mask from a single click on any object within the image. On complex shapes, this reduces polygon annotation time by up to 70 per cent compared to manual tracing.
DEXTR (Deep extreme cut) produces a precise polygon boundary from four annotator-marked points at the extreme edges of the target object. The model generates the boundary automatically. No manual tracing of the polygon perimeter is required.
SORT and SiamMask provide object tracking for video annotation tasks. An annotator labels an object in the first frame of a video sequence. The tracker follows that object across subsequent frames without additional manual input. Annotators review the output and correct tracking drift where it occurs, rather than labelling each frame independently.

At production dataset volumes, these reductions translate directly to lower annotation costs and shorter time-to-training schedules.

Dataset export

Training annotation files export without intermediate conversion. Exported files are directly compatible with the training frameworks listed below.

Exported datasets do not require post-processing or reformatting before use in a training pipeline.

Native S3 integration

The Yobitel CVAT AMI supports native Amazon S3 integration through IAM role-based authentication. A CVAT application does not store any access keys at any point. A role in the EC2 instance governs bucket access.

To register an S3 bucket, navigate to the Cloud Storage section within CVAT. Enter the bucket name and AWS region. Upon registration, the bucket becomes available as a data source for all annotation tasks within the instance. CVAT reads image data directly from the registered bucket. No data transfers to the EC2 instance volume occur during annotation.

Dataset exports write directly to the registered S3 bucket upon completion. Raw images, annotation files, and exported datasets remain within the subscriber's S3 environment at every stage of the annotation lifecycle.

Workflow automation

The Yobitel CVAT AMI includes a suite of pre-built automation scripts designed to reduce the operational overhead that annotation teams typically absorb when managing CVAT manually. These scripts cover the 4 core areas of annotation pipeline management and are available on the instance from the moment it launches.

Environment setup scripts are responsible for initialising and resetting the CVAT environment between projects. This ensures that each new annotation project starts from a consistent, controlled state without residual configuration or data from previous workflows carrying over into the new project.
Task creation scripts allow annotation tasks to be generated programmatically in bulk, drawing directly from image source lists or S3 prefixes stored in the connected AWS environment. It removes the bottleneck of manually creating tasks through CVAT. When processing datasets with thousands of images or frames, making large-scale annotation operations more efficient.
User management scripts handle the full lifecycle of annotator accounts, covering provisioning of new users, modification of existing permissions, and removal of accounts when no longer required. Teams managing large or frequently rotating annotator workforces can execute these operations programmatically at scale, without accessing the CVAT interface for every individual change.
Export automation scripts connect the dataset export directly to downstream pipeline stages within the AWS environment. Completed annotation outputs move into training, validation, or delivery workflows automatically. It ensures the annotation process integrates cleanly with the broader ML pipeline without requiring manual intervention at the export stage.

Extensibility

The Yobitel CVAT AMI is built on the open-source CVAT codebase, and every layer of the platform is available for modification. The default configuration does not restrict organisations. The AMI is designed to serve as a foundation that teams can extend and adapt to meet requirements specific to their domain, infrastructure, or workflow.

Teams working with proprietary machine learning models can integrate those models directly into the CVAT interface as custom Nuclio serverless functions. This allows annotators to work with model-assisted annotation suggestions generated by in-house models running within the same AWS environment, rather than relying solely on the default AI assistants that ship with CVAT.

The annotation interface can be extended through custom plugins to support specialised workflows or domain-specific interaction patterns. Organisations operating in fields such as medical imaging, satellite imagery, or industrial inspection can adapt the interface to match the precise requirements of their annotation process rather than working around the constraints of a fixed default layout.

Label schemas, project templates, and workflow configurations are fully modifiable within the AMI. This allows organisations to define and enforce their own data taxonomy, labelling standards, and project structure across all annotation work conducted on the platform, ensuring consistency with internal data governance requirements.

For teams with advanced infrastructure needs, the Docker Compose configuration that underpins the AMI deployment can be modified to accommodate custom networking, storage, or service configurations. This gives infrastructure and DevOps teams full control over how the CVAT environment is structured within the AWS architecture, without being restricted to the default setup the AMI ships with.

How to deploy CVAT on AWS using Yobitel's AMI?

The following procedure covers the complete deployment sequence from AWS Marketplace subscription to active annotation. No command-line infrastructure setup is required.

Technical usage manual

Subscribe and launch - Navigate to the Yobitel CVAT AMI listing on the AWS Marketplace and subscribe. Once the subscription is confirmed, select Launch through EC2 to proceed to the instance configuration page.

Configure the instance - On the Launch Instance page, provide the required configuration details, including the instance name, instance type, key pair, network settings, and storage. Select the instance type appropriate for your annotation workload and launch the instance.

Retrieve the public IP address - Once the instance reaches a running state, open the EC2 Dashboard in the AWS Management Console.

Access the CVAT interface - Select the instance and copy its public IP address from the instance details panel. Open a web browser and navigate to the following address, replacing the placeholder with the public IP address of your instance: http://<EC2_PUBLIC_IP>:8080
Wait for the server to initialise - On first access, the browser may display a "Cannot connect to the server" message. This is expected behaviour. The CVAT server requires a short initialisation period after the instance starts. Dismiss the message and refresh the page after a few moments. The login page will load once the server is ready.

Retrieve superuser credentials - Open a terminal session connected to the instance and run the following command to retrieve the automatically generated superuser credentials:

sudo cat /opt/cvat/superuser.txt

The output will display the superuser username and password. Enter these credentials on the CVAT login page to sign in.

Begin annotating - After a successful login, the CVAT application will open in the browser. The platform is fully configured and ready for annotation work.

Create a project, task, and start annotating

CVAT organises annotation work within a three-level hierarchy: Project, Task, and Job.

A Project defines the label schema and serves as the parent container for all related annotation tasks. To create a project, navigate to the Projects section and select the option to add a new entry. Assign a project name, define the label set, and configure any required label attributes such as occlusion status or object orientation.

A Task represents a discrete batch of images or video assigned to a project. To create a task, navigate to the Tasks section and add a new entry. Assign a task name, link it to the parent project, and specify the data source. For datasets residing in S3, select the registered cloud storage bucket as the data source. Configure image quality and segment size parameters according to the workload requirements. Submit the task to make it available for annotation.

A Job is a subdivision of a task assigned to an individual annotator. Jobs are generated automatically upon task creation. Each job is assigned to an annotator through the task management interface.

Real-world annotation use cases

The practical value of running CVAT on AWS becomes clearest when you look at the kinds of datasets that power real production AI systems.

Autonomous vehicle perception

Perception systems for autonomous vehicles require annotation of camera, lidar, and radar data at scale. Pedestrians, vehicles, traffic signs, lane markings, and environmental structures all require labelling across tens of millions of frames. The full range of CVAT annotation types applies to this domain: bounding boxes for vehicle detection, polygons for lane boundary tracing, keypoints for pedestrian pose estimation, and video tracking for multi-frame object continuity.

Raw sensor data in autonomous vehicle programs typically resides in S3 from upstream ingestion pipelines. CVAT's native S3 integration allows annotators to work directly against this data, removing the need for intermediate transfer operations or local storage duplication.

Medical imaging and diagnostic AI

Radiology, pathology, and surgical AI systems require pixel-precise annotation of MRI scans, CT images, histopathology slides, and endoscopy videos. Segmentation masks and polygon annotations at this precision level are where SAM and DEXTR produce the largest reduction in annotation cost. It becomes feasible to trace complex organ boundaries and irregular pathological shapes using AI-assisted boundary generation.

Clinical imaging data is subject to strict regulatory requirements in most jurisdictions, making data isolation a hard requirement for this use case. As covered in the data privacy section above, the AMI keeps all data within the subscriber's own AWS environment, making it the appropriate deployment model here.

Drone and satellite imagery analysis

Remote sensing datasets for agriculture, infrastructure inspection, environmental monitoring, and urban planning require annotation of aerial imagery at sub-meter resolution across large geographic areas. Object density is high, and objects are often small relative to image dimensions, making annotation volume the primary cost driver.

The AMI's flat per-hour pricing scales annotation infrastructure cost with compute consumption rather than team size or task volume. Organisations processing large aerial datasets pay for the compute hours their workload requires.

Retail object detection and shelf analytics

Retail AI applications, including planogram compliance monitoring, inventory tracking, and out-of-stock detection, require dense bounding box annotation across large product catalogues and varied store environments. CVAT exports directly to YOLO format. The gap between a completed annotation task and a training-ready dataset is a single export operation with no intermediate conversion step.

Ideally, your team should never have to think about annotation setup again. Yobitel's CVAT AMI is built to meet that standard. It removes every infrastructure decision that stands between your team and productive annotation work. The only thing left to do is log in and start labelling.

FAQs

Is my annotation data secure on AWS?

Yes. All annotation data images, labels, and exports remain within your own AWS account. Nothing is transmitted to Yobitel or any third-party platform. S3 integration uses IAM role-based authentication, meaning no credentials are stored in CVAT.

Which EC2 instance type should I choose?

Yobitel lists 16 supported instance types in the Marketplace pricing table. The t3.large is recommended as the starting point for most teams, as it handles standard annotation workloads well. For large concurrent teams or AI-assisted annotation tasks (SAM, DEXTR), m5.xlarge or m5.2xlarge provides additional compute headroom. GPU instances are not required for annotation; they become relevant in model training, which happens downstream of CVAT.

Can I integrate custom models into the CVAT AMI?

Yes. The AMI's open-source CVAT base supports custom model integration for AI-assisted annotation. Teams can add custom serverless functions (via Nuclio) to expose their own models as annotation assistants within the CVAT interface, useful for domain-specific tasks where generic models like SAM do not perform optimally.