System Architecture

Understanding the architecture of MLPractitioners Computer Vision—how the three platforms work together to create a complete YOLO training ecosystem.

Architecture Overview

Label Studio is built on three core components that communicate through a unified API contract:

🖥️

Client Apps

Android (Mobile) app for data collection and labeling. Exports to standardized JSON format.

API Contract

Unified data format and REST API specification that all platforms follow, ensuring complete interoperability.

Training Backend

Serverless Modal.com backend that receives labeled data and returns trained YOLO11 models.

Data Flow

Image Capture: User captures images via device camera (Android)
Labeling: User draws bounding boxes using center-to-corner interaction
Local Storage: Labels saved in YOLO format to local JSON files
Export: App generates standardized JSON with base64-encoded images
API Submit: JSON sent to training backend via REST API
Training: Backend converts JSON to YOLO dataset and trains model
Download: User downloads trained model in PyTorch, ONNX, or TFLite format

Android App Architecture

Core Components

MainActivity.kt: Main activity and coordinator
CameraManager.kt: CameraX pipeline wrapper
LabelingCanvasView.kt: Custom view for drawing
ProjectStorage.kt: JSON persistence with Gson
ExportManager.kt: API-compliant export generation

Camera Pipeline

The Android app uses CameraX for optimal camera performance:

Direct RGBA format access (no YUV conversion)
Zero-copy bitmap creation
Preview and capture unified pipeline
Automatic rotation handling

Backend Architecture

Modal.com Infrastructure

ASGI App: FastAPI application served via Modal
GPU Containers: Auto-scaling compute for training
Volume Storage: Persistent storage for trained models
Job Queue: In-memory job tracking (upgradeable to Redis)

Training Pipeline

Validation: Verify JSON format and data integrity
Dataset Creation: Convert to YOLO text format
Train/Val Split: 80/20 split by default
Model Training: Ultralytics YOLO11 training
Export: Convert to requested format (pt/onnx/tflite)
Storage: Save to Modal Volume
Cleanup: Remove temporary files

API Contract

All platforms adhere to a unified API specification. See the API Overview for complete details.

Key Principles

YOLO Format: All coordinates in normalized [0, 1] range
Base64 Images: Images encoded with data URI prefix
Async Processing: Training jobs run asynchronously
RESTful Design: Standard HTTP methods and status codes

Further Reading: For implementation details, see the complete API Contract document and Backend Architecture guide.

Scalability Considerations

Client Apps

Local-first design—works offline
No backend required for labeling
Export only when ready to train

Backend

Serverless architecture scales automatically
Handles concurrent training jobs
GPU resources allocated on-demand
No infrastructure management needed

Security

Data Privacy

Images stored locally until export
No data sent to cloud until user initiates training
Trained models stored in user's Modal account

API Security

Input validation on all endpoints
File size limits enforced
Rate limiting available (optional)
Authentication via API keys (optional)