System Architecture
Understanding the architecture of MLPractitioners Computer Vision—how the three platforms work together to create a complete YOLO training ecosystem.
Architecture Overview
Label Studio is built on three core components that communicate through a unified API contract:
Client Apps
Android (Mobile) app for data collection and labeling. Exports to standardized JSON format.
API Contract
Unified data format and REST API specification that all platforms follow, ensuring complete interoperability.
Training Backend
Serverless Modal.com backend that receives labeled data and returns trained YOLO11 models.
Data Flow
- Image Capture: User captures images via device camera (Android)
- Labeling: User draws bounding boxes using center-to-corner interaction
- Local Storage: Labels saved in YOLO format to local JSON files
- Export: App generates standardized JSON with base64-encoded images
- API Submit: JSON sent to training backend via REST API
- Training: Backend converts JSON to YOLO dataset and trains model
- Download: User downloads trained model in PyTorch, ONNX, or TFLite format
Android App Architecture
Core Components
- MainActivity.kt: Main activity and coordinator
- CameraManager.kt: CameraX pipeline wrapper
- LabelingCanvasView.kt: Custom view for drawing
- ProjectStorage.kt: JSON persistence with Gson
- ExportManager.kt: API-compliant export generation
Camera Pipeline
The Android app uses CameraX for optimal camera performance:
- Direct RGBA format access (no YUV conversion)
- Zero-copy bitmap creation
- Preview and capture unified pipeline
- Automatic rotation handling
Backend Architecture
Modal.com Infrastructure
- ASGI App: FastAPI application served via Modal
- GPU Containers: Auto-scaling compute for training
- Volume Storage: Persistent storage for trained models
- Job Queue: In-memory job tracking (upgradeable to Redis)
Training Pipeline
- Validation: Verify JSON format and data integrity
- Dataset Creation: Convert to YOLO text format
- Train/Val Split: 80/20 split by default
- Model Training: Ultralytics YOLO11 training
- Export: Convert to requested format (pt/onnx/tflite)
- Storage: Save to Modal Volume
- Cleanup: Remove temporary files
API Contract
All platforms adhere to a unified API specification. See the API Overview for complete details.
Key Principles
- YOLO Format: All coordinates in normalized [0, 1] range
- Base64 Images: Images encoded with data URI prefix
- Async Processing: Training jobs run asynchronously
- RESTful Design: Standard HTTP methods and status codes
Further Reading: For implementation details, see the complete API Contract document and Backend Architecture guide.
Scalability Considerations
Client Apps
- Local-first design—works offline
- No backend required for labeling
- Export only when ready to train
Backend
- Serverless architecture scales automatically
- Handles concurrent training jobs
- GPU resources allocated on-demand
- No infrastructure management needed
Security
Data Privacy
- Images stored locally until export
- No data sent to cloud until user initiates training
- Trained models stored in user's Modal account
API Security
- Input validation on all endpoints
- File size limits enforced
- Rate limiting available (optional)
- Authentication via API keys (optional)