Bounding Box — Multi-Domain Object Detection

25,000+ bounding box annotations across 6 clients and multiple domains — pedestrians, vehicles, aerial views, and urban scenes. All annotated on self-hosted CVAT.

← Back to Case Studies Project Overview

The Challenge

Bounding box annotation is the foundation of object detection — every autonomous vehicle, security system, retail analytics platform, and drone perception pipeline starts here. But quality varies dramatically across providers. Tight bounding boxes with consistent class labels across thousands of images, multiple domains, and varying conditions is where most annotation teams fall apart.

TechAI Remote has delivered 25,000+ bounding box annotations across 6 different clients, in-house pilots, and self-training projects — spanning pedestrian detection in crowded street scenes, vehicle detection from aerial and ground-level views, and multi-class object detection in urban environments. Every annotation was produced on our self-hosted CVAT instance with full three-layer QA.

Project Management Keylian Namisi

QA Lead Ibrahim Ouma

Total Annotations 25,000+

Platform CVAT v2.58.0 (Self-Hosted)

Bounding box annotation metrics - 25000+ annotations across 6 clients

Project metrics: 25,000+ bounding box annotations, 6 clients, multiple domains, self-hosted CVAT, 98.5% QA accuracy

Technical Specifications

Domains & Object Classes

Annotation Domains

Bounding box work spanned multiple visual domains — each with its own challenges in scale, occlusion, density, and perspective. From street-level pedestrian crowds to overhead aerial vehicle detection, every domain requires different annotation strategies and edge case handling.

Object Classes

Class	Domain
Person	Street scenes, crosswalks, crowds
Vehicle	Aerial views, parking lots, roads
Pedestrian	Urban intersections, sidewalks
Cyclist	Road scenes, bike lanes
Traffic Sign	Street-level detection
Custom Classes	Per-client specifications

Infrastructure

Component	Detail
Platform	CVAT v2.58.0 (self-hosted)
Server	Hetzner dedicated server
Deployment	Docker with SSL
Data Security	Data never leaves our server
Access	NDA-protected, role-based

Export Formats

Format	Use Case
COCO JSON	Standard object detection training
YOLO	Real-time detection models
Pascal VOC	Legacy detection pipelines
CVAT JSON	Client CVAT import

Inside CVAT

Annotation in Production

Real screenshots from our self-hosted CVAT instance showing bounding box annotation across different domains. Each image shows the annotation canvas with labeled bounding boxes, the objects panel with class labels, and the color-coded overlay system for visual verification.

CVAT bounding box annotation - 52 person objects in crowded urban crosswalk scene

CVAT bounding box annotation - 16 person objects in street pedestrian scene

CVAT bounding box annotation - 22 vehicle objects in aerial parking lot view

Left: Dense urban scene — 52 person bounding boxes in a crowded crosswalk with heavy occlusion. Center: Street-level pedestrian detection — 16 persons with varying scale and partial visibility. Right: Aerial vehicle detection — 22 vehicles in a parking lot from overhead perspective.

Multi-Domain Coverage

Why Multi-Domain Matters

A single bounding box model trained on clean, well-lit images fails in the real world. Production object detection needs training data that spans perspectives, densities, lighting conditions, and object scales. Our 25,000+ annotations deliberately cover this range.

Street-Level Pedestrian Detection

Crowded urban crosswalks with 50+ persons per frame. Heavy occlusion where pedestrians overlap, varying scales from close-range to distant, and mixed lighting conditions. This is where most auto-labeling tools break down — distinguishing overlapping humans in dense crowds requires human spatial judgment.

Aerial & Overhead Views

Vehicle detection from drone and satellite perspectives. Objects appear at dramatically different scales and orientations compared to street-level views. Parking lots, intersections, and highway segments annotated from above — critical for surveillance, urban planning, and traffic monitoring applications.

Urban Mixed-Class Scenes

Street scenes containing multiple object classes simultaneously — persons, vehicles, cyclists, and infrastructure. Annotators must correctly classify each object while maintaining tight bounding box boundaries even when classes overlap in the frame.

Edge Cases Across Domains

Partial visibility at frame edges, extreme occlusion where less than 30% of the object is visible, scale variation within a single image (near objects 500+ pixels vs. distant objects under 20 pixels), and unusual poses or orientations. These are the annotations that separate production-grade datasets from hobbyist work.

Key insight: The hardest bounding boxes aren’t the obvious ones — they’re the partially occluded pedestrian behind a pole, the vehicle half-hidden by a tree in an aerial shot, and the cyclist at the edge of frame. Our annotators are trained to annotate what’s visible, classify correctly even with minimal visual evidence, and maintain consistency across thousands of images.

Our Process

How We Deliver Bounding Box Projects

Annotation Workflow

Client onboarding: Review annotation guidelines, class definitions, edge case rules, and export requirements
Task creation: Images uploaded to self-hosted CVAT, split into batches, assigned to trained annotators
Bounding box placement: Tight-fitting rectangles drawn around each target object with correct class label
Edge case handling: Occluded objects, truncated objects at frame edges, and ambiguous classes flagged and resolved per guidelines
Export & delivery: Annotations exported in client-specified format (COCO, YOLO, Pascal VOC, CVAT JSON)

Quality Assurance

Stage 1 — Self-check: Annotator reviews own work for missed objects, incorrect labels, and loose bounding boxes
Stage 2 — Peer review: Second annotator validates every box for tightness, class accuracy, and completeness
Stage 3 — QA Lead final: Ibrahim Ouma performs final validation with spot-checks on box tightness and class distribution
Consistency checks: Cross-image label consistency verified — same object type gets the same label every time
Completeness audit: Random sampling to ensure no objects are missed, especially in dense scenes
Overall QA target: 98.5% accuracy across all annotations

Target Industries

Who This Serves

Bounding box annotation is the entry point for most computer vision pipelines:

Autonomous vehicles: Pedestrian, vehicle, and cyclist detection for ADAS and self-driving perception
Security & surveillance: Person detection and tracking in CCTV and drone footage
Retail analytics: Customer counting, foot traffic analysis, and shelf monitoring
Aerial & satellite: Vehicle counting, infrastructure monitoring, and urban planning from overhead imagery
Robotics: Object detection for pick-and-place, navigation, and obstacle avoidance

Why choose managed annotation over auto-labeling? Auto-labeling tools achieve 85-92% accuracy on clean, well-lit images with standard objects. But real-world datasets include occlusion, unusual angles, rare objects, and edge cases where automated tools consistently fail. Our 25,000+ annotations across 6 clients prove we handle the full spectrum — from easy frames to the hardest edge cases that break production models.

Need Bounding Box Annotation?

Start with a free 500-image pilot. Multi-domain, multi-class, any export format. Same CVAT infrastructure, same QA pipeline, same 98.5% accuracy guarantee.

Request a Pilot → Book a Call →