Skeleton Keypoints — 17-Point COCO Human Pose Estimation
Human body pose annotation using the COCO 17-keypoint standard — nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles with visibility flags across diverse scenes.
The Challenge
Human pose estimation powers everything from fitness apps and sports analytics to autonomous vehicle pedestrian prediction and robotics. The COCO 17-keypoint format is the industry standard — 17 anatomical landmarks per person, each with x/y coordinates and a visibility flag (visible, occluded, or not in frame). Getting this right across crowded scenes with overlapping people, partial occlusion, and unusual poses is where human annotators remain essential.
This ongoing internal pilot trains our annotation team on skeleton keypoint placement across diverse scenes — single subjects, multi-person crowds, varying poses, and occlusion scenarios. Every new annotator cohort works through this project to build the spatial reasoning needed for production-grade pose annotation.
Technical SpecificationsThe COCO 17-Keypoint Standard
Keypoint Map
| # | Keypoint | Region |
|---|---|---|
| 1 | Nose | Head |
| 2–3 | Left/Right Eye | Head |
| 4–5 | Left/Right Ear | Head |
| 6–7 | Left/Right Shoulder | Upper Body |
| 8–9 | Left/Right Elbow | Arms |
| 10–11 | Left/Right Wrist | Arms |
| 12–13 | Left/Right Hip | Lower Body |
| 14–15 | Left/Right Knee | Legs |
| 16–17 | Left/Right Ankle | Legs |
Visibility Flags
| Flag | Meaning |
|---|---|
| v=2 | Visible — keypoint clearly seen |
| v=1 | Occluded — keypoint hidden but position can be inferred |
| v=0 | Not in frame — keypoint outside image boundary |
Annotation Challenges
| Challenge | Our Approach |
|---|---|
| Multi-person scenes | Separate skeleton per individual |
| Partial occlusion | Infer position, mark v=1 |
| Unusual poses | Anatomical reasoning for joint placement |
| Scale variation | Consistent precision at all scales |
Who This Serves
- Autonomous vehicles: Pedestrian pose prediction — understanding body orientation to predict crossing intent
- Sports analytics: Athlete tracking, form analysis, and performance metrics from video
- Fitness & health: Exercise form detection, physical therapy tracking, fall detection for elderly care
- Robotics: Human-robot interaction — understanding human pose for safe collaborative movement
- Security: Behavior analysis, suspicious activity detection from pose patterns
- Animation & VFX: Motion reference, pose-driven animation, and virtual character control
Production-ready on demand: Our annotators are trained on the COCO 17-keypoint standard with visibility flags, occlusion handling, and multi-person scene management. If your team needs pose estimation training data, we can start a pilot immediately on our self-hosted CVAT infrastructure.
Need Skeleton Keypoint Annotation?
COCO 17-point pose estimation across any scene type. Start with a free pilot — same CVAT infrastructure, same QA pipeline, same 98.5% accuracy guarantee.