Skeleton Keypoints — 17-Point COCO Human Pose Estimation

Human body pose annotation using the COCO 17-keypoint standard — nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles with visibility flags across diverse scenes.

← Back to Case Studies

Ongoing — Active Internal Pilot & Training Project

Project Overview

The Challenge

Human pose estimation powers everything from fitness apps and sports analytics to autonomous vehicle pedestrian prediction and robotics. The COCO 17-keypoint format is the industry standard — 17 anatomical landmarks per person, each with x/y coordinates and a visibility flag (visible, occluded, or not in frame). Getting this right across crowded scenes with overlapping people, partial occlusion, and unusual poses is where human annotators remain essential.

This ongoing internal pilot trains our annotation team on skeleton keypoint placement across diverse scenes — single subjects, multi-person crowds, varying poses, and occlusion scenarios. Every new annotator cohort works through this project to build the spatial reasoning needed for production-grade pose annotation.

Project Management Keylian Namisi

QA Lead Ibrahim Ouma

Status Ongoing

Platform CVAT v2.58.0 (Self-Hosted)

Technical Specifications

The COCO 17-Keypoint Standard

Keypoint Map

#	Keypoint	Region
1	Nose	Head
2–3	Left/Right Eye	Head
4–5	Left/Right Ear	Head
6–7	Left/Right Shoulder	Upper Body
8–9	Left/Right Elbow	Arms
10–11	Left/Right Wrist	Arms
12–13	Left/Right Hip	Lower Body
14–15	Left/Right Knee	Legs
16–17	Left/Right Ankle	Legs

Visibility Flags

Flag	Meaning
v=2	Visible — keypoint clearly seen
v=1	Occluded — keypoint hidden but position can be inferred
v=0	Not in frame — keypoint outside image boundary

Annotation Challenges

Challenge	Our Approach
Multi-person scenes	Separate skeleton per individual
Partial occlusion	Infer position, mark v=1
Unusual poses	Anatomical reasoning for joint placement
Scale variation	Consistent precision at all scales

Applications

Who This Serves

Autonomous vehicles: Pedestrian pose prediction — understanding body orientation to predict crossing intent
Sports analytics: Athlete tracking, form analysis, and performance metrics from video
Fitness & health: Exercise form detection, physical therapy tracking, fall detection for elderly care

Robotics: Human-robot interaction — understanding human pose for safe collaborative movement
Security: Behavior analysis, suspicious activity detection from pose patterns
Animation & VFX: Motion reference, pose-driven animation, and virtual character control

Production-ready on demand: Our annotators are trained on the COCO 17-keypoint standard with visibility flags, occlusion handling, and multi-person scene management. If your team needs pose estimation training data, we can start a pilot immediately on our self-hosted CVAT infrastructure.

Need Skeleton Keypoint Annotation?

COCO 17-point pose estimation across any scene type. Start with a free pilot — same CVAT infrastructure, same QA pipeline, same 98.5% accuracy guarantee.

Request a Pilot → Book a Call →