Vendor Selection • Data Quality

10 Red Flags That Tell You an Annotation Vendor Will Waste Your Budget

By Keylian Namisi • March 31, 2025 • 9 min read

Picking the wrong annotation vendor is one of the most expensive mistakes an AI team can make — and it’s almost never obvious until you’re already committed. By then, you’ve signed a contract, loaded data, and burned three weeks. This checklist exists so you can catch the warning signs before that happens. Every red flag here comes from a pattern we’ve seen repeated across the industry. Take them seriously.

Why Vendor Selection Is Harder Than It Looks

Every annotation company has a professional website, impressive-sounding capability claims, and a sales rep who will tell you exactly what you want to hear. “98% accuracy.” “Scalable to any volume.” “Domain experts on staff.” These phrases are cheap to write and impossible to verify without doing the work yourself.

The problem is that most teams don’t do the work. They compare prices, look at a portfolio, get on one call, and sign. Then the first delivery comes in and the errors start surfacing — inconsistent labels, missed edge cases, wrong classifications that the model quietly learns from. By the time the damage shows up in model performance, weeks of engineering time have already been lost.

The red flags below are observable before you commit. Use this list during vendor evaluation, not after.

The 10 Red Flags

They offer a free pilot without charging for it. This sounds backwards, but it matters. A vendor who gives away pilot work is signaling that pilots are a sales tactic, not a quality evaluation. Your pilot gets deprioritized, handed to junior annotators trying to win business, and doesn’t reflect the quality of a real paid project. Quality vendors charge for pilots — modestly, but they charge — because they’re committing real annotator time and QA resources to it. If a vendor insists the pilot is free, ask yourself who’s actually doing that work.
They can’t explain their QA process in specific terms. “We have rigorous quality control” means nothing. What does rigorous mean? How many review stages? What’s the reviewer-to-annotator ratio? How do they handle inter-annotator disagreement? How are errors fed back to annotators? If a vendor can’t answer these questions with specifics — named processes, measurable thresholds, concrete steps — they don’t have a real QA system. They have a talking point.
Their per-unit pricing is at or below market floor. If a vendor is quoting bounding box annotation at $0.01–0.02 per object, you should immediately ask how. At those rates, annotators are being paid well below any livable wage and being pressured to rush. The math doesn’t work for quality work. Understand what quality annotation actually costs — including training time, QC overhead, and reasonable annotator pay — and treat pricing significantly below that as a warning sign, not a bargain.
They have no answer when you ask about edge case handling. Every real-world dataset has ambiguous cases: partial occlusions, unusual angles, objects that straddle category boundaries. Ask the vendor directly: when an annotator encounters a case the guidelines don’t cover, what happens? A quality vendor has an escalation process — annotators flag it, a senior reviewer or QA lead makes a judgment call, the decision gets documented, and guidelines get updated. A bad vendor shrugs and says annotators use their best judgment. That’s how you get 15% of your dataset inconsistently labeled.
They can’t tell you who is actually doing the annotation. Some vendors are marketplaces or brokers, not annotation teams. They take your project, post it to a crowdsourcing platform, and collect a margin on whatever comes back. There’s no consistent team, no training continuity, and no accountability. Ask directly: are your annotators employees or contractors? Are they on a dedicated team for my project or pulled from a general pool? A vendor who deflects or gives vague answers about workforce structure is almost certainly outsourcing further than you realize.
They don’t ask enough questions about your task before quoting. A vendor who sends you a price within hours of your first inquiry hasn’t thought seriously about your project. Real annotation projects have specific complexity — what are the label classes, how complex are the scenes, what export format do you need, what are your edge case handling rules, what model will this data train? A vendor who skips these questions either isn’t experienced enough to know they matter, or is quoting a generic rate and planning to cut corners when reality doesn’t match their assumptions.
Their accuracy claims aren’t tied to a methodology. “98.5% accuracy” is meaningless without context. Accuracy against what benchmark? Measured how — inter-annotator agreement, client review, automated validation? On what task type? Across which projects? Vendors who cite accuracy figures without explaining the methodology are using the number as marketing, not evidence. Ask them to walk you through how they measure accuracy on a project like yours. If they can’t, the number is made up.
They’ve never worked on tasks similar to yours. Annotation skill doesn’t transfer automatically between domains. A team experienced in 2D image classification for e-commerce is not automatically equipped for 3D LiDAR point cloud segmentation, edge case video annotation for autonomous vehicles, or building polygon annotation for geospatial imagery. Domain-specific work requires domain-specific training. A vendor who claims they can do everything equally well is either lying or about to learn on your data.
Communication slows down after the contract is signed. Pay attention to response times during the sales process. Vendors are on their best behavior when they’re trying to win your business. If it already takes 24–48 hours to get answers on a simple pre-sales question, expect that to get worse once you’re locked in. The first sign of trouble on a live project — data issues, timeline slippage, quality problems — will surface at the worst possible time. You need a vendor who responds fast. Test it before you sign.
They can’t provide references from projects structurally similar to yours. Case studies on a website are curated. References are real. Ask for two or three contacts from clients who ran projects comparable to yours in task complexity and volume. Ask those references specifically: Were there data quality issues? How were they handled? Were timelines met? Would you use them again for the same type of work? A vendor who hesitates on references, offers only testimonials, or provides references who can’t speak to similar work is hiding something.

The Questions to Ask on Every Vendor Call

Beyond the red flags above, here are the questions that separate serious vendors from ones who will cost you later. Ask these on your first call and listen for how comfortable they are answering without hesitation:

On Quality

Walk me through your QA process step by step — who reviews, at what stage, and what happens when an error is caught?
What’s your inter-annotator agreement methodology, and what threshold do you require before delivery?
How do you handle annotation guidelines that don’t cover an edge case encountered during the project?

On Workforce

Are the annotators who will work on my project employees or contractors?
Will I have a dedicated team, or will annotators rotate in and out?
What training do your annotators receive before starting a project like mine?

On Process

What’s your rework policy if delivered data doesn’t meet agreed quality standards?
How do you handle mid-project scope changes or guideline updates?
What’s your escalation path if I raise a quality concern?

“The vendor who hesitates on any of these questions, deflects to marketing language, or promises to ‘follow up in writing’ is telling you something. Quality vendors answer these questions fluently because they’ve thought about them — and because they actually have answers.”

What a Good Vendor Evaluation Actually Looks Like

Once you’ve shortlisted two or three vendors who passed the red flag check and answered the questions above without deflecting, here’s how to make the final call:

Run a Structured Pilot

Give every shortlisted vendor the same 200–300 sample set from your actual production data. Include at least 20% edge cases — the ambiguous, partially occluded, or boundary-case examples your model will actually encounter in the real world. Pay for the pilot. Review every annotation in detail, not just a sample. Grade each vendor against the same criteria.

Measure Consistency, Not Just Accuracy

Have each vendor annotate 50 duplicate samples — the same images or frames labeled twice by different annotators. Calculate inter-annotator agreement. This tells you more about process maturity than any accuracy number the vendor gives you about themselves. A team with consistent internal agreement will produce consistent data at scale. A team with low agreement will produce chaos.

Stress-Test Communication

During the pilot, deliberately send one ambiguous question about the guidelines. Time the response. Evaluate the quality of the answer — do they engage with the ambiguity and reason through it, or do they give a generic reply and proceed? Communication quality during the pilot predicts communication quality during production.

The rule: Never commit volume before validating quality on your actual data. A vendor who performs well on a curated demo dataset and poorly on your real data has told you exactly how production will go. The pilot is not a formality — it’s the only real signal you have.

A Note on Price

Price is a signal, not a decision criterion. The cheapest option is almost never the right option, and the most expensive option isn’t automatically better. What you’re looking for is a vendor whose pricing is consistent with the quality they claim to deliver.

If a vendor quotes rates significantly below what the work actually costs to do well — accounting for annotator training, QA overhead, team stability, and reasonable pay — someone is absorbing that gap. It’s either quality, or annotator welfare, or both. Either way, you pay the real cost eventually. It just shows up in your model performance and engineering hours instead of your annotation invoice.

The vendors who are worth working with know what their work costs and price accordingly. They’re also comfortable explaining why. If a vendor can walk you through their cost structure and justify their rate, that’s a vendor who understands their own operation. That’s who you want annotating your training data.

How We Handle Vendor Evaluation at TechAI Remote

When a prospective client comes to us, we actively encourage them to run this kind of evaluation — including against us. We offer structured paid pilots on real client data, provide direct references from comparable projects, and walk through our QA process in full detail on the first call, not after the contract is signed.

Our annotators are a dedicated team of 140+ specialists, not a rotating pool. They’re trained on domain-specific tasks before touching client data. Our QA lead reviews and calibrates every project. We document edge case decisions and update guidelines mid-project when needed. And we measure inter-annotator agreement as a standard deliverable, not an optional extra.

We also won’t promise to do work we’re not equipped for. If a task is outside our core specialization in computer vision, 3D LiDAR, and video annotation, we’ll tell you that before you commit — not after the first delivery.

Bottom line: The annotation vendor you choose is deciding what your model learns. That decision deserves more than a price comparison and a 30-minute demo call. Use this checklist. Run the pilot. Ask the hard questions. The vendors who hold up under scrutiny are the ones worth working with.