Be wary of public benchmarks (daily search tip)


You may know ANN Benchmarks - it’s a leaderboard of vector search algorithms. It’s referenced a lot by companies when choosing a vector system.

But let’s look at ANN Benchmarks - it measures:

  • Recall
  • Latency

What does it NOT measure?

  • Incremental updates impact on search latency
  • Sharding and replication
  • Reliability
  • Consistency / availability of updates
  • Filtering performance
  • Memory usage
  • Recall on YOUR embeddings

Depending on YOUR problem, you may choose an ANN Benchmarks loser if say, you care about fast updates, performant filters, or low memory footprint.

All to say, there’s nothing wrong with ANN Benchmarks, but YOUR problem is almost certainly far more multidimensional than just maximizing recall for the latency.

When you choose a vector database - or really anything - don’t just look at the topline public benchmark. You need to discover your product requirements. Real production problems transcend a few easily benchmarkable metrics.

-Doug

Events · Consulting · Training (use code search-tips)

You're subscribed to Doug Turnbull's daily search tips where I share tips, blog articles, events, and more. You can always manage your profile:

Doug Turnbull

I share search tips, blog articles, and free events I'm hosting about the search+retreval industry, vector databases, information retrieval and more.

Read more from Doug Turnbull

Have you been to a conversion-crazy site? It’s nuts. Their site screams at you. They probably have the modern version of the HTML blink tag. Popups everywhere just won't go away. Buy buy buy! It’s fun to go to a physical store when you can browse the shelves, talk to customer service, and get help. People avoid stores lacking information and only high pressure salespeople in your face. If your search stinks of pressure, users will retreat. They’ll stay on Google. They win precisely because...

My buddy John Berryman and I were nicely hosted by Hugo Browne-Anderson on the Vanishing Gradients podcast. We talked about how agentic search stands poised to be more disruptive to the Information Retrieval space than RAG. Check it out! Other upcoming events Tomorrow (free) Cheat at Search Essentials, Vector Search (free) ReasoningLayer.ai - symbolic reasoning with LLMs Tuesday (free) Cheat at Search Essentials - Search Evaluation (WTF is an NDCG?) Friday (free) Doug Turnbull + Daniel...

Youtube masterminded how to turn engagement into insights. Whether search or their feed, you can learn from how they learn from you! They: Create low-friction, sticky, addictive little interactions you do subconsciously on the surface. How often have you found yourself hovering over a video on your feed? Give you many actions to take on a video, even from search results themselves (bookmark, share, etc) Treat their monetization (ads) more as a guardrail. They know engaging you with a sticky,...