How to actually choose a retrieval engine (daily search tip)

Published 3 months ago • 1 min read

How do teams choose vector databases / search engines?

People wrack their brains between Elasticsearch/OpenSearch/Solr/Vespa/Pinecone/Turbopuffer/Weaviate/…?

First things first - DO NOT start with a feature matrix. Start with the simple question:

What is my team most comfortable with? That’s the default. If everyone can go deep in one system, don’t overcomplicate the decision. It might be good enough to stop here.

NEXT - consider the high-level characteristics of the project. Use these as veto points for the original choice.

Pace of development - do you see that the project is actively being maintained and improved? If not, consider something else.
Scale - what scale does the project target? Does it match what you need? High scale, you want simple operations, executed predictably. Lower scale you’ll have richer features but won’t get predictable performance from those as you scale out. Choose the right fit.
Company capitalization - Who builds the technology? Will they exist in one year? If they don’t exist, who takes on the project?

FINALLY - think about how you make it easy to migrate OFF the technology. Don’t over-couple to one system / company. Avoid the advanced features unless they’re really killer. Build code that modularizes the dependency on the search backend so you can swap them out as needed.

-Doug

Events · Consulting · Training (use code search-tips)

You're subscribed to Doug Turnbull's daily search tips where I share tips, blog articles, events, and more. You can always manage your profile:

What is agentic search? Nobody knows

Agentic search gets interesting when agents do not know how to find the right answer. Oh, the agent might think it knows. It might confidently BS us. But the agent’s poor domain intuition steers itself astray. Agents make false assumptions about what our users think is relevant. Our fashionista users think “red shoes” should return high-heels. When I worked at one company ABE wasn’t a president, it was an A/B testing tool. Agents need context to know these things - and context engineering...

about 4 hours ago • 1 min read

How do users search in 2026?

Upcoming events in the next week or so Show us your skills w/ Hugo Bowne-Anderson Thursday May 28th - https://luma.com/ltpzpqgw Pray to the demo gods! I'll be joining Hugo Bowne-Anderson's "Show us your skills" event on Luma - highlighting using a coding agent to optimize search rankers.. Come hang out if you want to see how others in the industry leverage agentic AI to build in their domain. User search trends in 2026 Monday June 1 -...

14 days ago • 1 min read

Autoresearching BM25 on MSMarco

At Haystack I spoke about autoresearch: Code generation to optimize search rankers. Can we use it to improve on BM25? This article represents my lab notes. My agent starts with a BM25 implementation, proposes changes, and accepts those that improve NDCG. We’ll zero-in on passage retrieval dataset MSMarco. I won’t claim I’ve found a “better BM25” but I’ve iterated towards a decent tuning regime. All while learning valuable lessons about how validation data can leak. Let’s walk through what...

21 days ago • 1 min read