|
At Haystack I spoke about autoresearch: Code generation to optimize search rankers. Can we use it to improve on BM25? This article represents my lab notes. My agent starts with a BM25 implementation, proposes changes, and accepts those that improve NDCG. We’ll zero-in on passage retrieval dataset MSMarco. I won’t claim I’ve found a “better BM25” but I’ve iterated towards a decent tuning regime. All while learning valuable lessons about how validation data can leak. Let’s walk through what happened. More in my blog article: https://softwaredoug.com/blog/2026/05/17/autoresearching-a-better-msmarco-bm25 -Doug Slack Community * Events · Consulting · Training (use code search-tips) You're subscribed to Doug Turnbull's daily search tips where I share tips, blog articles, events, and more. You can always manage your profile: |
I share search tips, blog articles, and free events I'm hosting about the search+retreval industry, vector databases, information retrieval and more.
Agentic search gets interesting when agents do not know how to find the right answer. Oh, the agent might think it knows. It might confidently BS us. But the agent’s poor domain intuition steers itself astray. Agents make false assumptions about what our users think is relevant. Our fashionista users think “red shoes” should return high-heels. When I worked at one company ABE wasn’t a president, it was an A/B testing tool. Agents need context to know these things - and context engineering...
Upcoming events in the next week or so Show us your skills w/ Hugo Bowne-Anderson Thursday May 28th - https://luma.com/ltpzpqgw Pray to the demo gods! I'll be joining Hugo Bowne-Anderson's "Show us your skills" event on Luma - highlighting using a coding agent to optimize search rankers.. Come hang out if you want to see how others in the industry leverage agentic AI to build in their domain. User search trends in 2026 Monday June 1 -...
Search events this week! Bag of Documents Model w/ Daniel Tunkelang Tuesday, 1PM ET - https://maven.com/p/7270ba/a-bag-of-documents-model-for-query-understanding-retrieval Tomorrow, Trey Grainger and I will host Daniel Tunkelang as he introduces his "Bag of Documents" technique for vector search. Bag of Documents is a form of pseudo-relevance feedback using document embeddings. Search week in Berlin June 7-12 2026 - https://berlinsearchweek.com/index.html A reminder in June: Berlin Buzzwords...