Remove term frequency from title fields (daily search tip)

Published 6 months ago • 1 min read

Users want to know if a document is about the searched for terms. Search tech news articles for “iPhone” - relevance statistics like BM25 infer more occurrences in the article body means the article is more about iPhone.

Human authors write more than paragraphs. They add titles and create section headings. Repeated occurrences of a term here rarely matter. There’s no difference between:

iPhone for the iPhone expert
iPhone: the ultimate guide

So today’s tip, remove the influence of term frequency when searching title fields. you can

Lower k1 to 0 to create binary relevance
Use an alternative similarity
Disable term frequency entirely for this field

-Doug

Services: Training (use code search-tips) · Consulting

You're subscribed to Doug Turnbull's daily search tips where I share tips, blog articles, events, and more. You can always manage your profile:

What is agentic search? Nobody knows

Agentic search gets interesting when agents do not know how to find the right answer. Oh, the agent might think it knows. It might confidently BS us. But the agent’s poor domain intuition steers itself astray. Agents make false assumptions about what our users think is relevant. Our fashionista users think “red shoes” should return high-heels. When I worked at one company ABE wasn’t a president, it was an A/B testing tool. Agents need context to know these things - and context engineering...

8 days ago • 1 min read

How do users search in 2026?

Upcoming events in the next week or so Show us your skills w/ Hugo Bowne-Anderson Thursday May 28th - https://luma.com/ltpzpqgw Pray to the demo gods! I'll be joining Hugo Bowne-Anderson's "Show us your skills" event on Luma - highlighting using a coding agent to optimize search rankers.. Come hang out if you want to see how others in the industry leverage agentic AI to build in their domain. User search trends in 2026 Monday June 1 -...

22 days ago • 1 min read

Autoresearching BM25 on MSMarco

At Haystack I spoke about autoresearch: Code generation to optimize search rankers. Can we use it to improve on BM25? This article represents my lab notes. My agent starts with a BM25 implementation, proposes changes, and accepts those that improve NDCG. We’ll zero-in on passage retrieval dataset MSMarco. I won’t claim I’ve found a “better BM25” but I’ve iterated towards a decent tuning regime. All while learning valuable lessons about how validation data can leak. Let’s walk through what...

29 days ago • 1 min read