Consider pairwise evals instead of pointwise (daily search tip)

Published 11 days ago • 1 min read

If pointwise evals asks “How relevant is this from 1-5” - pairwise search evals says “Which of these two results is more relevant - X or Y?”

Comparing two items at a time has some advantages:

Less chance for per-decision error - harder to screw up one is better than another
More precise results - fine grain details that can’t be shoved into a 1-5 scale
Faster decisions - comparisons often can be made quicker

However, two major downsides remain

Pairwise evals take more time - instead of rating 10 items 1-5, you need to compare 10 items against 9 other items to get a complete picture
Pairwise evals need to be transformed into pointwise - to use traditional search metrics or ranking data, we need a single score per-document

Luckily these factors can be mitigated.

LLMs can do a lot of the simpler evals / comparisons - such as my approach to LLM as a judge
A system like Elo - used in competitions like chess - can be used to turn 1 vs 1 competitions (like pairwise comparisons) into a pointwise rating

-Doug

Events · Consulting · Training (use code search-tips)

You're subscribed to Doug Turnbull's daily search tips where I share tips, blog articles, events, and more. You can always manage your profile:

Blog post on Bayesian BM25

Just sharing my post on Bayesian BM25 and other ways of normalizing BM25 scores. Enjoy! https://softwaredoug.com/blog/2026/03/06/probabilistic-bm25-utopia Do you have any thoughts on normalizing BM25 scores? -Doug Events · Consulting · Training (use code search-tips) You're subscribed to Doug Turnbull's daily search tips where I share tips, blog articles, events, and more. You can always manage your profile:

about 12 hours ago • 1 min read

Ugly hack to force BM25 to 0-1 (daily search tip)

Its convenient to have a lexical score normalized from 0-1. Sadly BM25 scores tend to be all over the place (0.5? 5.1? 12.51?). Fine for ranking. Annoying for other goals. That's why I wrote a post about one way to compute probabilities from BM25. In that post, I allude to one hack that forces BM25 to 0-1. Let's walk through it. A query term’s BM25 score is IDF * TF. Lucene’s TF is already normalized Lucene drops the (k1 + 1) in the numerator of BM25, giving you: Now we’ve got a TF term...

1 day ago • 1 min read

Blog post - can BM25 be a probability?

Reviewing Bayesian BM25 - a new approach to creating calibrated BM25 probabilities for hybrid search. I talk about this vs naive approaches I've used to do similar things. Enjoy! https://softwaredoug.com/blog/2026/03/06/probabilistic-bm25-utopia -Doug Events · Consulting · Training (use code search-tips) You're subscribed to Doug Turnbull's daily search tips where I share tips, blog articles, events, and more. You can always manage your profile:

4 days ago • 1 min read