AlgoLab hates this one Weird Trick for Binary Search

Note: I thoroughly overhauled this article following some great suggestions by my friend Nicolas. Thank you Nicolas!

I’m taking the Algorithms Lab course at ETH this semester. Among its many algorithmic tricks, plenty of tasks require binary searching for a parameter - how hard could that be? Well, judging by quite some headaches: It can be trickier than you think! This post is about the binary-search implementation I use that just works^TM, but also why it works (and I think it’s the best), and discussing some potential footguns along the way.

General Form

Let’s formalise the problem a bit: We have some candidate range $S = [lo,\, hi]$ we are searching over. We also suppose we have some monotonic binary predicate $f:\; S \to \{true, false\}$. We further assume the following:

Non-decreasingness: $\forall x \in S. \forall x’ \geq x \in S. f(x) = true \implies f(x’) = true$. Note that if our predicate were non-increasing, we could easily substitute $\lnot f$ and search over that.
Feasibility: Some $x \in S$ satisfies the predicate and some other $x’ \in S$ does not. Note that this can simply be tested by evaluating $f(hi)$ and $f(lo)$. If the predicate holds for all (or none) of the elements in the range, the search is clearly trivial.

With these assumptions, if we write the values of the predicate out from $lo$ to $hi$ it will look something like so: $false, false, \dots, false, true, true, \dots, true$.

We will aim to find the two values that cover the change from false to true, i.e. the greatest value such that $f$ is false, and the least value such that $f$ is true. The neat thing about this is that this covers both possible use-cases you might have (greatest possible or least possible value) in just one implementation.

Implementation Derivation

The search range $S$ will be demarcated by the inclusive bounds $lo$ and $hi$. We therefore keep searching until $hi - lo = 1$, i.e. we have found our final two values.

while hi - lo > 1:

We find the centre of our current range as $\lfloor\frac{hi + lo}{2}\rfloor = \lfloor\frac{hi - lo}{2}\rfloor + lo$, where the latter is preferred to prevent overflow.

    mid = (hi - lo) // 2 + lo 

Now comes the interesting part: We evaluate $f$ at $mid$. What insights can we gain from monotonicity? Well, $f(mid) = true \implies f(x) = true$ for all $x \geq mid$. So if $f(mid)$ holds, it surely makes sense to shrink our search range to $[lo,\, mid]$. Similarly, $f(mid) = false \implies f(x) = false$ for all $x \leq mid$. Therefore, we can shrink our range to $[mid, \, hi]$. Thus:

    if f(mid):
        hi = mid
    else:
        lo = mid

Can we ever get stuck in an infinite loop? Let’s check. We get stuck if after an iteration $lo$ and $hi$ are unchanged. The loop conditions implies there is at least one element strictly between $lo$ and $hi$, so we always make progress!

You might think it’s a bit daft to spell something so simple as binary search out in such detail, but you would be surprised how easily you can complicate it. For example, if you search until the range only contains one element, you suddenly need to start thinking about rounding when computing $mid$ and so on, making it simple to accidentally end in an infinite loop or get an off-by-one error.

Full Implementation (C++)

TLDR: This is the binary search I use.

// Either check to ensure feasibility, or use sentinel values (like -1 and N)
long lo = ...; // Invariant: f(lo) == false
long hi = ...; // Invariant: f(hi) == true
while (hi - lo > 1) {
    long mid = (hi - lo) / 2 + lo;
    if (f(mid)) {
        hi = mid;
    }
    else {
        lo = mid;
    }
}
// lo = greatest value for which f is false
// hi = least value for which f is true