Binary search is one of the fundamental algorithm in the computer science. Binary search is used to quickly find a value in a sorted sequence (consider the sorted sequence to be an ordinary array). We will call the value we want to find as target value. Binary sort maintains a contiguous sequence of values where the target value is surely located. We will call this contiguous sequence as the search space. Initially whole sequence is the part of search space.
In each step the median of the search space is compared with the target value. If the median matches with the target value, we return the position of the target value in the sequence. If the median does not matches with the target value, then based on the comparison since the sequence is sorted, we can eliminate half of the sequence. After each consecutive step we half the search space until we find the target value or the search space is empty.
Consider an example, we have a sorted sequence and we are looking for 55, which is our target value. We want to find the location of out target value.
0 5 13 19 22 41 55 68 72 81 98
Initially the whole sequence is included in the search space. The index of the sequence is from 1 to 11. We compare our target value to median at index 6. The median 41 is less than 55, so we can eliminate half of the search space indexed from 1 to 6. Since the sequence was sorted, all elements before 41 are smaller than 41. So no element from index 1 to 6 will be equal to 55. Hence we are left with half of the search space which is
55 68 72 81 98
Similarly as we did now, we can chop of second half of the search space and are left with
Depending upon how median is chosen from even number of elements, binary search will find 55 in the next step or cut 68 to get a search space of one element. From both the ways we can conclude that 55 is located at index 7.
If 55 was not present in the sequence binary search would empty the search space entirely. Here is some code to check the condition:
binary_search(A, target): lo = 1, hi = size(A) while lo <= hi: mid = lo + (hi-lo)/2 if A[mid] == target: return mid //target found at mid else if A[mid] < target: lo = mid+1 else: hi = mid-1 // target was not found
Complexity of Binary Search
Since after every comparison binary search halves the search space, we can easily say that to find any target value in a sequence binary search would never talk more than (in big-oh notation) \(O(\log n)\) steps, where \(n\) is the size of the sequence. Logarithm is very slowly growing function. Consider searching a name in a phone directory. Binary search can find any name in around \(21\) comparison. And if you manage to sort all names in the world, then binary search can find any name in around \(35\) comparisons.
We always assume that we have random access to any elements of the sequence. For data structures like linked list binary search may not work, rather linear search works better in this case.
Binary Search in Standard Libraries
C++ Standard Template Library implements binary search in algorithms lower_bound, upper_bound, binary_search and equal_range, depending upon what you need to do. Java has a build-in Array.binary_search method for arrays and the .NET Framework has Array.BinarySearch .
Beyond arrays: the Discrete Binary Search
The binary search are not only limited for any tangible sequence. Sequence is just function which associate integers (indices) with the corresponding values. In fact some algorithm can be used on functions whose domain is the set of integer. The only difference is that the array look up is replaced by function evolution. We now need to find a \(x\) such that \(f(x)\) is equal to the target value. The search space is the sub-interval of the domain of the function while the target value is an element of the co-domain. In this case we not only need to do \(O(\log n)\) comparisons to find the target value, but we also do need to evaluate the function more than that many times. And also we are not restricted by practical quantities such as available memory.
The Main Theorem
When we encounter some problem that we think can be solved using binary search, we also need a way to prove that it will work. There is a theorem, known as main theorem, which can be really helpful in proving that binary search would work on given problem. It will allow us to solve more problems and help implementing binary search.
Consider a predicate \(p\) over some ordered set \(S\) (the search space). The search space consists of may candidate solutions to the problem. Here predicate is a function that returns boolean value true or false. We use the predicate to verify whether the candidate solution violates any constraints or not according to the given problem. The main theorem states that
“binary search can be used if and only if for all \(x\) in \(S\), \(p(x)\) implies \(p(y)\) for all \(y\gt x\).”
This property is used when we discard the second half of the search space. It is equivalent to saying that \(\sim p(x)\) implies \(\sim p(y)\) for all \(y\le x\) (the \(\sim\) symbol denotes the logical not operator), which is what we used when we discarded the first half of the search space.
Now we can say that if had a yes or no question (a predicate), getting a yes answer for some candidate solution \(x\) means that we should get a yes answer for all elements after \(x\). Similarly if we get a no answer for any \(x\), we would get a no answer for all elements before \(x\). As a consequence if we ask the question for all elements in the search space in order, we would get a series of no answer followed by series of yes answer. If the condition in the main theorem satisfies, the binary search finds the smallest legal solution, i.e., the smallest \(x\) for which \(p(x)\) is true. The vice versa is also true according to the predicate and the problem.
We can devise the solution based on binary search in two parts.
- In the first part we design a predicate which can be evaluated and for which using binary search makes some sense. We need to choose what algorithm should find. We can have the predicate find the first \(x\) for which \(p(x)\) is true or the last \(x\) for which \(p(x)\) is false.
- In the second part we prove that binary search can be applied to the predicate. This is where we use main theorem, verifying that the conditions in the theorem are satisfied. The proof doesn’t need to be completely mathematical. This can be often done by applying common sense in a sentence or two. When the domain of the predicate are the integer, its sufficient to prove that \(p(x)\) implies \(p(x+1)\) or that \(\sim p(x)\) implies \(\sim p(x+1)\), the rest can be shown by induction.
These two parts are most often interleaved. When we think a problem can be solved by binary search, we aim to design a predicate so that it satisfies the condition in the main theorem.
Now you might think why we should use this abstraction rather than simple looking binary search which we discussed earlier. There are many problems upon which binary search cannot be applied directly but we may design a predicate that can be evaluated to get a solution. The problem are not always necessarily straight forward. Consider we need to find an assignment which will cost us less than or equal to x. Here the target value is not defined, but we can evaluate a predicate “Is there an assignment which cost \(x\) or less?” and then apply binary search to find the smallest \(x\) which satisfies the predicate. This is called reducing the original problem to a decision (yes/ no) problem.
Going back to the example we saw in the beginning. We can state its problem as “Given a sorted array and a target value, return the index of the first element which is greater or equal to the target value”.
All the indices in the search space are candidate solution. And we want to find the index of target value from all the candidate indices. Now we can design the predicate for this, consider predicate “Is \(A[x]\) greater than or equal to target value?” If we ask this question for every candidate element, we would get a series of no answer followed by a series of yes answer. If we find the first \(x\) for which the predicate says yes, we would get exactly what we are looking for.
The condition in the main theorem is satisfied as the array is sorted in ascending order. If \(A[x]\) is greater than or equal to the target value than all the elements after \(A[x]\) are surely greater than or equal to given target value.
0 5 13 19 22 41 55 68 72 81 98
The indices of the elements in search space
1 2 3 4 5 6 7 8 9 10 11
If the target value is 55 then predicate would return
no no no no no no yes yes yes yes yes
The index 7 is the first \(x\) (where the target value is located) for which the predicate yields yes, so this is what our binary search will find.
Implementing the Discrete Algorithm
Some things should be taken care of before beginning the code. The search space is bound by upper bound and lower bound. This bound is a closed interval which should surely contain the first \(x\) for which \(p(x)\) is true. Then we direct all our code in maintaining this invariant, it tells how to properly move the bounds.
Another thing to think about is how wide should we set the bound. Since the search space should surely contain the first \(x\) for which \(p(x)\) is true, the bound can be as long as it doesn’t break the evaluation of the predicate. The execution time of binary search increases logarithmically with the bounds, so you can always set them higher.
Now here goes some code for applying binary search.
binary_search(lo, hi, p): while lo < hi: mid = lo + (hi-lo)/2 if p(mid) == true: hi = mid else: lo = mid+1 if p(lo) == false: complain // p(x) is false for all x in S! return lo // lo is the least x for which p(x) is true
Here either p(mid) can be true or false. If p(mid) is true, then according to our predicate, it is either greater than or equal to the target value. We can discard the second half of the search space, since the predicate is true for all elements in it by the main theorem (the sequence is sorted in ascending order). But we cannot discard mid itself as it may be the first element for which \(p\) is true. So we bring the upper bound to the mid . In a similar manner, if p(mid) is false, we can discard the first half of the search space, but this time we eliminate mid also as p(mid) is false . So we do not need it in our search space. We pull the lower bound to mid+1 .
It is good to test your code on a two-element set where the predicate is false for the first element and true for the second. This helps in finding a bug in your code, if any. And you may also test your code against corner cases that may arise in a problem.
Now you should try to solve some problems that can be solved using binary search. Just try to keep a few things in mind.
- Design a predicate which can be effectively evaluated and so that binary search can be applied.
- Decide on what you are looking for and code it such that the search space always contains the target value (if it exist).
- If the search space contains only of integers (indices), test your algorithm on a two element set to sure it doesn’t goes into an infinite loop.
- Verify that the lower and upper bound are not overly constrained, it’s better to relax them as long as it doesn’t break the predicate.
This article is contributed by Jyoti. If you like eLgo Academy and would like to contribute, you can mail your article to firstname.lastname@example.org. See your article appearing on the eLgo Academy page and help other Geeks. Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.