Divide and Conquer ================== We can search fast through huge sort data via the divide and conquer search method. Guessing a Secret ----------------- Consider the following little game: * The computer picks a random integer in [0, 1000]. * It is up to the user to guess the number. Suppose making a guess costs $1 and you get \$100 for the right guess. Would you play this game? Suppose now the computer would tell you after each incorrect guess: ``too low`` or ``too high``. With same cost and price, would you now play this game? An example of a session of the game with the ``too low`` and ``too high`` feedback is shown below. :: $ python findsecret.py Guess number in [0, 1000] : 500 Your guess is too low. Guess number in [0, 1000] : 750 Your guess is too high. Guess number in [0, 1000] : 625 Your guess is too high. Guess number in [0, 1000] : 562 Your guess is too high. Guess number in [0, 1000] : 531 Your guess is too high. Guess number in [0, 1000] : 516 Your guess is too high. Guess number in [0, 1000] : 508 Your guess is too high. Guess number in [0, 1000] : 504 Your guess is too low. Guess number in [0, 1000] : 506 found 506 after 9 guesses The search space is [0, 1000] at the start, in :numref:`fighalvingsearch`. .. _fighalvingsearch: .. figure:: ./fighalvingsearch.png :align: center Halving the search space in each step. As shown in :numref:`fighalvingsearch`, in each step the search space is cut in half. After 10 steps, we are down to the last digit. In every step we recover one bit of the secret. Binary Search ------------- Let us generate a list of 10 random two digit numbers. Consider the session below. :: >>> from random import randint >>> L = [randint(10, 99) for _ in range(10)] >>> L [32, 61, 50, 81, 30, 14, 53, 92, 22, 23] >>> 10 in L False >>> 81 in L True >>> L.index(81) 3 >>> L[3] 81 The instruction ``L.index(n)`` will throw a ``ValueError`` if ``n not in L``. The formulation of the input/output of the search problem is below: .. math:: \begin{array}{rcl} Input & : & \mbox{\tt L} \mbox{ a list of numbers and some number } \mbox {\tt x}. \\ Output & : & \mbox{index } \mbox{\tt k == -1}, \mbox{if } \mbox{\tt not x in L} \mbox{ otherwise } \mbox{\tt L[k] == x}. \end{array} The algorithm for a linear search executes the following steps: 1. Enumerate in ``L`` all its elements in ``L[k]``. 2. If ``L[k] == x`` then ``return k``. 3. Return ``-1`` at the end of the loop. Our cost analysis first considers the best case. In the best case, we find ``x`` immediately if ``x`` occurs at the start of ``L``. In the worst case, we have to traverse the entire list if ``x`` is at the end of ``L``. On average, we execute :math:`c \times n` steps, where :math:`n =~` ``len(L)``, for some constant :math:`c \approx 0.5`. We say its cost is :math:`O(n)`. Code to search linearly in a sorted list is below. To sort a list ``L``, do ``L.sort()``. :: def linear_search(numbers, nbr): """ Returns -1 if nbr belongs to numbers, else returns k for which numbers[k] == nbr. Items in the list numbers must be sorted in increasing order. """ for i in range(len(numbers)): if numbers[i] == nbr: return i elif numbers[i] > nbr: return -1 return -1 The builtin ``in`` and ``index`` for lists do not exploit order. The problem statement to search in a sorted list is below. .. math:: \begin{array}{rcl} Input & : & \mbox{\tt L} \mbox{ is a list of numbers, ordered increasingly;} \mbox{\tt x} \mbox{ is some number.} \\ Output & : & \mbox{\tt True} \mbox{ if } \mbox{\tt x in L}, \mbox{\tt False} \mbox{ otherwise.} \end{array} The rules to apply divide and conquer are * The base cases are ``len(L) == 0`` and ``len(L) == 1``. * Let ``m = len(L)//2``. If ``L[m] == x``, then ``return True``. * If ``x < L[m]``, then search in the first half of ``L``, that is ``L[:m]``. * If ``x > L[m]``, then search in the second half of ``L``, that is ``L[m+1:]``. Code for a binary search is in the function below. :: def binary_search(numbers, nbr): """ Returns True if nbr is in the sorted numbers. Otherwise False is returned. """ if len(numbers) == 0: return False elif len(numbers) == 1: return numbers[0] == nbr else: middle = len(numbers)//2 if numbers[middle] == nbr: return True elif numbers[middle] > nbr: return binary_search(numbers[:middle], nbr) else: return binary_search(numbers[middle+1:], nbr) We trace the search by * accumulating the depth of the recursion, * printing as many spaces as the depth, * printing the remaining list to search in. To test the search, we generate a list of random 2-digit numbers. :: Give lower bound : 10 Give upper bound : 99 How many numbers ? 10 L = [10, 14, 14, 19, 20, 38, 53, 60, 66, 72] Give number to search for : 21 find 21 in L = [10, 14, 14, 19, 20, 38, 53, 60, 66, 72] find 21 in L = [10, 14, 14, 19, 20] find 21 in L = [19, 20] find 21 in L = [] 21 does not occur in L The builtin ``index`` does not exploit order either. Let us define an index search for sorted lists. :: def binary_index(numbers, nbr): """ Applies binary search to find the position k of nbr in the sorted numbers. Returns -1 if not nbr in numbers, or else returns k for which numbers[k] == nbr. """ We apply the same divide and conquer as in ``binary_search()``, with additional attention to the index calculation. Code to define the ``binary_index(numbers, nbr)`` function follows. :: # search for the index of nbr in a sorted list numbers if len(numbers) == 0: return -1 elif len(numbers) == 1: return (0 if numbers[0] == nbr else -1) else: middle = len(numbers)//2 if numbers[middle] == nbr: return middle elif numbers[middle] > nbr: return binary_index(numbers[:middle], nbr) else: k = binary_index(numbers[middle+1:], nbr) if k == -1: return -1 return k + middle + 1 Bisection Search ---------------- A related problem to binary search is bisection search, used to invert a function. Consider a cumulative distribution function, for example, as shown in :numref:`figplotcdf`. .. _figplotcdf: .. figure:: ./figplotcdf.png :align: center Inverting a function: given $y = f(x)$, find $x$. The problem statement of inverting a sampled function is below. .. math:: \begin{array}{rcl} Input & \mbox{1.} & \mbox{a sampled array } A \mbox{ of function values,} \\ & \mbox{2.} & \mbox{a particular function value } y = f(x). \\ Output & k: & A[k] \leq y \leq A[k+1]. \end{array} We work with arrays of numbers sorted in increasing order. Consider the interactive Python session below. :: >>> from random import uniform as u >>> L = [u(-1, 1) for _ in range(10)] >>> L.sort() >>> from array import array >>> A = array('d', L) A linear search is provided in the function below. :: def linear_search(arr, nbr): """ Returns the index k in the array arr such that arr[k] <= nbr <= arr[k+1]. A must be sorted in increasing order. """ for i in range(len(arr)): if nbr <= arr[i]: return i-1 return len(arr) The bisection search in an array uses a function with the following prototype. :: def bisect_search(arr, nbr): """ Returns the index k in the array arr such that arr[k] <= nbr <= arr[k+1] applying binary search. """ We have two base cases: 1. If ``len(arr) == 0``, then ``return -1``. 2. If ``len(arr) == 1``, then ``return 0`` if ``arr[0] <= nbr``, otherwise ``return -1``. In the general case, define ``m = len(arr)//2``. 1. If ``nbr < arr[m]``, then search in ``arr[:m]``. 2. If ``nbr > arr[m]``, then search in ``arr[m+1:]``. Add ``m+1`` to the index returned by 2nd search. The code for the recursive function is below. :: def bisect_search(arr, nbr): """ Returns the index k in the array arr such that arr[k] <= nbr <= arr[k+1] applying binary search. """ if len(arr) == 0: return -1 elif len(arr) == 1: if arr[0] <= nbr: return 0 else: return -1 else: middle = len(arr)//2 if nbr < arr[middle]: return bisect_search(arr[:middle], nbr) else: k = bisect_search(arr[middle+1:], nbr) return k + middle +1 An application of the bisection search is the root finding problem. Let *f* be a continuous function over :math:`[a,b]`, and :math:`f(a) f(b) < 0`, then :math:`f(r) = 0`, for some :math:`r \in [a,b]`. The key steps in the bisection method are the following. 1. Let :math:`m = \frac{a+b}{2}`. 2. If :math:`f(a) f(m) < 0`, then replace :math:`[a,b]` by :math:`[a,m]`, otherwise replace :math:`[a,b]` by :math:`[m,b]`. Every step gains one bit in an approximate root *r* of *f*. The function ``bisect()`` does one step of the bisection method: :: def bisect(fun, left, right): """ If (left, right) contains a root of fun, then on return is a smaller (left, right) containing a root of fun. """ midpoint = (left + right)/2 if fun(left)*fun(midpoint) < 0: return (left, midpoint) else: return (midpoint, right) The accuracy of the root is ``right - left``. Let ``tol`` be the tolerance on the error on the root. If ``right - left < tol``, return ``(left, right)`` else call ``bisect`` again. This recursive bisection method is defined below. :: def bisectroot(fun, left, right, tol): """ Continues bisecting till the right - left is less than tol. """ if right-left < tol: return (left, right) else: (left, right) = bisect(fun, left, right) return bisectroot(fun, left, right, tol) As an example, consider the approximation of :math:`\sqrt{2}`. :: $ python bisection.py Give a function in x : x**2 - 2 give left bound A : 1 give right bound B : 2 give the tolerance : 1.0e-12 A = 1.4142135623724243 B = 1.4142135623733338 $ Exercises --------- 1. Write an iterative version of ``binary_search``. 2. Write an iterative version of ``bisectroot``. 3. The minimum of a list of unsorted numbers is the minimum of the minimum of the first half and the minimum of the second half of the list. Write a function to compute the minimum this way. 4. Given is a list of lexicographically sorted names. Use divide and conquer to find the name that occurs most frequently in the list. 5. Develop bisection search to compute the binary representation of a number ``x``, starting at the most significant bit. Use ``math.log(x,2)`` to compute the total number of bits needed to represent ``x``.