Arrays and Matrices =================== For working with numerical data, arrays are more memory efficient. Searching in two dimensional tables occur frequently. We consider two common problems: finding extremal values and locating saddle points. Such searching algorithms apply the same double loop: for all rows, for all columns, do something. Later we will scan all words from a page of text on file: for all lines on the page, read all words on the line. We start with a generalization of our factorization in primes problem. The Sieve of Eratosthenes ------------------------- Let us consider the problem of counting the number of primes less than a given number. The statement with input and output specifications is below: .. math:: \begin{array}{rcl} input & : & \mbox{a natural number } n \\ output & : & \mbox{all prime numbers } \leq n \end{array} Running factorization in primes for all numbers :math:`\leq n` is too expensive. The method called the *Sieve of Eratosthenes* works as follows: 1. A Boolean table ``isprime[i]`` records prime numbers: if ``i`` is prime, then ``isprime[i] == True``, otherwise ``isprime[i] == False``. 2. All multiples of prime numbers are not prime: for all ``isprime[i]`` : ``isprime[i*k] = False``. Consider for example the computation of all primes less than 10. In the table below :math:`T` stands for ``True`` and :math:`F` stands for ``False``. .. math:: \begin{array}{cccccccccc} i & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ 0 & T & T & T & T & T & T & T & T & T \\ 2 & T & T & {\rm F} & T & {\rm F} & T & {\rm F} & T & {\rm F} \\ 3 & T & T & F & T & F & T & F & {\rm F} & F \\ \end{array} Initially, all entries in the table are ``True``. The algorithm uses a double loop: 1. the first loop runs for *i* from 2 to *n*; 2. the second loop runs only if *i* is prime, setting to ``False`` all multiples of *i*. In order to be more efficient, we may restrict the range of the first loop: ``while (i < n//i+1)``. Arrays in Python ---------------- Arrays are sequences of *fixed* length; filled with objects of the *same* type. Compared to lists (variable length, heterogeneous), arrays are more memory efficient, and allow for faster access. Available in Python via the module {\tt array}, the module {\tt array} exports the class {\tt array}. We can create arrays directly from lists: :: >>> from array import array >>> L = range(2, 7) >>> a = array('b', L) array('b', [2, 3, 4, 5, 6]) The ``'b'`` stands for signed integer, one byte. Selecting an element from an array is the same as selecting an element from a list. Also the slicing operation is the same: :: >>> a[1] 3 >>> a[1:3] array('b', [3, 4]) The current memory info is obtained as :: >>> a.buffer_info() (28773520, 5) The types are restricted to numerical types, summarized in :numref:`tabarraytypes`. .. _tabarraytypes: .. table:: Types of entries in an array. +-----------+--------------------+--------+ | Type code | C Type | #bytes | +===========+====================+========+ | ``'b'`` | signed integer | 1 | +-----------+--------------------+--------+ | ``'B'`` | unsigned integer | 1 | +-----------+--------------------+--------+ | ``'u'`` | Unicode character | 2 | +-----------+--------------------+--------+ | ``'h'`` | signed integer | 2 | +-----------+--------------------+--------+ | ``'H'`` | unsigned integer | 2 | +-----------+--------------------+--------+ | ``'i'`` | signed integer | 2 | +-----------+--------------------+--------+ | ``'I'`` | unsigned integer | 2 | +-----------+--------------------+--------+ | ``'l'`` | signed integer | 4 | +-----------+--------------------+--------+ | ``'L'`` | unsigned integer | 4 | +-----------+--------------------+--------+ | ``'q'`` | signed integer | 8 | +-----------+--------------------+--------+ | ``'Q'`` | unsigned integer | 8 | +-----------+--------------------+--------+ | ``'f'`` | floating point | 4 | +-----------+--------------------+--------+ | ``'d'`` | floating point | 8 | +-----------+--------------------+--------+ An array can be converted into a list. Consider for example the following session: :: >>> from array import array >>> L = range(2,7) >>> A = array('b', L) >>> A array('b', [2, 3, 4, 5, 6]) >>> type(A) With the method ``tolist``, we convert the array to a list: :: >>> K = A.tolist() >>> K [2, 3, 4, 5, 6] >>> type(K) Arrays can be converted to strings and strings can be converted back to arrays. The above Python session is continued: :: >>> s = A.tostring() >>> s b'\x02\x03\x04\x05\x06' In reverse, we can make an array from a string ``s``: :: >>> B = array('b') >>> B.fromstring(s) >>> B array('b', [2, 3, 4, 5, 6]) Now we return to our application, the sieving method to compute all primes less than a given number. Initially, all entries in the table are ``True``. The initialization is in de code below: :: def prime_sieve(nbr): """ Returns all primes less than nbr. """ isprime = array('B') for _ in range(nbr+1): isprime.append(1) The algorithm uses a double loop: 1. the first loop runs for *i* from 2 to *n*; 2. the second loop runs only if *i* is prime, setting to ``False`` all multiples of *i*. We extend the function ``prime_sieve``: :: def prime_sieve(nbr): """ Returns all primes less than nbr. """ isprime = array('B') for _ in range(nbr+1): isprime.append(1) i = 2 while i < nbr//i+1: if isprime[i] == 1: for j in range(i, nbr//i+1): isprime[i*j] = 0 i = i + 1 Why is ``nbr//i+1`` necessary? The last operation is the collection the primes from the sieve. The complete function is below: :: def prime_sieve(nbr): """ Returns all primes less than nbr. """ isprime = array('B') for _ in range(nbr+1): isprime.append(1) i = 2 while i < nbr//i+1: if isprime[i] == 1: for j in range(i, nbr//i+1): isprime[i*j] = 0 i = i + 1 primes = array('l') for i in range(2, nbr+1): if isprime[i] == 1: primes.append(i) return primes With the definition of ``main``, we can use the script ``sieve`` as a script and a module. :: def main(): """ Prompts the user for a natural number n and prints all primes less than or equal to n. """ nbr = int(input('give a positive number : ')) primes = prime_sieve(nbr) count = primes.buffer_info()[1] print('#primes = ', count) print(' primes = ', primes) if __name__ == "__main__": main() Why are the last two lines useful? :: >>> from sieve import prime_sieve >>> p = prime_sieve(100) Matrices -------- Python does not have a two dimensional array type. Instead we can store a matrix as a list of rows, where the data on each row is stored in an array. Consider for example a 5-by-5 matrix of random two digit numbers: :: >>> from array import array >>> from random import randint >>> A = [array('b',[randint(10, 99) ... for _ in range(5)]) ... for _ in range(5)] >>> for row in A: print(row) ... array('b', [23, 62, 85, 82, 38]) array('b', [68, 54, 18, 16, 37]) array('b', [91, 70, 88, 42, 56]) array('b', [42, 61, 90, 91, 41]) array('b', [40, 13, 19, 66, 54]) To select the row with index 2: :: >>> A[2] array('b', [91, 70, 88, 42, 56]) To select the element in column 3 or row 2: :: >>> A[2][3] 42 We can define a matrix from a function. For example, consider a 5-by-5 matrix where the :math:`(i,j)`-th element is :math:`i+j`: :: >>> from array import array >>> f = lambda x,y: x+y The function ``f`` defines the :math:`(i,j)`-th entry. We use ``f`` in a doubly nested list comprehension to define a list of rows: :: >>> L = [array('b',[f(i,j) for i in range(5)]) ... for j in range(5)] >>> for row in L: print(row) ... array('b', [0, 1, 2, 3, 4]) array('b', [1, 2, 3, 4, 5]) array('b', [2, 3, 4, 5, 6]) array('b', [3, 4, 5, 6, 7]) array('b', [4, 5, 6, 7, 8]) The problem we next consider is, given a matrix of numbers, find the row and column of the smallest element in the matrix. As a geometric test, we take points on a paraboloid. The simplest mathematical description of a paraboloid is the expression :math:`z = x^2 + y^2`, where the minimum is at the point :math:`(0,0,0)`. In our geometric setup, we want a * a 10-by-10 matrix of integer valued points; * where the minimum occurs at row 5, column 5. Therefore, we sample :math:`z = (x-5)^2 + (y-5)^2`, for *x* and *y* both ranging from 0 till 9. :: def test(): """ Tests the findmin on the values on a paraboloid. """ from array import array paraboloid = lambda x, y: (x-5)**2+(y-5)**2 data = [array('b', [paraboloid(i, j) \ for i in range(10)]) \ for j in range(10)] print('looking for a minimum in ') for row in data: print(row) The function ``test()`` prints the following: :: looking for a minimum in array('b', [50, 41, 34, 29, 26, 25, 26, 29, 34, 41]) array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32]) array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25]) array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20]) array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17]) array('b', [25, 16, 9, 4, 1, 0, 1, 4, 9, 16]) array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17]) array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20]) array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25]) array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32]) The input/output specification of our problem is stated as follows: .. math:: \begin{array}{rcl} input & : & \mbox{a matrix } {\tt A} \\ output & : & {\tt (i, j)}: {\tt A[i][j]} \mbox{ is the minimum} \end{array} The algorithm to solve this problem proceeds as follows: 1. initialize current minimum with ``A[0][0]``, and initialize its position to ``(0, 0)`` 2. go over all rows ``i`` and columns ``j`` of ``A`` 3. if ``A[i][j]`` is less than current minimum, assign ``A[i][j]`` to the current minimum, assign ``(i, j)`` to its position This algorithm is encode in the funciton ``findmin()`` below: :: def findmin(matrix): """ Returns the row and the column indices of the minimum element in the matrix. """ (row, col) = (0, 0) val = matrix[row][col] for i in range(0, len(matrix)): nbcols = matrix[i].buffer_info()[1] for j in range(0, nbcols): if matrix[i][j] < val: (row, col) = (i, j) val = matrix[row][col] return (row, col) The ``test()`` function is then extended with the following lines: :: def test(): # omitted lines (i, j) = findmin(data) print('minimum value %d occurs at (%d, %d)' \ % (data[i][j], i, j)) if __name__ == "__main__": test() To run the test, we type at the command prompt: :: $ python3 findmin.py looking for a minimum in array('b', [50, 41, 34, 29, 26, 25, 26, 29, 34, 41]) array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32]) array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25]) array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20]) array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17]) array('b', [25, 16, 9, 4, 1, 0, 1, 4, 9, 16]) array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17]) array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20]) array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25]) array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32]) minimum value 0 occurs at (5, 5) Our next, somewhat harder problem, involves the location of saddle points. The simplest mathematical description of a saddle is :math:`z = x^2 - y^2`, where :math:`(0,0,0)` is a saddle point. The geometric setup is similar as with the paraboloid, we sample :math:`z = -(x-5)^2 + (y-5)^2`, for *x* and *y* both ranging from 0 till 9. The test function starts below: :: def test(): """ Testing the location of saddle points. """ from array import array surface = lambda x, y: - (x-5)**2 + (y-5)**2 data = [array('b', [surface(i, j) \ for i in range(10)]) \ for j in range(10)] print('looking for a saddle in ') for row in data: print(row) Observe the minus sign in the definition of ``surface``. The test matrix, with some minor edits to display the column structure better, is the following: :: looking for a saddle in array('b', [ 0, 9, 16, 21, 24, 25, 24, 21, 16, 9]) array('b', [ -9, 0, 7, 12, 15, 16, 15, 12, 7, 0]) array('b', [-16, -7, 0, 5, 8, 9, 8, 5, 0, -7]) array('b', [-21, -12, -5, 0, 3, 4, 3, 0, -5, -12]) array('b', [-24, -15, -8, -3, 0, 1, 0, -3, -8, -15]) array('b', [-25, -16, -9, -4, -1, 0, -1, -4, -9, -16]) array('b', [-24, -15, -8, -3, 0, 1, 0, -3, -8, -15]) array('b', [-21, -12, -5, 0, 3, 4, 3, 0, -5, -12]) array('b', [-16, -7, 0, 5, 8, 9, 8, 5, 0, -7]) array('b', [ -9, 0, 7, 12, 15, 16, 15, 12, 7, 0]) Observe the occurrence of the zero entries. Now we give the numerical definition of a saddle point in a matrix. A saddle point is 1. maximal in its row; and 2. minimal in its column. The input/output specification of the problem statement is then .. math:: \begin{array}{lcl} input & : & \mbox{a matrix } {\tt A} \\ output & : & \mbox{list of } {\tt (i,j)} \mbox{ of saddles } {\tt A[i][j]} \end{array} The algorithm proceeds as follows: 1. for all rows ``i``, let ``A[i][maxcol]`` be maximal 2. check in column ``maxcol`` whether ``A[i][maxcol] <= A[k][maxcol]``, for all ``k != i`` The execution of the algorithm is traced below: :: looking for a saddle in array('b', [ 0, 9, 16, 21, 24, 25, 24, 21, 16, 9]) array('b', [ -9, 0, 7, 12, 15, 16, 15, 12, 7, 0]) array('b', [-16, -7, 0, 5, 8, 9, 8, 5, 0, -7]) array('b', [-21, -12, -5, 0, 3, 4, 3, 0, -5, -12]) array('b', [-24, -15, -8, -3, 0, 1, 0, -3, -8, -15]) array('b', [-25, -16, -9, -4, -1, 0, -1, -4, -9, -16]) array('b', [-24, -15, -8, -3, 0, 1, 0, -3, -8, -15]) array('b', [-21, -12, -5, 0, 3, 4, 3, 0, -5, -12]) array('b', [-16, -7, 0, 5, 8, 9, 8, 5, 0, -7]) array('b', [ -9, 0, 7, 12, 15, 16, 15, 12, 7, 0]) max 25.0 in row 0, column 5 smaller value in row 1 max 16.0 in row 1, column 5 smaller value in row 2 max 9.0 in row 2, column 5 smaller value in row 3 max 4.0 in row 3, column 5 smaller value in row 4 max 1.0 in row 4, column 5 smaller value in row 5 max 0.0 in row 5, column 5 saddle at (5,5) max 1.0 in row 6, column 5 smaller value in row 5 max 4.0 in row 7, column 5 smaller value in row 4 max 9.0 in row 8, column 5 smaller value in row 3 max 16.0 in row 9, column 5 smaller value in row 2 We go over all rows, looking for the largest element, as is done in the function ``saddle``. :: def saddle(matrix): """ Returns the coordinates of saddles: maximum in rows, minimum in columns. """ result = [] for i in range(0, len(matrix)): (maxval, maxcol) = (matrix[i][0], 0) nbcols = matrix[i].buffer_info()[1] for j in range(1, nbcols): if matrix[i][j] > maxval: (maxval, maxcol) = (matrix[i][j], j) prt = 'max %.1f in row %d, column %d' \ % (maxval, i, maxcol) We have a candidate saddle point in ``matrix[i][maxcol]}``. Now we check whether it is minimal in its column, in the ``i``-loop: The code for the function ``saddle`` continues then: :: # continuation of the body of the for i in range(0, len(matrix)) is_saddle = True for k in range(0, len(matrix)): if(k != i): if matrix[k][maxcol] < maxval: prt = prt + ' smaller value in row %d' % k is_saddle = False break if is_saddle: prt = prt + ' saddle at (%d,%d)' % (i, maxcol) result.append((i, maxcol)) print prt return result For working with numerical data, arrays are more memory efficient. Searching in two dimensional tables occur frequently. We consider two common problems: finding extremal values and locating saddle points. Such searching algorithms apply the same double loop: for all rows, for all columns, do something. Later we will scan all words from a page of text on file: for all lines on the page, read all words on the line. Exercises --------- 1. Give a Python function ``swap`` to reverse the order of the elements in an array. After calling ``swap`` on ``A = [2 9 8 3]``, we have ``A = [3 8 9 2]``. Do not create a new array inside ``swap``. 2. Write a Python function ``Duplicates`` whose input is an array ``A`` and output is a list of elements that occur at least twice in ``A``. 3. A plateau in an array is the longest sequence of the same elements that occur in the array. Write a Python function that returns the start and begin index of a plateau in a given array. 4. Extend the function ``saddle`` with a parameter ``verbose`` which takes the default value ``True``. If ``verbose``, then the strings ``prt`` are shown, otherwise the function ``saddle`` does not print anything. 5. For a two dimensional matrix of integer values, write a formatted print function: every entry is printed with the formatting string ``'\%3d'``. 6. Extend the ``findmin`` to find the smallest value in a 3-dimensional matrix. Think of a elements in the matrix as temperature measurements.