Arrays and Matrices

For working with numerical data, arrays are more memory efficient. Searching in two dimensional tables occur frequently. We consider two common problems: finding extremal values and locating saddle points.

Such searching algorithms apply the same double loop: for all rows, for all columns, do something. Later we will scan all words from a page of text on file: for all lines on the page, read all words on the line.

We start with a generalization of our factorization in primes problem.

The Sieve of Eratosthenes

Let us consider the problem of counting the number of primes less than a given number. The statement with input and output specifications is below:

\[\begin{split}\begin{array}{rcl} input & : & \mbox{a natural number } n \\ output & : & \mbox{all prime numbers } \leq n \end{array}\end{split}\]

Running factorization in primes for all numbers \(\leq n\) is too expensive.

The method called the Sieve of Eratosthenes works as follows:

  1. A Boolean table isprime[i] records prime numbers:

    if i is prime, then isprime[i] == True, otherwise isprime[i] == False.

  2. All multiples of prime numbers are not prime: for all isprime[i] : isprime[i*k] = False.

Consider for example the computation of all primes less than 10. In the table below \(T\) stands for True and \(F\) stands for False.

\[\begin{split}\begin{array}{cccccccccc} i & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ 0 & T & T & T & T & T & T & T & T & T \\ 2 & T & T & {\rm F} & T & {\rm F} & T & {\rm F} & T & {\rm F} \\ 3 & T & T & F & T & F & T & F & {\rm F} & F \\ \end{array}\end{split}\]

Initially, all entries in the table are True.

The algorithm uses a double loop:

  1. the first loop runs for i from 2 to n;
  2. the second loop runs only if i is prime, setting to False all multiples of i.

In order to be more efficient, we may restrict the range of the first loop: while (i < n//i+1).

Arrays in Python

Arrays are sequences of fixed length; filled with objects of the same type.

Compared to lists (variable length, heterogeneous), arrays are more memory efficient, and allow for faster access. Available in Python via the module {tt array}, the module {tt array} exports the class {tt array}.

We can create arrays directly from lists:

>>> from array import array
>>> L = range(2, 7)
>>> a = array('b', L)
array('b', [2, 3, 4, 5, 6])

The 'b' stands for signed integer, one byte.

Selecting an element from an array is the same as selecting an element from a list. Also the slicing operation is the same:

>>> a[1]
3
>>> a[1:3]
array('b', [3, 4])

The current memory info is obtained as

>>> a.buffer_info()
(28773520, 5)

The types are restricted to numerical types, summarized in Table 1.

Table 1 Types of entries in an array.
Type code C Type #bytes
'b' signed integer 1
'B' unsigned integer 1
'u' Unicode character 2
'h' signed integer 2
'H' unsigned integer 2
'i' signed integer 2
'I' unsigned integer 2
'l' signed integer 4
'L' unsigned integer 4
'q' signed integer 8
'Q' unsigned integer 8
'f' floating point 4
'd' floating point 8

An array can be converted into a list. Consider for example the following session:

>>> from array import array
>>> L = range(2,7)
>>> A = array('b', L)
>>> A
array('b', [2, 3, 4, 5, 6])
>>> type(A)
<class 'array.array'>

With the method tolist, we convert the array to a list:

>>> K = A.tolist()
>>> K
[2, 3, 4, 5, 6]
>>> type(K)
<class 'list'>

Arrays can be converted to strings and strings can be converted back to arrays. The above Python session is continued:

>>> s = A.tostring()
>>> s
b'\x02\x03\x04\x05\x06'

In reverse, we can make an array from a string s:

>>> B = array('b')
>>> B.fromstring(s)
>>> B
array('b', [2, 3, 4, 5, 6])

Now we return to our application, the sieving method to compute all primes less than a given number. Initially, all entries in the table are True. The initialization is in de code below:

def prime_sieve(nbr):
    """
    Returns all primes less than nbr.
    """
    isprime = array('B')
    for _ in range(nbr+1):
        isprime.append(1)

The algorithm uses a double loop:

  1. the first loop runs for i from 2 to n;
  2. the second loop runs only if i is prime, setting to False all multiples of i.

We extend the function prime_sieve:

def prime_sieve(nbr):
    """
    Returns all primes less than nbr.
    """
    isprime = array('B')
    for _ in range(nbr+1):
        isprime.append(1)
    i = 2
    while i < nbr//i+1:
        if isprime[i] == 1:
            for j in range(i, nbr//i+1):
                isprime[i*j] = 0
        i = i + 1

Why is nbr//i+1 necessary?

The last operation is the collection the primes from the sieve. The complete function is below:

def prime_sieve(nbr):
    """
    Returns all primes less than nbr.
    """
    isprime = array('B')
    for _ in range(nbr+1):
        isprime.append(1)
    i = 2
    while i < nbr//i+1:
        if isprime[i] == 1:
            for j in range(i, nbr//i+1):
                isprime[i*j] = 0
        i = i + 1
    primes = array('l')
    for i in range(2, nbr+1):
        if isprime[i] == 1:
            primes.append(i)
    return primes

With the definition of main, we can use the script sieve as a script and a module.

def main():
    """
    Prompts the user for a natural number n and
    prints all primes less than or equal to n.
    """
    nbr = int(input('give a positive number : '))
    primes = prime_sieve(nbr)
    count = primes.buffer_info()[1]
    print('#primes = ', count)
    print(' primes = ', primes)

if __name__ == "__main__":
    main()

Why are the last two lines useful?

>>> from sieve import prime_sieve
>>> p = prime_sieve(100)

Matrices

Python does not have a two dimensional array type. Instead we can store a matrix as a list of rows, where the data on each row is stored in an array. Consider for example a 5-by-5 matrix of random two digit numbers:

>>> from array import array
>>> from random import randint
>>> A = [array('b',[randint(10, 99)
... for _ in range(5)])
... for _ in range(5)]
>>> for row in A: print(row)
...
array('b', [23, 62, 85, 82, 38])
array('b', [68, 54, 18, 16, 37])
array('b', [91, 70, 88, 42, 56])
array('b', [42, 61, 90, 91, 41])
array('b', [40, 13, 19, 66, 54])

To select the row with index 2:

>>> A[2]
array('b', [91, 70, 88, 42, 56])

To select the element in column 3 or row 2:

>>> A[2][3]
42

We can define a matrix from a function. For example, consider a 5-by-5 matrix where the \((i,j)\)-th element is \(i+j\):

>>> from array import array
>>> f = lambda x,y: x+y

The function f defines the \((i,j)\)-th entry. We use f in a doubly nested list comprehension to define a list of rows:

>>> L = [array('b',[f(i,j) for i in range(5)])
... for j in range(5)]
>>> for row in L: print(row)
...
array('b', [0, 1, 2, 3, 4])
array('b', [1, 2, 3, 4, 5])
array('b', [2, 3, 4, 5, 6])
array('b', [3, 4, 5, 6, 7])
array('b', [4, 5, 6, 7, 8])

The problem we next consider is, given a matrix of numbers, find the row and column of the smallest element in the matrix. As a geometric test, we take points on a paraboloid. The simplest mathematical description of a paraboloid is the expression \(z = x^2 + y^2\), where the minimum is at the point \((0,0,0)\).

In our geometric setup, we want a * a 10-by-10 matrix of integer valued points; * where the minimum occurs at row 5, column 5. Therefore, we sample \(z = (x-5)^2 + (y-5)^2\), for x and y both ranging from 0 till 9.

def test():
    """
    Tests the findmin on the values on a paraboloid.
    """
    from array import array
    paraboloid = lambda x, y: (x-5)**2+(y-5)**2
    data = [array('b', [paraboloid(i, j) \
        for i in range(10)]) \
        for j in range(10)]
    print('looking for a minimum in ')
    for row in data:
        print(row)

The function test() prints the following:

looking for a minimum in
array('b', [50, 41, 34, 29, 26, 25, 26, 29, 34, 41])
array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
array('b', [25, 16, 9, 4, 1, 0, 1, 4, 9, 16])
array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])

The input/output specification of our problem is stated as follows:

\[\begin{split}\begin{array}{rcl} input & : & \mbox{a matrix } {\tt A} \\ output & : & {\tt (i, j)}: {\tt A[i][j]} \mbox{ is the minimum} \end{array}\end{split}\]

The algorithm to solve this problem proceeds as follows:

  1. initialize current minimum with A[0][0], and initialize its position to (0, 0)
  2. go over all rows i and columns j of A
  3. if A[i][j] is less than current minimum, assign A[i][j] to the current minimum, assign (i, j) to its position

This algorithm is encode in the funciton findmin() below:

def findmin(matrix):
    """
    Returns the row and the column indices
    of the minimum element in the matrix.
    """
    (row, col) = (0, 0)
    val = matrix[row][col]
    for i in range(0, len(matrix)):
        nbcols = matrix[i].buffer_info()[1]
        for j in range(0, nbcols):
            if matrix[i][j] < val:
                (row, col) = (i, j)
            val = matrix[row][col]
    return (row, col)

The test() function is then extended with the following lines:

 def test():

     # omitted lines

     (i, j) = findmin(data)
     print('minimum value %d occurs at (%d, %d)' \
         % (data[i][j], i, j))

if __name__ == "__main__":
    test()

To run the test, we type at the command prompt:

$ python3 findmin.py
looking for a minimum in
array('b', [50, 41, 34, 29, 26, 25, 26, 29, 34, 41])
array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
array('b', [25, 16, 9, 4, 1, 0, 1, 4, 9, 16])
array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
minimum value 0 occurs at (5, 5)

Our next, somewhat harder problem, involves the location of saddle points. The simplest mathematical description of a saddle is \(z = x^2 - y^2\), where \((0,0,0)\) is a saddle point. The geometric setup is similar as with the paraboloid, we sample \(z = -(x-5)^2 + (y-5)^2\), for x and y both ranging from 0 till 9. The test function starts below:

def test():
    """
    Testing the location of saddle points.
    """
    from array import array
    surface = lambda x, y: - (x-5)**2 + (y-5)**2
    data = [array('b', [surface(i, j) \
        for i in range(10)]) \
        for j in range(10)]
    print('looking for a saddle in ')
    for row in data:
        print(row)

Observe the minus sign in the definition of surface. The test matrix, with some minor edits to display the column structure better, is the following:

looking for a saddle in
array('b', [  0,   9, 16, 21, 24, 25, 24, 21, 16,   9])
array('b', [ -9,   0,  7, 12, 15, 16, 15, 12,  7,   0])
array('b', [-16,  -7,  0,  5,  8,  9,  8,  5,  0,  -7])
array('b', [-21, -12, -5,  0,  3,  4,  3,  0, -5, -12])
array('b', [-24, -15, -8, -3,  0,  1,  0, -3, -8, -15])
array('b', [-25, -16, -9, -4, -1,  0, -1, -4, -9, -16])
array('b', [-24, -15, -8, -3,  0,  1,  0, -3, -8, -15])
array('b', [-21, -12, -5,  0,  3,  4,  3,  0, -5, -12])
array('b', [-16,  -7,  0,  5,  8,  9,  8,  5,  0,  -7])
array('b', [ -9,   0,  7, 12, 15, 16, 15, 12,  7,   0])

Observe the occurrence of the zero entries.

Now we give the numerical definition of a saddle point in a matrix. A saddle point is

  1. maximal in its row; and
  2. minimal in its column.

The input/output specification of the problem statement is then

\[\begin{split}\begin{array}{lcl} input & : & \mbox{a matrix } {\tt A} \\ output & : & \mbox{list of } {\tt (i,j)} \mbox{ of saddles } {\tt A[i][j]} \end{array}\end{split}\]

The algorithm proceeds as follows:

  1. for all rows i, let A[i][maxcol] be maximal
  2. check in column maxcol whether A[i][maxcol] <= A[k][maxcol], for all k != i

The execution of the algorithm is traced below:

looking for a saddle in
array('b', [  0,   9, 16, 21, 24, 25, 24, 21, 16,   9])
array('b', [ -9,   0,  7, 12, 15, 16, 15, 12,  7,   0])
array('b', [-16,  -7,  0,  5,  8,  9,  8,  5,  0,  -7])
array('b', [-21, -12, -5,  0,  3,  4,  3,  0, -5, -12])
array('b', [-24, -15, -8, -3,  0,  1,  0, -3, -8, -15])
array('b', [-25, -16, -9, -4, -1,  0, -1, -4, -9, -16])
array('b', [-24, -15, -8, -3,  0,  1,  0, -3, -8, -15])
array('b', [-21, -12, -5,  0,  3,  4,  3,  0, -5, -12])
array('b', [-16,  -7,  0,  5,  8,  9,  8,  5,  0,  -7])
array('b', [ -9,   0,  7, 12, 15, 16, 15, 12,  7,   0])
max 25.0 in row 0, column 5 smaller value in row 1
max 16.0 in row 1, column 5 smaller value in row 2
max 9.0 in row 2, column 5 smaller value in row 3
max 4.0 in row 3, column 5 smaller value in row 4
max 1.0 in row 4, column 5 smaller value in row 5
max 0.0 in row 5, column 5 saddle at (5,5)
max 1.0 in row 6, column 5 smaller value in row 5
max 4.0 in row 7, column 5 smaller value in row 4
max 9.0 in row 8, column 5 smaller value in row 3
max 16.0 in row 9, column 5 smaller value in row 2

We go over all rows, looking for the largest element, as is done in the function saddle.

def saddle(matrix):
    """
    Returns the coordinates of saddles:
    maximum in rows, minimum in columns.
    """
    result = []
    for i in range(0, len(matrix)):
        (maxval, maxcol) = (matrix[i][0], 0)
        nbcols = matrix[i].buffer_info()[1]
        for j in range(1, nbcols):
            if matrix[i][j] > maxval:
                (maxval, maxcol) = (matrix[i][j], j)
        prt = 'max %.1f in row %d, column %d' \
            % (maxval, i, maxcol)

We have a candidate saddle point in matrix[i][maxcol]}. Now we check whether it is minimal in its column, in the i-loop: The code for the function saddle continues then:

# continuation of the body of the for i in range(0, len(matrix))

    is_saddle = True
    for k in range(0, len(matrix)):
        if(k != i):
            if matrix[k][maxcol] < maxval:
                prt = prt + ' smaller value in row %d' % k
                is_saddle = False
                break
    if is_saddle:
        prt = prt + ' saddle at (%d,%d)' % (i, maxcol)
        result.append((i, maxcol))
    print prt
return result

For working with numerical data, arrays are more memory efficient. Searching in two dimensional tables occur frequently. We consider two common problems: finding extremal values and locating saddle points.

Such searching algorithms apply the same double loop: for all rows, for all columns, do something. Later we will scan all words from a page of text on file: for all lines on the page, read all words on the line.

Exercises

  1. Give a Python function swap to reverse the order of the elements in an array. After calling swap on A = [2 9 8 3], we have A = [3 8 9 2]. Do not create a new array inside swap.
  2. Write a Python function Duplicates whose input is an array A and output is a list of elements that occur at least twice in A.
  3. A plateau in an array is the longest sequence of the same elements that occur in the array. Write a Python function that returns the start and begin index of a plateau in a given array.
  4. Extend the function saddle with a parameter verbose which takes the default value True. If verbose, then the strings prt are shown, otherwise the function saddle does not print anything.
  5. For a two dimensional matrix of integer values, write a formatted print function: every entry is printed with the formatting string '\%3d'.
  6. Extend the findmin to find the smallest value in a 3-dimensional matrix. Think of a elements in the matrix as temperature measurements.