Arrays and Matrices
===================

For working with numerical data, arrays are more memory efficient.
Searching in two dimensional tables occur frequently.
We consider two common problems:
finding extremal values and locating saddle points.

Such searching algorithms apply the same double loop:
for all rows, for all columns, do something.
Later we will scan all words from a page of text on file:
for all lines on the page, read all words on the line.

We start with a generalization of our factorization in primes problem.

The Sieve of Eratosthenes
-------------------------

Let us consider the problem of counting the number of primes
less than a given number.
The statement with input and output specifications is below:

.. math::

   \begin{array}{rcl}
    input & : & \mbox{a natural number } n \\
   output & : & \mbox{all prime numbers } \leq n
   \end{array}

Running factorization in primes for all numbers :math:`\leq n`
is too expensive.

The method called the *Sieve of Eratosthenes* works as follows:

1. A Boolean table ``isprime[i]`` records prime numbers:

   if ``i`` is prime, then ``isprime[i] == True``,
   otherwise ``isprime[i] == False``.

2. All multiples of prime numbers are not prime:
   for all ``isprime[i]`` : ``isprime[i*k] = False``.

Consider for example the computation of all primes less than 10.
In the table below :math:`T` stands for ``True``
and :math:`F` stands for ``False``.

.. math::

   \begin{array}{cccccccccc}
   i & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\
   0 & T & T & T & T & T & T & T & T & T \\
   2 & T & T & {\rm F} & T & {\rm F} & T & {\rm F} & T & {\rm F} \\ 
   3 & T & T & F & T & F & T & F & {\rm F} & F \\
   \end{array}

Initially, all entries in the table are ``True``.

The algorithm uses a double loop:

1. the first loop runs for *i* from 2 to *n*;

2. the second loop runs only if *i* is prime,
   setting to ``False`` all multiples of *i*.

In order to be more efficient,
we may restrict the range of the first loop:
``while (i < n//i+1)``.

Arrays in Python
----------------

Arrays are sequences of *fixed* length;
filled with objects of the *same* type.

Compared to lists (variable length, heterogeneous),
arrays are more memory efficient, and allow for faster access.
Available in Python via the module {\tt array},
the module {\tt array} exports the class {\tt array}.

We can create arrays directly from lists:
 
::

   >>> from array import array
   >>> L = range(2, 7)
   >>> a = array('b', L)
   array('b', [2, 3, 4, 5, 6])

The ``'b'`` stands for signed integer, one byte.

Selecting an element from an array is the same as 
selecting an element from a list.
Also the slicing operation is the same:

::

   >>> a[1]
   3
   >>> a[1:3]
   array('b', [3, 4])

The current memory info is obtained as

::

   >>> a.buffer_info()
   (28773520, 5)

The types are restricted to numerical types,
summarized in :numref:`tabarraytypes`.

.. _tabarraytypes:

.. table:: Types of entries in an array.

   +-----------+--------------------+--------+
   | Type code |  C Type            | #bytes |
   +===========+====================+========+
   | ``'b'``   |  signed integer    |    1   |
   +-----------+--------------------+--------+
   | ``'B'``   |  unsigned integer  |    1   |
   +-----------+--------------------+--------+
   | ``'u'``   |  Unicode character |    2   |
   +-----------+--------------------+--------+
   | ``'h'``   |  signed integer    |    2   |
   +-----------+--------------------+--------+
   | ``'H'``   |  unsigned integer  |    2   |
   +-----------+--------------------+--------+
   | ``'i'``   |  signed integer    |    2   |
   +-----------+--------------------+--------+
   | ``'I'``   |  unsigned integer  |    2   |
   +-----------+--------------------+--------+
   | ``'l'``   |  signed integer    |    4   |
   +-----------+--------------------+--------+
   | ``'L'``   |  unsigned integer  |    4   |
   +-----------+--------------------+--------+
   | ``'q'``   |  signed integer    |    8   |
   +-----------+--------------------+--------+
   | ``'Q'``   |  unsigned integer  |    8   |
   +-----------+--------------------+--------+
   | ``'f'``   |  floating point    |    4   |
   +-----------+--------------------+--------+
   | ``'d'``   |  floating point    |    8   |
   +-----------+--------------------+--------+

An array can be converted into a list.
Consider for example the following session:

::

   >>> from array import array
   >>> L = range(2,7)
   >>> A = array('b', L)
   >>> A
   array('b', [2, 3, 4, 5, 6])
   >>> type(A)
   <class 'array.array'>

With the method ``tolist``,
we convert the array to a list:

::

   >>> K = A.tolist()
   >>> K
   [2, 3, 4, 5, 6]
   >>> type(K)
   <class 'list'>

Arrays can be converted to strings and strings
can be converted back to arrays.
The above Python session is continued:

::

   >>> s = A.tostring()
   >>> s
   b'\x02\x03\x04\x05\x06'

In reverse, we can make an array from a string ``s``:

::

   >>> B = array('b')
   >>> B.fromstring(s)
   >>> B
   array('b', [2, 3, 4, 5, 6])


Now we return to our application,
the sieving method to compute all primes less than a given number.
Initially, all entries in the table are ``True``.
The initialization is in de code below:

::

   def prime_sieve(nbr):
       """
       Returns all primes less than nbr.
       """
       isprime = array('B')
       for _ in range(nbr+1):
           isprime.append(1)

The algorithm uses a double loop:

1. the first loop runs for *i* from 2 to *n*;

2. the second loop runs only if *i* is prime,
   setting to ``False`` all multiples of *i*.

We extend the function ``prime_sieve``:

::

   def prime_sieve(nbr):
       """
       Returns all primes less than nbr.
       """
       isprime = array('B')
       for _ in range(nbr+1):
           isprime.append(1)
       i = 2
       while i < nbr//i+1:
           if isprime[i] == 1:
               for j in range(i, nbr//i+1):
                   isprime[i*j] = 0
           i = i + 1

Why is ``nbr//i+1`` necessary?

The last operation is the collection the primes from
the sieve.  The complete function is below:

::

   def prime_sieve(nbr):
       """
       Returns all primes less than nbr.
       """
       isprime = array('B')
       for _ in range(nbr+1):
           isprime.append(1)
       i = 2
       while i < nbr//i+1:
           if isprime[i] == 1:
               for j in range(i, nbr//i+1):
                   isprime[i*j] = 0
           i = i + 1
       primes = array('l')
       for i in range(2, nbr+1):
           if isprime[i] == 1:
               primes.append(i)
       return primes

With the definition of ``main``, we can use the script
``sieve`` as a script and a module.

::

   def main():
       """
       Prompts the user for a natural number n and
       prints all primes less than or equal to n.
       """
       nbr = int(input('give a positive number : '))
       primes = prime_sieve(nbr)
       count = primes.buffer_info()[1]
       print('#primes = ', count)
       print(' primes = ', primes)

   if __name__ == "__main__":
       main()

Why are the last two lines useful?

::

   >>> from sieve import prime_sieve
   >>> p = prime_sieve(100)

Matrices
--------

Python does not have a two dimensional array type.
Instead we can store a matrix as a list of rows,
where the data on each row is stored in an array.
Consider for example a 5-by-5 matrix of random two digit numbers:

::

   >>> from array import array
   >>> from random import randint
   >>> A = [array('b',[randint(10, 99)
   ... for _ in range(5)])
   ... for _ in range(5)]
   >>> for row in A: print(row)
   ... 
   array('b', [23, 62, 85, 82, 38])
   array('b', [68, 54, 18, 16, 37])
   array('b', [91, 70, 88, 42, 56])
   array('b', [42, 61, 90, 91, 41])
   array('b', [40, 13, 19, 66, 54])

To select the row with index 2:

::

   >>> A[2]
   array('b', [91, 70, 88, 42, 56])

To select the element in column 3 or row 2:

::

   >>> A[2][3]
   42

We can define a matrix from a function.
For example, consider 
a 5-by-5 matrix where the :math:`(i,j)`-th element is :math:`i+j`:

::

   >>> from array import array
   >>> f = lambda x,y: x+y

The function ``f`` defines the :math:`(i,j)`-th entry.
We use ``f`` in a doubly nested list comprehension
to define a list of rows:

::

   >>> L = [array('b',[f(i,j) for i in range(5)])
   ... for j in range(5)]
   >>> for row in L: print(row)
   ... 
   array('b', [0, 1, 2, 3, 4])
   array('b', [1, 2, 3, 4, 5])
   array('b', [2, 3, 4, 5, 6])
   array('b', [3, 4, 5, 6, 7])
   array('b', [4, 5, 6, 7, 8])

The problem we next consider is, given a matrix of numbers,
find the row and column of the smallest element in the matrix.
As a geometric test, we take points on a paraboloid.
The simplest mathematical description of a paraboloid
is the expression :math:`z = x^2 + y^2`,
where the minimum is at the point :math:`(0,0,0)`.

In our geometric setup, we want a
* a 10-by-10 matrix of integer valued points;
* where the minimum occurs at row 5, column 5.
Therefore, we sample :math:`z = (x-5)^2 + (y-5)^2`,
for *x* and *y* both ranging from 0 till 9.

::

   def test():
       """
       Tests the findmin on the values on a paraboloid.
       """
       from array import array
       paraboloid = lambda x, y: (x-5)**2+(y-5)**2
       data = [array('b', [paraboloid(i, j) \
           for i in range(10)]) \
           for j in range(10)]
       print('looking for a minimum in ')
       for row in data:
           print(row)

The function ``test()`` prints the following:

::

   looking for a minimum in 
   array('b', [50, 41, 34, 29, 26, 25, 26, 29, 34, 41])
   array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
   array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
   array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
   array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
   array('b', [25, 16, 9, 4, 1, 0, 1, 4, 9, 16])
   array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
   array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
   array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
   array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])

The input/output specification of our problem
is stated as follows:

.. math::

   \begin{array}{rcl}
    input & : & \mbox{a matrix } {\tt A} \\
   output & : & {\tt (i, j)}: {\tt A[i][j]} \mbox{ is the minimum}
   \end{array}

The algorithm to solve this problem proceeds as follows:

1. initialize current minimum with ``A[0][0]``, and
   initialize its position to ``(0, 0)``

2. go over all rows ``i`` and columns ``j`` of ``A``

3. if ``A[i][j]`` is less than current minimum,
   assign ``A[i][j]`` to the current minimum,
   assign ``(i, j)`` to its position

This algorithm is encode in the funciton ``findmin()`` below:

::

   def findmin(matrix):
       """
       Returns the row and the column indices
       of the minimum element in the matrix.
       """
       (row, col) = (0, 0)
       val = matrix[row][col]
       for i in range(0, len(matrix)):
           nbcols = matrix[i].buffer_info()[1]
           for j in range(0, nbcols):
               if matrix[i][j] < val:
                   (row, col) = (i, j)
               val = matrix[row][col]
       return (row, col)

The ``test()`` function is then extended with the following lines:

::

   def test():

       # omitted lines
 
       (i, j) = findmin(data)
       print('minimum value %d occurs at (%d, %d)' \
           % (data[i][j], i, j))

  if __name__ == "__main__":
      test()

To run the test, we type at the command prompt:

::

   $ python3 findmin.py
   looking for a minimum in 
   array('b', [50, 41, 34, 29, 26, 25, 26, 29, 34, 41])
   array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
   array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
   array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
   array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
   array('b', [25, 16, 9, 4, 1, 0, 1, 4, 9, 16])
   array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
   array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
   array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
   array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
   minimum value 0 occurs at (5, 5)

Our next, somewhat harder problem, involves the location
of saddle points.
The simplest mathematical description of a saddle is
:math:`z = x^2 - y^2`, where
:math:`(0,0,0)` is a saddle point.
The geometric setup is similar as with the paraboloid,
we sample :math:`z = -(x-5)^2 + (y-5)^2`,
for *x* and *y* both ranging from 0 till 9.
The test function starts below:

::

   def test():
       """
       Testing the location of saddle points.
       """
       from array import array
       surface = lambda x, y: - (x-5)**2 + (y-5)**2
       data = [array('b', [surface(i, j) \
           for i in range(10)]) \
           for j in range(10)]
       print('looking for a saddle in ')
       for row in data:
           print(row)

Observe the minus sign in the definition of ``surface``.
The test matrix, with some minor edits to display the
column structure better, is the following:

::

   looking for a saddle in 
   array('b', [  0,   9, 16, 21, 24, 25, 24, 21, 16,   9])
   array('b', [ -9,   0,  7, 12, 15, 16, 15, 12,  7,   0])
   array('b', [-16,  -7,  0,  5,  8,  9,  8,  5,  0,  -7])
   array('b', [-21, -12, -5,  0,  3,  4,  3,  0, -5, -12])
   array('b', [-24, -15, -8, -3,  0,  1,  0, -3, -8, -15])
   array('b', [-25, -16, -9, -4, -1,  0, -1, -4, -9, -16])
   array('b', [-24, -15, -8, -3,  0,  1,  0, -3, -8, -15])
   array('b', [-21, -12, -5,  0,  3,  4,  3,  0, -5, -12])
   array('b', [-16,  -7,  0,  5,  8,  9,  8,  5,  0,  -7])
   array('b', [ -9,   0,  7, 12, 15, 16, 15, 12,  7,   0])

Observe the occurrence of the zero entries.

Now we give the numerical definition of a saddle point in a matrix.
A saddle point is

1. maximal in its row; and

2. minimal in its column.

The input/output specification of the problem statement is then

.. math::

   \begin{array}{lcl}
   input & : & \mbox{a matrix } {\tt A} \\
   output & : & \mbox{list of } {\tt (i,j)} \mbox{ of saddles } {\tt A[i][j]}
   \end{array}

The algorithm proceeds as follows:

1. for all rows ``i``, let ``A[i][maxcol]`` be maximal

2. check in column ``maxcol`` whether 
   ``A[i][maxcol] <= A[k][maxcol]``, for all ``k != i``

The execution of the algorithm is traced below:

::

   looking for a saddle in 
   array('b', [  0,   9, 16, 21, 24, 25, 24, 21, 16,   9])
   array('b', [ -9,   0,  7, 12, 15, 16, 15, 12,  7,   0])
   array('b', [-16,  -7,  0,  5,  8,  9,  8,  5,  0,  -7])
   array('b', [-21, -12, -5,  0,  3,  4,  3,  0, -5, -12])
   array('b', [-24, -15, -8, -3,  0,  1,  0, -3, -8, -15])
   array('b', [-25, -16, -9, -4, -1,  0, -1, -4, -9, -16])
   array('b', [-24, -15, -8, -3,  0,  1,  0, -3, -8, -15])
   array('b', [-21, -12, -5,  0,  3,  4,  3,  0, -5, -12])
   array('b', [-16,  -7,  0,  5,  8,  9,  8,  5,  0,  -7])
   array('b', [ -9,   0,  7, 12, 15, 16, 15, 12,  7,   0])
   max 25.0 in row 0, column 5 smaller value in row 1
   max 16.0 in row 1, column 5 smaller value in row 2
   max 9.0 in row 2, column 5 smaller value in row 3
   max 4.0 in row 3, column 5 smaller value in row 4
   max 1.0 in row 4, column 5 smaller value in row 5
   max 0.0 in row 5, column 5 saddle at (5,5)
   max 1.0 in row 6, column 5 smaller value in row 5
   max 4.0 in row 7, column 5 smaller value in row 4
   max 9.0 in row 8, column 5 smaller value in row 3
   max 16.0 in row 9, column 5 smaller value in row 2

We go over all rows, looking for the largest element,
as is done in the function ``saddle``.

::

   def saddle(matrix):
       """
       Returns the coordinates of saddles:
       maximum in rows, minimum in columns.
       """
       result = []
       for i in range(0, len(matrix)):
           (maxval, maxcol) = (matrix[i][0], 0) 
           nbcols = matrix[i].buffer_info()[1]
           for j in range(1, nbcols):
               if matrix[i][j] > maxval:
                   (maxval, maxcol) = (matrix[i][j], j)
           prt = 'max %.1f in row %d, column %d' \
               % (maxval, i, maxcol)

We have a candidate saddle point in ``matrix[i][maxcol]}``.
Now we check whether it is minimal in its column, in the ``i``-loop:
The code for the function ``saddle`` continues then:

::

      # continuation of the body of the for i in range(0, len(matrix))

          is_saddle = True
          for k in range(0, len(matrix)):
              if(k != i):
                  if matrix[k][maxcol] < maxval:
                      prt = prt + ' smaller value in row %d' % k
                      is_saddle = False
                      break
          if is_saddle:
              prt = prt + ' saddle at (%d,%d)' % (i, maxcol)
              result.append((i, maxcol))
          print prt
      return result

For working with numerical data, arrays are more memory efficient.
Searching in two dimensional tables occur frequently.
We consider two common problems:
finding extremal values and locating saddle points.

Such searching algorithms apply the same double loop:
for all rows, for all columns, do something.
Later we will scan all words from a page of text on file:
for all lines on the page, read all words on the line.

Exercises
---------

1. Give a Python function ``swap`` to reverse the order
   of the elements in an array.  After calling ``swap``
   on ``A = [2 9 8 3]``, we have ``A = [3 8 9 2]``.
   Do not create a new array inside ``swap``.

2. Write a Python function ``Duplicates``
   whose input is an array ``A`` and output is
   a list of elements that occur at least twice in ``A``.

3. A plateau in an array is the longest sequence of the
   same elements that occur in the array.
   Write a Python function that returns the start 
   and begin index of a plateau in a given array.

4. Extend the function ``saddle`` with a parameter ``verbose``
   which takes the default value ``True``.
   If ``verbose``, then the strings ``prt`` are shown,
   otherwise the function ``saddle`` does not print anything.

5. For a two dimensional matrix of integer values,
   write a formatted print function: every entry is printed
   with the formatting string ``'\%3d'``.

6. Extend the ``findmin`` to find the smallest value
   in a 3-dimensional matrix.  Think of a elements in
   the matrix as temperature measurements.