Arrays and Matrices¶
For working with numerical data, arrays are more memory efficient. Searching in two dimensional tables occur frequently. We consider two common problems: finding extremal values and locating saddle points.
Such searching algorithms apply the same double loop: for all rows, for all columns, do something. Later we will scan all words from a page of text on file: for all lines on the page, read all words on the line.
We start with a generalization of our factorization in primes problem.
The Sieve of Eratosthenes¶
Let us consider the problem of counting the number of primes less than a given number. The statement with input and output specifications is below:
Running factorization in primes for all numbers \(\leq n\) is too expensive.
The method called the Sieve of Eratosthenes works as follows:
A Boolean table
isprime[i]
records prime numbers:if
i
is prime, thenisprime[i] == True
, otherwiseisprime[i] == False
.All multiples of prime numbers are not prime: for all
isprime[i]
:isprime[i*k] = False
.
Consider for example the computation of all primes less than 10.
In the table below \(T\) stands for True
and \(F\) stands for False
.
Initially, all entries in the table are True
.
The algorithm uses a double loop:
- the first loop runs for i from 2 to n;
- the second loop runs only if i is prime,
setting to
False
all multiples of i.
In order to be more efficient,
we may restrict the range of the first loop:
while (i < n//i+1)
.
Arrays in Python¶
Arrays are sequences of fixed length; filled with objects of the same type.
Compared to lists (variable length, heterogeneous), arrays are more memory efficient, and allow for faster access. Available in Python via the module {tt array}, the module {tt array} exports the class {tt array}.
We can create arrays directly from lists:
>>> from array import array
>>> L = range(2, 7)
>>> a = array('b', L)
array('b', [2, 3, 4, 5, 6])
The 'b'
stands for signed integer, one byte.
Selecting an element from an array is the same as selecting an element from a list. Also the slicing operation is the same:
>>> a[1]
3
>>> a[1:3]
array('b', [3, 4])
The current memory info is obtained as
>>> a.buffer_info()
(28773520, 5)
The types are restricted to numerical types, summarized in Table 1.
Type code | C Type | #bytes |
---|---|---|
'b' |
signed integer | 1 |
'B' |
unsigned integer | 1 |
'u' |
Unicode character | 2 |
'h' |
signed integer | 2 |
'H' |
unsigned integer | 2 |
'i' |
signed integer | 2 |
'I' |
unsigned integer | 2 |
'l' |
signed integer | 4 |
'L' |
unsigned integer | 4 |
'q' |
signed integer | 8 |
'Q' |
unsigned integer | 8 |
'f' |
floating point | 4 |
'd' |
floating point | 8 |
An array can be converted into a list. Consider for example the following session:
>>> from array import array
>>> L = range(2,7)
>>> A = array('b', L)
>>> A
array('b', [2, 3, 4, 5, 6])
>>> type(A)
<class 'array.array'>
With the method tolist
,
we convert the array to a list:
>>> K = A.tolist()
>>> K
[2, 3, 4, 5, 6]
>>> type(K)
<class 'list'>
Arrays can be converted to strings and strings can be converted back to arrays. The above Python session is continued:
>>> s = A.tostring()
>>> s
b'\x02\x03\x04\x05\x06'
In reverse, we can make an array from a string s
:
>>> B = array('b')
>>> B.fromstring(s)
>>> B
array('b', [2, 3, 4, 5, 6])
Now we return to our application,
the sieving method to compute all primes less than a given number.
Initially, all entries in the table are True
.
The initialization is in de code below:
def prime_sieve(nbr):
"""
Returns all primes less than nbr.
"""
isprime = array('B')
for _ in range(nbr+1):
isprime.append(1)
The algorithm uses a double loop:
- the first loop runs for i from 2 to n;
- the second loop runs only if i is prime,
setting to
False
all multiples of i.
We extend the function prime_sieve
:
def prime_sieve(nbr):
"""
Returns all primes less than nbr.
"""
isprime = array('B')
for _ in range(nbr+1):
isprime.append(1)
i = 2
while i < nbr//i+1:
if isprime[i] == 1:
for j in range(i, nbr//i+1):
isprime[i*j] = 0
i = i + 1
Why is nbr//i+1
necessary?
The last operation is the collection the primes from the sieve. The complete function is below:
def prime_sieve(nbr):
"""
Returns all primes less than nbr.
"""
isprime = array('B')
for _ in range(nbr+1):
isprime.append(1)
i = 2
while i < nbr//i+1:
if isprime[i] == 1:
for j in range(i, nbr//i+1):
isprime[i*j] = 0
i = i + 1
primes = array('l')
for i in range(2, nbr+1):
if isprime[i] == 1:
primes.append(i)
return primes
With the definition of main
, we can use the script
sieve
as a script and a module.
def main():
"""
Prompts the user for a natural number n and
prints all primes less than or equal to n.
"""
nbr = int(input('give a positive number : '))
primes = prime_sieve(nbr)
count = primes.buffer_info()[1]
print('#primes = ', count)
print(' primes = ', primes)
if __name__ == "__main__":
main()
Why are the last two lines useful?
>>> from sieve import prime_sieve
>>> p = prime_sieve(100)
Matrices¶
Python does not have a two dimensional array type. Instead we can store a matrix as a list of rows, where the data on each row is stored in an array. Consider for example a 5-by-5 matrix of random two digit numbers:
>>> from array import array
>>> from random import randint
>>> A = [array('b',[randint(10, 99)
... for _ in range(5)])
... for _ in range(5)]
>>> for row in A: print(row)
...
array('b', [23, 62, 85, 82, 38])
array('b', [68, 54, 18, 16, 37])
array('b', [91, 70, 88, 42, 56])
array('b', [42, 61, 90, 91, 41])
array('b', [40, 13, 19, 66, 54])
To select the row with index 2:
>>> A[2]
array('b', [91, 70, 88, 42, 56])
To select the element in column 3 or row 2:
>>> A[2][3]
42
We can define a matrix from a function. For example, consider a 5-by-5 matrix where the \((i,j)\)-th element is \(i+j\):
>>> from array import array
>>> f = lambda x,y: x+y
The function f
defines the \((i,j)\)-th entry.
We use f
in a doubly nested list comprehension
to define a list of rows:
>>> L = [array('b',[f(i,j) for i in range(5)])
... for j in range(5)]
>>> for row in L: print(row)
...
array('b', [0, 1, 2, 3, 4])
array('b', [1, 2, 3, 4, 5])
array('b', [2, 3, 4, 5, 6])
array('b', [3, 4, 5, 6, 7])
array('b', [4, 5, 6, 7, 8])
The problem we next consider is, given a matrix of numbers, find the row and column of the smallest element in the matrix. As a geometric test, we take points on a paraboloid. The simplest mathematical description of a paraboloid is the expression \(z = x^2 + y^2\), where the minimum is at the point \((0,0,0)\).
In our geometric setup, we want a * a 10-by-10 matrix of integer valued points; * where the minimum occurs at row 5, column 5. Therefore, we sample \(z = (x-5)^2 + (y-5)^2\), for x and y both ranging from 0 till 9.
def test():
"""
Tests the findmin on the values on a paraboloid.
"""
from array import array
paraboloid = lambda x, y: (x-5)**2+(y-5)**2
data = [array('b', [paraboloid(i, j) \
for i in range(10)]) \
for j in range(10)]
print('looking for a minimum in ')
for row in data:
print(row)
The function test()
prints the following:
looking for a minimum in
array('b', [50, 41, 34, 29, 26, 25, 26, 29, 34, 41])
array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
array('b', [25, 16, 9, 4, 1, 0, 1, 4, 9, 16])
array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
The input/output specification of our problem is stated as follows:
The algorithm to solve this problem proceeds as follows:
- initialize current minimum with
A[0][0]
, and initialize its position to(0, 0)
- go over all rows
i
and columnsj
ofA
- if
A[i][j]
is less than current minimum, assignA[i][j]
to the current minimum, assign(i, j)
to its position
This algorithm is encode in the funciton findmin()
below:
def findmin(matrix):
"""
Returns the row and the column indices
of the minimum element in the matrix.
"""
(row, col) = (0, 0)
val = matrix[row][col]
for i in range(0, len(matrix)):
nbcols = matrix[i].buffer_info()[1]
for j in range(0, nbcols):
if matrix[i][j] < val:
(row, col) = (i, j)
val = matrix[row][col]
return (row, col)
The test()
function is then extended with the following lines:
def test():
# omitted lines
(i, j) = findmin(data)
print('minimum value %d occurs at (%d, %d)' \
% (data[i][j], i, j))
if __name__ == "__main__":
test()
To run the test, we type at the command prompt:
$ python3 findmin.py
looking for a minimum in
array('b', [50, 41, 34, 29, 26, 25, 26, 29, 34, 41])
array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
array('b', [25, 16, 9, 4, 1, 0, 1, 4, 9, 16])
array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
minimum value 0 occurs at (5, 5)
Our next, somewhat harder problem, involves the location of saddle points. The simplest mathematical description of a saddle is \(z = x^2 - y^2\), where \((0,0,0)\) is a saddle point. The geometric setup is similar as with the paraboloid, we sample \(z = -(x-5)^2 + (y-5)^2\), for x and y both ranging from 0 till 9. The test function starts below:
def test():
"""
Testing the location of saddle points.
"""
from array import array
surface = lambda x, y: - (x-5)**2 + (y-5)**2
data = [array('b', [surface(i, j) \
for i in range(10)]) \
for j in range(10)]
print('looking for a saddle in ')
for row in data:
print(row)
Observe the minus sign in the definition of surface
.
The test matrix, with some minor edits to display the
column structure better, is the following:
looking for a saddle in
array('b', [ 0, 9, 16, 21, 24, 25, 24, 21, 16, 9])
array('b', [ -9, 0, 7, 12, 15, 16, 15, 12, 7, 0])
array('b', [-16, -7, 0, 5, 8, 9, 8, 5, 0, -7])
array('b', [-21, -12, -5, 0, 3, 4, 3, 0, -5, -12])
array('b', [-24, -15, -8, -3, 0, 1, 0, -3, -8, -15])
array('b', [-25, -16, -9, -4, -1, 0, -1, -4, -9, -16])
array('b', [-24, -15, -8, -3, 0, 1, 0, -3, -8, -15])
array('b', [-21, -12, -5, 0, 3, 4, 3, 0, -5, -12])
array('b', [-16, -7, 0, 5, 8, 9, 8, 5, 0, -7])
array('b', [ -9, 0, 7, 12, 15, 16, 15, 12, 7, 0])
Observe the occurrence of the zero entries.
Now we give the numerical definition of a saddle point in a matrix. A saddle point is
- maximal in its row; and
- minimal in its column.
The input/output specification of the problem statement is then
The algorithm proceeds as follows:
- for all rows
i
, letA[i][maxcol]
be maximal - check in column
maxcol
whetherA[i][maxcol] <= A[k][maxcol]
, for allk != i
The execution of the algorithm is traced below:
looking for a saddle in
array('b', [ 0, 9, 16, 21, 24, 25, 24, 21, 16, 9])
array('b', [ -9, 0, 7, 12, 15, 16, 15, 12, 7, 0])
array('b', [-16, -7, 0, 5, 8, 9, 8, 5, 0, -7])
array('b', [-21, -12, -5, 0, 3, 4, 3, 0, -5, -12])
array('b', [-24, -15, -8, -3, 0, 1, 0, -3, -8, -15])
array('b', [-25, -16, -9, -4, -1, 0, -1, -4, -9, -16])
array('b', [-24, -15, -8, -3, 0, 1, 0, -3, -8, -15])
array('b', [-21, -12, -5, 0, 3, 4, 3, 0, -5, -12])
array('b', [-16, -7, 0, 5, 8, 9, 8, 5, 0, -7])
array('b', [ -9, 0, 7, 12, 15, 16, 15, 12, 7, 0])
max 25.0 in row 0, column 5 smaller value in row 1
max 16.0 in row 1, column 5 smaller value in row 2
max 9.0 in row 2, column 5 smaller value in row 3
max 4.0 in row 3, column 5 smaller value in row 4
max 1.0 in row 4, column 5 smaller value in row 5
max 0.0 in row 5, column 5 saddle at (5,5)
max 1.0 in row 6, column 5 smaller value in row 5
max 4.0 in row 7, column 5 smaller value in row 4
max 9.0 in row 8, column 5 smaller value in row 3
max 16.0 in row 9, column 5 smaller value in row 2
We go over all rows, looking for the largest element,
as is done in the function saddle
.
def saddle(matrix):
"""
Returns the coordinates of saddles:
maximum in rows, minimum in columns.
"""
result = []
for i in range(0, len(matrix)):
(maxval, maxcol) = (matrix[i][0], 0)
nbcols = matrix[i].buffer_info()[1]
for j in range(1, nbcols):
if matrix[i][j] > maxval:
(maxval, maxcol) = (matrix[i][j], j)
prt = 'max %.1f in row %d, column %d' \
% (maxval, i, maxcol)
We have a candidate saddle point in matrix[i][maxcol]}
.
Now we check whether it is minimal in its column, in the i
-loop:
The code for the function saddle
continues then:
# continuation of the body of the for i in range(0, len(matrix))
is_saddle = True
for k in range(0, len(matrix)):
if(k != i):
if matrix[k][maxcol] < maxval:
prt = prt + ' smaller value in row %d' % k
is_saddle = False
break
if is_saddle:
prt = prt + ' saddle at (%d,%d)' % (i, maxcol)
result.append((i, maxcol))
print prt
return result
For working with numerical data, arrays are more memory efficient. Searching in two dimensional tables occur frequently. We consider two common problems: finding extremal values and locating saddle points.
Such searching algorithms apply the same double loop: for all rows, for all columns, do something. Later we will scan all words from a page of text on file: for all lines on the page, read all words on the line.
Exercises¶
- Give a Python function
swap
to reverse the order of the elements in an array. After callingswap
onA = [2 9 8 3]
, we haveA = [3 8 9 2]
. Do not create a new array insideswap
. - Write a Python function
Duplicates
whose input is an arrayA
and output is a list of elements that occur at least twice inA
. - A plateau in an array is the longest sequence of the same elements that occur in the array. Write a Python function that returns the start and begin index of a plateau in a given array.
- Extend the function
saddle
with a parameterverbose
which takes the default valueTrue
. Ifverbose
, then the stringsprt
are shown, otherwise the functionsaddle
does not print anything. - For a two dimensional matrix of integer values,
write a formatted print function: every entry is printed
with the formatting string
'\%3d'
. - Extend the
findmin
to find the smallest value in a 3-dimensional matrix. Think of a elements in the matrix as temperature measurements.