Arrays and Matrices¶
For working with numerical data, arrays are more memory efficient. Searching in two dimensional tables occur frequently. We consider two common problems: finding extremal values and locating saddle points.
Such searching algorithms apply the same double loop: for all rows, for all columns, do something. Later we will scan all words from a page of text on file: for all lines on the page, read all words on the line.
We start with a generalization of our factorization in primes problem.
The Sieve of Eratosthenes¶
Let us consider the problem of counting the number of primes less than a given number. The statement with input and output specifications is below:
Running factorization in primes for all numbers \(\leq n\) is too expensive.
The method called the Sieve of Eratosthenes works as follows:
A Boolean table
isprime[i]records prime numbers:if
iis prime, thenisprime[i] == True, otherwiseisprime[i] == False.All multiples of prime numbers are not prime: for all
isprime[i]:isprime[i*k] = False.
Consider for example the computation of all primes less than 10.
In the table below \(T\) stands for True
and \(F\) stands for False.
Initially, all entries in the table are True.
The algorithm uses a double loop:
- the first loop runs for i from 2 to n;
- the second loop runs only if i is prime,
setting to
Falseall multiples of i.
In order to be more efficient,
we may restrict the range of the first loop:
while (i < n//i+1).
Arrays in Python¶
Arrays are sequences of fixed length; filled with objects of the same type.
Compared to lists (variable length, heterogeneous), arrays are more memory efficient, and allow for faster access. Available in Python via the module {tt array}, the module {tt array} exports the class {tt array}.
We can create arrays directly from lists:
>>> from array import array
>>> L = range(2, 7)
>>> a = array('b', L)
array('b', [2, 3, 4, 5, 6])
The 'b' stands for signed integer, one byte.
Selecting an element from an array is the same as selecting an element from a list. Also the slicing operation is the same:
>>> a[1]
3
>>> a[1:3]
array('b', [3, 4])
The current memory info is obtained as
>>> a.buffer_info()
(28773520, 5)
The types are restricted to numerical types, summarized in Table 1.
| Type code | C Type | #bytes |
|---|---|---|
'b' |
signed integer | 1 |
'B' |
unsigned integer | 1 |
'u' |
Unicode character | 2 |
'h' |
signed integer | 2 |
'H' |
unsigned integer | 2 |
'i' |
signed integer | 2 |
'I' |
unsigned integer | 2 |
'l' |
signed integer | 4 |
'L' |
unsigned integer | 4 |
'q' |
signed integer | 8 |
'Q' |
unsigned integer | 8 |
'f' |
floating point | 4 |
'd' |
floating point | 8 |
An array can be converted into a list. Consider for example the following session:
>>> from array import array
>>> L = range(2,7)
>>> A = array('b', L)
>>> A
array('b', [2, 3, 4, 5, 6])
>>> type(A)
<class 'array.array'>
With the method tolist,
we convert the array to a list:
>>> K = A.tolist()
>>> K
[2, 3, 4, 5, 6]
>>> type(K)
<class 'list'>
Arrays can be converted to strings and strings can be converted back to arrays. The above Python session is continued:
>>> s = A.tostring()
>>> s
b'\x02\x03\x04\x05\x06'
In reverse, we can make an array from a string s:
>>> B = array('b')
>>> B.fromstring(s)
>>> B
array('b', [2, 3, 4, 5, 6])
Now we return to our application,
the sieving method to compute all primes less than a given number.
Initially, all entries in the table are True.
The initialization is in de code below:
def prime_sieve(nbr):
"""
Returns all primes less than nbr.
"""
isprime = array('B')
for _ in range(nbr+1):
isprime.append(1)
The algorithm uses a double loop:
- the first loop runs for i from 2 to n;
- the second loop runs only if i is prime,
setting to
Falseall multiples of i.
We extend the function prime_sieve:
def prime_sieve(nbr):
"""
Returns all primes less than nbr.
"""
isprime = array('B')
for _ in range(nbr+1):
isprime.append(1)
i = 2
while i < nbr//i+1:
if isprime[i] == 1:
for j in range(i, nbr//i+1):
isprime[i*j] = 0
i = i + 1
Why is nbr//i+1 necessary?
The last operation is the collection the primes from the sieve. The complete function is below:
def prime_sieve(nbr):
"""
Returns all primes less than nbr.
"""
isprime = array('B')
for _ in range(nbr+1):
isprime.append(1)
i = 2
while i < nbr//i+1:
if isprime[i] == 1:
for j in range(i, nbr//i+1):
isprime[i*j] = 0
i = i + 1
primes = array('l')
for i in range(2, nbr+1):
if isprime[i] == 1:
primes.append(i)
return primes
With the definition of main, we can use the script
sieve as a script and a module.
def main():
"""
Prompts the user for a natural number n and
prints all primes less than or equal to n.
"""
nbr = int(input('give a positive number : '))
primes = prime_sieve(nbr)
count = primes.buffer_info()[1]
print('#primes = ', count)
print(' primes = ', primes)
if __name__ == "__main__":
main()
Why are the last two lines useful?
>>> from sieve import prime_sieve
>>> p = prime_sieve(100)
Matrices¶
Python does not have a two dimensional array type. Instead we can store a matrix as a list of rows, where the data on each row is stored in an array. Consider for example a 5-by-5 matrix of random two digit numbers:
>>> from array import array
>>> from random import randint
>>> A = [array('b',[randint(10, 99)
... for _ in range(5)])
... for _ in range(5)]
>>> for row in A: print(row)
...
array('b', [23, 62, 85, 82, 38])
array('b', [68, 54, 18, 16, 37])
array('b', [91, 70, 88, 42, 56])
array('b', [42, 61, 90, 91, 41])
array('b', [40, 13, 19, 66, 54])
To select the row with index 2:
>>> A[2]
array('b', [91, 70, 88, 42, 56])
To select the element in column 3 or row 2:
>>> A[2][3]
42
We can define a matrix from a function. For example, consider a 5-by-5 matrix where the \((i,j)\)-th element is \(i+j\):
>>> from array import array
>>> f = lambda x,y: x+y
The function f defines the \((i,j)\)-th entry.
We use f in a doubly nested list comprehension
to define a list of rows:
>>> L = [array('b',[f(i,j) for i in range(5)])
... for j in range(5)]
>>> for row in L: print(row)
...
array('b', [0, 1, 2, 3, 4])
array('b', [1, 2, 3, 4, 5])
array('b', [2, 3, 4, 5, 6])
array('b', [3, 4, 5, 6, 7])
array('b', [4, 5, 6, 7, 8])
The problem we next consider is, given a matrix of numbers, find the row and column of the smallest element in the matrix. As a geometric test, we take points on a paraboloid. The simplest mathematical description of a paraboloid is the expression \(z = x^2 + y^2\), where the minimum is at the point \((0,0,0)\).
In our geometric setup, we want a * a 10-by-10 matrix of integer valued points; * where the minimum occurs at row 5, column 5. Therefore, we sample \(z = (x-5)^2 + (y-5)^2\), for x and y both ranging from 0 till 9.
def test():
"""
Tests the findmin on the values on a paraboloid.
"""
from array import array
paraboloid = lambda x, y: (x-5)**2+(y-5)**2
data = [array('b', [paraboloid(i, j) \
for i in range(10)]) \
for j in range(10)]
print('looking for a minimum in ')
for row in data:
print(row)
The function test() prints the following:
looking for a minimum in
array('b', [50, 41, 34, 29, 26, 25, 26, 29, 34, 41])
array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
array('b', [25, 16, 9, 4, 1, 0, 1, 4, 9, 16])
array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
The input/output specification of our problem is stated as follows:
The algorithm to solve this problem proceeds as follows:
- initialize current minimum with
A[0][0], and initialize its position to(0, 0) - go over all rows
iand columnsjofA - if
A[i][j]is less than current minimum, assignA[i][j]to the current minimum, assign(i, j)to its position
This algorithm is encode in the funciton findmin() below:
def findmin(matrix):
"""
Returns the row and the column indices
of the minimum element in the matrix.
"""
(row, col) = (0, 0)
val = matrix[row][col]
for i in range(0, len(matrix)):
nbcols = matrix[i].buffer_info()[1]
for j in range(0, nbcols):
if matrix[i][j] < val:
(row, col) = (i, j)
val = matrix[row][col]
return (row, col)
The test() function is then extended with the following lines:
def test():
# omitted lines
(i, j) = findmin(data)
print('minimum value %d occurs at (%d, %d)' \
% (data[i][j], i, j))
if __name__ == "__main__":
test()
To run the test, we type at the command prompt:
$ python3 findmin.py
looking for a minimum in
array('b', [50, 41, 34, 29, 26, 25, 26, 29, 34, 41])
array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
array('b', [25, 16, 9, 4, 1, 0, 1, 4, 9, 16])
array('b', [26, 17, 10, 5, 2, 1, 2, 5, 10, 17])
array('b', [29, 20, 13, 8, 5, 4, 5, 8, 13, 20])
array('b', [34, 25, 18, 13, 10, 9, 10, 13, 18, 25])
array('b', [41, 32, 25, 20, 17, 16, 17, 20, 25, 32])
minimum value 0 occurs at (5, 5)
Our next, somewhat harder problem, involves the location of saddle points. The simplest mathematical description of a saddle is \(z = x^2 - y^2\), where \((0,0,0)\) is a saddle point. The geometric setup is similar as with the paraboloid, we sample \(z = -(x-5)^2 + (y-5)^2\), for x and y both ranging from 0 till 9. The test function starts below:
def test():
"""
Testing the location of saddle points.
"""
from array import array
surface = lambda x, y: - (x-5)**2 + (y-5)**2
data = [array('b', [surface(i, j) \
for i in range(10)]) \
for j in range(10)]
print('looking for a saddle in ')
for row in data:
print(row)
Observe the minus sign in the definition of surface.
The test matrix, with some minor edits to display the
column structure better, is the following:
looking for a saddle in
array('b', [ 0, 9, 16, 21, 24, 25, 24, 21, 16, 9])
array('b', [ -9, 0, 7, 12, 15, 16, 15, 12, 7, 0])
array('b', [-16, -7, 0, 5, 8, 9, 8, 5, 0, -7])
array('b', [-21, -12, -5, 0, 3, 4, 3, 0, -5, -12])
array('b', [-24, -15, -8, -3, 0, 1, 0, -3, -8, -15])
array('b', [-25, -16, -9, -4, -1, 0, -1, -4, -9, -16])
array('b', [-24, -15, -8, -3, 0, 1, 0, -3, -8, -15])
array('b', [-21, -12, -5, 0, 3, 4, 3, 0, -5, -12])
array('b', [-16, -7, 0, 5, 8, 9, 8, 5, 0, -7])
array('b', [ -9, 0, 7, 12, 15, 16, 15, 12, 7, 0])
Observe the occurrence of the zero entries.
Now we give the numerical definition of a saddle point in a matrix. A saddle point is
- maximal in its row; and
- minimal in its column.
The input/output specification of the problem statement is then
The algorithm proceeds as follows:
- for all rows
i, letA[i][maxcol]be maximal - check in column
maxcolwhetherA[i][maxcol] <= A[k][maxcol], for allk != i
The execution of the algorithm is traced below:
looking for a saddle in
array('b', [ 0, 9, 16, 21, 24, 25, 24, 21, 16, 9])
array('b', [ -9, 0, 7, 12, 15, 16, 15, 12, 7, 0])
array('b', [-16, -7, 0, 5, 8, 9, 8, 5, 0, -7])
array('b', [-21, -12, -5, 0, 3, 4, 3, 0, -5, -12])
array('b', [-24, -15, -8, -3, 0, 1, 0, -3, -8, -15])
array('b', [-25, -16, -9, -4, -1, 0, -1, -4, -9, -16])
array('b', [-24, -15, -8, -3, 0, 1, 0, -3, -8, -15])
array('b', [-21, -12, -5, 0, 3, 4, 3, 0, -5, -12])
array('b', [-16, -7, 0, 5, 8, 9, 8, 5, 0, -7])
array('b', [ -9, 0, 7, 12, 15, 16, 15, 12, 7, 0])
max 25.0 in row 0, column 5 smaller value in row 1
max 16.0 in row 1, column 5 smaller value in row 2
max 9.0 in row 2, column 5 smaller value in row 3
max 4.0 in row 3, column 5 smaller value in row 4
max 1.0 in row 4, column 5 smaller value in row 5
max 0.0 in row 5, column 5 saddle at (5,5)
max 1.0 in row 6, column 5 smaller value in row 5
max 4.0 in row 7, column 5 smaller value in row 4
max 9.0 in row 8, column 5 smaller value in row 3
max 16.0 in row 9, column 5 smaller value in row 2
We go over all rows, looking for the largest element,
as is done in the function saddle.
def saddle(matrix):
"""
Returns the coordinates of saddles:
maximum in rows, minimum in columns.
"""
result = []
for i in range(0, len(matrix)):
(maxval, maxcol) = (matrix[i][0], 0)
nbcols = matrix[i].buffer_info()[1]
for j in range(1, nbcols):
if matrix[i][j] > maxval:
(maxval, maxcol) = (matrix[i][j], j)
prt = 'max %.1f in row %d, column %d' \
% (maxval, i, maxcol)
We have a candidate saddle point in matrix[i][maxcol]}.
Now we check whether it is minimal in its column, in the i-loop:
The code for the function saddle continues then:
# continuation of the body of the for i in range(0, len(matrix))
is_saddle = True
for k in range(0, len(matrix)):
if(k != i):
if matrix[k][maxcol] < maxval:
prt = prt + ' smaller value in row %d' % k
is_saddle = False
break
if is_saddle:
prt = prt + ' saddle at (%d,%d)' % (i, maxcol)
result.append((i, maxcol))
print prt
return result
For working with numerical data, arrays are more memory efficient. Searching in two dimensional tables occur frequently. We consider two common problems: finding extremal values and locating saddle points.
Such searching algorithms apply the same double loop: for all rows, for all columns, do something. Later we will scan all words from a page of text on file: for all lines on the page, read all words on the line.
Exercises¶
- Give a Python function
swapto reverse the order of the elements in an array. After callingswaponA = [2 9 8 3], we haveA = [3 8 9 2]. Do not create a new array insideswap. - Write a Python function
Duplicateswhose input is an arrayAand output is a list of elements that occur at least twice inA. - A plateau in an array is the longest sequence of the same elements that occur in the array. Write a Python function that returns the start and begin index of a plateau in a given array.
- Extend the function
saddlewith a parameterverbosewhich takes the default valueTrue. Ifverbose, then the stringsprtare shown, otherwise the functionsaddledoes not print anything. - For a two dimensional matrix of integer values,
write a formatted print function: every entry is printed
with the formatting string
'\%3d'. - Extend the
findminto find the smallest value in a 3-dimensional matrix. Think of a elements in the matrix as temperature measurements.