November 18

Lecture Overview

Estimating Algorithm Execution Time

The actual speed/execution time of an algorithm will depend on how it exactly is converted to code and details about the hardware that executes the code. Despite that, we can still obtain a rough estimate for the speed. Our goal here is to compare two different algorithms and if the rough estimate is enough to show that one is much faster than another we are happy.

To estimate the speed we use big-O notation. f(n) is O(g(n)) if there exists a constant C such that f(n) <= Cg(n) for all n.

Consider this pseudocode, which finds the smallest element in a list

min_idx = 0
for i in 1 to len(L) - 1
    if L[i] < L[min_idx]
        min_idx = i
    endif
endfor

To estimate the time, we decide for each line of code how long it takes to execute and how many times it is executed. For "base" instructions we just say it takes a constant amount of time. For the discussion let n = len(L). First, line one takes time c1 for some constant which depends on the implimentation and execution and I can't say anything more specific. Line one is also is executed once so the total time "spent" on line one is just c1. Line two consists of an increment of i and a test that i is at most len(L) but each of those take some amount of time, say c2. Also, line two is executed n times for a total time "spent" on line two of c2 n. Similarly, lines 3 and 4 take constant time c3 and c4 and are executed n times. Therefore the total time is

c1 + (c2 + c3 + c4) n

By definition of big-O, this is O(n) so we say that finding the minimum element in a list runs in time O(n) and we mean there is some constant c (which depends on the details how how the algorithm is implemented and executed) so that the time is at most c n.

The speed of selection sort from last time is O(n²) where n is the length of the list. The analysis is similar, each line is some constant but the inside of the inner loop over j is run n(n+1)/2 times.

Exercises

Here is the selection sort from last time in python

def selection_sort(L):
    for i in range(0, len(L)):

        # Find the smallest among i...len(L)
        min_idx = i
        for j in range(i+1,len(L)):
            if L[j] < L[min_idx]:
                min_idx = j

        # Move smallest into position i
        temp = L[min_idx]
        L[min_idx] = L[i]
        L[i] = temp

Use the time.time function to time runs of my selection sort code on lists of size 10, 100, 500, 1000, 10000, 50000, 100000. If it takes longer than 30 seconds or so, just use Control-C to quit and mark the time as ">= 30 seconds". Also, you might want to run it a few times at a given size just to make sure the randomness isn't giving you a wierd result. Here is sample code for a list of size 10.

import time
import random
L = [random.randint(0,10000) for i in range(10)]

start = time.time()
selection_sort(L)
end = time.time()

print("%f seconds" % (end - start))

Do the same timing with the same sizes but with python's built in sorting algorithm L.sort(). Python uses an algorithm called Timsort. How big do you need to make the list before python takes more than a second to sort? For me it seems to be around 2 million.
Selection sort is O(n²) and Timsort is O(n log n). Explain using big-O notation why Timsort is much better than selection sort on large lists, no matter how the algorithms are actually implemented in code.
For real timing of python code, one should use the timeit module. (See here for some examples.) Convert the timing code to timeit and also explain what additional results the timeit module is giving you over just using time.time.
If you are bored, watch some of the videos from this page or youtube or their website. You can find a video of a selection sort dance there, plus compare some of the other sorting algorithms.