Dec 2

Lecture Overview

Bugs

One major task during program development is detecting bugs. As programmers, we have several options:

Test the software manually. For example, if it is a GUI we can launch the GUI and click the buttons and see if the program works, or for a website we can just browse the website. Or we could just give the software to our users and expect them to report bugs. This works OK for small programs, but the main problem is that there is no guarantee that all our code or all the features are actually tested since manual testing is ad hoc. Also, if a bug does occur, we usually do not get a lot of useful information on where exactly the bug occurred. Which file? Which function?
Another option is to write a test suite. This is a collection of code, separate from our actual program, that calls functions and methods in our program code and checks the response. This has the advantage that each test can examine a single function or group of functions to make sure that it behaves correctly, so if a bug is present we can quickly identify the function(s) containing the bug. The downside is that writing test cases takes programming time, sometimes much more than the original program! SQLite is a good example, you should read this page on SQLite testing. SQLite has roughly 90,000 lines of code in the program itself and roughly 90 million lines of test code! The NASA Space Shuttle Code is another example that shows the amount of effort required for comprehensive testing.

SQLite and the shuttle code are outliers; most programs do not write that much test code because of the cost in money and programmer time. In most programs, not all the code in the program is tested and only some of the main paths through the code is tested. Untested code will then contain bugs which hopefully will be reported by the users and then fixed.

Another option is to use a tool that can automatically search a program for bugs without running the program. It will detect bugs just by looking at the source code for the program. As an example, consider a bug where you have a function foo that accepts two arguments, and somewhere else in your code you try calling foo with three arguments. We don't need to execute the program to detect that this will cause an error. The upside to such tools is that they are guaranteed to check the entire program, but the downside is that not all bugs can be searched for in this way. Tools that do this are called linters, in that they are checking your program for lint instead of all bugs. Pylint is the main linting tool for python.
One main problem with linters is that they cannot catch many bugs because they do not have enough information about the program, and so do not know if a line of code will cause a bug or not. So a final approach is to have a tool that checks our program without running it, but the tool takes additional information about our code that we provide to the tool. The tradeoff here is that this is extra work on our part and while it catches more bugs than a linter, it can't catch all bugs. But it is still less than writing code in a test suite. These tools are called type systems or type checkers and can either be built right into the programming language itself (by far the most common) or as a separate tool.

Type Systems

Python added a type system in Python 3.5 (which was just released a few months ago), but before that python did not have a type system. More than any other language feature, type systems vary significantly between languages, and so what I am about to briefly sketch is the background on how we think about types. Since it is a very new feature in Python, there isn't a lot of documentation on the specific implementation of type systems in python. The python documentation is targeted at people already familiar with type systems.

Types

A type is a set of values. (Side note, the word type is overused in computer science and programming so in different contexts it could mean other things.) A type is a set of values that can be stored and manipulated by a computer program. Some examples:

The type meter is the set of all integer valued distances measured in meters.
The type kilometer is the set of all integer valued distances measured in kilometers.
The type str is the set of all strings (sequences of characters).
The type Dict[str, meter] is the set of all python dictionaries which consist of keys from the str type/set and values from the meter type/set. More precisely, it consists of all dictionaries, such that for each entry (k,v) in the dictionary, k is an element of str and v is an element of meter.

Note that these types are mathematical sets which we can define how we like. That is, both meter and kilometer consist of integers, but we mathematically define them as different sets because 10 meters is not the same as 10 kilometers. We can then specify types on parameters, variables, and properties of objects. When we specify a type, what we are telling the type system is that for the code to be correct, the value of the parameter/variable/property must be an element of the set of values.

Here is an example. Say I have defined types meter, kilograms, and force (you can look in the python documentation if you want to discover how to do that). For each parameter to the function, I specify a type by using a colon and then the type. The value of the return is given by using an arrow -> and then the type name.

def gravitational_field(distFromSun: meters, mass: kilograms) -> force:
    return 6.67e-11 * mass / distFromSun**2

So for example, distFromSun is specified to have type meters. When I mean is that for the code to be correct, the distFromSun parameter must be an element of the meters set. The return type from the function is specified to be a type force. Now consider that I have a class for my planet which stores the coordinates x, y, and z from the sun in kilometers:

class Planet:
    def __init__(self, name: str, x: kilometers, y: kilometers, z: kilometers):
        self.name = name
        self.x = x
        self.y = y
        self.z = z

    def dist_from_sun(self) -> kilometers:
        return math.sqrt(self.x**2 + self.y**2 + self.z**2)

Note that I specify name to have type str and x, y, and z to have type kilometers in the init function. These get set as properties on the planet class. Next, the dist_from_sun method is annotated that the return value has type kilometers. With this code, if I write the following code I have an error

earth = Planet("Earth", x=1.266384612505139E+08, y=7.903780957444319E+07, z=-2.711615123040974E+04)
print(gravitational_field(earth.dist_from_sun(), 5.972e24))

Note that I won't get an error message from Python and without the type hints the code would run fine, but the gravitational field would not be correctly computed since I passed kilometers into a function which expects meters. But with the type hints, I will actually get an error message and this can be detected by the type system without running the code. The type system will notice that the type of the return value from dist_from_sun does not match the type expected by gravitational_field.

Exercises

No exercises