Oct 21

Lecture Overview

The tools we have developed so far (loops, lists, dictionaries, if statements, syscalls) are enough to implement any program. In fact, programming languages from the 70s and early 80s like C and Fortran implement essentially these features and there is a wealth of programs written in these languages. As we have seen, there are many ways of structuring a program (how many functions, is data in a global variable or passed as an argument, is some task split into modules or functions, etc.). One hard-won lesson we have learned over the years has been that the way you structure your program has a huge impact on the number of bugs, how difficult bugs are to find and fix, and how maintainable the program is. (Maintenance is the effort needed a year from now to keep the program operational and adding new features.)

Over the years there have therefore been many guidelines or recommendations for how to structure your program to lead to the fewest bugs and most maintainable program. There are hundreds of these guidelines. Unfortunately, mostly by observing failure of projects have we determined which are good and which are not so good. Also, we have yet to find a single system which is great with no downsides, each system has its good points and bad points.

Procedural Programming

I have mentioned top down design several times. The idea is that you split your task into components, each component is split into subcomponents, and so on. For example, the task of writing firefox might split into a component for network communication, a component to manage bookmarks, a component to draw the menu bar, etc. The bookmark component might be further split into a subcomponent to read and write the file format on disk, a subcomponent to allow the user to edit the bookmarks, and so on. Procedural programming is the design style where you use top-down design to divide your program into components. Each component then becomes a directory, sub-components become files and modules, and sub-sub-components become functions. Procedural programming is a great design for small to medium sized programs, and even works for large programs like the Linux kernel (which has 15+ million lines of code in over 3000 directories and 47,000 files).

The key goal during top-down design is low coupling between components. For example, the component of Firefox that draws the menu bar shouldn't know or depend on any of the details of how the bookmark component works, instead the menu bar should just ask the bookmark component for the bookmarks to draw on the screen. It doesn't need to know if the bookmarks are stored in a file or in a SQLite database or anything like that. At the function level, functions already provide low coupling because of local variables. Each function gets its own local variables and other functions can't see or interact with them. But on the level of the bookmark/menu-bar components, procedural programming does not have programming language assistance in keeping things separate. That is, python will allow the menu-bar component to call any function within the bookmark component. It is up to us humans to specify in the documentation which functions are public (intended for other components to use) and which functions are private (intended only for use within the component, for example the function to load the bookmarks from SQLite).

Another example where procedural programming doesn't help us is the SVM classifiers from last time. The SVM classifier first runs on some training data and picks the best lines separating the training data (recall this image). The formulas for these lines must be stored somewhere so that when it comes time to classify future points we can use the formulas. One good way of storing these formulas is in a dictionary, i.e. if we were in two dimensions, I could store a dictionary with keys "y-intercept", "slope", "min-x", and "max-x". These four properties give me the line plus where it starts and stops. The problem is now any other code can see these properties and potentially use them directly. For a good design, I want low coupling so other code shouldn't be accessing these directly. To obtain this low coupling, I should create a function classify which classifies a point. Other code should then just call this classify function and never work with the lines themselves directly. This allows me to for example change how I store the line (maybe instead of y-intercept and slope I want to just store two points on the line), and if I do change how the line is stored only the classify function must be updated. All other code calls classify so is immune to changes in how the line is stored; the format for storing the line is private.

To summarize, when designing a component some data and functions should be kept private in that only other functions within the component ever access these private data and functions. The component then has a few public functions which other components interact with. Such a design is certainly possible with procedural programming. What you do is just document which functions and data are private and which are public, and then when calling functions in another component you restrict yourself to the public functions. For small to medium sized programs, this is not too difficult. Thus for small to medium sized programs, top-down design and procedural programming is a great fit.

For large programs with many components and many programmers, it would be nice to have some programming language assistance so that if someone tried to access my private data they get an error message. For example, in python all functions in a module are available if you import the module. It would be nice to be able to mark certain functions (and data) as private so that python will give an error message if someone else tries to access one of our private functions. What I am about to talk about next, while providing some helpful features, is more complex which comes at some cost.

Object Orientated Programming

Object Orientated Programming, developed in the 80s and 90s, is one way of providing programming language features to assist with the low coupling between components. Several languages developed during this time period like C++, Java, C#, Python, and others implement this.

A word of caution. During the 90s and 2000s, there was this movement to make everything use object orientated programming because "obviously" object orientated programming is the best thing ever. There was quite a bit of marketing, persuasive articles, blog posts, and so on along these lines and some of that is still around today (toned down somewhat from how it used to be). Unfortunately, things didn't turn out quite that rosy and you should just keep in mind that object orientated programming should not be crammed into every possible situation. Unfortunately, experience or reading about other people's experience seems to be the only way to develop a sense for when some design method (like object orientated programming) is a good fit and when some design method will just increase complexity and increase the bug count. I hope to partially address this with Project 3 (to come). One example of where object orientated programming is not a good fit is in parallel computation and cloud computing, where object orientated programming usually leads to a situation where bugs are much easer to create and very hard to find and eliminate. But applied in the correct situation, object orientated programming is a powerful design method to organize code.

Classes and Objects

See the python tutorial

A python object is a data value stored out in memory cells that consists of two things: properties and methods.

An Example

class MyClass:
    x = 42  # This is an initial property with name x and value 42

    # This is an initial method.  The method will receive the object
    # as its first argument, which by convention is called self
    def foo(self):
        # self.x refers to the x property in the object pointed to by self
        print("The current x value is ", self.x)

        self.x = self.x + 20

obj = MyClass() # Create a new object according to the template
print(obj.x)    # Reference the x property
obj.foo()       # Call the foo method.  Note the self parameter is set for us.
obj.foo()       # Call the method again, note the new value of x.
obj.x = 100     # Update the x property to 100
obj.foo()       # Call the method again.

obj2 = MyClass() # Create a second object

obj.x = 500      # Update the x values of both objects
obj2.x = 300

obj.foo()        # Call both methods
obj2.foo()

obj.z = "Hello, World"  # Create a new property z with value a string
print(obj.z)            # Reference the newly created property

Exercises

No homework today, work on the project.