Sep 30

Lecture Overview

All data in memory is lost when a program exits. (The operating system reclaims the memory to use for other processes.) Also, all data in memory is lost when the computer shuts down. To overcome this, the hardware and operating system provide several ways to store data such that it is preserved across program exit and preserved across power-off. There are two main ways programs store data: files and databases.

Files

Files are accessed by a syscall to the operating system. Files are organized into File systems, which are implemented by the operating system.

Paths

To access a file, we use a filepath. Python has a bunch of functions to help with paths in the os.path module. There are two types of paths:

For example,

As mentioned, the os.path module has a bunch of functions for working with paths. Also, chdir allows the program to change its current directory. Most files should be accessed by relative paths (after setting the current directory appropriately, usually at program start). Accessing files by relative paths makes it easer to have the code run in many stations on many different computers and operating systems. Different peoples computers might have different directory organization, but by using relative paths this can be mostly ignored, since the program just needs to set its current directory correctly.

Open/Close/Read/Write

To access files, the operating system provides four syscalls:

Buffering

Because of seek time and latency, the operating system and python together go to great lengths to make file access as fast as possible. One major way this happens is via a buffer or cache. The operating system keeps copies of the contents of the file in memory. Usually we don't need to care about this and we can just let the OS and python do their thing, but when writing we do need to be aware of it. When writing, our data does not appear immediately in the file. The operating system will queue up our writes and only periodically write the contents to disk, so it might be up to a few seconds before our writes actually reach the disk.

Text File Access

See the tutorial. We open a file and then can read and write it before closing it.

Exercises

The NOAA publishes current weather data on their website here. First, go to this page for Illinois and copy and paste the data from the top of the page down to the first horizontal break into a new file on your computer called illinois-weather.txt or something like that. (The page lists a whole lot of data, one entry for each hour. Only copy up to the first break which is the data for the most recent hour. Side note, in a couple weeks we will learn how to get python to download this data for us, but for now we are concentrating on files so just copy and paste the data into a file.)

Now that we have this file, your task is to write a python program which prompts the user for a city, loads the entire file into memory, and prints out just the lines matching the city or prints "No city found".

To load the entire file, you can use the following code.

with open("path/to/illinois-weather.txt") as f:
    lines = f.readlines()

After this, lines will be a list of strings. Now use a loop and a string method and the in operator to check if the line matched the city the user entered.