All data in memory is lost when a program exits. (The operating system reclaims the memory to use for other processes.) Also, all data in memory is lost when the computer shuts down. To overcome this, the hardware and operating system provide several ways to store data such that it is preserved across program exit and preserved across power-off. There are two main ways programs store data: files and databases.
Files are accessed by a syscall to the operating system. Files are organized into File systems, which are implemented by the operating system.
To access a file, we use a filepath. Python has a bunch of functions to help with paths in the os.path module. There are two types of paths:
/
(Modern versions of windows can use either /
or \
to separate paths, Linux and OS X use only /
so most of the time programs will use /
) An absolute path is determined by starting the path with /
to signal the root. You should usually avoid absolute paths, since they are not as portable./
.For example,
/home/john/Downloads/python-3.5.0.tgz
is a absolute path. The location is found by starting at the root, traversing directories home
, john
, Downloads
. The filename is then python-3.5.0.tgz
.Downloads/python-3.5.0.tgz
is a relative path. Say the current directory was /home/john
. Then the file referenced by this relative path will be the same as the previous, since the operating system will combine the current directory with the relative path.somefile.txt
is a relative path as well. If the current directory was /home/john
, then the file referenced would be /home/john/somefile.txt
.As mentioned, the os.path module has a bunch of functions for working with paths. Also, chdir allows the program to change its current directory. Most files should be accessed by relative paths (after setting the current directory appropriately, usually at program start). Accessing files by relative paths makes it easer to have the code run in many stations on many different computers and operating systems. Different peoples computers might have different directory organization, but by using relative paths this can be mostly ignored, since the program just needs to set its current directory correctly.
To access files, the operating system provides four syscalls:
Open: This appears in python as the open function. To open a file, you provide a path and some options, and the operating system locates the file on disk, checks you have permissions, and other housekeeping. It then returns what we call a handle to the file. This is an data value which we consider opaque, meaning we never look at the contents of the value. The data value is just there for us to use to reference the file and also to use for the next three syscalls.
Read: The read syscall takes a file handle, an offset within the file, and a length. It returns the contents of the file as a list of bytes at the given offset from the beginning of the file and with the given length. Working directly with files in this manor is somewhat painful, so Python provides many more convenient functions that internally use read. I will discuss these in a moment.
Write: The write syscall takes a file handle, an offset within the file, and a list of bytes to write. It then writes to those bytes to the file at the given offset. Again, python provides more convenient functions than directly working with the write syscall.
Close: once a file handle is no longer needed by the program, it should be closed. The only argument is the filehandle, and that filehandle will be closed. The filehandle will now be invalid, any read/write to it after close will result in errors.
Because of seek time and latency, the operating system and python together go to great lengths to make file access as fast as possible. One major way this happens is via a buffer or cache. The operating system keeps copies of the contents of the file in memory. Usually we don't need to care about this and we can just let the OS and python do their thing, but when writing we do need to be aware of it. When writing, our data does not appear immediately in the file. The operating system will queue up our writes and only periodically write the contents to disk, so it might be up to a few seconds before our writes actually reach the disk.
See the tutorial. We open a file and then can read and write it before closing it.
The NOAA publishes current weather data on their website here. First, go to this page for Illinois and copy and paste the data from the top of the page down to the first horizontal break into a new file on your computer called illinois-weather.txt
or something like that. (The page lists a whole lot of data, one entry for each hour. Only copy up to the first break which is the data for the most recent hour. Side note, in a couple weeks we will learn how to get python to download this data for us, but for now we are concentrating on files so just copy and paste the data into a file.)
Now that we have this file, your task is to write a python program which prompts the user for a city, loads the entire file into memory, and prints out just the lines matching the city or prints "No city found".
To load the entire file, you can use the following code.
with open("path/to/illinois-weather.txt") as f:
lines = f.readlines()
After this, lines will be a list of strings. Now use a loop and a string method and the in
operator to check if the line matched the city the user entered.