Oct 2

Lecture Overview

While the operating system supports it, the vast majority of the time we do not read and write files at arbitrary positions. 99.9% of the time, we read the entire file into some kind of data structure (list, dictionary, etc.) in memory. Our program then manipulates the data in memory for a while and then writes out the data by creating a new file to replace the current one. This is much easier than trying to edit a file in place.

While it is possible to create ad-hoc file formats, it is better to use a pre-existing file format. (A file format is the collection of rules for how the data is stored into a file as a sequence of bytes.) Common file formats allow communication between separate programs, allow many tools to be written that can process the formats, and have python modules which read and write the format.

The three most common general data formats are

There are other formats but the above three are by far the most common for storing general data. As an example we will use later, the USGS publishes earthquake data in JSON format. For example, here is a JSON file containing all M4.5+ earthquakes from the last day.

In addition to the above general data formats, there are also specific formats for specific kinds of data. Most common is for images and video, where there are a variety of image formats like PNG and WebM for video.

With statement

We can use python's with statement to cause python to automatically close a file, which happens once execution leaves the indented block.

Example

As an example, consider the following example. The problem is to store a list of books and if they are checked out or not (like for a library). The data is stored in JSON format. For the user interface, we will load in all the books on startup and save them when we exit. All the book operations take place on this list of books in memory. Recall that lists are passed by reference, so when the books list is edited, the list itself is changed.

The module additionally makes use of docstrings. This is a string that is the first statement of a function. It does not impact the execution of the function, instead it provides information about the function in a standard format.

import os
import json

#Each book is a dictionary with the following entries
#    key "avail", value a boolean if the book is available
#    key "num", value a unique integer for the book
#    key "title", value a string which is the title

def show_menu():
    "shows the menu, prompts for a choice, and returns the number"

    print("Welcome to our library!  Choose")
    print("  0. leave the program and save books")
    print("  1. show all books in the collection")
    print("  2. add a new book to the collection")
    print("  3. check out a book of the library")
    print("  4. return a book to the library")
    c = input("Type 0, 1, 2, 3, or 4 : ")
    return c

def show_books(books):
    "shows the books currently in the collection"

    print("") # Print a blank line
    for book in books:
        s = str(book["num"]) + ' ' + book["title"]

        # Now the available
        if book["avail"]:
            s += ' available'
        else:
            s += ' checked out'
        print(s)

    print("") # Print a blank line

def add_book(books):
    "adds a book to the collection"

    # Find max used num
    maxnum = 0
    for book in books:
        if book["num"] > maxnum:
            maxnum = book["num"]

    title = input('Enter a title: ')
    
    # Add the book with key one larger than the max key
    books.append({"avail": True, "num": maxnum+1, "title":title})

def checkout(books):
    "checks out a book"
    show_books(books)
    n = int(input('Enter book number: '))
    for book in books:
        if book["num"] == n:
            if not book["avail"]:
                print("Book is already checked out")
            else:
                book["avail"] = False
                
def checkin(books):
    "return a book to the library"
    show_books(books)
    n = int(input('Enter book number: '))
    for book in books:
        if book["num"] == n:
            if book["avail"]:
                print("Book is not checked out")
            else:
                book["avail"] = True

def main():
    # If the books.json file exists, load it.  Otherwise start with an
    # empty list of books
    if os.path.exists("books.json"):
        with open("books.json") as f:
            books = json.load(f)
    else:
        books = []

    while True:
        choice = show_menu()
        if choice == "0": break
        if choice == "1": show_books(books)
        if choice == "2": add_book(books)
        if choice == "3": checkout(books)
        if choice == "4": checkin(books)

    # Note the "w" tells the operating system we want to write to the file.
    # The "w" also causes the file (if it exists) to be replaced.
    with open("books.json", "w") as f:
        json.dump(books, f)

main()

Exercises

Add the ability to store the author to the library code above: