Working with MySQL

We introduce the processing of MySQL queries with Python scripts.

MySQL basics

If the MySQL is not started at boot time, then we have to start the daemon manually, as sudoer.

$ sudo mysqld_safe

Starting mysqld daemon with databases
from /usr/local/mysql/data

To shut the MySQL server down, use mysqladmin as sudo mysqladmin shutdown.

To create a database we use the create command with mysqladmin. For example: sudo mysqladmin create mydb. To delete the database we use the drop command with mysqladmin. For exmaple, to delete the database mydb, we do

$ sudo mysqladmin drop mydb
Dropping the database is potentially a very bad thing to do.
Any data stored in the database will be destroyed.

Do you really want to drop the 'mydb' database [y/N] y
Database "mydb" dropped

In the MySQL language, the client which makes requests to the database server is called the monitor mysql. If proper permissions are set, then running mysql should not be done as superuser.

User administration is done via the monitor mysql, running as sudo mysql or a mysql -u root. To certain users we can give specific privileges to certain databases and/or tables within a database.

First we have to create a user account in mysql. For example, for a user name on localhost:

mysql> create user 'name'@'localhost';

There is the possibility to set a password. To grant user with name all privileges on all databases:

mysql> GRANT ALL PRIVILEGES ON *.* TO 'name'@'localhost';

Replacing the wild card * by specific names of databases and/or tables allows to restrict the privileges to certain databases and/or tables.

After the user administration, mysql runs under a regular user account.

$ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.29 MySQL Community Server (GPL)

Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

Leaving mysql, happens via the exit command.

mysql> exit
Bye

The command line usage of mysql is very useful to try single queries. Careful programming avoids mistakes by observing the following three rules.

  1. First try a command before placing it into a script.
  2. Have the script print the command.
  3. Eexecute one printed command as a test.

There are not many mysql commands we need to know to write useful programs. The main commands to work with tables are listed in Table 8.

Table 8 MySQL commands for tables
command syntax description
use <dbname> make database current
show tables show tables in current database
create table <name> <field(s)> create a table
explain <name> display data types of all fields
drop table <name> delete a table

To change a table, we use queries. The command syntax for the most important queries is listed in Table 9.

Table 9 MySQL commands for queries
command syntax description
select <field(s)> from <table> retrieve records
insert into <table> <values> insert records
delete from <table> delete records
update <table> set <values> update records

More involved queries have clauses, with added command syntax where <criteria> order by <field> ASC | DSC.

The mysql commands terminate with a semicolon ;.

Database for Python scripts

Consider the problem of managing the Python scripts we have studied in this course.

So far, there are over 70 Python scripts posted at the course web site. The scripts are listed chronologically grouped along the lectures ... but then, there are also the scripts for the projects and quizzes.

Our goal is to build a systematical catalog, so we can sort the scripts in several ways.

The first step is to create a database and grant access to its users. The name of the database is OurPyFiles A database is created by the administrator mysqladmin executed as superuser, as sudo mysqladmin create OurPyFiles

To grant access to users who are not sudoers, we run mysql as a sudoer.

$ sudo mysql
mysql> GRANT ALL ON OurPyFiles.* TO 'jan'@'localhost';
Query OK, 0 rows affected (0.05 sec)

mysql> Bye

With the * in OurPyFiles.* we grant access to all tables in the database OurPyFiles. As a test, we do

$ mysql
mysql> use OurPyFiles;
Database changed

Observe the absence of the sudo in front of the mysql. This confirms that the database OurPyFiles exists, even though it is empty, and that access is granted.

The database OurPyFiles will have one table: scripts. For every script, we have a type (L, P, or Q), a number, a date, and a file name.

mysql> use OurPyFiles
Database changed
mysql> create table scripts
    -> (t CHAR(1), n INT, d DATE, f CHAR(20));
Query OK, 0 rows affected (0.00 sec)

As a good practice, the command explain is the next command to run after the creation of a new table, to check whether the structure of the table corresponds with what we wanted.

mysql> explain scripts;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| t     | char(1)  | YES  |     | NULL    |       |
| n     | int(11)  | YES  |     | NULL    |       |
| d     | date     | YES  |     | NULL    |       |
| f     | char(20) | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+
4 rows in set (0.10 sec)

Some of the most common numeric data types are listed in Table 10.

Table 10 most common numeric types in MySQL
numeric types description
INT integer in \([-2^{31}, 2^{31} - 1]\)
SMALLINT 2-byte integer
FLOAT floating-point number

The data types for time are listed in Table 11.

Table 11 types for date and time
date and time description
YEAR year as yyyy
DATE date in the format yyyy-mm-dd
TIME time in the format hh:mm:ss
DATETIME yyyy-mm-dd hh:mm:ss
TIMESTAMP date expressed in seconds

We distinguish between strings of fixed length and strings of unlimited length, as listed in Table 12.

Table 12 strings for fixed and unlimited length
string type description
CHAR(size) string of length size
TEXT strings of unlimited length

Returning to our mysql session, we have created a table scripts in the database OurPyFiles.

mysql> use OurPyFiles;
mysql> show tables;
+----------------------+
| Tables_in_ourpyfiles |
+----------------------+
| scripts              |
+----------------------+
1 row in set (0.00 sec)
mysql> explain scripts;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| t     | char(1)  | YES  |     | NULL    |       |
| n     | int(11)  | YES  |     | NULL    |       |
| d     | date     | YES  |     | NULL    |       |
| f     | char(20) | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+
4 rows in set (0.00 sec)

Using MySQLdb or pymysql

The table will contain over 80 records. Entering the data manually is tedious and may lead to errors. Therefore, we use MySQLdb, or pymysql with Python 3.6

$ python3
Python 3.6.0 (v3.6.0:41df79263a11, Dec 22 2016, 17:23:13)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymysql

There are two tasks:

  1. grab the headers of each .py file;
  2. insert header fields into table.

All Python scripts are in some directory on disk. With the os module we write utilities which are independent of the operating system. The commands we will use are

  • os.listdir(<directory>) returns a list of strings of names of files and directories in <directory>;
  • os.getcwd() returns the path name of the current working directory;
  • os.chdir(<path name>) changes the current working directory to the <path name>.

The script to grab the headers assumes that every script is documented in a uniform manner. For example, the script to grab the headers starts as

# L-25 MCS 275 Wed 8 Mar 2017 : grabpyhead.py
"""
Grabs the header line of all .py programs.
When the file starts with '#!', the header
line is not the first but the second line.
Every file has four fields:
type, number, date, and file name.
"""

The format conversion results in

('L', '25', 'Wed 8 Mar 2017', 'grabpyhead.py')

To select only the files with extension .py, the function below helps to make the distinction.

import os

def has_py_ext(name):
    """
    Returns True if the name ends in ".py",
    returns False otherwise.
    """
    try:
        return name[-3:] == '.py'
    except:
        return False

To recognize the lines starting with `` \#! the function start_path() is applied.

def start_path(line):
    """
    Returns True if the line appears to be
    the path of the python interpreter,
    returns False otherwise.
    """
    try:
        return line[0:2] == '#!'
    except:
        return False

To extract the information from the header, the function below splits a line into fields.

def split_fields(line):
    """
    Returns a tuple with the fields of the header.
    """
    L0 = line.split(' : ')
    fname = L0[1]
    L1 = L0[0].split(' MCS 275 ')
    fdate = L1[1]
    L2 = L1[0].split(' ')
    L3 = L2[1].split('-')
    ftype = L3[0]
    fnumb = L3[1]
    return (ftype, fnumb, fdate, fname)

The main function to enumerate all fields from the headers is below.

def enum_fields(d, fun):
    """
    Enumerates all fields in the header of
    the .py files in the directory d.
    For each field, the function f is called.
    Returns the number of .py files.
    """
    L = os.listdir(d)
    cnt = 0
    for filename in L:
        if has_py_ext(filename):
            cnt = cnt + 1
            file = open(filename, 'r')
            line = file.readline()
            if start_path(line):
                line = file.readline()
            try:
                fun(split_fields(line[:-1])) # omit \n
            except:
                print('exception occurred with file ', filename)
        file.close()
    return cnt

The enum_fields routine is an iterator. We use a callback function in an iterator. An iterator enumerates all items in a collection. Actions on the items are of no concern of the iterator, display items, select items, or ...

In good software design, we write iterator separately with simple callback function (e.g.: print) to test, and we make callback functions specific for applications.

The test callback function and its use in main() is shown below.

def print_fields(data):
    """
    Use as argument to test the enumerator.
    """
    print(data)

def main():
    """
    Prints the header of all .py files
    in the current directory.
    """
    nbr = enum_fields('.', print_fields)
    print('counted %d .py files' % nbr)

if __name__ == "__main__":
    main()

In using the pymysql API (or MySQLdb), there are five things we have to remember:

  1. import pymysql
  2. connect to the database: <connection> = pymysql.connect(db="<dbname>")
  3. create a cursor object: <cursor> = <connection>.cursor()
  4. execute mysql commands: <cursor>.execute(<command string>) returns number of rows in the result
  5. retrieve results of the queries:
    • <cursor>.fetchone() returns a single row
    • <cursor>.fetchall() returns all rows
    • <cursor>.rowcount returns the number of rows

To convert the headers into data tuples, suitables for insertion with SQL queries, we have to convert the format of the date, as done by the function my_data() defined below.

import os
import pymysql
from grabpyhead import enum_fields

def my_date(data):
    """
    Converts a string such as Wed 9 Mar 2016
    into the format 2016-03-09.
    """
    months = {"Jan":"01", "Feb":"02", "Mar":"03", \
       "Apr":"04", "May":"05", "Jun":"06", \
       "Jul":"07", "Aug":"08", "Sep":"09", \
       "Oct":"10", "Nov":"11", "Dec":"12"}
    vals = data.split(' ')
    day = '%02d' % int(vals[1])
    return vals[3] + '-' + months[vals[2]] + '-' + day

The my_data function is applied in the function which formulates the query.

def insert_data(cur, doit=False):
    """
    Data is inserted into the database,
    using the cursor cur.
    """
    def insert(data):
        """
        Uses the tuple data to insert into
        the table scripts.
        """
        cmd = 'insert into scripts values (' \
            + '\"' + data[0] + '\"' + ',' \
            + '\"' + data[1] + '\"' + ',' \
            + '\"' + my_date(data[2]) + '\"' + ',' \
            + '\"' + data[3] + '\"' + ');'
        cur.execute(cmd)

Before the query is executed, we test the format of the queries first by printing each query command.

def insert_data(cur, doit=False):
    """
    Data is inserted into the database,
    using the cursor cur.
    """
    def insert(data):
        ... code omitted ...
        cur.execute(cmd)

    if doit:
        nbr = enum_fields('.', insert)
    else:
        nbr = enum_fields('.', print)
    return nbr

The filling of the table scripts, as done by the script filldb.py is defined in the function main() below. The collection of scripts is in the folder ../MCS275py.

def main():
    """
    Prints the header of all .py files
    in the current directory.
    """
    pth = os.getcwd()
    os.chdir('../MCS275py')
    ourdb = pymysql.connect(db="OurPyFiles")
    crs = ourdb.cursor()
    ans = input("really do it ? (y/n) ")
    nbr = insert_data(crs, ans == 'y')
    print('inserted %d .py files' % nbr)
    ourdb.commit()
    os.chdir(pth)

We conclude with some example queries. To sort the records in the table scripts by date d, we can do:

mysql> select * from scripts order by d;

The * is a wild card, the returned table contains all fields of scripts.

To retrieve date and file name of all quiz scripts:

mysql> select d, f from scripts where t = "Q";

A script to view all rows is viewdbdata.py, with main function shown below.

import pymysql

def main():
    """
    Executes a simple query to the database.
    """
    db = pymysql.connect(db="OurPyFiles")
    c = db.cursor()
    q = 'select * from scripts order by d'
    lc = c.execute(q)
    print('found %d rows' % int(lc))
    while True:
         print(c.fetchone())
         ans = input('see more ? (y/n) ')
         if ans != 'y': break

For more on formulating queries in Python, see http://www.python.org/doc/topics.

Exercises

  1. Extend the script filldb.py so that it also recursively looks for .py files in all subdirectories.
  2. The database is not normalized because type, number, and dates are redundant. Use mysql to select from scripts to create a table typedates to store only data like "L-25" and "8 Mar 2017". Select from scripts to create a table typefiles to store "L-25" and "viewdbdata.py".
  3. Modify the Python script filldb.py so it fills the tables typedates and typefiles (defined in the previous exercise) while scanning the .py files.
  4. Extend the script viewdbdata.py asking for sorting directives. There are 32 possible orders.