Working with MySQL¶
We introduce the processing of MySQL
queries
with Python scripts.
MySQL basics¶
If the MySQL
is not started at boot time,
then we have to start the daemon manually, as sudoer.
$ sudo mysqld_safe
Starting mysqld daemon with databases
from /usr/local/mysql/data
To shut the MySQL server down, use mysqladmin
as sudo mysqladmin shutdown
.
To create a database we use the create
command with mysqladmin
.
For example: sudo mysqladmin create mydb
.
To delete the database we use the drop
command with mysqladmin
.
For exmaple, to delete the database mydb
, we do
$ sudo mysqladmin drop mydb
Dropping the database is potentially a very bad thing to do.
Any data stored in the database will be destroyed.
Do you really want to drop the 'mydb' database [y/N] y
Database "mydb" dropped
In the MySQL
language, the client which makes requests
to the database server is called the monitor mysql
.
If proper permissions are set, then running mysql
should not be done as superuser.
User administration is done via the monitor mysql
,
running as sudo mysql
or a mysql -u root
.
To certain users we can give specific privileges
to certain databases and/or tables within a database.
First we have to create a user account in mysql
.
For example, for a user name
on localhost:
mysql> create user 'name'@'localhost';
There is the possibility to set a password.
To grant user with name
all privileges on all databases:
mysql> GRANT ALL PRIVILEGES ON *.* TO 'name'@'localhost';
Replacing the wild card *
by specific names of databases
and/or tables allows to restrict the privileges to certain
databases and/or tables.
After the user administration, mysql
runs under a regular
user account.
$ mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.29 MySQL Community Server (GPL)
Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
Leaving mysql, happens via the exit
command.
mysql> exit
Bye
The command line usage of mysql
is very useful to try single queries.
Careful programming avoids mistakes by
observing the following three rules.
- First try a command before placing it into a script.
- Have the script print the command.
- Eexecute one printed command as a test.
There are not many mysql
commands we need to know
to write useful programs. The main commands to work with tables
are listed in Table 8.
command syntax | description |
---|---|
use <dbname> |
make database current |
show tables | show tables in current database |
create table <name> <field(s)> |
create a table |
explain <name> |
display data types of all fields |
drop table <name> |
delete a table |
To change a table, we use queries. The command syntax for the most important queries is listed in Table 9.
command syntax | description |
---|---|
select <field(s)> from <table> |
retrieve records |
insert into <table> <values> |
insert records |
delete from <table> |
delete records |
update <table> set <values> |
update records |
More involved queries have clauses, with added command syntax
where <criteria>
order by <field>
ASC | DSC.
The mysql
commands terminate with a semicolon ;
.
Database for Python scripts¶
Consider the problem of managing the Python scripts we have studied in this course.
So far, there are over 70 Python scripts posted at the course web site. The scripts are listed chronologically grouped along the lectures ... but then, there are also the scripts for the projects and quizzes.
Our goal is to build a systematical catalog, so we can sort the scripts in several ways.
The first step is to create a database and grant access
to its users. The name of the database is OurPyFiles
A database is created by the administrator mysqladmin
executed as superuser, as
sudo mysqladmin create OurPyFiles
To grant access to users who are not sudoers,
we run mysql
as a sudoer.
$ sudo mysql
mysql> GRANT ALL ON OurPyFiles.* TO 'jan'@'localhost';
Query OK, 0 rows affected (0.05 sec)
mysql> Bye
With the *
in OurPyFiles.*
we grant access to all tables
in the database OurPyFiles
. As a test, we do
$ mysql
mysql> use OurPyFiles;
Database changed
Observe the absence of the sudo
in front of the mysql
.
This confirms that the database OurPyFiles
exists,
even though it is empty, and that access is granted.
The database OurPyFiles
will have one table: scripts
.
For every script, we have a type (L, P, or Q),
a number, a date, and a file name.
mysql> use OurPyFiles
Database changed
mysql> create table scripts
-> (t CHAR(1), n INT, d DATE, f CHAR(20));
Query OK, 0 rows affected (0.00 sec)
As a good practice, the command explain
is the next
command to run after the creation of a new table,
to check whether the structure of the table corresponds
with what we wanted.
mysql> explain scripts;
+-------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| t | char(1) | YES | | NULL | |
| n | int(11) | YES | | NULL | |
| d | date | YES | | NULL | |
| f | char(20) | YES | | NULL | |
+-------+----------+------+-----+---------+-------+
4 rows in set (0.10 sec)
Some of the most common numeric data types are listed in Table 10.
numeric types | description |
---|---|
INT | integer in \([-2^{31}, 2^{31} - 1]\) |
SMALLINT | 2-byte integer |
FLOAT | floating-point number |
The data types for time are listed in Table 11.
date and time | description |
---|---|
YEAR | year as yyyy |
DATE | date in the format yyyy-mm-dd |
TIME | time in the format hh:mm:ss |
DATETIME | yyyy-mm-dd hh:mm:ss |
TIMESTAMP | date expressed in seconds |
We distinguish between strings of fixed length and strings of unlimited length, as listed in Table 12.
string type | description |
---|---|
CHAR(size ) |
string of length size |
TEXT | strings of unlimited length |
Returning to our mysql
session,
we have created a table scripts
in the database OurPyFiles
.
mysql> use OurPyFiles;
mysql> show tables;
+----------------------+
| Tables_in_ourpyfiles |
+----------------------+
| scripts |
+----------------------+
1 row in set (0.00 sec)
mysql> explain scripts;
+-------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| t | char(1) | YES | | NULL | |
| n | int(11) | YES | | NULL | |
| d | date | YES | | NULL | |
| f | char(20) | YES | | NULL | |
+-------+----------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Using MySQLdb or pymysql¶
The table will contain over 80 records.
Entering the data manually is tedious and may lead to errors.
Therefore, we use MySQLdb
, or pymysql
with Python 3.6
$ python3
Python 3.6.0 (v3.6.0:41df79263a11, Dec 22 2016, 17:23:13)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymysql
There are two tasks:
- grab the headers of each .py file;
- insert header fields into table.
All Python scripts are in some directory on disk.
With the os
module we write utilities
which are independent of the operating system.
The commands we will use are
os.listdir(<directory>)
returns a list of strings of names of files and directories in<directory>
;os.getcwd()
returns the path name of the current working directory;os.chdir(<path name>)
changes the current working directory to the<path name>
.
The script to grab the headers assumes that every script is documented in a uniform manner. For example, the script to grab the headers starts as
# L-25 MCS 275 Wed 8 Mar 2017 : grabpyhead.py
"""
Grabs the header line of all .py programs.
When the file starts with '#!', the header
line is not the first but the second line.
Every file has four fields:
type, number, date, and file name.
"""
The format conversion results in
('L', '25', 'Wed 8 Mar 2017', 'grabpyhead.py')
To select only the files with extension .py
,
the function below helps to make the distinction.
import os
def has_py_ext(name):
"""
Returns True if the name ends in ".py",
returns False otherwise.
"""
try:
return name[-3:] == '.py'
except:
return False
To recognize the lines starting with `` \#!
the function start_path()
is applied.
def start_path(line):
"""
Returns True if the line appears to be
the path of the python interpreter,
returns False otherwise.
"""
try:
return line[0:2] == '#!'
except:
return False
To extract the information from the header, the function below splits a line into fields.
def split_fields(line):
"""
Returns a tuple with the fields of the header.
"""
L0 = line.split(' : ')
fname = L0[1]
L1 = L0[0].split(' MCS 275 ')
fdate = L1[1]
L2 = L1[0].split(' ')
L3 = L2[1].split('-')
ftype = L3[0]
fnumb = L3[1]
return (ftype, fnumb, fdate, fname)
The main function to enumerate all fields from the headers is below.
def enum_fields(d, fun):
"""
Enumerates all fields in the header of
the .py files in the directory d.
For each field, the function f is called.
Returns the number of .py files.
"""
L = os.listdir(d)
cnt = 0
for filename in L:
if has_py_ext(filename):
cnt = cnt + 1
file = open(filename, 'r')
line = file.readline()
if start_path(line):
line = file.readline()
try:
fun(split_fields(line[:-1])) # omit \n
except:
print('exception occurred with file ', filename)
file.close()
return cnt
The enum_fields
routine is an iterator.
We use a callback function in an iterator.
An iterator enumerates all items in a collection.
Actions on the items are of no concern of the iterator,
display items, select items, or ...
In good software design,
we write iterator separately with simple callback function
(e.g.: print
) to test, and
we make callback functions specific for applications.
The test callback function and its use in main()
is shown below.
def print_fields(data):
"""
Use as argument to test the enumerator.
"""
print(data)
def main():
"""
Prints the header of all .py files
in the current directory.
"""
nbr = enum_fields('.', print_fields)
print('counted %d .py files' % nbr)
if __name__ == "__main__":
main()
In using the pymysql
API (or MySQLdb
),
there are five things we have to remember:
import pymysql
- connect to the database:
<connection> = pymysql.connect(db="<dbname>")
- create a cursor object:
<cursor> = <connection>.cursor()
- execute
mysql
commands:<cursor>.execute(<command string>)
returns number of rows in the result - retrieve results of the queries:
<cursor>.fetchone()
returns a single row<cursor>.fetchall()
returns all rows<cursor>.rowcount
returns the number of rows
To convert the headers into data tuples,
suitables for insertion with SQL queries,
we have to convert the format of the date,
as done by the function my_data()
defined below.
import os
import pymysql
from grabpyhead import enum_fields
def my_date(data):
"""
Converts a string such as Wed 9 Mar 2016
into the format 2016-03-09.
"""
months = {"Jan":"01", "Feb":"02", "Mar":"03", \
"Apr":"04", "May":"05", "Jun":"06", \
"Jul":"07", "Aug":"08", "Sep":"09", \
"Oct":"10", "Nov":"11", "Dec":"12"}
vals = data.split(' ')
day = '%02d' % int(vals[1])
return vals[3] + '-' + months[vals[2]] + '-' + day
The my_data
function is applied in the function
which formulates the query.
def insert_data(cur, doit=False):
"""
Data is inserted into the database,
using the cursor cur.
"""
def insert(data):
"""
Uses the tuple data to insert into
the table scripts.
"""
cmd = 'insert into scripts values (' \
+ '\"' + data[0] + '\"' + ',' \
+ '\"' + data[1] + '\"' + ',' \
+ '\"' + my_date(data[2]) + '\"' + ',' \
+ '\"' + data[3] + '\"' + ');'
cur.execute(cmd)
Before the query is executed, we test the format of the queries first by printing each query command.
def insert_data(cur, doit=False):
"""
Data is inserted into the database,
using the cursor cur.
"""
def insert(data):
... code omitted ...
cur.execute(cmd)
if doit:
nbr = enum_fields('.', insert)
else:
nbr = enum_fields('.', print)
return nbr
The filling of the table scripts
, as done by the script filldb.py
is defined in the function main()
below.
The collection of scripts is in the folder ../MCS275py
.
def main():
"""
Prints the header of all .py files
in the current directory.
"""
pth = os.getcwd()
os.chdir('../MCS275py')
ourdb = pymysql.connect(db="OurPyFiles")
crs = ourdb.cursor()
ans = input("really do it ? (y/n) ")
nbr = insert_data(crs, ans == 'y')
print('inserted %d .py files' % nbr)
ourdb.commit()
os.chdir(pth)
We conclude with some example queries.
To sort the records in the table scripts
by date d
, we can do:
mysql> select * from scripts order by d;
The *
is a wild card,
the returned table contains all fields of scripts
.
To retrieve date and file name of all quiz scripts:
mysql> select d, f from scripts where t = "Q";
A script to view all rows is viewdbdata.py
,
with main function shown below.
import pymysql
def main():
"""
Executes a simple query to the database.
"""
db = pymysql.connect(db="OurPyFiles")
c = db.cursor()
q = 'select * from scripts order by d'
lc = c.execute(q)
print('found %d rows' % int(lc))
while True:
print(c.fetchone())
ans = input('see more ? (y/n) ')
if ans != 'y': break
For more on formulating queries in Python,
see http://www.python.org/doc/topics
.
Exercises¶
- Extend the script
filldb.py
so that it also recursively looks for.py
files in all subdirectories. - The database is not normalized because type, number,
and dates are redundant. Use
mysql
to select fromscripts
to create a tabletypedates
to store only data like"L-25"
and"8 Mar 2017"
. Select fromscripts
to create a tabletypefiles
to store"L-25"
and"viewdbdata.py"
. - Modify the Python script
filldb.py
so it fills the tablestypedates
andtypefiles
(defined in the previous exercise) while scanning the.py
files. - Extend the script
viewdbdata.py
asking for sorting directives. There are 32 possible orders.