Working with MySQL ================== We introduce the processing of ``MySQL`` queries with Python scripts. MySQL basics ------------ If the ``MySQL`` is not started at boot time, then we have to start the daemon manually, as sudoer. :: $ sudo mysqld_safe Starting mysqld daemon with databases from /usr/local/mysql/data To shut the MySQL server down, use ``mysqladmin`` as ``sudo mysqladmin shutdown``. To create a database we use the ``create`` command with ``mysqladmin``. For example: ``sudo mysqladmin create mydb``. To delete the database we use the ``drop`` command with ``mysqladmin``. For exmaple, to delete the database ``mydb``, we do :: $ sudo mysqladmin drop mydb Dropping the database is potentially a very bad thing to do. Any data stored in the database will be destroyed. Do you really want to drop the 'mydb' database [y/N] y Database "mydb" dropped In the ``MySQL`` language, the client which makes requests to the database server is called the monitor ``mysql``. If proper permissions are set, then running ``mysql`` should not be done as superuser. User administration is done via the monitor ``mysql``, running as ``sudo mysql`` or a ``mysql -u root``. To certain users we can give specific privileges to certain databases and/or tables within a database. First we have to create a user account in ``mysql``. For example, for a user ``name`` on localhost: :: mysql> create user 'name'@'localhost'; There is the possibility to set a password. To grant user with ``name`` all privileges on all databases: :: mysql> GRANT ALL PRIVILEGES ON *.* TO 'name'@'localhost'; Replacing the wild card ``*`` by specific names of databases and/or tables allows to restrict the privileges to certain databases and/or tables. After the user administration, ``mysql`` runs under a regular user account. :: $ mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.5.29 MySQL Community Server (GPL) Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> Leaving mysql, happens via the ``exit`` command. :: mysql> exit Bye The command line usage of ``mysql`` is very useful to try single queries. Careful programming avoids mistakes by observing the following three rules. 1. First try a command before placing it into a script. 2. Have the script print the command. 3. Eexecute one printed command as a test. There are not many ``mysql`` commands we need to know to write useful programs. The main commands to work with tables are listed in :numref:`tabmysqltablecmds`. .. _tabmysqltablecmds: .. table:: MySQL commands for tables +----------------------------------------+----------------------------------+ | command syntax | description | +========================================+==================================+ | **use** ```` | make database current | +----------------------------------------+----------------------------------+ | **show tables** | show tables in current database | +----------------------------------------+----------------------------------+ | **create table** `` `` | create a table | +----------------------------------------+----------------------------------+ | **explain** ```` | display data types of all fields | +----------------------------------------+----------------------------------+ | **drop table** ```` | delete a table | +----------------------------------------+----------------------------------+ To change a table, we use *queries*. The command syntax for the most important queries is listed in :numref:`tabmysqlquerycmds`. .. _tabmysqlquerycmds: .. table:: MySQL commands for queries +------------------------------------------------+------------------+ | command syntax | description | +================================================+==================+ | **select** ```` **from** ```` | retrieve records | +------------------------------------------------+------------------+ | **insert into** ``
`` | insert records | +------------------------------------------------+------------------+ | **delete from** ``
`` | delete records | +------------------------------------------------+------------------+ | **update** ``
`` **set** ```` | update records | +------------------------------------------------+------------------+ More involved queries have clauses, with added command syntax **where** ```` **order by** ```` **ASC** | **DSC**. The ``mysql`` commands terminate with a semicolon ``;``. Database for Python scripts --------------------------- Consider the problem of managing the Python scripts we have studied in this course. So far, there are over 70 Python scripts posted at the course web site. The scripts are listed chronologically grouped along the lectures ... but then, there are also the scripts for the projects and quizzes. Our goal is to build a systematical catalog, so we can sort the scripts in several ways. The first step is to create a database and grant access to its users. The name of the database is ``OurPyFiles`` A database is created by the administrator ``mysqladmin`` executed as superuser, as ``sudo mysqladmin create OurPyFiles`` To grant access to users who are not sudoers, we run ``mysql`` as a sudoer. :: $ sudo mysql mysql> GRANT ALL ON OurPyFiles.* TO 'jan'@'localhost'; Query OK, 0 rows affected (0.05 sec) mysql> Bye With the ``*`` in ``OurPyFiles.*`` we grant access to all tables in the database ``OurPyFiles``. As a test, we do :: $ mysql mysql> use OurPyFiles; Database changed Observe the absence of the ``sudo`` in front of the ``mysql``. This confirms that the database ``OurPyFiles`` exists, even though it is empty, and that access is granted. The database ``OurPyFiles`` will have one table: ``scripts``. For every script, we have a type (L, P, or Q), a number, a date, and a file name. :: mysql> use OurPyFiles Database changed mysql> create table scripts -> (t CHAR(1), n INT, d DATE, f CHAR(20)); Query OK, 0 rows affected (0.00 sec) As a good practice, the command ``explain`` is the next command to run after the creation of a new table, to check whether the structure of the table corresponds with what we wanted. :: mysql> explain scripts; +-------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+----------+------+-----+---------+-------+ | t | char(1) | YES | | NULL | | | n | int(11) | YES | | NULL | | | d | date | YES | | NULL | | | f | char(20) | YES | | NULL | | +-------+----------+------+-----+---------+-------+ 4 rows in set (0.10 sec) Some of the most common numeric data types are listed in :numref:`tabmysqlnumerictypes`. .. _tabmysqlnumerictypes: .. table:: most common numeric types in MySQL +---------------+------------------------------------------+ | numeric types | description | +===============+==========================================+ | INT | integer in :math:`[-2^{31}, 2^{31} - 1]` | +---------------+------------------------------------------+ | SMALLINT | 2-byte integer | +---------------+------------------------------------------+ | FLOAT | floating-point number | +---------------+------------------------------------------+ The data types for time are listed in :numref:`tabmysqltimetypes`. .. _tabmysqltimetypes: .. table:: types for date and time +----------------+-----------------------------------+ | date and time | description | +================+===================================+ | YEAR | year as ``yyyy`` | +----------------+-----------------------------------+ | DATE | date in the format ``yyyy-mm-dd`` | +----------------+-----------------------------------+ | TIME | time in the format ``hh:mm:ss`` | +----------------+-----------------------------------+ | DATETIME | ``yyyy-mm-dd hh:mm:ss`` | +----------------+-----------------------------------+ | TIMESTAMP | date expressed in seconds | +----------------+-----------------------------------+ We distinguish between strings of fixed length and strings of unlimited length, as listed in :numref:`tabmysqlstringtypes`. .. _tabmysqlstringtypes: .. table:: strings for fixed and unlimited length +----------------+-----------------------------+ | string type | description | +================+=============================+ | CHAR(``size``) | string of length ``size`` | +----------------+-----------------------------+ | TEXT | strings of unlimited length | +----------------+-----------------------------+ Returning to our ``mysql`` session, we have created a table ``scripts`` in the database ``OurPyFiles``. :: mysql> use OurPyFiles; mysql> show tables; +----------------------+ | Tables_in_ourpyfiles | +----------------------+ | scripts | +----------------------+ 1 row in set (0.00 sec) :: mysql> explain scripts; +-------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+----------+------+-----+---------+-------+ | t | char(1) | YES | | NULL | | | n | int(11) | YES | | NULL | | | d | date | YES | | NULL | | | f | char(20) | YES | | NULL | | +-------+----------+------+-----+---------+-------+ 4 rows in set (0.00 sec) Using MySQLdb or pymysql ------------------------ The table will contain over 80 records. Entering the data manually is tedious and may lead to errors. Therefore, we use ``MySQLdb``, or ``pymysql`` with Python 3.6 :: $ python3 Python 3.6.0 (v3.6.0:41df79263a11, Dec 22 2016, 17:23:13) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pymysql There are two tasks: 1. grab the headers of each .py file; 2. insert header fields into table. All Python scripts are in some directory on disk. With the ``os`` module we write utilities which are independent of the operating system. The commands we will use are * ``os.listdir()`` returns a list of strings of names of files and directories in ````; * ``os.getcwd()`` returns the path name of the current working directory; * ``os.chdir()`` changes the current working directory to the ````. The script to grab the headers assumes that every script is documented in a uniform manner. For example, the script to grab the headers starts as :: # L-25 MCS 275 Wed 8 Mar 2017 : grabpyhead.py """ Grabs the header line of all .py programs. When the file starts with '#!', the header line is not the first but the second line. Every file has four fields: type, number, date, and file name. """ The format conversion results in :: ('L', '25', 'Wed 8 Mar 2017', 'grabpyhead.py') To select only the files with extension ``.py``, the function below helps to make the distinction. :: import os def has_py_ext(name): """ Returns True if the name ends in ".py", returns False otherwise. """ try: return name[-3:] == '.py' except: return False To recognize the lines starting with `` ``\#!`` the function ``start_path()`` is applied. :: def start_path(line): """ Returns True if the line appears to be the path of the python interpreter, returns False otherwise. """ try: return line[0:2] == '#!' except: return False To extract the information from the header, the function below splits a line into fields. :: def split_fields(line): """ Returns a tuple with the fields of the header. """ L0 = line.split(' : ') fname = L0[1] L1 = L0[0].split(' MCS 275 ') fdate = L1[1] L2 = L1[0].split(' ') L3 = L2[1].split('-') ftype = L3[0] fnumb = L3[1] return (ftype, fnumb, fdate, fname) The main function to enumerate all fields from the headers is below. :: def enum_fields(d, fun): """ Enumerates all fields in the header of the .py files in the directory d. For each field, the function f is called. Returns the number of .py files. """ L = os.listdir(d) cnt = 0 for filename in L: if has_py_ext(filename): cnt = cnt + 1 file = open(filename, 'r') line = file.readline() if start_path(line): line = file.readline() try: fun(split_fields(line[:-1])) # omit \n except: print('exception occurred with file ', filename) file.close() return cnt The ``enum_fields`` routine is an *iterator*. We use a callback function in an iterator. An iterator enumerates all items in a collection. Actions on the items are of no concern of the iterator, display items, select items, or ... In good software design, we write iterator separately with simple callback function (e.g.: ``print``) to test, and we make callback functions specific for applications. The test callback function and its use in ``main()`` is shown below. :: def print_fields(data): """ Use as argument to test the enumerator. """ print(data) def main(): """ Prints the header of all .py files in the current directory. """ nbr = enum_fields('.', print_fields) print('counted %d .py files' % nbr) if __name__ == "__main__": main() In using the ``pymysql`` API (or ``MySQLdb``), there are five things we have to remember: 1. ``import pymysql`` 2. connect to the database: `` = pymysql.connect(db="")`` 3. create a cursor object: `` = .cursor()`` 4. execute ``mysql`` commands: ``.execute()`` returns number of rows in the result 5. retrieve results of the queries: * ``.fetchone()`` returns a single row * ``.fetchall()`` returns all rows * ``.rowcount`` returns the number of rows To convert the headers into data tuples, suitables for insertion with SQL queries, we have to convert the format of the date, as done by the function ``my_data()`` defined below. :: import os import pymysql from grabpyhead import enum_fields def my_date(data): """ Converts a string such as Wed 9 Mar 2016 into the format 2016-03-09. """ months = {"Jan":"01", "Feb":"02", "Mar":"03", \ "Apr":"04", "May":"05", "Jun":"06", \ "Jul":"07", "Aug":"08", "Sep":"09", \ "Oct":"10", "Nov":"11", "Dec":"12"} vals = data.split(' ') day = '%02d' % int(vals[1]) return vals[3] + '-' + months[vals[2]] + '-' + day The ``my_data`` function is applied in the function which formulates the query. :: def insert_data(cur, doit=False): """ Data is inserted into the database, using the cursor cur. """ def insert(data): """ Uses the tuple data to insert into the table scripts. """ cmd = 'insert into scripts values (' \ + '\"' + data[0] + '\"' + ',' \ + '\"' + data[1] + '\"' + ',' \ + '\"' + my_date(data[2]) + '\"' + ',' \ + '\"' + data[3] + '\"' + ');' cur.execute(cmd) Before the query is executed, we test the format of the queries first by printing each query command. :: def insert_data(cur, doit=False): """ Data is inserted into the database, using the cursor cur. """ def insert(data): ... code omitted ... cur.execute(cmd) if doit: nbr = enum_fields('.', insert) else: nbr = enum_fields('.', print) return nbr The filling of the table ``scripts``, as done by the script ``filldb.py`` is defined in the function ``main()`` below. The collection of scripts is in the folder ``../MCS275py``. :: def main(): """ Prints the header of all .py files in the current directory. """ pth = os.getcwd() os.chdir('../MCS275py') ourdb = pymysql.connect(db="OurPyFiles") crs = ourdb.cursor() ans = input("really do it ? (y/n) ") nbr = insert_data(crs, ans == 'y') print('inserted %d .py files' % nbr) ourdb.commit() os.chdir(pth) We conclude with some example queries. To sort the records in the table ``scripts`` by date ``d``, we can do: :: mysql> select * from scripts order by d; The ``*`` is a wild card, the returned table contains all fields of ``scripts``. To retrieve date and file name of all quiz scripts: :: mysql> select d, f from scripts where t = "Q"; A script to view all rows is ``viewdbdata.py``, with main function shown below. :: import pymysql def main(): """ Executes a simple query to the database. """ db = pymysql.connect(db="OurPyFiles") c = db.cursor() q = 'select * from scripts order by d' lc = c.execute(q) print('found %d rows' % int(lc)) while True: print(c.fetchone()) ans = input('see more ? (y/n) ') if ans != 'y': break For more on formulating queries in Python, see ``http://www.python.org/doc/topics``. Exercises --------- 1. Extend the script ``filldb.py`` so that it also recursively looks for ``.py`` files in all subdirectories. 2. The database is not normalized because type, number, and dates are redundant. Use ``mysql`` to select from ``scripts`` to create a table ``typedates`` to store only data like ``"L-25"`` and ``"8 Mar 2017"``. Select from ``scripts`` to create a table ``typefiles`` to store ``"L-25"`` and ``"viewdbdata.py"``. 3. Modify the Python script ``filldb.py`` so it fills the tables ``typedates`` and ``typefiles`` (defined in the previous exercise) while scanning the ``.py`` files. 4. Extend the script ``viewdbdata.py`` asking for sorting directives. There are 32 possible orders.