Working with MySQL
==================

We introduce the processing of ``MySQL`` queries
with Python scripts.

MySQL basics
------------

If the ``MySQL`` is not started at boot time,
then we have to start the daemon manually, as sudoer.

::

   $ sudo mysqld_safe

   Starting mysqld daemon with databases 
   from /usr/local/mysql/data

To shut the MySQL server down, use ``mysqladmin`` 
as ``sudo mysqladmin shutdown``.

To create a database we use the ``create``
command with ``mysqladmin``.
For example: ``sudo mysqladmin create mydb``.
To delete the database we use the ``drop``
command with ``mysqladmin``.
For exmaple, to delete the database ``mydb``, we do

::

   $ sudo mysqladmin drop mydb
   Dropping the database is potentially a very bad thing to do.
   Any data stored in the database will be destroyed.

   Do you really want to drop the 'mydb' database [y/N] y
   Database "mydb" dropped

In the ``MySQL`` language, the client which makes requests
to the database server is called the monitor ``mysql``.
If proper permissions are set, then running ``mysql``
should not be done as superuser.

User administration is done via the monitor ``mysql``,
running as ``sudo mysql`` or a ``mysql -u root``.
To certain users we can give specific privileges
to certain databases and/or tables within a database.

First we have to create a user account in ``mysql``.
For example, for a user ``name`` on localhost:

::

   mysql> create user 'name'@'localhost';

There is the possibility to set a password.
To grant user with ``name`` all privileges on all databases:

::

   mysql> GRANT ALL PRIVILEGES ON *.* TO 'name'@'localhost';

Replacing the wild card ``*`` by specific names of databases
and/or tables allows to restrict the privileges to certain
databases and/or tables.

After the user administration, ``mysql`` runs under a regular
user account.

::

   $ mysql
   Welcome to the MySQL monitor.  Commands end with ; or \g.
   Your MySQL connection id is 1
   Server version: 5.5.29 MySQL Community Server (GPL)

   Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

   Oracle is a registered trademark of Oracle Corporation and/or its
   affiliates. Other names may be trademarks of their respective
   owners.

   Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

   mysql>

Leaving mysql, happens via the ``exit`` command.

::

   mysql> exit
   Bye

The command line usage of ``mysql``
is very useful to try single queries.
Careful programming avoids mistakes by
observing the following three rules.

1. First try a command before placing it into a script.

2. Have the script print the command.

3. Eexecute one printed command as a test.

There are not many ``mysql`` commands we need to know
to write useful programs.  The main commands to work with tables
are listed in :numref:`tabmysqltablecmds`.

.. _tabmysqltablecmds:

.. table:: MySQL commands for tables

   +----------------------------------------+----------------------------------+
   | command syntax                         | description                      |
   +========================================+==================================+
   | **use** ``<dbname>``                   | make database current            |
   +----------------------------------------+----------------------------------+
   | **show tables**                        | show tables in current database  |
   +----------------------------------------+----------------------------------+
   | **create table** ``<name> <field(s)>`` | create a table                   |
   +----------------------------------------+----------------------------------+
   | **explain** ``<name>``                 | display data types of all fields |
   +----------------------------------------+----------------------------------+
   | **drop table** ``<name>``              | delete a table                   |
   +----------------------------------------+----------------------------------+

To change a table, we use *queries*.
The command syntax for the most important queries 
is listed in :numref:`tabmysqlquerycmds`.

.. _tabmysqlquerycmds:

.. table:: MySQL commands for queries

   +------------------------------------------------+------------------+
   | command syntax                                 | description      |
   +================================================+==================+
   | **select** ``<field(s)>`` **from** ``<table>`` | retrieve records |
   +------------------------------------------------+------------------+
   | **insert into** ``<table> <values>``           | insert records   |
   +------------------------------------------------+------------------+
   | **delete from** ``<table>``                    | delete records   |
   +------------------------------------------------+------------------+
   | **update** ``<table>`` **set** ``<values>``    | update records   |
   +------------------------------------------------+------------------+

More involved queries have clauses, with added command syntax
**where** ``<criteria>`` **order by** ``<field>`` **ASC** | **DSC**.

The ``mysql`` commands terminate with a semicolon ``;``.

Database for Python scripts
---------------------------

Consider the problem of managing the Python scripts
we have studied in this course.

So far, there are over 70 Python scripts
posted at the course web site.
The scripts are listed chronologically
grouped along the lectures ...
but then, there are also the scripts 
for the projects and quizzes.

Our goal is to build a systematical catalog,
so we can sort the scripts in several ways.

The first step is to create a database and grant access
to its users.  The name of the database is ``OurPyFiles``
A database is created by the administrator ``mysqladmin``
executed as superuser, as
``sudo mysqladmin create OurPyFiles``

To grant access to users who are not sudoers,
we run ``mysql`` as a sudoer.

::

   $ sudo mysql
   mysql> GRANT ALL ON OurPyFiles.* TO 'jan'@'localhost';
   Query OK, 0 rows affected (0.05 sec)

   mysql> Bye

With the ``*`` in ``OurPyFiles.*`` we grant access to all tables
in the database ``OurPyFiles``.  As a test, we do

::

   $ mysql
   mysql> use OurPyFiles;
   Database changed

Observe the absence of the ``sudo`` in front of the ``mysql``.
This confirms that the database ``OurPyFiles`` exists,
even though it is empty, and that access is granted.

The database ``OurPyFiles`` will have one table: ``scripts``.
For every script, we have a type (L, P, or Q),
a number, a date, and a file name.

::

   mysql> use OurPyFiles
   Database changed
   mysql> create table scripts
       -> (t CHAR(1), n INT, d DATE, f CHAR(20));
   Query OK, 0 rows affected (0.00 sec)

As a good practice, the command ``explain`` is the next
command to run after the creation of a new table,
to check whether the structure of the table corresponds
with what we wanted.

::

   mysql> explain scripts;
   +-------+----------+------+-----+---------+-------+
   | Field | Type     | Null | Key | Default | Extra |
   +-------+----------+------+-----+---------+-------+
   | t     | char(1)  | YES  |     | NULL    |       |
   | n     | int(11)  | YES  |     | NULL    |       |
   | d     | date     | YES  |     | NULL    |       |
   | f     | char(20) | YES  |     | NULL    |       |
   +-------+----------+------+-----+---------+-------+
   4 rows in set (0.10 sec)

Some of the most common numeric data types
are listed in :numref:`tabmysqlnumerictypes`.

.. _tabmysqlnumerictypes:

.. table:: most common numeric types in MySQL

   +---------------+------------------------------------------+
   | numeric types | description                              |
   +===============+==========================================+
   | INT           | integer in :math:`[-2^{31}, 2^{31} - 1]` |
   +---------------+------------------------------------------+
   | SMALLINT      | 2-byte integer                           |
   +---------------+------------------------------------------+
   | FLOAT         | floating-point number                    |
   +---------------+------------------------------------------+

The data types for time are listed in :numref:`tabmysqltimetypes`.

.. _tabmysqltimetypes:

.. table:: types for date and time

   +----------------+-----------------------------------+
   | date and time  | description                       |
   +================+===================================+
   | YEAR           | year as ``yyyy``                  |
   +----------------+-----------------------------------+
   | DATE           | date in the format ``yyyy-mm-dd`` |
   +----------------+-----------------------------------+
   | TIME           | time in the format ``hh:mm:ss``   |
   +----------------+-----------------------------------+
   | DATETIME       | ``yyyy-mm-dd hh:mm:ss``           |
   +----------------+-----------------------------------+
   | TIMESTAMP      | date expressed in seconds         |
   +----------------+-----------------------------------+

We distinguish between strings of fixed length
and strings of unlimited length,
as listed in :numref:`tabmysqlstringtypes`.

.. _tabmysqlstringtypes:

.. table:: strings for fixed and unlimited length
   
   +----------------+-----------------------------+
   | string type    | description                 |
   +================+=============================+
   | CHAR(``size``) | string of length ``size``   |
   +----------------+-----------------------------+
   | TEXT           | strings of unlimited length |
   +----------------+-----------------------------+

Returning to our ``mysql`` session,
we have created a table ``scripts`` in the database ``OurPyFiles``.

::

   mysql> use OurPyFiles;
   mysql> show tables;
   +----------------------+
   | Tables_in_ourpyfiles |
   +----------------------+
   | scripts              | 
   +----------------------+
   1 row in set (0.00 sec)


::

   mysql> explain scripts;
   +-------+----------+------+-----+---------+-------+
   | Field | Type     | Null | Key | Default | Extra |
   +-------+----------+------+-----+---------+-------+
   | t     | char(1)  | YES  |     | NULL    |       | 
   | n     | int(11)  | YES  |     | NULL    |       | 
   | d     | date     | YES  |     | NULL    |       | 
   | f     | char(20) | YES  |     | NULL    |       | 
   +-------+----------+------+-----+---------+-------+
   4 rows in set (0.00 sec)

Using MySQLdb or pymysql
------------------------

The table will contain over 80 records.
Entering the data manually is tedious and may lead to errors.
Therefore, we use ``MySQLdb``, or ``pymysql`` with Python 3.6

::

   $ python3
   Python 3.6.0 (v3.6.0:41df79263a11, Dec 22 2016, 17:23:13) 
   [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import pymysql

There are two tasks:

1. grab the headers of each .py file;

2. insert header fields into table.

All Python scripts are in some directory on disk.
With the ``os`` module we write utilities
which are independent of the operating system.
The commands we will use are

* ``os.listdir(<directory>)`` returns a list of strings of names 
  of files and directories in ``<directory>``;

* ``os.getcwd()`` returns the path name
  of the current working directory;

* ``os.chdir(<path name>)`` changes the current working directory 
  to the ``<path name>``.

The script to grab the headers assumes that
every script is documented in a uniform manner.
For example, the script to grab the headers starts as

::

   # L-25 MCS 275 Wed 8 Mar 2017 : grabpyhead.py
   """ 
   Grabs the header line of all .py programs.
   When the file starts with '#!', the header
   line is not the first but the second line.
   Every file has four fields:
   type, number, date, and file name.
   """

The format conversion results in

::

   ('L', '25', 'Wed 8 Mar 2017', 'grabpyhead.py')

To select only the files with extension ``.py``,
the function below helps to make the distinction.

::

   import os

   def has_py_ext(name):
       """
       Returns True if the name ends in ".py",
       returns False otherwise.
       """
       try:
           return name[-3:] == '.py'
       except:
           return False

To recognize the lines starting with `` ``\#!``
the function ``start_path()`` is applied.

::

   def start_path(line):
       """
       Returns True if the line appears to be
       the path of the python interpreter,
       returns False otherwise.
       """
       try:
           return line[0:2] == '#!'
       except:
           return False

To extract the information from the header,
the function below splits a line into fields.

::

   def split_fields(line):
       """
       Returns a tuple with the fields of the header.
       """
       L0 = line.split(' : ')
       fname = L0[1]
       L1 = L0[0].split(' MCS 275 ')
       fdate = L1[1]
       L2 = L1[0].split(' ')
       L3 = L2[1].split('-')
       ftype = L3[0]
       fnumb = L3[1]
       return (ftype, fnumb, fdate, fname)

The main function to enumerate all fields
from the headers is below.

::

   def enum_fields(d, fun):
       """
       Enumerates all fields in the header of
       the .py files in the directory d.
       For each field, the function f is called.
       Returns the number of .py files.
       """
       L = os.listdir(d)
       cnt = 0
       for filename in L:
           if has_py_ext(filename):
               cnt = cnt + 1
               file = open(filename, 'r')
               line = file.readline()
               if start_path(line):
                   line = file.readline()
               try:
                   fun(split_fields(line[:-1])) # omit \n
               except:
                   print('exception occurred with file ', filename)
           file.close()
       return cnt

The ``enum_fields`` routine is an *iterator*.
We use a callback function in an iterator.
An iterator enumerates all items in a collection.
Actions on the items are of no concern of the iterator,
display items, select items, or ...

In good software design,
we write iterator separately with simple callback function
(e.g.: ``print``) to test, and
we make callback functions specific for applications.

The test callback function and its use in ``main()`` is shown below.

::

   def print_fields(data):
       """
       Use as argument to test the enumerator.
       """
       print(data)

   def main():
       """
       Prints the header of all .py files
       in the current directory.
       """
       nbr = enum_fields('.', print_fields)
       print('counted %d .py files' % nbr)

   if __name__ == "__main__":
       main()

In using the ``pymysql`` API (or ``MySQLdb``),
there are five things we have to remember:

1. ``import pymysql``

2. connect to the database:
   ``<connection> = pymysql.connect(db="<dbname>")``

3. create a cursor object: ``<cursor> = <connection>.cursor()``

4. execute ``mysql`` commands:
   ``<cursor>.execute(<command string>)``
   returns number of rows in the result

5. retrieve results of the queries:

   * ``<cursor>.fetchone()`` returns a single row

   * ``<cursor>.fetchall()`` returns all rows

   * ``<cursor>.rowcount`` returns the number of rows

To convert the headers into data tuples,
suitables for insertion with SQL queries,
we have to convert the format of the date,
as done by the function ``my_data()`` defined below.

::

   import os
   import pymysql
   from grabpyhead import enum_fields

   def my_date(data):
       """
       Converts a string such as Wed 9 Mar 2016
       into the format 2016-03-09.
       """
       months = {"Jan":"01", "Feb":"02", "Mar":"03", \
          "Apr":"04", "May":"05", "Jun":"06", \
          "Jul":"07", "Aug":"08", "Sep":"09", \
          "Oct":"10", "Nov":"11", "Dec":"12"}
       vals = data.split(' ')
       day = '%02d' % int(vals[1])
       return vals[3] + '-' + months[vals[2]] + '-' + day

The ``my_data`` function is applied in the function
which formulates the query.

::

   def insert_data(cur, doit=False):
       """
       Data is inserted into the database,
       using the cursor cur.
       """
       def insert(data):
           """
           Uses the tuple data to insert into
           the table scripts.
           """
           cmd = 'insert into scripts values (' \
               + '\"' + data[0] + '\"' + ',' \
               + '\"' + data[1] + '\"' + ',' \
               + '\"' + my_date(data[2]) + '\"' + ',' \
               + '\"' + data[3] + '\"' + ');'
           cur.execute(cmd)

Before the query is executed, we test the format of the queries
first by printing each query command.

::

   def insert_data(cur, doit=False):
       """
       Data is inserted into the database,
       using the cursor cur.
       """
       def insert(data):
           ... code omitted ...
           cur.execute(cmd)

       if doit:
           nbr = enum_fields('.', insert)
       else:
           nbr = enum_fields('.', print)
       return nbr

The filling of the table ``scripts``, as done by the script ``filldb.py``
is defined in the function ``main()`` below.
The collection of scripts is in the folder ``../MCS275py``.

::

   def main():
       """
       Prints the header of all .py files
       in the current directory.
       """
       pth = os.getcwd()
       os.chdir('../MCS275py')
       ourdb = pymysql.connect(db="OurPyFiles")
       crs = ourdb.cursor()
       ans = input("really do it ? (y/n) ")
       nbr = insert_data(crs, ans == 'y')
       print('inserted %d .py files' % nbr)
       ourdb.commit()
       os.chdir(pth)

We conclude with some example queries.
To sort the records in the table ``scripts``
by date ``d``, we can do:

::

   mysql> select * from scripts order by d;

The ``*`` is a wild card,
the returned table contains all fields of ``scripts``.

To retrieve date and file name of all quiz scripts:

::

   mysql> select d, f from scripts where t = "Q";

A script to view all rows is ``viewdbdata.py``,
with main function shown below.

::

   import pymysql

   def main():
       """
       Executes a simple query to the database.
       """
       db = pymysql.connect(db="OurPyFiles")
       c = db.cursor() 
       q = 'select * from scripts order by d'
       lc = c.execute(q)
       print('found %d rows' % int(lc))
       while True:
            print(c.fetchone())
            ans = input('see more ? (y/n) ')
            if ans != 'y': break

For more on formulating queries in Python,
see ``http://www.python.org/doc/topics``.

Exercises
---------

1. Extend the script ``filldb.py`` so that it also
   recursively looks for ``.py`` files in all subdirectories.

2. The database is not normalized because type, number,
   and dates are redundant.   Use ``mysql`` to
   select from ``scripts`` to create a table ``typedates``
   to store only data like ``"L-25"`` and ``"8 Mar 2017"``.
   Select from ``scripts`` to create a table
   ``typefiles`` to store ``"L-25"`` and ``"viewdbdata.py"``.

3. Modify the Python script ``filldb.py`` so it fills
   the tables ``typedates`` and ``typefiles`` (defined
   in the previous exercise) while scanning the ``.py`` files.

4. Extend the script ``viewdbdata.py`` asking for
   sorting directives.  There are 32 possible orders.