MCS 548 - Mathematical Theory of Artificial Intelligence
University of Illinois - Chicago
Fall 2014

MCS 548 somewhat differs in topics each time it is offered. This semester, the focus will be on foundations of machine learning theory. Example topics include inductive inference, query learning, PAC learning and VC theory, Occam's razor, online learning, boosting, support vector machines, bandit algorithms, statistical queries, and Rademiacher complexity.

This course is represented on the computer science prelim.

Basic Information

Syllabus: pdf
Time and Location: T-R 11:00am - 12:15am, Behavioral Sciences Building (BSB) 137
Instructor Contact Information: Lev Reyzin, SEO 713, (312)-413-9576,
Required Textbook: Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms
Online Textbook: Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning (available online via UIC library)
Office Hours: T 3:00PM-4:00PM, F 11:00AM-12:00PM


The in-class final exam will be held on Thursday December 4, 2014 at 11:00am - 12:15pm.

Projects and Presentations

choosing a topic
presentation schedule
final project reports, due 12/08/14 by 11:00am

Problem Sets

problem set 1: pdf, due 9/30/14 10/2/14
problem set 2: pdf, due 10/28/14
problem set 3: pdf, due 12/02/14

Lectures and Readings

Note: lectures will have material not covered in the readings.

Lecture 1 (8/26/14)
covered material: intro to the course, preview of learning models, beginning inductive inference
reading: section 7 of Computing Machinery and Intelligence by Turing (1950)

Lecture 2 (8/28/14)
covered material: positive and negative results for learning in limit from text and informant
reading: Language Idenification in the Limit by Gold (1967)
optional reading: 13.4.1 of Mohri et al.

Lecture 3 (9/2/14)
covered material: membership and equivalence queries, L* algorithm for exact learning of regular languages
reading: 13.3.3 of Mohri et al.
optional reading: Learning Regular Sets from Queries and Counterexamples by Angluin (1987)

Lecture 4 (9/4/14)
covered material: introduction to probably approximately correct (PAC) learning
reading: 3.1 of Shalev-Shwartz & Ben-David; and A Theory of the Learnable by Valiant (1984)
optional reading: Learning from Noisy Examples by Angluin and Laird (1988)

Lecture 5 (9/9/14)
covered material: efficient MQ+EQ --> efficient PAC + MQ, PAC learning axis aligned rectangles, variants of PAC
reading: 3.2 of Shalev-Shwartz & Ben-David; and 2.1 and 13.3.2 of Mohri et al.
optional reading: Queries and Concept Learning by Angluin (1988)

Lecture 6 (9/11/14)
covered material: Hoeffding's inequality, PAC sample complexity for finite classes (non-realizable bound), VC dimension
reading: 2.3.1 of Shalev-Shwartz & Ben-David; and 2.3 and 3.2 of Mohri et al.
optional reading: Occam's Razor by Blumer et al. (1986)

Lecture 7 (9/16/14)
covered material: Sauer-Shelah lemma, VC bounds for PAC, sample complexity for realizable PAC
reading: 6.1-6.5 of Shalev-Shwartz & Ben-David; and 2.2, 3.3, and 3.4 of Mohri et al.

Lecture 8 (9/18/14)
covered material: weak learning, boosting and equivalence to strong learning
reading: 10.1-10.2 of Shalev-Shwartz & Ben-David; or 6.1-6.2.1 and 6.3.1 of Mohri et al.
other: problem set 1 assigned

Lecture 9 (9/23/14)
covered material: game-theoretic view of boosting via the minimax theorem, relationship between edges and margins
reading: 6.3.4 of Mohri et al.
optional reading: The Boosting Approach to Machine Learning: An Overview by Schapire (2003)

Lecture 10 (9/30/14)
covered material: boosting margin bound, intro to support vector machines and primal optimization for separable case
reading: 6.3.2; and 4.2.1 of Mohri et al. or up to 15.1.1 of Shalev-Shwartz & Ben-David
optional reading: How Boosting the Margin Can Also Boost Classifier Complexity by Reyzin and Schapire (2006)

Lecture 11 (10/2/14)
covered material: Lagrange duality (and KKT theorem), support vectors, and dual optimization problem for SVM
reading: 4.2.2, 4.2.3, and B.3 of Mohri et al.

Lecture 12 (10/7/14)
covered material: non separable SVM primal and dual formulations, hinge and square losses, SVM margin bounds
reading: 15.2 of Shalev-Shwartz & Ben-David; 4.3 and Thm 4.2 and Cor 4.1 of Mohri et al.
optional reading: Support-Vector Networks by Cortes and Vapnik (1995)

Lecture 13 (10/9/14)
covered material: SVM leave-one-out bound; kernal Hilbert space and kernalized SVM, polynomial and Gaussian kernals
reading: 16.1-16.2 of Shalev-Shwartz & Ben-David; or 5.1-5.3 of Mohri et al.
optional reading: SVMs and Kernel Methods: The New Generation of Learning Machines by Cristianini and Schölkopf (2002)

Lecture 14 (10/14/14)
covered material: parity functions, the statistical query (SQ) model, relationship to PAC
reading: Efficient Noise-Tolerant Learning from Statistical Queries by Kearns (1998)

Lecture 15 (10/16/14)
covered material: SQ learning implies noisy PAC, SQ dimension, SQ lower bounds, and correlational SQ
reading: section 3 of Characterizing SQ Learning: Simplified Notions and Proofs by Szörényi (2009)
optional reading: Weakly Learning DNF and Characterizing SQ Learning using Fourier Analysis by Blum et al. (1994)

Lecture 16 (10/21/14)
covered material: intro to online learning, halving algorithm, weighted majority algorithm
reading 21.1 of Shalev-Shwartz & Ben-David; or 7.1.2-7.2.2 of Mohri et al.
optional reading The Weighted Majority Algorithm by Littlestone and Warmuth (1994)

Lecture 17 (10/23/14)
covered material: randomized weighted majority, doubling trick, online-to-batch conversions
reading: 21.2 of Shalev-Shwartz & Ben-David; or 7.2.3 of Morhi. et al.
other: begin browsing COLT/ALT and STOC/FOCS/SODA/ITCS for project ideas

Lecture 18 (10/28/14)
covered material: exponential weighted majority, perceptron: mistake bound, dual, and kernalization
reading: 7.2.4 of Mohri et al.; and Shalev-Shwartz & Ben-David or 7.3.1 of Mohri et al.
optional reading: The Perceptron: A Probabilistic Model for Information Storage & Organization in the Brain. by Rosenblatt (1958)

Lecture 19 (11/4/14)
covered material: the winnow algorithm, feature-efficient learning, application to disjunctions
reading: 7.3.2 of Mohri et al.
other: handout on choosing a topic for projects and presentations

Lecture 20 (11/11/14)
covered material: the bandit setting, explore-exploit tradeoff, importance sampling, begin EXP3
reading: sections 1-3 of The Nonstochastic Multiarmed Bandit Problem by Auer et al. (2002)

Lecture 21 (11/13/14)
covered material: finish analysis of EXP3, discussion of EXP4 and high probability variants
reading: sections 1-2 of Contextual Bandit Algorithms with Supervised Learning Guarantees by Beygelzimer et al. (2011)
optional reading: Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits by Agarwal et al. (2014)

Lecture 22 (11/18/14)
covered material: Rademacher complexity, generalization bounds, Massart's lemma
reading: 3.1 and 3.2 of Mohri et al.; or 26.1 of Shalev-Shwartz & Ben-David
optional reading: Rademacher Penalties and Structural Risk Minization by Koltchinskii (2001)

Lectures 23 - 25 (11/20/14, 11/25/14, 12/2/14)
covered material: student presentations on advanced topics (see the schedule)
other: problem set 3 given out on 11/20/14 and project guidelines given out on 11/25/14