MCS 548 - Mathematical Theory of Artificial Intelligence
University of Illinois - Chicago
Fall 2018

This class will focus on the foundations of machine learning theory. Example topics include inductive inference, query learning, PAC learning and VC theory, Occam's razor, online learning, boosting, support vector machines, bandit algorithms, statistical queries, Rademiacher complexity, and neural networks.

This course is represented on the computer science prelim.

Basic Information

Syllabus: pdf
Time and Location: M-W-F 1:00pm - 1:50pm, Lincoln Hall (LH) 104
Instructor Contact Information: Lev Reyzin, SEO 418, (312)-413-3745,
Required Textbook: Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning (available online via UIC library)
Optional Textbook: Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms
Office Hours: M 10:00-10:50am, W 11:00-11:50am

Exam Dates

Final Exam: Monday December 10, 1:00-3:00pm in LH 104

Projects and Presentations

choosing a project topic due 11/5/18, final reports due 12/14/18
presentation schedule

Problem Sets

problem set 1 due 10/5/18
problem set 2 due 10/31/18
problem set 3 due 11/21/18

Lectures and Readings

Note: lectures will have material not covered in the readings.

Lecture 1 (8/27/18)
covered material: intro to the course, preview of learning models
reading: section 7 of Computing Machinery and Intelligence by Turing (1950)

Lecture 2 (8/29/18)
covered material: positive and negative results for learning in limit from text and informant
reading: Language Idenification in the Limit by Gold (1967)
optional reading: Theorem 13.5 from 13.4.1 of Mohri et al.

Lecture 3 (8/31/18)
covered material: efficient exact learning, membership and equivalence queries, L* algorithm of exact learning of regular languages
reading: 13.3.3 Mohri et al.
next: continuing with L*

Lecture 4 (9/5/18)
covered material: continuing L* algorithm, relating EQ and MQ with learning in the limit
reading: 13.1 and 13.2 of Mohri et al.
optional reading: Learning Regular Sets from Queries and Counterexamples by Angluin (1987)
next: PAC (section 2.1)

Lecture 5 (9/7/18)
covered meterial: PAC learning of axis-aligned regtangles; PAC guarantees for finite realizable case
reading: 2.1 and 2.2 of Mohri et al.; A Theory of the Learnable by Valiant (1984)
optional reading: Occam's Razor by Blumer et al. (1986)

Lecture 6 (9/8/18)
covered material: PAC guarantees for inconsistent and agnostic learning reduction from query learning to PAC + MQ
reading: 2.3, 2.4.1, and 13.3.2 of Mohri et al.

Lecture 7 (9/14/18)
covered material: introduction to Rademacher complexity
reading beginning of 3.1

Lecture 8 (9/17/18)
covered material: Rademacher generalization bounds
reading finish 3.1
optional reading: Rademacher Penalties and Structural Risk Minization by Koltchinskii (2001)

Lecture 9 (9/21/18)
covered material: the growth function, VC-dimension, Sauer's lemma, VC generalization bounds
reading: 3.2 and 3.3 of Mohri et al.
other: problem set 1 assigned

Lecture 10 (9/24/18)
covered material: weak learning, boosting and equivalence to strong learning
reading: 6.1 and 6.2 of Mohri et al.
other: slides from lecture

Lecture 11 (9/26/18)
covered material: margins and the margin bound for boosting
reading: 6.3.1 and 6.3.2 of Mohri et al.
optional reading: How Boosting the Margin Can Also Boost Classifier Complexity by Reyzin and Schapire (2006)

Lecture 12 (9/28/18)
covered material:the statistical query (SQ) model, relationship to PAC and noicy PAC
reading: Efficient Noise-Tolerant Learning from Statistical Queries by Kearns (1998)
other: slides from lecture

Lecture 13 (10/1/18)
covered material: correlational and honest SQs, SQ dimension and associated lower bounds
reading: section 3 of Characterizing SQ Learning: Simplified Notions and Proofs by Szörényi (2009)
optional reading: Weakly Learning DNF and Characterizing SQ Learning using Fourier Analysis by Blum et al. (1994)

Lecture 14 (10/3/18)
covered material: relationship between SQ, the noisy parity problem (LPN), and learning
optional reading: Noise-Tolerant Learning, the Parity Problem, and the Statistical Query Model

Lecture 15 (10/5/18)
covered material: applications of statistical queries to optimization, evolvability, and differential privacy
optional reading: Statistical Algorithms and a Lower Bound for Detecting Planted Cliques

Lecture 16 (10/8/18)
covered material: intro to support vector machines, the primal optimization for separable case
reading: 4.1 and 4.2.1 of Mohri et al.

Lecture 17 (10/10/18)
covered material: Lagrange duality (and KKT theorem), support vectors, and dual optimization problem for SVM
reading: 4.2.2, 4.2.3, and B.3 of Mohri et al.

Lecture 18 (10/12/18)
covered material: leave-one out bounds, non separable SVM primal and dual, hinge and square hinge losses
reading: 4.3 of Mohri et al.
optional reading: Support-Vector Networks by Cortes and Vapnik (1995)

Lecture 19 (10/15/18)
covered material: SVM margin bounds, intro to kernelized SVM
reading: 4.4 of Mohri et al. (Rademacher bounds optional)

Lecture 20 (10/17/18)
covered material: PDS Kernels, polynomial and Gaussian kernels, reproducing kernel Hilbert spaces, Representor theorem
reading: relevant parts of 5.1 - 5.3 of Mohri et al.
optional reading: 5.3.3 of Mohri et al.

Lecture 21 (10/19/18)
covered material: iintroduction to online learning, halving algorithm
reading: 7.1 of Mohri et al.

Lecture 22 (10/22/18)
covered material: weighted majority algorithm, mistake bounds
reading: 7.2.1 and 7.2.2 of Mohri et al.
optional reading The Weighted Majority Algorithm by Littlestone and Warmuth (1994)

Lecture 23 (10/24/18)
covered material: randomized weighted majority
reading: 7.2.3 of Mohri et al.

Lecture 24 (10/26/18)
covered material: exponential weighted average, perceptron: mistake bound, dual, and kernalization, relation to stochastic gradient descent
reading: 7.2.4 and 7.3.1 of Mohri et al.
optional reading: The Perceptron: A Probabilistic Model for Information Storage & Organization in the Brain by Rosenblatt (1958)

Lecture 25 (10/29/18)
covered material: the Winnow algorithm, feature-efficient learning
reading: 7.3.2 of Mohri et al.
otional reading: Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm by Littlestone (1988)

Lecture 26 (10/31/18)
covered material: Littlestone dimension, introduction to the bandit setting
reading: 2.1.2 from these lecture notes by Daniel Hsu

Lecture 27 (11/2/18) (talk by Avi Wigderson)
covered material: singularity of symbolic matrices and alternating minimization for nonconvex optimization
other: at 3pm in SEO 636, please note the unusual time/place

Lecture 28 (11/2/18)
covered material: ε-greedy bandit strategies, importance sampling, EXP3
reading: these lecture notes by Jacob Abernethy
optional reading: sections 1-3 of The Nonstochastic Multiarmed Bandit Problem by Auer et al. (2002)

Lecture 29 (11/5/18)
covered material: explore-exploit tradeoff, EXP3.P for multiarmed bandits and EXP4 for multiarmed bandits with expert advice
reading: sections 4-7 of The Nonstochastic Multiarmed Bandit Problem by Auer et al. (2002)
optional reading: Contextual Bandit Algorithms with Supervised Learning Guarantees by Beygelzimer et al. (2011)

Lecture 30 (11/5/18) (talk by Steve Hanneke)
covered material: principles of active learning
optional reading: Steve Hanneke's doctoral dissertation
other: at 3pm in SEO 427, please note the unusual time/place

Lecture 31 (11/7/18)
covered material: introduction to regression
reading: 10.1 of Mohri et al.

Lecture 32 (11/9/18)
covered material: pseudo-dimension, Rademacher and dimension regression bounds
reading: 10.2 of Mohri et al.
other: problem set 3 assined

Lecture 33 (11/12/18)
covered material: least squares and linear regression, Lasso, and Rademacher bounds for regularized regressors
reading: 10.3.1 of Mohri et al.
optional reading: Regression Shrinkage and Selection via the Lasso by Tibshirani (1996)

Lecture 34 (11/14/18)
covered material: multiclass prediction, graph and Natarajan dimension, reductions to binary
reading: 8.1, 8.4.1, and 8.4.2 of Mohri et al.
optional reading: Multiclass Learnability and the ERM Principle by Daniely et al. (2011)

Lecture 35 (11/16/18)
covered material: bias-variance decomposition for mean squared error
reading: Cynthia Rudin's lecture notes
other: problem set 3 assigned

Lecture 36 (11/19/18)
covered material: odds, log-odds, and logistic regression
reading: 6.2.3 of Mohri et al. and these lecture notes

Lecture 37 (11/26/18)
covered material: fundementals of neural networks
reading: lecture notes by Shalev-Shwartz

Lecture 38 (11/26/18)
covered material: deep learning
reading: Representation Benefits of Deep Feedforward Networks by Telgarsky (2015)
optional reading: Size-Independent Sample Complexity of Neural Networks by Golowich et al. (2018)

Lectures 39 - 43 (11/28/18, 11/28/18, 12/3/18, 12/5/18, 12/7/18)
covered material: student presentations on advanced topics (see schedule)