MCS 548 - Mathematical Theory of Artificial Intelligence
University of Illinois - Chicago
Fall 2020


This course will introduce some of the central topics in computational learning theory, a field which approaches the question "whether machines can learn" from the perspective of theoretical computer science. We will study well defined and rigorous mathematical models of learning where it will be possible to give precise and rigorous analysis of learning problems and algorithms. A big focus of the course will be the computational efficiency of learning in these models. We will develop some provably efficient algorithms and explain why such provable algorithms are unlikely for other models.

Example topics include inductive inference, query learning, PAC learning and VC theory, Occam's razor, online learning, boosting, support vector machines, bandit algorithms, statistical queries, Rademiacher complexity, and neural networks.

This course is represented on the computer science prelim.

Basic Information

Syllabus: pdf
Time and Location: M-W-F 1:00pm - 1:50pm, online
Instructor Contact Information: Lev Reyzin, SEO 418, (312)-413-3745,
Required Textbook: Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning, second edition (available free online)
Optional Textbook: Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms (available free online)
Office Hours: T 9:00-9:50am, F 2:00-2:50pm, online

Presentations

schedule

Problem Sets

problem set 1 due 9/25/20
problem set 2 due 10/19/20
problem set 3 due 11/16/20 11/18/20

Lectures and Readings

Note: lectures will have material not covered in the readings.

Lecture 1 (8/24/20)
covered material: intro to the course, preview of learning models
reading: section 7 of Computing Machinery and Intelligence by Turing (1950)

Lecture 2 (8/26/20)
covered meterial: introduction to PAC learning
reading: A Theory of the Learnable by Valiant (1984)

Lecture 3 (8/28/20)
covered meterial: PAC learning fo axis-aligned rectangles
reading: 2.1 of Mohri et al.

Lecture 4 (8/31/20)
covered meterial: PAC guarantee for the finite realizable case
reading: 2.2 of Mohri et al.
optional reading: Occam's Razor by Blumer et al. (1986)

Lecture 5 (9/2/20)
covered material: PAC learning without a perfect predictor
reading: 2.3 of Mohri et al.
optional reading: begin chapters 2 and 3 of Shalev-Shwartz and Ben-David for a different perspective on PAC learning

Lecture 6 (9/4/20)
covered material: agnostic PAC learning bounds
reading: 2.4 and 2.5 of Mohri et al.
optional reading: continue chapters 2 and 3 of Shalev-Shwartz and Ben-David for a different perspective on PAC learning

Lecture 7 (9/9/20)
covered material: McDiarmid's inequality and its relationship to Hoeffding's bound
reading D.1, D.7 of Mohri et al.

Lecture 8 (9/11/20)
covered material: Rademacher generalization bounds
reading 3.1 of Mohri et al.
optional reading: Rademacher Penalties and Structural Risk Minization by Koltchinskii (2001)

Lecture 9 (9/14/20)
covered material: the growth function, Maximal inequality, Massart's lemma
reading: 3.2, D.10 of Mohri et al.

Lecture 10 (9/16/20)
covered material: VC-dimension, Sauer's lemma, VC generalization bounds
reading: 3.3, 3.4 of Mohri et al.

Lecture 11 (9/18/20)
covered material: intro to weak learning, boosing the confidence
reading: 7.1 of Mohri et al.

Lecture 12 (9/21/20)
covered material: the boosting framework, AdaBoost
reading: 7.2 of Mohri et al.

Lecture 13 (9/23/20)
covered mateiral: proof that AdaBoost converts weak learners to strong learners
reading: 7.3.1 of Mohri et al.

Lecture 14 (9/25/20)
covered material: margins and the margin bound for boosting
reading: 7.3.2 to 7.3.4 of Mohri et al.
optional reading: chapter 10 of Shalev-Shwartz and Ben-David for a different perspective on boosting

Lecture 15 (9/28/20)
covered material: went over problem set 1 (asynchronous)

Lecture 16 (9/30/20)
covered material: introduction to statistical queries (SQ) and PAC under classification noise
reading: sections 1 and 2 of this survey

Lecture 17 (10/2/20)
covered material: statistical query dimension (SQ-DIM) as a lower bound for SQ learning
reading: section 3.1 of this survey

Lecture 18 (10/5/20)
covered material: non-SQ learnable classes, SQ upper bounds, Gaussian elimination for parities
reading: sections 3.2 and 3.3 of this survey

Lecture 19 (10/7/20)
covered material: intro to support vector machines (SVM), the primal optimization for separable case
reading: 5.1 of Mohri et al.

Lecture 20 (10/9/20)
covered material: duality in constraint optimization
reading: B.3 of Mohri et al.

Lecture 21 (10/12/20)
covered material: primal and dual optimization problems for SVM
reading: 5.2 of Mohri et al.

Lecture 22 (10/14/20)
covered material: talk at the Simons Institute by Michael Kearns on "The Ethical Algorithm"
optional reading: this article by Kearns and Roth (UPenn)

Lecture 23 (10/16/20)
covered material: non separable SVM primal and dual
reading: 5.3 of Mohri et al.
optional reading: Support-Vector Networks by Cortes and Vapnik (1995)

Lecture 24 (10/19/20)
covered material: intro to adaptive data analysis (asynchronous lecture)
optional reading: section 4.3 of this survey

Lecture 25 (10/21/20)
covered material: SVM margin bounds, hinge loss as an upper bound to 0-1 loss
reading: 5.4 of Mohri et al. (note these proofs are longer than what I presented in lecture)

Lecture 26 (10/23/20)
covered material: kernalization, PDS kernels, polynomial and Gaussian kernels, the representor theorem
reading: 6.1 and 6.2.1 of Mohri et al.
optional reading: 6.2.2 and 6.3 of Mohri et al.

Lecture 27 (10/26/20)
covered material: introduction to online learning, the notion of regret
reading: 8.1 of Mohri et al.

Lecture 28 (10/28/20)
covered material: mistake bounds, online learning in the finite and realizable case, the Halving algorithm
reading: 8.2.1 of Mohri et al.

Lecture 29 (10/30/20)
covered material: the Weighted Majority (WM) algorithm, lower bound against deterministic online learning algorithms
reading: 8.2.2 of Mohri et al.
optional reading The Weighted Majority Algorithm by Littlestone and Warmuth (1994)

Lecture 30 (11/2/20)
covered material: the Randomized Weighted Majoirty (RWM) algorithm
reading: 8.2.3 of Mohri et al.

Lecture 31 (11/4/20)
covered material: online-to-batch conversion, Exponential Weighted Average (EWA)
reading: 8.2.3 and 8.4 of Mohri et al.

Lecture 32 (11/6/20)
covered material: the Perceptron: mistake bound, dual, and kernalization
reading: 8.3.1 of Mohri et al.
optional reading: The Perceptron: A Probabilistic Model for Information Storage & Organization in the Brain by Rosenblatt (1958)

Lecture 33 (11/9/20)
covered material: online learning with infinite function classes, tree shattering
reading: 21.1 up to 21.1.1 of Shalev-Shwartz and Ben-David

Lecture 34 (11/11/20)
covered material: Littlestone dimension (L-dim), Standard Optimal Algorithm (SOA), non-realizable case
reading : 21.1.1 of Shalev-Shwartz and Ben-David

Lecture 35 (11/13/20)
covered material: introduction to bandit learning, epsilon-first and epsilon-greedy algorithms
reading: 1.1 and 1.2 of this book by Alex Slivkins (MSR)

Lecture 36 (11/16/20)
covered material: importance weighting, EXP3 for adversarial multiarmed bandits, ensuring exploration to reduce variance
reading: these notes by Nicholas Harvey (UBC)
optional reading: The Nonstochastic Multiarmed Bandit Problem by Auer et al. (2002)

Lecture 37 (11/18/20)
covered material: intro to regression, Rademacher and pseudo-dimension bounds
reading: 11.1 and 11.2 of Mohri et al.

Lecture 38 (11/20/20)
covered material: least squares, linear regression, regularization, kernel ridge regression, Lasso
reading: 11.3 of Mohri et al.

Lecture 39 (11/23/20)
covered material: bias-variance decomposition for mean squared error
reading: these notes by Cynthia Rudin (Duke)

Lecture 40 (11/25/20)
covered material: other models, neural networks, deep learning
reading: these lecture notes by Shalev-Shwartz (Hebrew University)

Lectures 41-43 (11/30/20, 12/2/20, 12/4/20)
covered material: student presentations of research papers
other: see schedule