MCS 548 - Mathematical Theory of Artificial Intelligence
University of Illinois - Chicago
Fall 2025


This course will introduce some of the central topics in computational learning theory, a field which approaches the question "whether machines can learn" from the perspective of theoretical computer science. We will study well defined and rigorous mathematical models of learning where it will be possible to give precise and rigorous analysis of learning problems and algorithms. A big focus of the course will be the computational efficiency of learning in these models. We will develop some provably efficient algorithms and explain why such provable algorithms are unlikely for other models.

Example topics include inductive inference, query learning, PAC learning and VC theory, Occam's razor, online learning, boosting, support vector machines, bandit algorithms, statistical queries, Rademiacher complexity, and neural networks.

This course is represented on the computer science prelim.

Basic Information

Syllabus: pdf
Time and Location: M-W-F 1:00pm - 1:50pm, Computer Design Research and Learning Center (CDRLC) 1405
Instructor Contact Information: Lev Reyzin, SEO 417, (312)-413-3745,
Primary Textbook: Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning, second edition (available free online)
Secondary Textbook: Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms (available free online)
Office Hours: TBD
Piazza site: please sign up via this link

Presentations

choosing a project
choosing a paper to present
schedule

Problem Sets

problem set 1 due 9/29/25
problem set 2 due 10/24/25
problem set 3 due 12/1/25

Lectures and Readings

Note: lectures will have material not covered in the readings.

lecture 1 (8/25/25)
covered material: intro to the course, preview of learning models
reading: section 7 of Computing Machinery and Intelligence by Turing (1950)

lecture 2 (8/27/25)
covered meterial: introduction to PAC learning
reading: A Theory of the Learnable by Valiant (1984)

lecture 3 (8/29/25)
covered meterial: PAC learning fo axis-aligned rectangles and finite realizable case
reading: 2.1 and 2.2 of Mohri et al.
optional reading: Occam's Razor by Blumer et al. (1986)

lecture 4 (9/3/25)
covered meterial: Hoeffding bouds
reading: 2.3 and D.1 of Mohri et al.

lecture 5 (9/5/25)
covered material: PAC learning without a perfect predictor
reading: 2.4 of Mohri et al.
optional reading: begin chapters 2 and 3 of Shalev-Shwartz and Ben-David for a different perspective on PAC learning

lecture 6 (9/8/25)
covered material: McDiarmid's inequality Rademacher generalization bounds
reading 3.1, D.7 of Mohri et al.
optional reading: Rademacher Penalties and Structural Risk Minization by Koltchinskii (2001)

lecture 7 (9/10/25)
covered material: the growth function, Massart's lemma
reading: 3.2 of Mohri et al.

lecture 8 (9/12/25)
covered material: VC-dimension, Sauer's lemma, VC generalization bounds
reading: 3.3, 3.4 of Mohri et al.

lecture 9 (9/15/25)
covered material: intro to support vector machines (SVM), the primal and dual quadratic programs
reading: 5.1, 5.2 of Mohri et al.

lecture 10 (9/17/25)
covered material: leaove-one-out error, non separable SVM primal and dual
reading: 5.3 of Mohri et al.
optional reading: Support-Vector Networks by Cortes and Vapnik (1995)

lecture 11 (9/19/25)
covered material: SVM margin bounds, hinge loss as an upper bound to 0-1 loss
reading: 5.4 of Mohri et al. (note these proofs are longer than what I presented in lecture)

lecture 12 (9/22/25)
covered material: kernalization, PDS kernels, polynomial and Gaussian kernels, the representor theorem
reading: 6.1 and 6.2.1 of Mohri et al.
optional reading: 6.2.2 and 6.3 of Mohri et al.

lecture 13 (9/24/25)
covered material: weak learning, boosing the confidence, Adaboost
reading: 7.1 of Mohri et al.

lecture 14 (9/26/25)
covered material: the strength of weak learnability
reading: 7.2 of Mohri et al.
optional reading: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting by Freund and Schapire (1997)

lecture 15 (9/29/25)
covered material: margins and the margin bound for boosting
reading: 7.3 of Mohri et al.
optional reading: chapter 10 of Shalev-Shwartz and Ben-David for a different perspective on boosting

lecture 16 (10/1/25)
covered material: game theoretic view of boosting, introduction to online learning, halving algorithm
reading: 8.1 of Mohri et al.

lecture 17 (10/3/25)
covered material: weighted majority algorithm, mistake bounds
reading: 8.2.1 and 8.2.2 of Mohri et al.

lecture 18 (10/5/25)
covered material: randomized weighted majority, online to batch conversion
reading: 8.2.3 and 8.4 of Mohri et al.

lecture 19 (10/8/25)
covered material: the Perceptron: mistake bound, dual, and kernalization
reading: 8.3.1 of Mohri et al.
optional reading: The Perceptron: A Probabilistic Model for Information Storage & Organization in the Brain by Rosenblatt (1958)

lecture 20 (10/10/25)
covered material: Littlestone dimension (L-dim), Standard Optimal Algorithm (SOA), non-realizable case
reading : 21.1.1 of Shalev-Shwartz and Ben-David

lecture 21 (10/13/25)
covered material: introduction to bandit learning, epsilon-first and epsilon-greedy algorithms
reading: 1.1 and 1.2 of this book by Alex Slivkins (MSR)

lecture 22 (10/15/25)
covered material: importance weighting, EXP3 for adversarial multiarmed bandits, ensuring exploration to reduce variance
reading: these notes by Nicholas Harvey (UBC)
optional reading: The Nonstochastic Multiarmed Bandit Problem by Auer et al. (2002)

lecture 23 (10/17/25)
covered material: introduction to statistical queries (SQ) and PAC under classification noise
reading: sections 1 and 2 of this survey

lecture 24 (10/20/25)
covered material: statistical query dimension (SQ-dim), SQ upper bounds
reading: section 3 of this survey

lecture 25 (10/22/25)
covered material: applications of statistical queries to optimization and evolvability
optional reading: Statistical Algorithms and a Lower Bound for Detecting Planted Cliques by Feldman et al. (2017)

lecture 26 (11/24/25)
covered material: statistical queries and differential privacy, some other appplications of SQ
reading: section 4 of this survey

lecture 27 (10/27/25)
covered material: intro to regression, Rademacher and pseudo-dimension bounds
reading: 11.1 and 11.2 of Mohri et al.

lecture 28 (10/29/25)
covered material: regularization, kernel ridge regression, Lasso
reading: 11.3 of Mohri et al.

lecture 29 (10/31/25)
covered material: multiclass prediction, graph and Natarajan dimension, reductions to binary
reading: 8.1, 8.4.1, and 8.4.2 of Mohri et al.
optional reading: Multiclass Learnability and the ERM Principle by Daniely et al. (2011)

lecture 30 (11/3/25)
covered material: bias-variance decomposition for mean squared error
reading: these notes by Cynthia Rudin (Duke)

lecture 31 (11/5/25)
covered material: random projections and the Johnson-Lindenstrauss lemma
reading: 15.4 of Mohri et al.

lecture 32 (11/7/25)
covered material: singular value decomposition and principal component analysis
reading: 15.1 of Mohri et al.

lecture 33 (11/10/25)
covered material: neural networks, deep learning
reading: these lecture notes by Shalev-Shwartz (Hebrew University)

lecture 34 (11/12/25)
covered material: large language models LLMs
optional reading: these lecture notes by Sachin Kumar (UW/AI2)

lecture 35 (11/14/25)
covered material: clustering, k-means, hardness
reading: 22.2 of Shalev-Shwartz and Ben-David

lectures 36-38 (11/17/25, 11/19/25, 11/21/25)
covered material: student presentations

lecture 39 (11/24/25)
covered material: adaptive data analysis
optional reading: Sampling Without Compromising Accuracy in Adaptive Data Analysis by Fish et al. (2025)

lectures 40-42 (12/1/25, 12/3/25, 12/5/25)
covered material: student presentations