MCS 548 - Mathematical Theory of Artificial Intelligence
University of Illinois - Chicago
Fall 2016


This class will focus on the foundations of machine learning theory. Example topics include inductive inference, query learning, PAC learning and VC theory, Occam's razor, online learning, boosting, support vector machines, bandit algorithms, statistical queries, Rademiacher complexity, and neural networks.

This course is represented on the computer science prelim.

Basic Information

Syllabus: pdf
Time and Location: M-W-F 1:00pm - 1:50pm, Taft Hall (TH) 308
Instructor Contact Information: Lev Reyzin, SEO 418, (312)-413-3745,
Required Textbook: Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning (available online via UIC library)
Optional Textbook: Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms
Office Hours: T, F 10:00am-10:50am

Exam Dates

Final Exam: Monday December 5, 1:00-3:00pm in TH 308

Projects and Presentations

choosing a project, topic due 11/7/16, final project reports due: 12/9/16 by 10 a.m.
presentation schedule
presenting a paper, paper and date selection due 11/14/16

Problem Sets

problem set 1 due 9/30/16
problem set 2 due 10/21/16
problem set 3 due 11/23/16 11/28/16

Lectures and Readings

Note: lectures will have material not covered in the readings.

Lecture 1 (8/22/16)
covered material: intro to the course, preview of learning models, beginning inductive inference
reading: section 7 of Computing Machinery and Intelligence by Turing (1950)

Lecture 2 (8/24/16)
covered material: positive and negative results for learning in limit from text and informant
reading: Language Idenification in the Limit by Gold (1967)
optional reading: Theorem 13.5 from 13.4.1 of Mohri et al.

Lecture 3 (8/26/16)
covered material: efficient exact learning, membership and equivalence queries, relationship to learning in the limit
reading: 13.1 and 13.2 of Mohri et al.

Lecture 4 (8/29/16)
covered material: L* algorithm for exact learning of regular languages with membership and equivalence queries
reading: begin 13.3.3 of Mohri et al.
optional reading: Learning Regular Sets from Queries and Counterexamples by Angluin (1987)

Lecture 5 (8/31/16)
covered material: finishing the L* algorithm, introduction to PAC learning
reading: finish 13.3.3 and begin 2.1 of Mohri et al.

Lecture 6 (9/2/16)
covered meterial: PAC learning of axis-aligned regtangles
reading: finish 2.1 of Mohri et al.; A Theory of the Learnable by Valiant (1984)

Lecture 7 (9/7/16)
covered material: PAC guarantees for finite hypothesis realizable case
reading: 2.2 of Mohri et al.
optional reading: Occam's Razor by Blumer et al. (1986)

Lecture 8 (9/9/16)
covered material: PAC guarantees for inconsistent and agnostic learning reduction from query learning to PAC + MQ
reading: 2.3, 2.4.1, and 13.3.2 of Mohri et al.
optional reading: Queries and Concept Learning by Angluin (1988)

Lecture 9 (9/12/16)
covered material: the growth function, VC-dimension, Sauer's lemma
reading: beginning 3.2 and 3.3 of Mohri et al.

Lecture 10 (9/14/16)
covered material: introduction to Rademacher complexity
reading beginning of 3.1

Lecture 11 (9/16/16)
covered material: Massart's lemma, Rademacher and VC generalization bounds
reading finish 3.1 and 3.2
optional reading: Rademacher Penalties and Structural Risk Minization by Koltchinskii (2001)
other: problem set 1 assigned

Lecture 12 (9/19/16)
covered material: weak learning, boosting and equivalence to strong learning
reading: 6.1 and 6.2 of Mohri et al.

Lecture 13 (9/21/16)
covered material: empirical risk minimization and Fisher consistency (guest lecture by Brian Ziebart)
optional reading: notes on consistency

Lecture 14 (9/23/16)
covered material: margins and the margin bound for boosting
reading: 6.3.1 and 6.3.2 of Mohri et al.
optional reading: How Boosting the Margin Can Also Boost Classifier Complexity by Reyzin and Schapire (2006)

Lecture 15 (9/26/16)
covered material: parity functions, the statistical query (SQ) model, relationship to PAC
reading: Efficient Noise-Tolerant Learning from Statistical Queries by Kearns (1998)

Lecture 16 (9/28/16)
covered material: SQ learning implies noisy PAC, SQ dimension, and correlational SQ (CSQ)
reading: section 3 of Characterizing SQ Learning: Simplified Notions and Proofs by Szörényi (2009)

Lecture 17 (9/30/16)
covered material: decomposing an SQ into a CSQ and unlabeled distribution queries, (C)SQ lower bounds
optional reading: Weakly Learning DNF and Characterizing SQ Learning using Fourier Analysis by Blum et al. (1994)

Lecture 18 (10/3/16)
covered material: intro to support vector machines, the primal optimization for separable case
reading: 4.2.1 of Mohri et al.

Lecture 19 (10/5/16)
covered material: Lagrange duality (and KKT theorem), support vectors, and dual optimization problem for SVM
reading: 4.2.2, 4.2.3, and B.3 of Mohri et al.

Lecture 20 (10/7/16)
covered material: non separable SVM primal and dual formulations, hinge and square hinge losses
reading: 4.3 of Mohri et al.
optional reading: Support-Vector Networks by Cortes and Vapnik (1995)
other: problem set 2 assigned

Lecture 21 (10/10/16)
covered material: SVM leave-one-out bounds and margin bounds
reading: 4.4 of Mohri et al.

Lecture 22 (10/12/16)
covered material: Mercer's theorem, kernel Hilbert space and kernelized SVM, polynomial, Gaussian, and sigmoidal kernels
reading: 5.1 and 5.2 of Mohri et al.

Lecture 23 (10/14/16)
covered material: overview of RKHS and Representor theorems, introduction to online learning, halving algorithm
reading: 5.3.1, 5.3.2, and 7.1 of Mohri et al.
optional reading: 5.3.3 of Mohri et al.

Lecture 24 (10/17/16)
covered material: weighted majority and randomized weighted majority algorithms, mistake bounds
reading: 7.2.1 to 7.2.3 of Mohri et al.
optional reading The Weighted Majority Algorithm by Littlestone and Warmuth (1994)

Lecture 25 (10/19/16)
covered material: exponential weighted majority, perceptron: mistake bound, dual, and kernalization, relation to stochastic gradient descent
reading: 7.2.4 and 7.3.1 of Mohri et al.
optional reading: The Perceptron: A Probabilistic Model for Information Storage & Organization in the Brain by Rosenblatt (1958)

Lecture 26 (10/21/16)
covered material: the Winnow algorithm, feature-efficient learning, application to disjunctions
reading: 7.3.1 of Mohri et al.
otional reading: Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm by Littlestone (1988)

Lecture 27 (10/24/16)
covered material: the bandit setting, explore-exploit tradeoff, ε-greedy and ε-first, importance sampling, begin EXP3
reading: sections 1-3 of The Nonstochastic Multiarmed Bandit Problem by Auer et al. (2002)

Lecture 28 (10/26/16)
covered material: EXP3 and EXP3.P for multiarmed bandits and EXP4 for multiarmed bandits with expert advice
reading: sections 4-7 of The Nonstochastic Multiarmed Bandit Problem by Auer et al. (2002)
optional reading: Contextual Bandit Algorithms with Supervised Learning Guarantees by Beygelzimer et al. (2011)
other: project guidelines announced

Lecture 29 (10/28/16)
covered material: oracle-based contextual bandits, online-to-batch conversion, tree-shattering and Littlestone dimension
reading: 7.4 of Mohri et al. and these lecture notes by Shai Shalev-Shwartz
optional reading: Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits by Agarwal et al. (2014)

Lecture 30 (10/30/16)
covered material: introduction to regression, pseudo-dimension, γ-fat-dimension, and Rademacher regression bounds
reading: 10.1 and 10.2 of Mohri et al.

Lecture 31 (11/2/16)
covered material: least squares and linear regression, Lasso, and Rademacher bounds for regularized regressors
reading: 10.3.1 of Mohri et al.
optional reading: Regression Shrinkage and Selection via the Lasso by Tibshirani (1996)

Lecture 32 (11/4/16)
covered material: Ridge and kernelized Ridge, Widrow-Hoff, and online Lasso
reading: 10.3.2, 10.3.4, and 10.3.6 of Mohri et al.
optional reading: Ridge Regression: Biased Estimation for Nonorthogonal Problems by Hoerl and Kennard (1970)

Lecture 33 (11/7/16)
covered material: bias-variance decomposition for mean squared error
reading: Cynthia Rudin's lecture notes
other: problem set 3 assigned

Lecture 34 (11/9/16)
covered material: odds, log-odds, and logistic regression
reading: 6.2.3 of Mohri et al. and these lecture notes

Lecture 35 (11/11/16)
covered material: covariate shift and active learning (guest lecture by Brian Ziebart)
optional reading: Shift-Pessimistic Active Learning using Robust Bias-Aware Prediction by Liu et al (2015)

Lecture 36 (11/14/16)
covered material: multiclass prediction, decision trees, and one-vs-all and all pairs reductions to binary
reading: 8.1, 8.4.1, and 8.4.2 of Mohri et al.

Lecture 37 (11/16/16)
covered material: the multivector construction, multiclass batch perceptron, Natarajan and Graph dimension
reading: Multiclass Learnability and the ERM Principle by Daniely et al. (2011)

Lecture 38 (11/18/16)
covered material: multiclass SVM and multiclass boosting
reading: 8.3 of Mohri et al.
optional reading: 8.2 of Mohri et al.

Lecture 39 (11/21/16)
covered material: neural networks and deep learning
reading: lecture notes by Shalev-Shwartz
optional reading: Representation Benefits of Deep Feedforward Networks by Telgarsky (2015)

Lectures 40 - 43 (11/23/16, 11/28/16, 11/30/16, 12/2/16)
covered material: student presentations on advanced topics (see schedule)