EECS 496 - Computational Learning Theory
Northwestern University
Winter 2019

This course will introduce some of the central topics in computational learning theory, a field which approaches the question "whether machines can learn" from the perspective of theoretical computer science. We will study well defined and rigorous mathematical models of learning where it will be possible to give precise and rigorous analysis of learning problems and algorithms. A big focus of the course will be the computational efficiency of learning in these models. We will develop some provably efficient algorithms and explain why such provable algorithms are unlikely for other models.

We will only cover topics which permit a mathematically rigorous analysis and hence mathematical maturity is absolutely necessary. In particular, some familiarity with basic probability (such as linearity of expectation) and basic linear algebra will be necessary. Also, the emphasis of the course will be on proofs, so if you are in this course, you should enjoy proofs and math.

Example topics include inductive inference, query learning, PAC learning and VC theory, Occam's razor, online learning, boosting, support vector machines, bandit algorithms, statistical queries, Rademiacher complexity, and neural networks.

Basic Information

Instructor: Lev Reyzin
Syllabus: pdf
Time and Location: Tu,Th 9:30-10:50am, Tech MG28
Required Textbook: Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning, 2nd edition
Optional Textbook: Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms
Office Hours: Tu 11:00-11:50am, F 10:00-10:50am

Exam Dates

Final exam: Tuesday, 03/19/2019 at 12:00-2:00pm

Problem Sets

problem set 1 due 2/12/19
problem set 2 due 2/28/19
problem set 3 due 3/15/19

Lectures and Readings

Note: lectures will have material not covered in the readings.

Lecture 1 (1/8/19)
covered material: intro to the course, preview of learning models, inductive inference from text and informant
reading: Language Idenification in the Limit by Gold (1967)
optional reading: section 7 of Computing Machinery and Intelligence by Turing (1950)

Lecture 2 (1/10/19)
covered material: efficient exact learning with MQs and EQs, L* algorithm learning regular languages, implication for learning in limit
reading: ch. 16.1, 16.2, 16.3.3, 16.4 up to (but not including) 16.4.1
optional reading: ch. 16.4.1

Lecture 3 (1/15/19)
covered material: intro to PAC learning, relating PAC + MQ to query learning
reading: intro to ch. 2, 16.3.2

Lecture 4 (1/17/19)
covered meterial: PAC learning of axis-aligned regtangles; PAC guarantees for finite realizable case
reading: ch. 2.1, 2.2 A Theory of the Learnable by Valiant (1984)
optional reading: Occam's Razor by Blumer et al. (1986)

Lecture 5 (1/22/18)
covered material: learning in non-realizable case, introduction to Rademacher complexity, useful inequalities
reading: ch. 2.3, 3.1
optional reading: ch. 2.4

Lecture 6 (1/24/19)
covered material: Rademacher generalization bounds, growth function, VC dimension
reading ch. 3.2
optional reading: Rademacher Penalties and Structural Risk Minization by Koltchinskii (2001)

Lecture 7 (1/29/19)
covered material: VC generalization bounds
reading: ch. 3.3

Lecture 8 (2/5/19)
covered material: weak learning, boosting and equivalence to strong learning
reading: ch. 7.1 and 7.2
other: slides from lecture

Lecture 9 (2/7/19)
covered material: margins and the margin bound for boosting
reading: ch. 7.4 and 7.6
other: continued slides from lecture
optional reading: How Boosting the Margin Can Also Boost Classifier Complexity by Reyzin and Schapire (2006)

Lecture 10 (2/12/19)
covered material:the statistical query (SQ) model, relationship to PAC and noicy PAC, SQ variants SQ dimension
reading: Efficient Noise-Tolerant Learning from Statistical Queries by Kearns (1998)
other: slides from lecture

Lecture 11 (2/14/19)
covered material: correlational and honest SQs, SQ dimension and lower bounds, noisy parity, and learning
reading: section 3 of Characterizing SQ Learning: Simplified Notions and Proofs by Szörényi (2009)
other: continued slides from lecture
optional reading: Weakly Learning DNF and Characterizing SQ Learning using Fourier Analysis by Blum et al. (1994)
optional reading: Noise-Tolerant Learning, the Parity Problem, and the Statistical Query Model by Blum et al. (2000)

Lecture 12 (2/19/19)
covered material: applications of statistical queries to optimization, evolvability, and differential privacy
other: continued slides from lecture
optional reading: Statistical Algorithms and a Lower Bound for Detecting Planted Cliques

Lecture 13 (2/26/19)
covered material: intro to support vector machines, the primal optimization for separable case
reading: ch. 5.1 and 5.2.1

Lecture 14 (2/28/19)
covered material: Lagrange duality (and KKT theorem), support vectors, and dual optimization problem (and kernels), support vectors
reading: ch. 5.2.2 - 5.2.4

Lecture 15 (3/5/19)
covered material: leave-one-out bounds, non separable SVM, hinge loss, margins
reading: ch. 5.3 and Thm. 2.4 from these notes on the margin bound
optional reading: Support-Vector Networks by Cortes and Vapnik (1995)

Lecture 16 (3/7/19)
covered material: introduction to online learning, halving algorithm, weighted majority (WM), randomized WM
reading: ch. 8.1 - 8.2.3
optional reading The Weighted Majority Algorithm by Littlestone and Warmuth (1994)

Lecture 17 (3/12/19)
covered material: Littlestone dimension, perceptron: mistake bound and dual, EXP3
reading: ch. 7.2.4 and 7.3.1 and these lecture notes by Jacob Abernethy
optional reading: The Perceptron: A Probabilistic Model for Information Storage & Organization in the Brain by Rosenblatt (1958)
more optional reading: On Littlestone dimension: 2.1.2 from these lecture notes by Daniel Hsu

Lecture 18 (3/14/10)
covered material: introduction to regression bias-variance for mean squared error, logistic regression
reading: Cynthia Rudin's lecture notes and ch. 13.7