MCS 549 - Mathematical Foundations of Data Science
University of Illinois at Chicago
Fall 2024


This course covers the mathematical foundations of modern data science from a theoretical computer science perspective. Topics will include random graphs, random walks, Markov chains, streaming algorithms, clustering, singular value decomposition, and random projections.

Basic Information

Syllabus: pdf
Time and Location: M-W-F 9:00-9:50 am, 206 Lincoln Hall (LH)
Instructor: Lev Reyzin, SEO 417
Online Textbook: Avrim Blum, John Hopcroft, and Ravi Kannan, Mathematical Foundations of Data Science
Office Hours: T 11:00-11:50am (online), F 10:00-10:50am (in-person)
Piazza site: please sign up via this link

Presentations

instructions
topics and times

Problem Sets

problem set 1 due 10/4/24 10/9/24
problem set 2 due 11/1/24 11/4/24
problem set 3 due 12/6/24

Lectures and Readings

Lecture 1 (8/26/24)
covered material: intro to the course, preview of the material
reading: chapter 1

Lecture 2 (8/28/24)
covered material: some concentration inequalitiess
reading: chapters 2.1 - 2.2

Lecture 3 (8/30/24)
covered material: properties of high dimensions, properties of the unit ball
reading: chapters 2.3 - 2.4

Lecture 4 (9/4/24)
covered material: sampling from a high-dimensional sphere, Gaussian annulus theorem
reading: chapters 2.5 - 2.6

Lecture 5 (9/6/24)
covered material: random projections and the Johnson-Lindesnstrauss lemma
reading: chapter 2.7

Lecture 6 (9/9/24)
coveredmaterial: singular value decomposition (SVD), best-fit subspaces, and optimality of greedy algorithm
reading: chapters 3.1 - 3.4

Lecture 7 (9/11/24)
covered material: computing SVD, power iteration
reading: chapters 3.7 and 3.8

Lecture 8 (9/13/24)
covered material: centering data, separation Gaussians with SVD
reading: chapter 3.9

Lecture 9 (9/16/24)
covered material: intro to Markov chains, stationary distribution, Fundamental Theorem of Markov Chains
reading: chapter 4 intro, 4.1

Lecture 10 (9/18/24)
covered material: the Web as a Markov chain (online lecture by Leskovec)
reading: chapter 4.8

Lecture 11 (9/20/24)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hasting
reading: chapter 4.2

Lecture 12 (9/23/24)
covered material: Gibbs sampleing, volume estimation of convex bodies
reading: chapter 4.3

Lecture 13 (9/25/24)
covered material: bounding mixing time with normalized conductance
reading: begin chapter 4.4

Lecture 14 (9/27/24)
covered material: electrical networks and random walks, probabilistic interpretation of voltage
reading: begin chapter 4.5

Lecture 15 (9/30/24)
covered material: probabilistic interpretation of current and effective resistence
reading: finish chapter 4.5

Lecture 16 (10/2/24)
covered material: hitting and commute times on unweighted graphs
reading: chapter 4.6

Lecture 17 (10/4/24)
covered material: cover time, random walks on lattices
reading: chapter 4.7

Lecture 18 (10/7/24)
covered material: introduction to random graphs
reading: chapter 8.1.1

Lecture 19 (10/9/24)
covered material: counting triangles
reading: chapter 8.1.2

Lecture 20 (10/11/24)
covered material proof of mixing time, probability flows
reading: finish chapter 4.4

Lecture 21 (10/14/24)
covered material: first and second moment methods, sharp thresholds, first moment for diameter
reading: chapter 8.2

Lecture 22 (10/16/24)
covered material: sharp threshold for diameter, Hamiltonian cycles example
reading: finish chapter 8.2

Lecture 23 (10/18/24)
covered material: increasing graph properties, replicators, other growth models, Molloy-Reed condition
reading: chapters 8.5, 8.8, and 8.9

Lecture 24 (10/21/24)
covered material: intro to machine learning, PAC learning
reading: chapter 5.1

Lecture 25 (10/23/24)
covered material: Occam's razor theorem, VC dimension
reading: chapters 5.5 and 5.6

Lecture 26 (10/25/24)
covered material: neural networks and large language models
reading: chapter 5.8

Lecture 27 (10/28/24)
covered material: Support Vector Machines
reading: chapter 5.3

Lecture 28 (10/30/24)
covered material: the Perceptron algorithm, Kernel functions
reading: chapters 5.2 and 5.10

Lecture 29 (11/1/24)
covered material: weak learning, boosting the accuracy, AdaBoost
reading: chapter 5.11

Lecture 30 (11/4/24)
covered material: statistical queries
optional reading: sections 1 - 2 of this survey

Lecture 31 (11/6/24)
covered material: statistical dimension and learning
optional reading: section 3 of this survey

Lecture 32 (11/8/24)
covered material: talk at the Simons Institute by Michael Kearns on "The Ethical Algorithm"
optional reading: this article by Kearns and Roth (UPenn)

Lecture 33 (11/11/24)
covered material: applications of statistical queries
reading: section 4 of this survey

Lecture 34 (11/13/24)
covered material: intro to streaming algorithms, estimating the number of distinct elements in a stream
reading: chapter 6.1

Lecture 35 (11/15/24)
covered material: pairwise independent hash functions
optional reading: notes from Ronitt Rubenfeld (MIT)

Lecture 36 (11/18/23)
covered material majority element, and estimating second frequency moment of a stream
reading: chapter 6.2

Lecture 37 (11/20/24)
covered material: introduction to clustering, clustering objectives, k-means for Gaussian MLE
reading: chapter 7.1

Lecture 38 (11/22/24)
covered material: Lloyd's algorithm, 2-approximation for k-center
reading: chapters 7.2 and 7.3

Lecture 39 (11/25/24)
covered material: adaptive data analysis
reading: this paper

Lectures 40 - 42 (12/2/24, 12/4/24, 12/6/24)
covered material: student presentations on research papers
notes: see schedule