MCS 549 - Mathematical Foundations of Data Science
University of Illinois at Chicago
Fall 2024


This course covers the mathematical foundations of modern data science from a theoretical computer science perspective. Topics will include random graphs, random walks, Markov chains, streaming algorithms, clustering, singular value decomposition, and random projections.

Basic Information

Syllabus: pdf
Time and Location: M-W-F 9:00-9:50 am, 206 Lincoln Hall (LH)
Instructor: Lev Reyzin, SEO 417
Online Textbook: Avrim Blum, John Hopcroft, and Ravi Kannan, Mathematical Foundations of Data Science
Office Hours: T 11:00-11:50am (online), F 10:00-10:50am (in-person)
Piazza site: please sign up via this link

Presentations

Problem Sets

problem set 1 due 10/4/24 10/9/24
problem set 2 due 11/1/24

Lectures and Readings

Lecture 1 (8/26/24)
covered material: intro to the course, preview of the material
reading: chapter 1

Lecture 2 (8/28/24)
covered material: some concentration inequalitiess
reading: chapters 2.1 - 2.2

Lecture 3 (8/30/24)
covered material: properties of high dimensions, properties of the unit ball
reading: chapters 2.3 - 2.4

Lecture 4 (9/4/24)
covered material: sampling from a high-dimensional sphere, Gaussian annulus theorem
reading: chapters 2.5 - 2.6

Lecture 5 (9/6/24)
covered material: random projections and the Johnson-Lindesnstrauss lemma
reading: chapter 2.7

Lecture 6 (9/9/24)
coveredmaterial: singular value decomposition (SVD), best-fit subspaces, and optimality of greedy algorithm
reading: chapters 3.1 - 3.4

Lecture 7 (9/11/24)
covered material: computing SVD, power iteration
reading: chapters 3.7 and 3.8

Lecture 8 (9/13/24)
covered material: centering data, separation Gaussians with SVD
reading: chapter 3.9

Lecture 9 (9/16/24)
covered material: intro to Markov chains, stationary distribution, Fundamental Theorem of Markov Chains
reading: chapter 4 intro, 4.1

Lecture 10 (9/18/24)
covered material: the Web as a Markov chain (online lecture by Leskovec)
reading: chapter 4.8

Lecture 11 (9/20/24)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hasting
reading: chapter 4.2

Lecture 12 (9/23/24)
covered material: Gibbs sampleing, volume estimation of convex bodies
reading: chapter 4.3

Lecture 13 (9/25/24)
covered material: bounding mixing time with normalized conductance
reading: begin chapter 4.4

Lecture 14 (9/27/24)
covered material: electrical networks and random walks, probabilistic interpretation of voltage
reading: begin chapter 4.5

Lecture 15 (9/30/24)
covered material: probabilistic interpretation of current and effective resistence
reading: finish chapter 4.5

Lecture 16 (10/2/24)
covered material: hitting and commute times on unweighted graphs
reading: chapter 4.6

Lecture 17 (10/4/24)
covered material: cover time, random walks on lattices
reading: chapter 4.7

Lecture 18 (10/7/24)
covered material: introduction to random graphs
reading: chapter 8.1.1

Lecture 19 (10/9/24)
covered material: counting triangles
reading: chapter 8.1.2

Lecture 20 (10/11/24)
covered material proof of mixing time, probability flows
reading: finish chapter 4.4

Lecture 21 (10/14/24)
covered material: first and second moment methods, sharp thresholds, first moment for diameter
reading: chapter 8.2

Lecture 22 (10/16/24)
covered material: sharp threshold for diameter, Hamiltonian cycles example
reading: finish chapter 8.2

Lecture 23 (10/18/24)
covered material: increasing graph properties, replicators, other growth models, Molloy-Reed condition
reading: chapters 8.5, 8.8, and 8.9

Lecture 24 (10/21/24)
covered material: intro to machine learning, PAC learning
reading: chapter 5.1

Lecture 25 (10/23/24)
covered material: Occam's razor theorem, VC dimension
reading: chapters 5.5 and 5.6

Lecture 26 (10/25/24)
covered material: neural networks and large language models
reading: chapter 5.8