MCS 549 - Mathematical Foundations of Data Science
University of Illinois at Chicago
Fall 2021


This course covers the mathematical foundations of modern data science from a theoretical computer science perspective. Topics will include random graphs, small world phenomena, random walks, Markov chains, streaming algorithms, clustering, graphical models, singular value decomposition, and random projections.

Basic Information

Syllabus: pdf
Time and Location: M-W-F 12:00PM-12:50PM, 215 Lincoln Hall and online
Instructor Contact Information: Lev Reyzin, SEO 417
Online Textbook: Avrim Blum, John Hopcroft, and Ravi Kannan, Mathematical Foundations of Data Science
Office Hours: M 10:00AM-10:50AM in SEO 417, R 8:30AM-9:20AM online
Piazza site: link

Problem Sets

problem set 1, due 10/4/21

Lectures and Readings

Lecture 1 (8/23/21)
covered material: intro to the course, preview of the material
reading: chapter 1

Lecture 2 (8/25/21)
covered material: some concentration inequalities, intro to geometry in high dimensions
reading: chapters 2.1 - 2.3

Lecture 3 (8/27/21)
covered material: properties of the unit ball, sampling from the unit ball
reading: chapters 2.4 - 2.5

Lecture 4 (8/30/21)
covered material: Gaussian annulus theorem, fitting a spherical Gaussian to data
reading: chapters 2.6, 2.9

Lecture 5 (9/1/21)
covered material: random projection theorem, Johnson-Lindenstrauss lemma
reading: chapter 2.7

Lecture 6 (9/3/21)
coveredmaterial: singular value decomposition (SVD), best-fit subspaces, and optimality of greedy algorithm
reading: chapters 3.1 - 3.3

Lecture 7 (9/8/21)
covered material: power iteration, SVD for clustering mixtures of Gaussians
reading: chapter 3.7, begin 3.9
optional reading: chapter 2.8

Lecture 8 (9/10/21)
covered material: centering data, PCA, and other applications of SVD
reading: finish chapter 3.9
optional reading: Genes mirror geography within Europe

Lecture 9 (9/13/21)
covered material: intro to Markov chains, stationary distribution, Fundamental Theorem of Markov Chains
reading: chapter 4 intro, 4.1

Lecture 10 (9/15/21)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hastings
reading: chapter 4.2

Lecture 11 (9/17/21)
covered material: Gibbs sampling, MCMC for efficient sampling, volume estimation of convex bodies
reading: chapter 4.3

Lecture 12 (9/20/21)
covered material: convergence of random walks on undirected graphs, normalized conductance
reading: chapter 4.4.1

Lecture 13 (9/22/21)
covered material: bounding mixing time with normalized conductance
reading: begin chapter 4.4

lecture 14 (9/24/21)
covered material: using probability flows
reading: continue chapter 4.4

Lecture 15 (9/27/21)
covered material: conductance of grids, Markov chain for sampling from Gaussian
reading: finish chapter 4.4

Lecture 16 (9/29/21)
covered material: Markov chains as electrical networks, probabilistic interpretation of voltage
reading: begin chapter 4.5

Lecture 17 (10/1/21)
covered material: probabilistic interpretation of current and of effective resistance / conductance
reading: finish chapter 4.5

Lecture 18 (10/3/21)
covered material: hitting and cover times
reading: chapter 4.6.1, 4.6.3

Lecture 19 (10/5/21) covered material: commute times via effective resistances, random walks in Euclidean space
readingL chapter 4.6.2, 4.7

Lecture 20 (10/7/21), *optional* asynchronous lecture by Ravi Kannan
covered material: details on clustering Gaussians and SVD for MAX-CUT
optional reading: chapters 3.9.3 and 3.9.6 in detail