MCS 549 - Mathematical Foundations of Data Science
University of Illinois at Chicago
Fall 2021

This course covers the mathematical foundations of modern data science from a theoretical computer science perspective. Topics will include random graphs, small world phenomena, random walks, Markov chains, streaming algorithms, clustering, graphical models, singular value decomposition, and random projections.

Basic Information

Syllabus: pdf
Time and Location: M-W-F 12:00PM-12:50PM, 215 Lincoln Hall and online
Instructor Contact Information: Lev Reyzin, SEO 417
Online Textbook: Avrim Blum, John Hopcroft, and Ravi Kannan, Mathematical Foundations of Data Science
Office Hours: M 10:00AM-10:50AM in SEO 417, R 8:30AM-9:20AM online
Piazza site: link


topics and times

Problem Sets

problem set 1, due 10/4/21
problem set 2, due 11/1/21
problem set 3, due 12/1/21

Lectures and Readings

Lecture 1 (8/23/21)
covered material: intro to the course, preview of the material
reading: chapter 1

Lecture 2 (8/25/21)
covered material: some concentration inequalities, intro to geometry in high dimensions
reading: chapters 2.1 - 2.3

Lecture 3 (8/27/21)
covered material: properties of the unit ball, sampling from the unit ball
reading: chapters 2.4 - 2.5

Lecture 4 (8/30/21)
covered material: Gaussian annulus theorem, fitting a spherical Gaussian to data
reading: chapters 2.6, 2.9

Lecture 5 (9/1/21)
covered material: random projection theorem, Johnson-Lindenstrauss lemma
reading: chapter 2.7

Lecture 6 (9/3/21)
coveredmaterial: singular value decomposition (SVD), best-fit subspaces, and optimality of greedy algorithm
reading: chapters 3.1 - 3.3

Lecture 7 (9/8/21)
covered material: power iteration, SVD for clustering mixtures of Gaussians
reading: chapter 3.7, begin 3.9
optional reading: chapter 2.8

Lecture 8 (9/10/21)
covered material: centering data, PCA, and other applications of SVD
reading: finish chapter 3.9
optional reading: Genes mirror geography within Europe

Lecture 9 (9/13/21)
covered material: intro to Markov chains, stationary distribution, Fundamental Theorem of Markov Chains
reading: chapter 4 intro, 4.1

Lecture 10 (9/15/21)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hastings
reading: chapter 4.2

Lecture 11 (9/17/21)
covered material: Gibbs sampling, MCMC for efficient sampling, volume estimation of convex bodies
reading: chapter 4.3

Lecture 12 (9/20/21)
covered material: convergence of random walks on undirected graphs, normalized conductance
reading: chapter 4.4.1

Lecture 13 (9/22/21)
covered material: bounding mixing time with normalized conductance
reading: begin chapter 4.4

lecture 14 (9/24/21)
covered material: using probability flows
reading: continue chapter 4.4

Lecture 15 (9/27/21)
covered material: conductance of grids, Markov chain for sampling from Gaussian
reading: finish chapter 4.4

Lecture 16 (9/29/21)
covered material: Markov chains as electrical networks, probabilistic interpretation of voltage
reading: begin chapter 4.5

Lecture 17 (10/1/21)
covered material: probabilistic interpretation of current and of effective resistance / conductance
reading: finish chapter 4.5

Lecture 18 (10/4/21)
covered material: hitting and cover times
reading: chapter 4.6.1, 4.6.3

Lecture 19 (10/6/21)
covered material: commute times via effective resistances, random walks in Euclidean space
reading: chapter 4.6.2, 4.7

Lecture 20 (10/8/21), *optional* asynchronous lecture by Ravi Kannan
covered material: details on clustering Gaussians and SVD for MAX-CUT
optional reading: chapters 3.9.3 and 3.9.6 in detail

Lecture 21 (10/11/21), asynchronous lecture by Jure Leskovec
covered material: the Web as a Markov chain
reading: chapter 4.8

Lecture 22 (10/13/21)
covered material: introduction to random graphs, counting triangles
reading: chapter 8.1

Lecture 23 (10/15/21)
covered material: first and second moment methods for showing phase transitions
reading: begin chapter 8.2

Lecture 24 (10/18/21)
covered material: isolated vertices and sharp thresholds
reading: continue chapter 8.2

Lecture 25 (10/20/21)
covered material:sharp threshold for diameter 2
reading: finish chapter 8.2

Lecture 26 (10/22/21)
covered material: phase transitions for increasing graph properties, Molloy-Reed condition for non-uniform degrees
reading: chapters 8.5, 8.8

Lecture 27 (10/25/21)
covered material: growth models with and without preferential attachment
reading: chapter 8.9

Lecture 28 (10/27/21)
covered material: intro to streaming algorithms, lower bounds on estimating number of distinct elements in stream
reading chapter 6.1

Lecture 29 (10/29/21)
covered material: pairwise independent hash functions
optional reading: notes from Ronitt Rubenfeld (MIT)

Lecture 30 (11/1/21)
covered material: pairwise independent random variables, limited independence
optional reading: notes from Alistair Sinclair (Berkeley)

Lecture 31 (11/3/21)
covered material estimating unique occurances, and estimating second frequency moment of a stream
reading: chapter 6.2

Lecture 32 (11/5/21)
covered material: first frequency moment with even fewer bits, large frequenfy moments, sketches for resemblence
reading: chapter 6.4

Lecture 33 (11/8/21)
covered material: introduction to clustering, clustering objectives
reading: chapter 7.1

Lecture 34 (11/10/21)
covered material: dynamic program for 1 dimension, Lloyd's and Ward's algorithms, 2-approximation for k-center
reading: chapters 7.2 and 7.3

Lecture 35 (11/12/21)
covered material: spectral and heirarchical clustering
reading: chapters 7.4 and 7.5.1

Lecture 36 (11/15/21)
covereed material: impossibility theorems for clustering and social choice
reading: chapter 10.1

Lecture 37 (11/17/21)
covered material: intro to machine learning, PAC learning and sample complexity
reading: chapter 5.1

Lecture 39 (11/22/21)
covered material: uniform convergence, Occam's Razor, VC-dimension
reading: chapters 5.4 and 5.5

Lecture 40 (11/24/21)
covered material: weak vs. strong learning, boosting, and the AdaBoost algorithm
reading: chapter 5.11

Lectures 38, 41 - 43 (11/19/21, 11/29/21, 12/1/21, 12/3/21)
covered material: student presentations on research papers (see schedule)