MCS 549 - Mathematical Foundations of Data Science
University of Illinois at Chicago
Fall 2022


This course covers the mathematical foundations of modern data science from a theoretical computer science perspective. Topics will include random graphs, random walks, Markov chains, streaming algorithms, clustering, singular value decomposition, and random projections.

Basic Information

Syllabus: pdf
Time and Location: M-W-F 12:00-12:50PM, 302 Addams Hall (AH)
Instructor Contact Information: Lev Reyzin, SEO 417
Online Textbook: Avrim Blum, John Hopcroft, and Ravi Kannan, Mathematical Foundations of Data Science
Office Hours: T 9:00-9:50AM (online), F 10:00-10:50AM (SEO 417)
Piazza site: link

Presentations

instructions
topics and times

Problem Sets

problem set 1, due 10/3/22
problem set 2, due 11/2/22
problem set 3, due 12/1/22

Lectures and Readings

Lecture 1 (8/22/22)
covered material: intro to the course, preview of the material
reading: chapter 1

Lecture 2 (8/24/22)
covered material: some concentration inequalities, intro to geometry in high dimensions
reading: chapters 2.1 - 2.3

Lecture 3 (8/26/22)
covered material: properties of the unit ball, sampling from the unit ball
reading: chapters 2.4 - 2.5

Lecture 4 (8/29/22)
covered material: Gaussian annulus theorem, separating Gaussians, fitting a spherical Gaussian to data
reading: chapters 2.6, 2.8, and 2.9

Lecture 5 (8/31/22)
covered material: random projection theorem, Johnson-Lindenstrauss lemma
reading: chapter 2.7

Lecture 6 (9/2/22)
coveredmaterial: singular value decomposition (SVD), best-fit subspaces, and optimality of greedy algorithm
reading: chapters 3.1 - 3.4

Lecture 7 (9/7/22)
covered material: power iteration, SVD for clustering mixtures of Gaussians
reading: chapter 3.7, begin chapter 3.9

Lecture 8 (9/9/22)
covered material: centering data
reading: finish chapter 3.9

Lecture 9 (9/12/22)
covered material: introduction to random graphs, counting triangles
reading: chapter 8.1

Lecture 10 (9/14/22)
covered material: first and second moment methods for showing phase transitions
reading: begin chapter 8.2

Lecture 11 (9/16/22)
covered material: sharp threshold for diameter, Hamiltonian cycles example
reading: finish chapter 8.2

Lecture 12 (9/19/22)
covered material: isolated vertices, increasing graph properties
reading: chapter 8.5

Lecture 13 (9/21/22)
covered material: Molloy-Reed condition for non-uniform degrees
reading: chapter 8.8

Lecture 14 (9/23/22)
covered material: growth models with and without preferential attachment
reading: chapter 8.9

Lecture 15 (9/26/22)
covered material: intro to Markov chains, stationary distribution, Fundamental Theorem of Markov Chains
reading: chapter 4 intro, 4.1

Lecture 16 (9/28/22)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hasting, Gibbs sampling
reading: chapter 4.2

Lecture 17 (9/30/22)
covered material: MCMC for efficient sampling, volume estimation of convex bodies
reading: chapter 4.3

Lecture 18 (10/3/22)
covered material: mixing time of random walks on undirected graphs, normalized conductance
reading: chapter 4.4.1

Lecture 19 (10/5/22)
covered material: bounding mixing time with normalized conductance
reading: begin chapter 4.4

lecture 20 (10/7/22)
covered material: using probability flows
reading: finish chapter 4.4

Lecture 21 (10/10/22)
covered material: conductance of various graphs, Markov chains as electrical networks
reading: begin chapter 4.5

Lecture 22 (10/12/22)
covered material: probabilistic interpretation of voltage and current
reading: finish chapter 4.5

Lecture 23 (10/14/22)
covered material: probabilistic interpretation of current and of effective resistance
reading: chapter 4.7

Lecture 24 (10/17/22)
covered material: hitting, commute, and cover times
reading: chapter 4.6

Lecture 25 (10/19/22)
covered material: the Web as a Markov chain
reading: chapter 4.8

Lecture 26 (10/21/22)
covered material: intro to machine learning, PAC learning and Occam's razor theorem
reading: chapters 5.1 and 5.4

Lecture 27 (10/24/22)
covered material: VC dimension, agnostic learning
reading: chapters 5.5 and 5.6

Lecture 28 (10/26/22)
covered material: weak vs. strong learning, boosting, and the AdaBoost algorithm
reading: chapter 5.11

Lecture 29 (10/28/22)
covered material: support vector machines
optional reading: chapters 5.1 and 5.2

Lecture 30 (10/31/22)
covered material: intro to streaming algorithms, estimating the number of distinct elements in a stream
reading: chapter 6.1

Lecture 31 (11/2/22)
covered material: pairwise independent hash functions
optional reading: notes from Ronitt Rubenfeld (MIT)

Lecture 32 (11/4/22)
covered material estimating unique occurances, and estimating second frequency moment of a stream
reading: chapter 6.2

Lecture 33 (11/7/22)
covered material: matrix multiplication using sampling, sketches for resemblence
reading: chapters 6.3.1, 6.4

Lecture 34 (11/9/22)
covered material: introduction to clustering, clustering objectives, k-means for Gaussian MLE
reading: chapter 7.1

Lecture 35 (11/11/22)
covered material: Lloyd's algorithm, 2-approximation for k-center
reading: chapters 7.2 and 7.3

Lectures 36 - 38 (11/14/22, 11/16/22, 11/18/22)
covered material: student presentations on research papers (see schedule)

Lecture 39 (11/21/22)
covered material: talk at the Simons Institute by Michael Kearns on "The Ethical Algorithm"
optional reading: this article by Kearns and Roth (UPenn)

Lecture 40 (11/23/22)
covered material: talk on differential privacy, adaptive data analysis, and free speedups via sampling
optional reference: this article by Fish, Reyzin, and Rubinstein

Lectures 41 - 43 (11/28/22, 11/30/22, 12/2/22)
covered material: student presentations on research papers (see schedule)