MCS 549 - Mathematical Foundations of Data Science
University of Illinois at Chicago
Fall 2023

This course covers the mathematical foundations of modern data science from a theoretical computer science perspective. Topics will include random graphs, random walks, Markov chains, streaming algorithms, clustering, singular value decomposition, and random projections.

Basic Information

Syllabus: pdf
Time and Location: M-W-F 12:00-12:50PM, 304 Burnham Hall (BH)
Instructor: Lev Reyzin, SEO 417
Online Textbook: Avrim Blum, John Hopcroft, and Ravi Kannan, Mathematical Foundations of Data Science
Office Hours: T 11:00-11:50am (online), F 11:00-11:50am (in-person)
Piazza site: please sign up via this link


topics and times

Problem Sets

problem set 1 due 10/6/23
problem set 2 due 11/3/23
problem set 3 due 12/1/23

Lectures and Readings

Lecture 1 (8/21/23)
covered material: intro to the course, preview of the material
reading: chapter 1

Lecture 2 (8/23/23)
covered material: some concentration inequalities, intro to geometry in high dimensions
reading: chapters 2.1 - 2.3

Lecture 3 (8/25/23)
covered material: properties of the unit ball, sampling from the unit ball
reading: chapters 2.4 - 2.5

Lecture 4 (8/28/23)
covered material: Gaussian annulus theorem, random projection theorem, Johnson-Lindenstrauss lemma
reading: chapters 2.6 - 2.9

Lecture 5 (8/30/23)
coveredmaterial: singular value decomposition (SVD), best-fit subspaces, and optimality of greedy algorithm
reading: chapters 3.1 - 3.4

Lecture 6 (9/1/23)
covered material: power iteration, SVD for clustering mixtures of Gaussian, centering datas
reading: chapter 3.7, 3.9

Lecture 7 (9/6/23)
covered material: intro to Markov chains, stationary distribution, Fundamental Theorem of Markov Chains
reading: chapter 4 intro, 4.1

Lecture 8 (9/8/23)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hasting, Gibbs sampling
reading: chapter 4.2

Lecture 9 (9/11/23)
covered material: the Web as a Markov chain (online lecture by Leskovec)
reading: chapter 4.8

Lecture 10 (9/13/23)
covered material: talk at the Simons Institute by Michael Kearns on "The Ethical Algorithm"
optional reading: this article by Kearns and Roth (UPenn)

Lecture 11 (9/15/23)
covered material: MCMC for efficient sampling, volume estimation of convex bodies
reading: chapter 4.3

Lecture 12 (9/18/23)
covered material: electrical networks and random walks, probabilistic interpretation of voltage
reading: begin chapter 4.5

Lecture 13 (9/20/23)
covered material: probabilistic interpretation of current and effective resistence
reading: finish chapter 4.5

Lecture 14 (9/22/23)
covered material: hitting time, mixing time, normalized conductance
reading: chapter 4.4.1, begin 4.6.1

Lecture 15 (9/25/23)
covered material: bounding mixing time with normalized conductance
reading: begin chapter 4.4

Lecture 16 (9/27/23)
covered material: commute and cover times
reading: chapters 4.6.2 and 4.6.3

Lecture 17 (9/29/23)
covered material: escape probability and effective resistence
reading: finish chapter 4.5

Lecture 18 (10/2/23)
covered material: introduction to random graphs, counting triangles
reading: chapter 8.1

Lecture 19 (10/4/23)
covered material: first and second moment methods for showing phase transitions
reading: begin chapter 8.2

Lecture 20 (10/6/23)
covered material: sharp threshold for diameter, Hamiltonian cycles example, isolated vertices
reading: finish chapter 8.2

Lecture 21 (10/9/23)
covered material: probability flows (continuation from lecture 14) reading: finish chapter 4.4

Lecture 22 (9/11/23)
covered material: increasing graph properties, replicators
reading: chapter 8.5

Lecture 23 (10/13/23)
covered material: Molloy-Reed condition, other growth models
reading: chapters 8.8 and 8.9

Lecture 24 (10/16/23)
covered material: intro to machine learning, PAC learning and Occam's razor theorem
reading: chapter 5.1

Lecture 25 (10/18/23)
covered material: VC dimension
reading: chapter 5.5

Lecture 26 (10/20/23)
covered material: Support Vector Machines, Kernel functions
reading: chapter 5.3

Lecture 27 (10/23/23)
covered material: the Perceptron algorithm
reading: chapters 5.2 and 5.10

Lecture 28 (10/25/23)
covered material: agnostic learning
reading: chapter 5.4

Lecture 29 (10/27/23)
covered material: weak learning, boosting the accuracy, AdaBoost
reading: chapter 5.11

Lecture 30 (10/30/23)
covered material: statistical queries for learning
optional reading: sections 1 - 3 of this survey

Lecture 31 (11/1/23)
covered material: applications of statistical queries
reading: section 4 of this survey

Lecture 29 (11/3/23)
covered material: boosting and margins
optional reading: see this paper

Lecture 33 (11/6/23)
covered material: intro to streaming algorithms, estimating the number of distinct elements in a stream
reading: chapter 6.1

Lecture 34 (11/8/23)
covered material: pairwise independent hash functions
optional reading: notes from Ronitt Rubenfeld (MIT)

Lecture 35 (11/10/23)
covered material majority element, and estimating second frequency moment of a stream
reading: chapter 6.2

Lecture 36 (11/13/23)
covered material: matrix multiplication using sampling, sketches for resemblence
reading: chapters 6.3.1 and 6.4

Lecture 37 (11/15/23)
covered material: introduction to clustering, clustering objectives, k-means for Gaussian MLE
reading: chapter 7.1

Lecture 38 (11/17/23)
covered material: Lloyd's algorithm, 2-approximation for k-center
reading: chapters 7.2 and 7.3

Lecture 39 (11/20/23)
covered material: spectral clustering, kernel methods, impossibility theorem
reading: chapters 7.5.1 and 7.8

Lecture 40 (11/22/23)
covered material: neural networks and large language models
reading: chapter 5.8

Lectures 41 - 43 (11/27/23, 11/29/23, 12/1/23)
covered material: student presentations on research papers (see schedule)