MCS 549 - Mathematical Foundations of Data Science
University of Illinois - Chicago
Fall 2019


This course covers the mathematical foundations of modern data science from a theoretical computer science perspective. Topics will include random graphs, small world phenomena, random walks, Markov chains, streaming algorithms, clustering, graphical models, singular value decomposition, and random projections.

Basic Information

Syllabus: pdf
Time and Location: M-W-F 1:00PM-1:50PM, 219 Taft Hall (TH)
Instructor Contact Information: Lev Reyzin, SEO 418, (312)-413-3745,
Online Textbook: Avrim Blum, John Hopcroft, and Ravi Kannan, Mathematical Foundations of Data Science
Office Hours: W 10:00-10:50 AM, F 11:00-11:50 AM

Presentations

instructions
topics and times

Problem Sets

problem set 1, due 10/4/19
problem set 2, due 11/1/19
problem set 3, due 11/22/19

Lectures and Readings

Lecture 1 (8/26/19)
covered material: intro to the course, preview of the material, some basic probability
reading: chapters 1, 2.1, 2.2

Lecture 2 (8/28/19)
covered material: some concentration inequalities, geometry in high dimensions
reading: chapters 2.3 - 2.5

Lecture 3 (8/30/19)
covered material: Gaussian annulus theorem, random projection theorem, Johnson-Lindenstrauss lemma
reading: chapters 2.6 - 2.7

Lecture 4 (9/4/19)
covered material: singular value decomposition (SVD), best-fit subspaces, and optimality of greedy algorithm
reading: chapters 3.1 - 3.6

Lecture 5 (9/6/19)
covered material: principal component analysis (PCA), SVD for clustering mixtures of Gaussians
reading: chapters 2.8, 3.9.2 - 3.9.3
optional reading: chapters 3.9.4 - 3.9.5

Lecture 6 (9/9/19)
covered material: power iteration for fast computation of SVD
reading: chapter 3.7 (including 3.7.1)

Lecture 7 (9/11/19)
covered material: SVD for an additive approximation algorithm for max-cut
reading: chapter 3.9.6

Lecture 8 (9/13/19)
covered material: intro to Markov chains, stationary distribution, Fundamental Theorem of Markov Chains
reading: intro to chapter 4, chapter 4.1

Lecture 9 (9/16/19)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hasting, Gibbs Sampling
reading: chapter 4.2 (including 4.2.1 - 4.2.2)

Lecture 10 (9/18/19)
covered material: MCMC for efficient sampling and volume estimation of convex bodies in high dimension
reading: chapter 4.3

Lecture 11 (9/20/19)
covered material: convergence of random walks on undirected graphs, normalized conductance
reading: begin chapter 4.4

Lecture 12 (9/23/19)
covered material: bounding mixing time with normalized conductance, probability flows
reading: finish chapter 4.4

Lecture 13 (9/25/19)
covered material: analyzing random walks via electrical networks, probabilistic interpretation of voltage
reading: begin chapter 4.5

Lecture 14 (9/27/19)
covered material: probabilistic interpretation of current and of effective resistance / conductance
reading: finish chapter 4.5

Lecture 15 (9/30/19)
covered material: Gibbs measures and Glauber dynamics (guest lecture by Will Perkins)
optional reading: chapter 3 of Friedli and Vilenik

Lecture 16 (10/2/19)
covered material: hitting and commute times via effective resistence, cover time
reading: chapter 4.6

Lecture 17 (10/4/19)
covered material: random walks on Euclidean space, the Web as a Markov chain
reading: chapters 4.7 and 4.8

Lecture 18 (10/7/19)
covered material: introduction to random graphs, counting numbers of triangles
reading: chapter 8.1

Lecture 19 (10/9/19)
covered material: talk by Vishesh Jain on invertibility of random matrices at 4:15pm in SEO 612

Lecture 20 (10/11/19)
covered material: first and second moment methods for showing phase transitions
reading: begin chapter 8.2

Lecture 21 (10/14/19)
covered material: graph diameter 2 and sharp thresholds
reading: finish chapter 8.2

Lecture 22 (10/16/19)
covered material: phase transitions for any increasing graph property, Molloy-Reed condition for non-uniform models
reading: chapters 8.5 and 8.8

Lecture 23 (10/18/19)
covered material: growth models with and without preferential attachment
reading: chapter 8.9

Lecture 24 (10/21/19)
covered material: intro to streaming algorithms, estimating number of distinct elements in stream
reading chapter 6.1

Lecture 25 (10/23/19)
covered material: limited independence, counting occurances, and estimating second frequency moment of a stream
reading: chapter 6.2

Lecture 26 (10/25/19)
covered material: speeding up matrix multiplication by sampling, sketching to compute resemblance
reading: chapters 6.3.1 and 6.4

Lecture 27 (10/28/19)
covered material: introduction to clustering, clustering objectives
reading: chapters 7.1

Lecture 28 (10/30/19)
covered material: dynamic program for 1 dimension, Lloyd's and Ward's algorithms, 2-approximation for k-center
reading: chapters 7.2 and 7.3

Lecture 29 (11/1/19)
covered material: spectral clustering
reading: chapters 7.4 and 7.5.1 - 7.5.1

Lecture 30 (11/4/19)
covereed material: single linkage and maximizing minimum separation in polynomial time
reading: chapter 7.7.1

Lecture 31 (11/6/19)
covered material: introduction topic models, non-negative matrix factorization (NMF)
reading: chapters 9.1 and 9.3

Lecture 32 (11/8/19)
covered material: idealized topic model, NMF with anchor words, brief overview of LDA
reading: chapters 9.2 and 9.4
optional reading: chapter 9.5

Lecture 33 (11/11/19)
covered material: PAC learning and sample complexity
reading: chapters 5.1

Lecture 34 (11/13/19)
covered material: uniform convergence and Occam's Razor
reading: chapter 5.4

Lecture 35 (11/15/19)
covered material: infinite classes and VC-dimension
reading: chapter 5.5

Lecture 36 (11/18/19)
covered material: linear separators and support-vector machines
reading: chapter 5.2 and 5.3

Lecture 37 (11/20/19)
covered material: weak vs. strong learning, boosting, and the AdaBoost algorithm
reading: chapter 5.11

Lecture 38 (11/22/19)
covered material: margins theory of boosting
optional reading: my slides

Lectures 39 - 43 (11/25/19, 11/27/19, 12/2/19, 12/4/19, and 12/6/19)
covered material: student presentations on research papers (see schedule)