## MCS 549 - Mathematical Foundations of Data Science University of Illinois - Chicago Fall 2019

This course covers the mathematical foundations of modern data science from a theoretical computer science perspective. Topics will include random graphs, small world phenomena, random walks, Markov chains, streaming algorithms, clustering, graphical models, singular value decomposition, and random projections.

### Basic Information

 Syllabus: pdf
 Time and Location: M-W-F 1:00PM-1:50PM, 219 Taft Hall (TH)
 Instructor Contact Information: Lev Reyzin, SEO 418, (312)-413-3745,
 Online Textbook: Avrim Blum, John Hopcroft, and Ravi Kannan, Mathematical Foundations of Data Science
 Office Hours: W 10:00-10:50 AM, F 11:00-11:50 AM

instructions
topics and times

### Problem Sets

problem set 1, due 10/4/19
problem set 2, due 11/1/19
problem set 3, due 11/22/19

Lecture 1 (8/26/19)
covered material: intro to the course, preview of the material, some basic probability

Lecture 2 (8/28/19)
covered material: some concentration inequalities, geometry in high dimensions

Lecture 3 (8/30/19)
covered material: Gaussian annulus theorem, random projection theorem, Johnson-Lindenstrauss lemma

Lecture 4 (9/4/19)
covered material: singular value decomposition (SVD), best-fit subspaces, and optimality of greedy algorithm

Lecture 5 (9/6/19)
covered material: principal component analysis (PCA), SVD for clustering mixtures of Gaussians
reading: chapters 2.8, 3.9.2 - 3.9.3
optional reading: chapters 3.9.4 - 3.9.5

Lecture 6 (9/9/19)
covered material: power iteration for fast computation of SVD

Lecture 7 (9/11/19)
covered material: SVD for an additive approximation algorithm for max-cut

Lecture 8 (9/13/19)
covered material: intro to Markov chains, stationary distribution, Fundamental Theorem of Markov Chains
reading: intro to chapter 4, chapter 4.1

Lecture 9 (9/16/19)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hasting, Gibbs Sampling
reading: chapter 4.2 (including 4.2.1 - 4.2.2)

Lecture 10 (9/18/19)
covered material: MCMC for efficient sampling and volume estimation of convex bodies in high dimension

Lecture 11 (9/20/19)
covered material: convergence of random walks on undirected graphs, normalized conductance

Lecture 12 (9/23/19)
covered material: bounding mixing time with normalized conductance, probability flows

Lecture 13 (9/25/19)
covered material: analyzing random walks via electrical networks, probabilistic interpretation of voltage

Lecture 14 (9/27/19)
covered material: probabilistic interpretation of current and of effective resistance / conductance

Lecture 15 (9/30/19)
covered material: Gibbs measures and Glauber dynamics (guest lecture by Will Perkins)
optional reading: chapter 3 of Friedli and Vilenik

Lecture 16 (10/2/19)
covered material: hitting and commute times via effective resistence, cover time

Lecture 17 (10/4/19)
covered material: random walks on Euclidean space, the Web as a Markov chain

Lecture 18 (10/7/19)
covered material: introduction to random graphs, counting numbers of triangles

Lecture 19 (10/9/19)
covered material: talk by Vishesh Jain on invertibility of random matrices at 4:15pm in SEO 612

Lecture 20 (10/11/19)
covered material: first and second moment methods for showing phase transitions

Lecture 21 (10/14/19)
covered material: graph diameter 2 and sharp thresholds

Lecture 22 (10/16/19)
covered material: phase transitions for any increasing graph property, Molloy-Reed condition for non-uniform models

Lecture 23 (10/18/19)
covered material: growth models with and without preferential attachment

Lecture 24 (10/21/19)
covered material: intro to streaming algorithms, estimating number of distinct elements in stream

Lecture 25 (10/23/19)
covered material: limited independence, counting occurances, and estimating second frequency moment of a stream

Lecture 26 (10/25/19)
covered material: speeding up matrix multiplication by sampling, sketching to compute resemblance

Lecture 27 (10/28/19)
covered material: introduction to clustering, clustering objectives

Lecture 28 (10/30/19)
covered material: dynamic program for 1 dimension, Lloyd's and Ward's algorithms, 2-approximation for k-center

Lecture 29 (11/1/19)
covered material: spectral clustering
reading: chapters 7.4 and 7.5.1 - 7.5.1

Lecture 30 (11/4/19)
covereed material: single linkage and maximizing minimum separation in polynomial time

Lecture 31 (11/6/19)
covered material: introduction topic models, non-negative matrix factorization (NMF)

Lecture 32 (11/8/19)
covered material: idealized topic model, NMF with anchor words, brief overview of LDA

Lecture 33 (11/11/19)
covered material: PAC learning and sample complexity

Lecture 34 (11/13/19)
covered material: uniform convergence and Occam's Razor

Lecture 35 (11/15/19)
covered material: infinite classes and VC-dimension

Lecture 36 (11/18/19)
covered material: linear separators and support-vector machines