## MCS 549 - Mathematical Foundations of Data Science University of Illinois at Chicago Fall 2022

This course covers the mathematical foundations of modern data science from a theoretical computer science perspective. Topics will include random graphs, random walks, Markov chains, streaming algorithms, clustering, singular value decomposition, and random projections.

### Basic Information

 Syllabus: pdf
 Time and Location: M-W-F 12:00-12:50PM, 302 Addams Hall (AH)
 Instructor Contact Information: Lev Reyzin, SEO 417
 Online Textbook: Avrim Blum, John Hopcroft, and Ravi Kannan, Mathematical Foundations of Data Science
 Office Hours: T 9:00-9:50AM (online), F 10:00-10:50AM (SEO 417) Piazza site: link

instructions
topics and times

### Problem Sets

problem set 1, due 10/3/22
problem set 2, due 11/2/22
problem set 3, due 12/1/22

Lecture 1 (8/22/22)
covered material: intro to the course, preview of the material

Lecture 2 (8/24/22)
covered material: some concentration inequalities, intro to geometry in high dimensions

Lecture 3 (8/26/22)
covered material: properties of the unit ball, sampling from the unit ball

Lecture 4 (8/29/22)
covered material: Gaussian annulus theorem, separating Gaussians, fitting a spherical Gaussian to data
reading: chapters 2.6, 2.8, and 2.9

Lecture 5 (8/31/22)
covered material: random projection theorem, Johnson-Lindenstrauss lemma

Lecture 6 (9/2/22)
coveredmaterial: singular value decomposition (SVD), best-fit subspaces, and optimality of greedy algorithm

Lecture 7 (9/7/22)
covered material: power iteration, SVD for clustering mixtures of Gaussians
reading: chapter 3.7, begin chapter 3.9

Lecture 8 (9/9/22)
covered material: centering data

Lecture 9 (9/12/22)
covered material: introduction to random graphs, counting triangles

Lecture 10 (9/14/22)
covered material: first and second moment methods for showing phase transitions

Lecture 11 (9/16/22)
covered material: sharp threshold for diameter, Hamiltonian cycles example

Lecture 12 (9/19/22)
covered material: isolated vertices, increasing graph properties

Lecture 13 (9/21/22)
covered material: Molloy-Reed condition for non-uniform degrees

Lecture 14 (9/23/22)
covered material: growth models with and without preferential attachment

Lecture 15 (9/26/22)
covered material: intro to Markov chains, stationary distribution, Fundamental Theorem of Markov Chains

Lecture 16 (9/28/22)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hasting, Gibbs sampling

Lecture 17 (9/30/22)
covered material: MCMC for efficient sampling, volume estimation of convex bodies

Lecture 18 (10/3/22)
covered material: mixing time of random walks on undirected graphs, normalized conductance

Lecture 19 (10/5/22)
covered material: bounding mixing time with normalized conductance

lecture 20 (10/7/22)
covered material: using probability flows

Lecture 21 (10/10/22)
covered material: conductance of various graphs, Markov chains as electrical networks

Lecture 22 (10/12/22)
covered material: probabilistic interpretation of voltage and current

Lecture 23 (10/14/22)
covered material: probabilistic interpretation of current and of effective resistance

Lecture 24 (10/17/22)
covered material: hitting, commute, and cover times

Lecture 25 (10/19/22)
covered material: the Web as a Markov chain

Lecture 26 (10/21/22)
covered material: intro to machine learning, PAC learning and Occam's razor theorem

Lecture 27 (10/24/22)
covered material: VC dimension, agnostic learning

Lecture 28 (10/26/22)
covered material: weak vs. strong learning, boosting, and the AdaBoost algorithm

Lecture 29 (10/28/22)
covered material: support vector machines
optional reading: chapters 5.1 and 5.2

Lecture 30 (10/31/22)
covered material: intro to streaming algorithms, estimating the number of distinct elements in a stream

Lecture 31 (11/2/22)
covered material: pairwise independent hash functions
optional reading: notes from Ronitt Rubenfeld (MIT)

Lecture 32 (11/4/22)
covered material estimating unique occurances, and estimating second frequency moment of a stream

Lecture 33 (11/7/22)
covered material: matrix multiplication using sampling, sketches for resemblence

Lecture 34 (11/9/22)
covered material: introduction to clustering, clustering objectives, k-means for Gaussian MLE

Lecture 35 (11/11/22)
covered material: Lloyd's algorithm, 2-approximation for k-center

Lectures 36 - 38 (11/14/22, 11/16/22, 11/18/22)
covered material: student presentations on research papers (see schedule)

Lecture 39 (11/21/22)
covered material: talk at the Simons Institute by Michael Kearns on "The Ethical Algorithm"