## MCS 549 - Mathematical Foundations of Data Science University of Illinois at Chicago Fall 2023

This course covers the mathematical foundations of modern data science from a theoretical computer science perspective. Topics will include random graphs, random walks, Markov chains, streaming algorithms, clustering, singular value decomposition, and random projections.

### Basic Information

 Syllabus: pdf
 Time and Location: M-W-F 12:00-12:50PM, 304 Burnham Hall (BH)
 Instructor: Lev Reyzin, SEO 417
 Online Textbook: Avrim Blum, John Hopcroft, and Ravi Kannan, Mathematical Foundations of Data Science

instructions
topics and times

### Problem Sets

problem set 1 due 10/6/23
problem set 2 due 11/3/23
problem set 3 due 12/1/23

Lecture 1 (8/21/23)
covered material: intro to the course, preview of the material

Lecture 2 (8/23/23)
covered material: some concentration inequalities, intro to geometry in high dimensions

Lecture 3 (8/25/23)
covered material: properties of the unit ball, sampling from the unit ball

Lecture 4 (8/28/23)
covered material: Gaussian annulus theorem, random projection theorem, Johnson-Lindenstrauss lemma

Lecture 5 (8/30/23)
coveredmaterial: singular value decomposition (SVD), best-fit subspaces, and optimality of greedy algorithm

Lecture 6 (9/1/23)
covered material: power iteration, SVD for clustering mixtures of Gaussian, centering datas

Lecture 7 (9/6/23)
covered material: intro to Markov chains, stationary distribution, Fundamental Theorem of Markov Chains

Lecture 8 (9/8/23)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hasting, Gibbs sampling

Lecture 9 (9/11/23)
covered material: the Web as a Markov chain (online lecture by Leskovec)

Lecture 10 (9/13/23)
covered material: talk at the Simons Institute by Michael Kearns on "The Ethical Algorithm"

Lecture 11 (9/15/23)
covered material: MCMC for efficient sampling, volume estimation of convex bodies

Lecture 12 (9/18/23)
covered material: electrical networks and random walks, probabilistic interpretation of voltage

Lecture 13 (9/20/23)
covered material: probabilistic interpretation of current and effective resistence

Lecture 14 (9/22/23)
covered material: hitting time, mixing time, normalized conductance

Lecture 15 (9/25/23)
covered material: bounding mixing time with normalized conductance

Lecture 16 (9/27/23)
covered material: commute and cover times

Lecture 17 (9/29/23)
covered material: escape probability and effective resistence

Lecture 18 (10/2/23)
covered material: introduction to random graphs, counting triangles

Lecture 19 (10/4/23)
covered material: first and second moment methods for showing phase transitions

Lecture 20 (10/6/23)
covered material: sharp threshold for diameter, Hamiltonian cycles example, isolated vertices

Lecture 21 (10/9/23)
covered material: probability flows (continuation from lecture 14) reading: finish chapter 4.4

Lecture 22 (9/11/23)
covered material: increasing graph properties, replicators

Lecture 23 (10/13/23)
covered material: Molloy-Reed condition, other growth models

Lecture 24 (10/16/23)
covered material: intro to machine learning, PAC learning and Occam's razor theorem

Lecture 25 (10/18/23)
covered material: VC dimension

Lecture 26 (10/20/23)
covered material: Support Vector Machines, Kernel functions

Lecture 27 (10/23/23)
covered material: the Perceptron algorithm

Lecture 28 (10/25/23)
covered material: agnostic learning

Lecture 29 (10/27/23)
covered material: weak learning, boosting the accuracy, AdaBoost

Lecture 30 (10/30/23)
covered material: statistical queries for learning
optional reading: sections 1 - 3 of this survey

Lecture 31 (11/1/23)
covered material: applications of statistical queries
reading: section 4 of this survey

Lecture 29 (11/3/23)
covered material: boosting and margins

Lecture 33 (11/6/23)
covered material: intro to streaming algorithms, estimating the number of distinct elements in a stream

Lecture 34 (11/8/23)
covered material: pairwise independent hash functions
optional reading: notes from Ronitt Rubenfeld (MIT)

Lecture 35 (11/10/23)
covered material majority element, and estimating second frequency moment of a stream

Lecture 36 (11/13/23)
covered material: matrix multiplication using sampling, sketches for resemblence

Lecture 37 (11/15/23)
covered material: introduction to clustering, clustering objectives, k-means for Gaussian MLE

Lecture 38 (11/17/23)
covered material: Lloyd's algorithm, 2-approximation for k-center