## MCS 590 - Mathematical Foundations of Data Science University of Illinois - Chicago Fall 2017

MCS 590 is a course covering special topics in computer science. This semester, the topic is "foundations of data science." The course will cover topics such as: random graphs, small world phenomena, random walks, Markov chains, streaming algorithms, clustering, graphical models, and belief propogation. Techniques such as SVD and random projections will also be discussed.

### Basic Information

 Syllabus: pdf
 Time and Location: M-W-F 11:00AM-11:50AM, 212 Stevenson Hall (SH)
 Instructor Contact Information: Lev Reyzin, SEO 418, (312)-413-3745,
 Online Textbook: Avrim Blum, John Hopcroft, and Ravi Kannan Mathematical Foundations of Data Science
 Office Hours: T 2:30-3:20pm, F 1:00-1:50pm

instructions
topics and times

### Problem Sets

problem set 1, due 9/29/17
problem set 2, due 11/3/17
problem set 3, due 12/8/17

Lecture 1 (8/28/17)
covered material: intro to the course, preview of the material, law of large numbers

Lecture 2 (8/30/17)
covered material: some concentration inequalities, geometry in high dimensions

Lecture 3 (9/1/17)
covered material: Gaussian annulus theorem, random projection theorem, Johnson-Lindenstrauss lemma

Lecture 4 (9/6/17)
covered material: singular value decomposition and PCA, best-fit subspaces, and optimality of greedy algorithm

Lecture 5 (9/8/17)
covered material: singular vectors vs eigenvectors, power iteration, SVD for clustering mixtures of Gaussians

Lecture 6 (9/11/17)
covered material: random walks and Markov chains, fundamental theorem of Markov chains
reading: intro to chapter 4, 4.1

Lecture 7 (9/13/17)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hastings

Lecture 8 (9/15/17)
covered material: Gibbs sampling, efficient volume estimation for convex bodies in high dimension

Lecture 9 (9/18/17)
covered material: convergence of random walks on undirected graphs, normalized conductance
other: problem set 1 assigned

Lecture 10 (9/20/17)
covered material: begin proof of conductance theorem for convergence of random walks

Lecture 11 (9/22/17)
covered material: finish proof of conductance theorem, electrical networks, harmonic functions

Lecture 12 (9/25/17)
covered material: probabilistic interpretations of voltage, current, and effective resistance

Lecture 13 (9/27/17)
covered material: hitting and commute times via voltages and currents

Lecture 14 (9/29/17)
covered material: cover times, random walks on Euclidean space, the Web as a Markov chain

Lecture 15 (10/2/17)
covered material: PAC learning and sample complexity

Lecture 16 (10/4/17)
covered material: uniform convergence and Occam's Razor

Lecture 17 (10/6/17)
covered material: infinite classes and VC-dimension

Lecture 18 (10/9/17)
covered material: linear separators and support-vector machines

Lecture 19 (10/11/17)
covered material: weak vs. strong learning, boosting, and the AdaBoost algorithm

Lecture 20 (10/13/17) - guest lecture by Ben Fish
covered material: SQ learning, SQ dimension and correlational SQs, lower bounds for SQ learning
reading: intro to chapter 11 of Dwork and Roth (2014)

Lecture 21 (10/16/17) - guest lecture by Ben Fish
covered material: Differential privacy, standard mechanisms and accuracy, post-processing
reading: chapters 3.1 to 3.4 of Dwork and Roth (2014)

Lecture 22 (10/18/17) - guest lecture by Ben Fish
covered material: Composition in differential privacy and counting queries, differentially private learning
reading: chapters 3.5 and 11.1 of Dwork and Roth (2014)

Lecture 23 (10/20/17)
covered material: online learning, weighted majority and perceptron
other: problem set 2 assigned

Lecture 24 (10/23/17)
covered material: intro to streaming algorithms, estimating number of distinct elements in stream, universal hash functions
reading chapter 6.1, begin chapter 6.2 (6.2.1, 6.2.2)

Lecture 25 (10/25/17)
covered material: estimating the second moment of a stream, discussion of limited independence
reading: finish chapter 6.2 (6.2.3 and 6.2.4)

Lecture 26 (10/27/17)
covered material: introduction to random graphs, counting numbers of triangles

Lecture 27 (10/30/17)
covered material: first and second moment methods, applications to random graphs

Lecture 28 (11/1/17)
covered material: the planted clique problem
reading: first half of my slides

Lecture 29 (11/3/17)
covered material: statistical algorithms and lower bounds for planted cliques
reading: second half of my slides

Lecture 30 (11/6/17)
covered material: growth models with and without preferential attachment

Lecture 31 (11/8/17)
covered material: introduction to clustering, clustering objectives, 2-approximation for k-center

Lecture 32 (11/10/17)
covered material: k-means clustering maximum likelihood, dynamic program for 1 dimension, Lloyd's algorithm

Lecture 33 (11/13/17)
covered material: heirarchical clustering and Kernel methods

Lecture 34 (11/15/17)
covered material: stability assumptions for clustering