MCS 590 - Mathematical Foundations of Data Science
University of Illinois - Chicago
Fall 2017


MCS 590 is a course covering special topics in computer science. This semester, the topic is "foundations of data science." The course will cover topics such as: random graphs, small world phenomena, random walks, Markov chains, streaming algorithms, clustering, graphical models, and belief propogation. Techniques such as SVD and random projections will also be discussed.

Basic Information

Syllabus: pdf
Time and Location: M-W-F 11:00AM-11:50AM, 212 Stevenson Hall (SH)
Instructor Contact Information: Lev Reyzin, SEO 418, (312)-413-3745,
Online Textbook: Avrim Blum, John Hopcroft, and Ravi Kannan Mathematical Foundations of Data Science
Office Hours: T 2:30-3:20pm, F 1:00-1:50pm

Presentations

instructions
topics and times

Problem Sets

problem set 1, due 9/29/17
problem set 2, due 11/3/17
problem set 3, due 12/8/17

Lectures and Readings

Lecture 1 (8/28/17)
covered material: intro to the course, preview of the material, law of large numbers
reading: chapters 1, 2.1, 2.2

Lecture 2 (8/30/17)
covered material: some concentration inequalities, geometry in high dimensions
reading: chapters 2.3 - 2.5

Lecture 3 (9/1/17)
covered material: Gaussian annulus theorem, random projection theorem, Johnson-Lindenstrauss lemma
reading: chapters 2.6 - 2.7

Lecture 4 (9/6/17)
covered material: singular value decomposition and PCA, best-fit subspaces, and optimality of greedy algorithm
reading: chapters 3.1 - 3.6

Lecture 5 (9/8/17)
covered material: singular vectors vs eigenvectors, power iteration, SVD for clustering mixtures of Gaussians
reading: chapters 3.7 - 3.9

Lecture 6 (9/11/17)
covered material: random walks and Markov chains, fundamental theorem of Markov chains
reading: intro to chapter 4, 4.1

Lecture 7 (9/13/17)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hastings
reading: chapter 4.2

Lecture 8 (9/15/17)
covered material: Gibbs sampling, efficient volume estimation for convex bodies in high dimension
reading: chapter 4.3

Lecture 9 (9/18/17)
covered material: convergence of random walks on undirected graphs, normalized conductance
reading: begin chapter 4.4
other: problem set 1 assigned

Lecture 10 (9/20/17)
covered material: begin proof of conductance theorem for convergence of random walks
reading: finish chapter 4.4

Lecture 11 (9/22/17)
covered material: finish proof of conductance theorem, electrical networks, harmonic functions
reading: begin chapter 4.5

Lecture 12 (9/25/17)
covered material: probabilistic interpretations of voltage, current, and effective resistance
reading: finish chapter 4.5

Lecture 13 (9/27/17)
covered material: hitting and commute times via voltages and currents
reading: chapter 4.6

Lecture 14 (9/29/17)
covered material: cover times, random walks on Euclidean space, the Web as a Markov chain
reading: chapters 4.7 and 4.8

Lecture 15 (10/2/17)
covered material: PAC learning and sample complexity
reading: chapters 5.1 and 5.2

Lecture 16 (10/4/17)
covered material: uniform convergence and Occam's Razor
reading: chapter 5.3

Lecture 17 (10/6/17)
covered material: infinite classes and VC-dimension
reading: chapter 5.9

Lecture 18 (10/9/17)
covered material: linear separators and support-vector machines
reading: chapter 5.8

Lecture 19 (10/11/17)
covered material: weak vs. strong learning, boosting, and the AdaBoost algorithm
reading: chapter 5.10

Lecture 20 (10/13/17) - guest lecture by Ben Fish
covered material: SQ learning, SQ dimension and correlational SQs, lower bounds for SQ learning
reading: intro to chapter 11 of Dwork and Roth (2014)

Lecture 21 (10/16/17) - guest lecture by Ben Fish
covered material: Differential privacy, standard mechanisms and accuracy, post-processing
reading: chapters 3.1 to 3.4 of Dwork and Roth (2014)

Lecture 22 (10/18/17) - guest lecture by Ben Fish
covered material: Composition in differential privacy and counting queries, differentially private learning
reading: chapters 3.5 and 11.1 of Dwork and Roth (2014)

Lecture 23 (10/20/17)
covered material: online learning, weighted majority and perceptron
reading: chapter 5.5
other: problem set 2 assigned

Lecture 24 (10/23/17)
covered material: intro to streaming algorithms, estimating number of distinct elements in stream, universal hash functions
reading chapter 6.1, begin chapter 6.2 (6.2.1, 6.2.2)

Lecture 25 (10/25/17)
covered material: estimating the second moment of a stream, discussion of limited independence
reading: finish chapter 6.2 (6.2.3 and 6.2.4)

Lecture 26 (10/27/17)
covered material: introduction to random graphs, counting numbers of triangles
reading: chapter 8.1

Lecture 27 (10/30/17)
covered material: first and second moment methods, applications to random graphs
reading: beginning of chapter 8.2

Lecture 28 (11/1/17)
covered material: the planted clique problem
reading: first half of my slides

Lecture 29 (11/3/17)
covered material: statistical algorithms and lower bounds for planted cliques
reading: second half of my slides

Lecture 30 (11/6/17)
covered material: growth models with and without preferential attachment
reading: chapter 8.9

Lecture 31 (11/8/17)
covered material: introduction to clustering, clustering objectives, 2-approximation for k-center
reading: chapters 7.1, 7.3

Lecture 32 (11/10/17)
covered material: k-means clustering maximum likelihood, dynamic program for 1 dimension, Lloyd's algorithm
reading: chapter 7.2

Lecture 33 (11/13/17)
covered material: heirarchical clustering and Kernel methods
reading: chapters 7.7.1 and 7.8

Lecture 34 (11/15/17)
covered material: stability assumptions for clustering
reading: chapter 7.6

Lecture 35 (11/17/17)
covered material: SVD for additive approximation to max-cut (clustering)
reading: chapter 3.9.5

Lecture 36 (11/20/17)
covered material: Kleinberg's impossibility theorem, axiomatizing cluserting
reading: this blog post by R.J. Lipton
other: problem set 3 assigned

Lectures 37 - 43 (11/22/17, 11/27/17, 11/29/17, 12/1/17, 12/4/17, 12/6/17, and 12/8/17)
covered material: student presentations on research papers (see schedule)