MCS 590 - Foundations of Data Science
University of Illinois - Chicago
Spring 2015


MCS 590 is a course covering special topics in computer science. This semester, the topic is "foundations of data science." The course will cover topics such as: random graphs, small world phenomena, random walks, Markov chains, streaming algorithms, clustering, graphical models, and belief propogation. Techniques such as SVD and random projections will also be discussed.

Basic Information

Syllabus: pdf
Time and Location: MWF 12:00PM-12:50PM, Lincoln Hall 320
Instructor Contact Information: Lev Reyzin, SEO 418, (312)-413-9576,
Online Textbook: John Hopcroft and Ravi Kannan Mathematical Foundations of Data Science
Office Hours: T 3:00PM - 4:00PM, W 2:00PM-3:00PM

Presentations

instructions
topics and times

Problem Sets

problem set 1 due 2/13/15
problem set 2 due 3/18/15 3/20/15
problem set 3 due 4/24/14

Lectures and Readings

Lecture 1 (1/12/15)
covered material: intro to the course, preview of the material
reading: 1 of Hopcroft & Kannan

Lecture 2 (1/14/15)
covered material: tail inequalities and their proofs
reading: 2.2 and 2.7-2.8 of Hopcroft & Kannan

Lecture 3 (1/16/15)
covered material: intro to Erdös-Rényi random graphs, vertex degree distributions and their properties
reading: intro to 4.1 and 4.1.1 of Hopcroft & Kannan

Lecture 4 (1/21/15)
covered material: counting triangles in Erdös-Rényi random graphs
reading: 4.1.2 of Hopcroft & Kannan

Lecture 5 (1/23/15)
covered material: first and second moment methods, diameter sharp threshold in Erdös-Rényi random graphs
reading: 4.2 of Hopcroft & Kannan

Lecture 6 (1/26/15)
covered material: thresholds in Erdös-Rényi random graph, Molley Reed conditions for more general models
reading: 4.3 and 4.8 of Hopcroft & Kannan

Lecture 7 (1/28/15)
covered material: 0-1 law for first order logic for random graphs (guest lecture by György Turán)
reading: Probabilities on Finite Models by Fagin (1976)
optional reading 4.6 of Hopcroft & Kannan (a general theorem for phase transitions)

Lecture 8 (1/30/15)
covered material: growth models with and without preferential attachment
reading: 4.9.1 (first subsection) and 4.9.2 of Hopcroft & Kannan

Lecture 9 (2/2/15)
covered material: community detection, stochastic block model, and planted cliques
reading: Tim Roughgarden's lecture notes
optional reading: Statistical Algorithms and a Lower Bound for Planted Clique by Feldman et al. (2013)

Lecture 10 (2/4/15)
covered mateiral: models for overlapping communities, intro to spectral ideas
reading: Community-Affiliation Graph Model for Overlapping Network Community Detection by Yang and Leskovec (2012)
other: problem set 1 given out

Lecture 11 (2/6/15)
covered material: structured prediction for control (guest lecture by Brian Ziebart)
optional reading: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning by Ross et al. (2011)

Lecture 12 (2/9/15)
covered material: introduction to random walks and Markov chains, long term probability distributions
reading: intro to 5 of Hopcroft & Kannan

Lecture 13 (2/11/15)
covered material: the fundamental theorem of Markov chains, condition for time-reversible Markov chains
reading: 5.1 of Hopcroft & Kannan

Lecture 14 (2/13/15)
covered material: electrical networks and random walks
reading: 5.2 of Hopcroft & Kannan

Lecture 15 (2/16/15)
covered material: hitting and commute times for random walks on unweighted undirected graphs
reading: 5.3 of Hopcroft & Kannan

Lecture 16 (2/18/15)
covered material: cover times for random walks, the Web as a Markov chain, page rank
reading: begin 5.5 of Hopcroft & Kannan

Lecture 17 (2/20/15)
covered material: return time, expected hitting time, and personalized page rank
reading: finish 5.5 of Hopcroft & Kannan

Lecture 18 (2/23/15)
covered material: Markov chain Monte Carlo (MCMC), Metropolis-Hastings, Gibbs sampling
reading: 5.6 of Hopcroft & Kannan

Lecture 19 (2/25/15)
covered material: approximating volumes of convex bodies with MCMC
reading: 5.7 of Hopcroft & Kannan

Lecture 20 (2/27/15)
covered material: convergence of random walks on undirected graphs, intro to steaming algorithms
reading: 5.8 and intro to 7 of Hopcroft & Kannan

Lecture 21 (3/2/15)
covered material: estimating number of distinct elements in stream, universal hash functions
reading 7.1.1 of Hopcroft & Kannan

Lecture 22 (3/4/15)
covered material: counting occurrences of elements, approximating frequency of elements
reading 7.1.2 and 7.1.3 of Hopcroft & Kannan

Lecture 23 (3/6/15)
covered material: estimating the second moment of a stream, discussion of limited independence
reading: 7.1.4 of Hopcroft & Kannan
other: problem set 2 given out

Lecture 24 (3/9/15)
covered material: unique countable random graph, Ehrenfeucht-Fraisse games (guest lecture by György Turán)
reading: chapter 3 of Elements of Finite Model Theory by Libkin (2004)

Lecture 25 (3/11/15)
covered material: sketches of documents, sampling intersections, and matrix fingerprinting
reading: 7.3 of Hopcroft & Kannan and section 4.1 of Learning and Verifying Graphs... by Reyzin and Srivastava (2006)

Lecture 25 (3/13/15)
covered material: introduction to clustering, various clustering objectives and choosing an objective
reading: 8.1 of Hopcroft & Kannan

Lecture 26 (3/18/15)
covered material: Lloyd's algorithm for k-means, proof of convergence
reading: 8.2 of Hopcroft & Kannan

Lecture 27 (3/20/15)
covered material: 2-approximation for k-center and 2-approximation for max-cut clustering
reading: 8.3 of Hopcroft & Kannan and Barrington's lecture notes

Lecture 28 (3/30/15)
covered material: recursive heirarchical clustering based on sparse cuts
reading: 8.5 of Hopcroft & Kannan

Lecture 29 (4/1/15)
covered material: percolation clustering, axioms for clustering, Kleinberg's impossibility theorem, satisfiable axioms
reading: 8.10 (last subsection) and 8.11 of Hopcroft & Kannan

Lecture 30 (4/3/15)
covered material: topic models and nonnegative matrix factorization
reading: 9.1 of Hopcroft & Kannan

Lecture 31 (4/6/15)
covered material: Hidden Markov models, identifying most likely sequence of states via the Viterbi algorithm
reading: 9.2 of Hopcroft & Kannan

Lecture 32 (4/8/15)
covered material: introduction to graphical models, Bayesian networks, and factor graphs
reading: 9.3-9.6 of Hopcroft & Kannan

Lecture 33 (4/10/15)
covered material: message passing and marginalization on tree fractor graphs
reading: 9.7 of Hopcroft & Kannan

Lecture 34 (4/13/15)
covered material: singular value decomposition, best-fit subspaces, and optimality of greedy algorithm for computing SVD
reading: 3.1-3.3 of Hopcroft & Kannan

Lecture 35 (4/15/15)
covered material: SVD vs PCA, application to collaborative filtering, and power iteration
reading: 3.5, 3.6.1, and 3.7 of Hopcroft & Kannan

Lecture 36 (4/17/15)
covered material: alternating least squares for collaborative filtering, random projections and Johnson-Linderstrauss lemma
reading: 2.9 of Hopcroft & Kannan and Matrix Factorization Techniques for Recommender Systems by Koren et al. (2009)
other: problem set 3 given out

Lecture 37 (4/20/15)
covered material: SVD approach to clustering a mixture of spherical Gassuans
reading: 2.6 and 3.6.2 of Hopcroft & Kannan

Lecture 38 (4/22/15)
covered material: an additive polynomial time approximation scheme (PTAS) for max-cut using SVD
reading: 3.6.4 of Hopcroft & Kannan

Lecture 39 (4/24/15)
student presentations: nonnegative matrix factorization (Bowen Dong) and learning automata from random walks (Wei Xing)
references: Arora et al. (2012) and Freund et al. (1993)

Lecture 40 (4/27/15)
student presentations: finding a planted clique (Mojgan Mirzaie) and a spectral algorithm for learning HMMs (Yi Huang)
references: Alon et al. (2009) and Hsu et al. (2012)

Lecture 41 (4/29/15)
student presentations: deterministic matrix sketching (Anqi Liu) and stochastic models for the web (Mano Vikash Janardhanan)
references: Liberty (2013) and Kumar et al. (2000)

Lecture 42 (5/1/15)
student presentations: finding overlapping communities (Ben Fish) and testing of clustering (Adam Lelkes)
references: Arora et al. (2012) and Alon et al. (2003)