**Course Announcement****Time:**Monday, Wednesday, Friday at 10:00 AM - 10:50 AM

**Location:**Taft Hall 219**Instructor:**Jie Yang

**Office:**SEO 513

**Phone:**(312) 413-3748

**E-Mail:**jyang06 AT math DOT uic DOT edu

**Office Hours:**Monday, Wednesday, Friday at 11:00 a.m. - 12:00 p.m.**Textbook:**Trevor Hastie, Robert Tibshirani, Jerome Friedman,*The Elements of Statistical Learning: Data Mining, Inference, and Prediction*, 2nd edition, Springer, 2009.

**Reference Books:**- Bradley Efron, Trevor Hastie,
*Computer Age Statistical Inference: Algorithms, Evidence and Data Science*, 2016.

- J. D. Gibbons, S. Chakraborti,
*Nonparametric Statistical Inference*, 5th edition, 2011.

- Sharon L. Lohr,
*Sampling: Design and Analysis*, 2nd Edition, 2010.

- Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, An Introduction to Statistical Learning:
with Applications in R, Springer, 2013.

- Sanford Weisberg,
*Applied Linear Regression*, 3rd edition, Wiley, 2005. R Package.

- David J. Hand, Heikki Mannila, Padhraic Smyth, Principles of Data Mining, MIT Press, 2001.

**Course Contents:**Modern techniques for statistical learning including ridge regression, LASSO, LAR, principal components regression, and partial least squares; model assessment and selection techniques including AIC, BIC, and cross-validation; classification and clustering analysis techniques including linear discriminant analysis, logistic regression, neural networks, support vector machine, nearest neighbor, and K-means; sampling and survey techniques; applied nonparametric tests

**Prerequisite:**Grade of C or better in STAT 411 or STAT 481.**Homework:**Turn in every Wednesday before class; half of the grade counts for completeness; half of the grade counts for correctness of one selected problem.

**Exams:**October 12th (Friday), and November 16th (Friday), 10:00 a.m. - 10:50 a.m.

**Project:**Students are required to work in groups on course projects and submit their final reports before December 7th, Friday, 10:00 a.m.. Each group should consist of at most three students. The projects may come from the optional problems assigned by the instructor or be proposed by the students themselves upon the approval of the instructor.

**Grading:**Homework 20%, Two Exams 25% each, Project 30%

**Grading Scale:**90% A , 80% B , 70% C , 60% D**Format of All Exams:**Exams are mainly based on the homework and the examples discussed in class. The last class session before each exam is a review session. Please prepare any questions that you may have.*No makeup exam will be given without a valid excuse*.

- Bradley Efron, Trevor Hastie,
**Course Syllabus****WEEK****SECTIONS****BRIEF DESCRIPTION**08/27 - 08/31 Chapter 1; 3.2; 3.2 Introduction to Statistical Learning; Linear Regression Models and Least Squares 09/03 - 09/07 Holiday; 3.4; 3.4 Ridge Regression; Lasso 09/10 - 09/14 3.4; 3.5; 3.5 Least Angle Regression; Principal Components Regression; Partial Least Squares 09/17 - 09/21 7.5; 7.7; 7.10 AIC; BIC; Cross-Validation 09/24 - 09/28 Sampling: Chapter 1; Chapter 2; Chapter 2 Introduction to Sampling; Simple Random Sampling 10/01 - 10/05 Sampling: Chapter 3; Chapter 3; Chapter 5 Stratified Sampling; Cluster Sampling 10/08 - 10/12 Sampling: Chapter 5; Review; Exam-1 Cluster Sampling 10/15 - 10/19 NSI: 3.2; 3.3; 3.4, 3.5 Nonparametric Statistical Inference: Tests of Randomness 10/22 - 10/26 NSI: 4.2; 4.3; 4.5, 4.6 Nonparametric Statistical Inference: Tests of Goodness of Fit 10/29 - 11/02 NSI: 5.7; 6.3, 6.6; 8.2 Nonparametric Statistical Inference: Wilcoxon Signed-Rank Test; Kolmogorov-Smirnov Two-Sample Test, Mann-Whitney U Test; Wilcoxon Rank-Sum Test 11/05 - 11/09 4.1; 4.3; 4.3 Introduction for Classification; Linear Discriminant Analysis 11/12 - 11/16 4.4; Review; Exam-2 Logistic Regression 11/19 - 11/23 4.4; 11.3; Holiday Logistic Regression; Neural Networks 11/26 - 11/30 12.2; 12.3; 12.3 Support Vector Classifier; Support Vector Machines and Kernels 12/03 - 12/07 13.3; 14.3; 14.3 k-Nearest-Neighbor Classifiers; Cluster Analysis; K-means

**Homework**- Homework #1, due 09/10/2018 Optional #1 (available) Optional #2 (reserved)

- Homework #2, due 09/19/2018 Optional #3 (available) Optional #4 (available)
- Homework #3, due 09/26/2018 Optional #5 (available)

- Homework #4, due 10/03/2018 Optional #6 (available)

- Homework #5, due 10/10/2018

- Homework #6, due 10/24/2018

- Homework #7, due 10/31/2018

- Homework #8, due 11/07/2018

- Homework #9, due 11/14/2018

- Homework #10, due 12/03/2018 Optional #7 (available) Optional #8 (available)

- Homework #1, due 09/10/2018 Optional #1 (available) Optional #2 (reserved)
**Using R**- Download
**R**for Free -- the most popular software used by statisticians

- Learn R in 15 Minutes

- Use R to Compute Numerical Integrals

- Downloadable Books on R:
*An Introduction to R*, by William N. Venables, David M. Smith and the R Development Core Team

*Using R for Data Analysis and Graphics - Introduction, Code and Commentary*, by John H. Maindonald

*Practical Regression and ANOVA using R*, by Julian J. Faraway

**More R Books in Different Languages ...**

- R Code for the Course:
- Introduction to R

- Useful R Tips

- R code for §3.2.1 in the ESL book, including linear models for prostate cancer data

- R code for §3.4 in the ESL book, including Ridge, LASSO, Lars

- R code for §3.5 in the ESL book, including PCA, PCR, PLS

- R code for §7.5 and §7.7 in the ESL book, including AIC, BIC

- R code for §4.3 and §4.4 in the ESL book, including LDA, QDA, Logistic regression for vowel recognition data

- R code for Multinomial logistic regression with trauma clinical trial data

- R code for §11.3 and §11.4 in the ESL book, including Neural Network
- R code for §12.2 and §12.3 in the ESL book, including Support vector classifier and support vector machines
- R code for §13.3 in the ESL book, including k-Nearest-neighbor classifier

- Introduction to R

- Download
**Relevant Course Materials**

UIC Home | Library | MSCS | Jie's Home