**Course Announcement****Time:**Monday, Wednesday, Friday at 9:00 AM - 9:50 AM

**Location (in person and on campus):**Taft Hall (826 S Halsted St, Chicago, IL 60607), Room 117**Instructor:**Jie Yang

**Office:**SEO 513

**Phone:**(312) 413-3748

**E-Mail:**jyang06 AT uic DOT edu

**Office Hours:**Monday, Wednesday, Friday at 10:00 a.m. - 11:00 a.m. at UIC Zoom or by appointment**Textbook (required):**Trevor Hastie, Robert Tibshirani, Jerome Friedman,*The Elements of Statistical Learning: Data Mining, Inference, and Prediction*, 2nd edition, Springer, 2017.

**Reference Books (optional):**- Bradley Efron, Trevor Hastie,
*Computer Age Statistical Inference: Algorithms, Evidence and Data Science*, 2016.

- J. D. Gibbons, S. Chakraborti,
*Nonparametric Statistical Inference*, 5th edition, 2011.

- Sharon L. Lohr,
*Sampling: Design and Analysis*, 2nd edition, 2010.

- Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, An Introduction to Statistical Learning:
with Applications in R, 2nd edition, Springer, 2023.

- Sanford Weisberg,
*Applied Linear Regression*, 4th edition, Wiley, 2013.

- Max Bramer, Principles of Data Mining, 4th edition, Springer, 2020.

**Course Description:**STAT 485. Intermediate Statistical Techniques for Machine Learning and Big Data. Modern techniques for statistical learning including linear models, subset selection, partial least squares; LDA; logistic regression; model selection; sampling theory with applications to big data analysis; applied nonparametric inference.

**Course Credits:**3 hours for undergraduates or 4 hours for graduate students.

**Prerequisite:**STAT 385 and STAT 411. Recommended background: STAT 481.

**Course Goals:**Understand modern statistical techniques for machine learning, data mining, and big data; formulate and implement statistical learning techniques by using standard statistical software; apply statistical learning techniques to real life applications and research projects.

**Learning Objectives:**Understand and implement linear models, subset selection, and partial least squares; implement linear discriminant analysis, logistic regression, and model selection; understand sampling theory and fundamental principles; apply sampling techniques to big data analysis; apply nonparametric inference to real applications.**Attendance/Participation Policy:**Students are expected to attend the lectures and participate in the discussions. Attendance will be counted at least six times during the course period and the students who present may receive half credit point each time and up to three extra credit points in total on the final grade at a 100-point scale. Students who actively participate in the discussion may receive half or one credit point each time and up to five credit points in total. If for any reason you could not present in class on time, please send me an email at your earliest convenience. If you need special accommodations due to disabilities, please contact the Disability Resource Center for a Letter of Accommodation (LOA).

**Assignments, Due Dates, and Deadlines:**Homework will be assigned at a weekly basis; turn in your homework every Wednesday before class via UIC Blackboard; half of the grade counts for completeness; half of the grade counts for correctness of one selected problem.

**Policy for Missed or Late Homework:**Students may request up to two days' extension for each homework; late homework without request ahead of time or longer than two days will not be accepted; the lowest one homework score will be dropped for final grade.

**Exam:**March 1st, 2024 (Friday), 9:00 a.m. - 9:50 a.m.

**Project:**Students are required to work in groups on course projects and submit their final reports before April 26th, 2024, Friday, 9:00am. Each group should consist of at most three students. The projects may come from the optional problems assigned by the instructor or be proposed by the students themselves upon the approval of the instructor.

**Grading:**Homework 30%, Exam 30%, Project 40%

**Grading Scale:**90% A , 80% B , 70% C , 60% D

- Bradley Efron, Trevor Hastie,
**Course Syllabus****WEEK****SECTIONS****BRIEF DESCRIPTION**01/08 - 01/12 Chapter 1; 3.2; 3.2 Introduction to Statistical Learning; Linear Regression Models and Least Squares 01/15 - 01/19 Holiday; 3.3; 3.3 Subset Selection 01/22 - 01/26 3.4; 3.4; 3.4 Ridge Regression; Lasso; Least Angle Regression 01/29 - 02/02 3.5; 3.5; 3.8 Principal Components Regression; Partial Least Squares; Grouped Lasso 02/05 - 02/09 7.5; 7.7; 7.10 AIC; BIC; Cross-Validation 02/12 - 02/16 Sampling: Chapter 1; Chapter 2; Chapter 3 Introduction to Sampling; Simple Random Sampling; Stratified Sampling 02/19 - 02/23 Sampling: Chapter 5; Sampling: Lecture Notes Cluster Sampling; Subsampling Techniques for Big Data Analysis 02/26 - 03/01 Sampling: Lecture Notes; Review; Exam Subsampling Techniques for Big Data Analysis 03/04 - 03/08 NSI: 3.2; 3.3; 3.4, 3.5 Nonparametric Statistical Inference: Tests of Randomness 03/11 - 03/15 NSI: 4.2; 4.3; 4.5, 4.6 Nonparametric Statistical Inference: Tests of Goodness of Fit 03/25 - 03/29 NSI: 5.7; 6.3, 6.6; 8.2 Nonparametric Statistical Inference: Wilcoxon Signed-Rank Test; Kolmogorov-Smirnov Two-Sample Test, Mann-Whitney U Test; Wilcoxon Rank-Sum Test 04/01 - 04/05 4.1; 4.3; 4.3 Introduction for Classification; Linear Discriminant Analysis 04/08 - 04/12 4.4; 4.4; 4.4 Logistic Regression 04/15 - 04/19 11.3; 11.3; 12.2 Neural Networks; Support Vector Classifier 04/22 - 04/26 12.3; 13.3; 14.3 Support Vector Machines and Kernels; k-Nearest-Neighbor Classifiers; K-means

**Homework**- Homework #1, due 01/17/2024

- Homework #2, due 01/31/2024

- Homework #3, due 02/07/2024

- Homework #4, due 02/14/2024

- Homework #5, due 02/21/2024

- Homework #6, due 02/28/2024

- Homework #7, due 03/13/2024

- Homework #8, due 03/27/2024

- Homework #9, due 04/03/2024

- Homework #10, due 04/17/2024

- Homework #1, due 01/17/2024
**Using R (Required and free to download)**- Download
**R**for Free -- the most popular software used by statisticians

- Learn R in 15 Minutes

- Use R to Compute Numerical Integrals

- RStudio -- a convenient set of integrated tools for R, including programming, plotting, and workspace management

- Downloadable Books on R:
*An Introduction to R*, by William N. Venables, David M. Smith and the R Development Core Team

*Using R for Data Analysis and Graphics - Introduction, Code and Commentary*, by John H. Maindonald

**More R Books in Different Languages ...**

- R Code for the Course:
- Introduction to R

- Useful R Tips

- R code for §3.2.1 in the ESL book, including linear models for prostate cancer data

- R code for §3.3 in the ESL book, including subset selection

- R code for §3.4 in the ESL book, including Ridge, LASSO, Lars

- R code for §3.5 in the ESL book, including PCA, PCR, PLS

- R code for §3.8.4 in the ESL book, including Grouped Lasso

- R code for §7.5 and §7.7 in the ESL book, including AIC, BIC

- R code for §4.3 and §4.4 in the ESL book, including LDA, QDA, Logistic regression for vowel recognition data

- R code for Multinomial logistic regression with trauma clinical trial data

- R code for §11.3 and §11.4 in the ESL book, including Neural Network
- R code for §12.2 and §12.3 in the ESL book, including Support vector classifier and support vector machines
- R code for §13.3 in the ESL book, including k-Nearest-neighbor classifier

- Introduction to R

- Download
**Relevant Course Materials**## Community Agreement/Classroom Conduct Policy

- Be present by turning off cell phones and removing yourself from other distractions.

- Be respectful of the learning space and community. For example, no side conversations or unnecessary disruptions.

- Use preferred names and gender pronouns.

- Assume goodwill in all interactions, even in disagreement.

- Facilitate dialogue and value the free and safe exchange of ideas.

- Try not to make assumptions, have an open mind, seek to understand, and not judge.

- Approach discussion, challenges, and different perspectives as an opportunity to think out loud, learn something new, and understand the concepts or experiences that guide other people's thinking.

- Debate the concepts, not the person.

- Be gracious and open to change when your ideas, arguments, or positions do not work or are proven wrong.

- Be willing to work together and share helpful study strategies.

- Be mindful of one another's privacy, and do not invite outsiders into our classroom.

- Be present by turning off cell phones and removing yourself from other distractions.
## Disability Accommodations Statement

UIC is committed to full inclusion and participation of people with disabilities in all aspects of university life. Students who face or anticipate disability-related barriers while at UIC should connect with the Disability Resource Center (DRC) at drc.uic.edu, drc@uic.edu, or at (312) 413-2183 to create a plan for reasonable accommodations. In order to receive accommodations, students must disclose disability to the DRC, complete an interactive registration process with the DRC, and provide their course instructor with a Letter of Accommodation (LOA). Course instructors in receipt of an LOA will work with the student and the DRC to implement approved accommodations.

UIC Home | Library | MSCS | Jie's Home