Lectures: Mondays and Wednesdays, 11:30 AM - 12:45 PM, White Hall 205
Instructor: Ruoxuan Xiong, Psychology and Interdisciplinary Sciences Building 581, ruoxuan.xiong@emory.edu
Office Hours: Mondays 3:00 PM - 4:00 PM, Psychology and Interdisciplinary Sciences Building 581
This course introduces students to the field of machine learning, a critical toolset for analyzing and interpreting complex data sets across diverse domains, including biology, finance, marketing, and astrophysics. Students will explore foundational modeling and prediction techniques widely used in machine learning, artificial intelligence, and data science, with a focus on both practical applications and the statistical principles underlying these methods.
Topics covered include:
Supervised Learning (Regression and Classification): Linear regression, logistic regression, K-nearest neighbors, linear and quadratic discriminant analysis, regularization methods (Ridge, LASSO, and Elastic Net), and tree-based methods.
Model Evaluation and Resampling: Bias-variance tradeoffs, cross-validation, and bootstrapping.
Unsupervised Learning: Principal components analysis, clustering, and k-means algorithms.
Advanced topics: Neural networks, transformers, diffusion models, and foundation models
By the end of this course, students will gain both theoretical knowledge and practical skills to apply machine learning techniques to real-world problems.
Week 1, W Jan 15: Introduction
Logistics, introduction of supervised machine learning (regression, tree-based methods, and neural networks) and unsupervised machine learning (clustering and principal components analysis).
Week 2, W Jan 22: Preliminaries
Parametric and nonparametric methods, training and test mean-squared error.
Week 3, M Jan 27: Bias-Variance, W Jan 29: KNN
Bias-variance decomposition.
K nearest neighbors regression and classification.
Week 4, M Feb 3: Classification, W Feb 5: LDA and QDA
Classification problem, logistic regression, generative vs discriminative methods.
Linear discriminant analysis and quadratic discriminant analysis.
Week 5, M Feb 10: Cross-Validation, W Feb 12: Cross-Validation
Leave-one-out cross-validation and k-fold cross-validation.
Lab session.
Week 6, M Feb 17: Bootstrap, W Feb 19: Subset Selection
Bootstrap.
Best subset selection.
Week 7, M Feb 24: Subset Selection and Regularization, W Feb 26: Regularization
Forward and backward selection.
Lasso, ridge regression, and elastic-net.
Week 8, M Mar 3: Regression Tree, W Mar 5: Classification Tree
Regression tree.
Classification tree.
Week 9, Spring Break
Week 10, M Mar 17: Project Proposal Presentation, W Mar 19: Bagging
Bagging.
Week 11, M Mar 24: Random Forest and Boosting, W Mar 26: Boosting
Random forest.
Boosting, Adaboost, and XGBoost.
Lab session.
Week 12, M Mar 31: Principal Component Analysis, W Apr 2: Principal Component Analysis
Principal component analysis.
Lab session.
Week 13, M April 7: Midterm Review, W April 9: Midterm
Midterm review.
Week 14, April 14: K-Means Clustering and Neural Networks, W April 16: Foundation Models
K-means clustering.
Neural networks.
Foundation models.
Week 15, April 21: Diffusion Models, W April 23: Final Project Presentation
Diffusion models.
Lab session.
Week 16, April 28: Final Project Presentation
See here for the syllabus.
The goal of the course project is to prepare you for some project experience in machine learning. By the end of the project, we hope that you will have gained some hands-on experience in applying ML to a real-world problem, or learned some research frontiers in machine learning. You have two options to complete the project. The first option is to pick a dataset that interests you, and apply the knowledge we have gained this semester to analyze this dataset. The second option is to replicate a research paper and explore the possible extensions/improvements of the paper.
There is a project proposal presentation on Mar 17, 2025. Each group needs to prepare a five-minute presentation that includes all the group members (up to four students) and the topic of your group.
There are final project presentations on Apr 23, 2025 and Apr 28, 2025. Each group needs to prepare a ten-minute presentation that includes the motivation, setup, and results of the project. Before the full project presentation, we ask that you set up a publicly available GitHub repository about your work, along with detailed documentation about how to use the code repository and what findings you currently have about the project.
We expect when each group presents, other groups will provide critical feedback, which will be counted toward the participation in this course.
Finally, by May 7, 2025, refine the GitHub repository and the accompanying documentation.
See here for more details about the course project instructions and the sample list of datasets.
You are responsible for keeping up with all announcements made in class and for all changes in the schedule that are posted on the Canvas website.
The grade will be based on the following:
Homeworks: 30%
Exam (takehome, choose 24 hours): 30%
Course project report (submitted on GitHub): 20%
Course project presentations (one proposal and one final presentation): 15%
Participation: 5%
An Introduction to Statistical Learning (ISL). Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
Elements of Statistical Learning (ESL). Trevor Hastie, Rob Tibshirani, and Jerome Friedman.