Chapter 13 Ensemble Methods
While this chapter is currently completely incomplete, the following resources will be useful for navigating the Graduate Student quiz in the Fall 2020 semester of STAT 432. Note that these resources might not necessarily follow all conventions of STAT 432, so notation and nomenclature may have minor differences.
This chapter introduces ensemble methods that use the combination of several models fit to the same data to create one model that may perform better than any single model.
The following are old notes from STAT 432. Like the resources below, these notes suffer from some deviation of the conventions established throughout the course this semester.
13.1 Bagging
Bagging is a combination of the words bootstrap and aggregation and refers to a process of fitting many models to bootstrap resamples of data and then aggregating the predictions from these models. Most of the reason we introduced the bootstrap earlier was for its use in the creation of ensemble methods. Bagging is often associated with decision trees, but you could use any methods that you’d like!
13.2 Random Forest
A random forest is a method that combines decision trees, bagging, and a little bit of extra randomness. The randomness is added to overcome the correlation in the results of the models in the ensemble.
13.2.1 Reading
13.3 Boosting
Like bagging, boosting uses many models, but instead of many models in parallel, it uses many models in sequence. Many methods can be boosted, but most often we boost decision trees.
13.3.1 Reading
13.3.2 Video
- Gradient Boost Part 1: Regression Main Ideas
- Gradient Boost Part 2: Regression Details
- Gradient Boost Part 3: Classification
- Gradient Boost Part 4: Classification Details
- XGBoost Part 1: Regression
- XGBoost Part 2: Classification
- XGBoost Part 3: Mathematical Details
- XGBoost Part 4: Crazy Cool Optimizations
- XGBoost Part 1: Regression