Chapter 13 Ensemble Methods

While this chapter is currently completely incomplete, the following resources will be useful for navigating the Graduate Student quiz in the Fall 2020 semester of STAT 432. Note that these resources might not necessarily follow all conventions of STAT 432, so notation and nomenclature may have minor differences.

This chapter introduces ensemble methods that use the combination of several models fit to the same data to create one model that may perform better than any single model.

The following are old notes from STAT 432. Like the resources below, these notes suffer from some deviation of the conventions established throughout the course this semester.

R for Statistical Learning: Ensemble Methods

13.1 Bagging

Bagging is a combination of the words bootstrap and aggregation and refers to a process of fitting many models to bootstrap resamples of data and then aggregating the predictions from these models. Most of the reason we introduced the bootstrap earlier was for its use in the creation of ensemble methods. Bagging is often associated with decision trees, but you could use any methods that you’d like!

13.1.1 Reading

13.2 Random Forest

A random forest is a method that combines decision trees, bagging, and a little bit of extra randomness. The randomness is added to overcome the correlation in the results of the models in the ensemble.

13.2.1 Reading

13.2.2 Video

13.3 Boosting

Like bagging, boosting uses many models, but instead of many models in parallel, it uses many models in sequence. Many methods can be boosted, but most often we boost decision trees.

Basics of Statistical Learning