Chapter 9 Regression

BLUF: Use regression, which is one of the two supervised learning tasks (the other being classification) to make predictions of new observations of numeric response variables. Start by randomly splitting the data (which includes both the response and the features) into a test set and a training set. Do not use the test data for anything other than supplying a final assesment of how well a chosen model performs at the prediction task. That is, never use the test data to make any modeling decisions. Use the training data however you please, but it is recommended to further split this data into an estimation set and a validation set. The estimation set should be used to train models for evaluation. For example, use the estimation data to learn the model parameters of a parametric model. Do not use data used in training of models (the estimation data) when evaluating models as doing so will mask overfitting of complex (flexible) models. Use the lm() function to train linear models. Use the knnreg() function from the caret pacakge to train k-nearest neighbors models. Use the rpart() function from the rpart package to train decision tree models. Use the validation set to evaluate models that have been trained using the estimation data. For example, use the validation data to select the value of tuning parameters that are often used in non-parametric models. Use numeric metrics such as root-mean-square error (RMSE) or graphical summaries such as actual versus predicted plots. Although it ignores some practical and statistical considerations (which will be discussed later), the model that acheives the lowest RMSE on the validation data will be deemed the “best” model. After finding this model, refit the model to the entire training dataset. Report the RMSE of this model on the test data as a final quantification of performance.

  • TODO: add ISL readings
  • TODO: <www.stat420.org>
  • TODO: add “why least squares?” readings

9.2 Modeling

9.2.1 Linear Models

  • TODO: assume form of mean relationship. linear combination
  • TODO: how to go from y = b0 + b1x1 + … + eps to lm(y ~ stuff)
  • TODO: least squares, least squares is least squares (difference in assumptions)

9.2.2 k-Nearest Neighbors

  • TODO: caret::knnreg()
  • TODO: for now, don’t worry about scaling, factors, etc.

9.3 Procedure

  • TODO: Look at data
  • TODO: Pick candiate models
  • TODO: Tune / train models
  • TODO: Pick “best” model
    • based on validation RMSE (note the issues with this)
  • TODO: Use best model / report test metrics

9.4 Data Splitting

\[ \mathcal{D} = \{ (x_i, y_i) \in \mathbb{R}^p \times \mathbb{R}, \ i = 1, 2, \ldots n \} \]

\[ \mathcal{D} = \mathcal{D}_{\texttt{trn}} \cup \mathcal{D}_{\texttt{tst}} \]

\[ \mathcal{D}_{\texttt{trn}} = \mathcal{D}_{\texttt{est}} \cup \mathcal{D}_{\texttt{val}} \]

9.5 Metrics

  • TODO: RMSE

\[ \text{rmse}\left(\hat{f}_{\texttt{set}}, \mathcal{D}_{\texttt{set}} \right) = \sqrt{\frac{1}{n_{\texttt{set}}}\displaystyle\sum_{i \in {\texttt{set}}}^{}\left(y_i - \hat{f}_{\texttt{set}}({x}_i)\right)^2} \]

\[ \text{RMSE}_{\texttt{trn}} = \text{rmse}\left(\hat{f}_{\texttt{est}}, \mathcal{D}_{\texttt{est}}\right) = \sqrt{\frac{1}{n_{\texttt{est}}}\displaystyle\sum_{i \in {\texttt{est}}}^{}\left(y_i - \hat{f}_{\texttt{est}}({x}_i)\right)^2} \]

\[ \text{RMSE}_{\texttt{val}} = \text{rmse}\left(\hat{f}_{\texttt{est}}, \mathcal{D}_{\texttt{val}}\right) = \sqrt{\frac{1}{n_{\texttt{val}}}\displaystyle\sum_{i \in {\texttt{val}}}^{}\left(y_i - \hat{f}_{\texttt{est}}({x}_i)\right)^2} \]

\[ \text{RMSE}_{\texttt{tst}} = \text{rmse}\left(\hat{f}_{\texttt{trn}}, \mathcal{D}_{\texttt{tst}}\right) = \sqrt{\frac{1}{n_{\texttt{tst}}}\displaystyle\sum_{i \in {\texttt{tst}}}^{}\left(y_i - \hat{f}_{\texttt{trn}}({x}_i)\right)^2} \]

9.6 Model Complexity

  • TODO: what determines the complexity of the above models?
    • lm: terms, xforms, interactions
    • knn: k (also terms, xforms, interactions)
    • tree: cp (with rpart, also others that we’ll keep mostly hidden) (also terms, xforms, interactions)

9.7 Overfitting

  • TODO: too complex
  • TODO: usual picture with training and validation error
  • TODO: define for the purposes of this course

9.8 Multiple Features

  • TODO: more features = more complex
  • TODO: how do the three models add additional features?

9.9 Example Analysis

  • TODO: Diamonds analysis

  • TODO: model.matrix()

9.10 MISC TODOS