Preface


Welcome to Basics of Statistical Learning! What a boring title! The title was chosen to mirror the University of Illinois course STAT 432 - Basics of Statistical Learning. That title was chosen to meet certain University course naming conventions, hence the boring title. A more appropriate title would be “Machine Learning from the Perspective of a Statistician who uses R,” which is more descriptive, but still a boring title. Anyway, this book will often be referred to as BSL.1


Caveat Emptor

This “book” is under active development. Literally every element of the book is subject to change, at any moment. This text, BSL, is the successor to R4SL, an unfinished work that began as a supplement to Introduction to Statistical Learning, but was never finished. (In some sense, this book is just a fresh start due to the author wanting to change the presentation of the material. The author is seriously worried that he will encounter the second-system effect.)

Because this book is written with a course in mind, that is actively being taught, sometimes out of convenience, the text will speak directly to the students of that course. Thus, be aware that any reference to a “course” are a reference to STAT 432 @ UIUC.

A PDF version is maintained for use offline, however, given the pace of development, this should only be used if absolutely necessary. During development formatting in the PDF version will largely be ignored.

Since this book is under active development you may encounter errors ranging from typos, to broken code, to poorly explained topics. If you do, please let us know! Better yet, fix the issue yourself! If you are familiar with R Markdown and GitHub, pull requests are highly encouraged!. This process is partially automated by the edit button in the top-left corner of the html version. If your suggestion or fix becomes part of the book, you will be added to the list at the end of this chapter. We’ll also link to your GitHub account, or personal website upon request. If you’re not familiar with version control systems feel free to email the author, dalpiaz2 AT illinois DOT edu. (But also consider using this opportunity to learn a bit about version control!) See additional details in the Acknowledgements section.

While development is taking place, you may see “TODO” scattered throughout the text. These are mostly notes for internal use, but give the reader some idea of what development is still to come. For additional details on the development process, please see the README file on GitHub as well as the Issues page.


Who?

This book is targeted at advanced undergraduate or first year MS students in Statistics who have no prior machine learning experience. While both will be discussed in great detail, previous experience with both statistical modeling and R are assumed.


Organization

Note: This is somewhat speculative.

01 - A Machine Learning Preview

  1. Introduction
  2. Regression (Powerlifting)
  3. Classification (Handwriting)
  4. Clustering (NBA Players)

02 - Some Machine Learning Foundations

  1. Introduction
  2. Probability (A Quick Tour)
  3. Statistics and Estimation (A Quick Tour)
  4. Density Estimation (?)

03 - A Tour of Machine Learning

  1. Introduction
    • Data Splitting
    • Generalization to Unseen Data
  2. Regression
  3. Bias, Variane, Loss, Risk
  4. Classification
  5. Resampling
  6. Recap and Overview of Supervised Learning
  7. Regularization
  8. Ensemble Learning
  9. Practical Issues
  10. Clustering (Unsupervised Learning)
    • Recap the unsupervised learning that was done throughout.
    • Introduce clustering specifically.

04 - Mathematics (Miscellaneous chapters further exploring mathematical details)

  1. Introduction
  2. Bayes Theorem
  3. Multivariate Normal

05 - Computing (Miscellaneous chapters further exploring computing details)

  1. Introduction
  2. Getting up to speed with R (Currently “Computing” in the main narrative.)
  3. purrr::map()
  4. Simulation? Bootstrap?

06 - Analysis (Examples of using the material with “real world” datasets)

07 - Appendix (Additional Readings and References)

  • Misc papers, blogs, tutorials, etc
  • Misc videos

Acknowledgements

The following is a (likely incomplete) list of helpful contributors. This book was also influenced by the helpful contributors to R4SL.

Your name could be here! Please see the CONTRIBUTING document on GitHub for details on interacting with this project. Pull requests encouraged!

Looking for ways to contribute?

  • You’ll notice that a lot of the plotting code is not displayed in the text, but is available in the source. Currently that code was written to accomplish a task, but without much thought about the best way to accomplish the task. Try refactoring some of this code.
  • Fix typos. Since the book is actively being developed, typos are getting added all the time.
  • Suggest edits. Good feedback can be just as helpful as actually contributing code changes.

TODO: Standing on the shoulder of giants. High level acknowledgements.



  1. Just an example footnote, please ignore.