Statistics in Plain English for Machine Learning

Author: Jason Brownlee

There is an ocean of books on statistics; where do you start?

A big problem in choosing a beginner book on statistics is that a book may suffer one of two common problems.

It may be a mathematical textbook filled with derivations, special cases, and proofs for each statistical method with little idea for the intuition for the method or how to use it. Or it may be a playbook for a proprietary or ancient statistical package with little relevance to the libraries and problems you face.

In this post, you will discover the book “Statistics in Plain English” for learning about statistical methods without getting too bogged down in theory nor implementation details.

After reading this post, you will know:

  • That the book is intended to provide a clear presentation of statistical methods for practitioners.
  • The contents of the book focus on the foundations, Gaussian distribution, and parametric statistical hypothesis tests.
  • A careful reading list can be used to learn about the specific methods relevant to machine learning practitioners.

Let’s get started.

Overview

  1. Statistics in Plain English
  2. Contents of the Book
  3. Reading list for Machine Learning

Statistics in Plain English

Statistics in Plain English provides an introduction to statistics for students that might be taking a statistics class as part of some other degree program in social sciences.

Statistics in Plain English

Statistics in Plain English

It was written by Timothy Urdan who is a researcher and professor of Psychology. It is a popular book because of the accessibility of the writing and is currently in the fourth edition. I have the third edition, so any quotes and table of contents will reference that version.

It is not a textbook nor an exercise book, but something in between. Tim modestly states the purpose of the book as follows:

The purpose of this book is to make it a little easier to understand statistics.

His intention is for the book to act as a compliment to a more dense textbook on statistics. Again, I think this is modest and mentioned because it does not dive into more mathematical rigor (derivation and proofs) behind the methods and focuses on the application and intuition for the methods (i.e. what you care about as a practitioner).

I do think that the book is more than suitable as a first step into statistics.

Each chapter introduces a statistic (sometimes more than one) using a consistent template with three parts, as follows:

  1. A short description of the statistic.
  2. A longer description of the equation and details of the statistics.
  3. A worked example for using the statistic.

The book is not long at less than 200 pages. It also uses a large form factor 11 x 5.5 inches, meaning that physically holding the book gives a lot of space to the ideas and examples.

If you have the time and are really new to the field of statistics, it is worth reading cover to cover. Seriously. Even if you’re familiar with the topic, it’s a great read.

Need help with Statistics for Machine Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Contents of the Book

I recommend studying the table of contents.

It is useful for two reasons:

  • To get an idea of the breadth in topics for introductory statistics.
  • To get an idea of what topics might interest you or be relevant to your projects.

The full 15-chapter table of contents from the third edition of the book are as follows:

  • Chapter 1: Introduction to Social Science Research Principles and Terminology
  • Chapter 2: Measures of Central Tendency
  • Chapter 3: Measures of Variability
  • Chapter 4: The Normal Distribution
  • Chapter 5: Standardization and z Scores
  • Chapter 6: Standard Errors
  • Chapter 7: Statistical Significance, Effect Size, and Confidence Intervals
  • Chapter 8: Correlation
  • Chapter 9: t Tests
  • Chapter 10: One-Way Analysis of Variance
  • Chapter 11: Factorial Analysis of Variance
  • Chapter 12: Repeated-Measures Analysis of Variance
  • Chapter 13: Regression
  • Chapter 14: The Chi-Square Test of Independence
  • Chapter 15: Factor Analysis and Reliability Analysis: Data Reduction Techniques

The presentation provides a clear separation of the topics.

It allows you to pick and choose the topics or chapters that interest you the most and dive in, without having to read prior chapters.

Thee book is organized such that the more basic statistics and statistical concepts are in the earlier chapters whereas the more complex concepts appear later in the book. However, it is not necessary to read one chapter before understanding the next. Rather, each chapter in the book was written to stand on its own.

A review of the table of contents highlights two things:

  • The book has a strong focus on the Gaussian distribution, which is reasonable given the importance of this distribution in both probability and statistics.
  • The book also has a large focus on statistical hypothesis tests, specifically parametric tests, which aligns with the focus on the Gaussian distribution.

This chosen focus will handle most of the statistical methods required when working with social science experimental data, at least in the beginning. There are a few holes though for the machine learning practitioner. For example:

  • The book does not have much on estimation methods, a little on confidence intervals, but nothing on prediction intervals and tolerance intervals.
  • The book also does not cover resampling methods (bootstrap, k-fold cross-validation and more).
  • The whole area of nonparametric statistical methods are also skipped.

Nevertheless, these topics can be looked up in more targeted books.

Reading List for Machine Learning

It’s a great book and I do recommend it if you are new to statistics and you’re looking for a clear presentation of the foundations that you really do need to know in applied machine learning.

As I mentioned above, it is not a long read and well worth reading cover to cover.

With that being said, not all chapters are relevant or directly useful to you as a machine learning practitioner.

Below is a breakdown or suggested reading list of the book for machine learning practitioners.

I think you need to have some understanding of foundational statistics no matter what. I would recommend reading the first few chapters in order to get this grounding, at least:

  • Chapter 1: Introduction to Social Science Research Principles and Terminology
  • Chapter 2: Measures of Central Tendency
  • Chapter 3: Measures of Variability
  • Chapter 4: The Normal Distribution

To beef-up your skills in understanding your training data and in data preparation, I would recommend the following three chapters:

  • Chapter 5: Standardization and z Scores
  • Chapter 8: Correlation
  • Chapter 14: The Chi-Square Test of Independence

For evaluating and comparing machine learning models and model parameters, you can use statistical hypothesis tests. To get started in this area, I would recommend the following two chapters:

  • Chapter 7: Statistical Significance, Effect Size, and Confidence Intervals
  • Chapter 9: t Tests

You could probably skip the other chapters.

The chapter on linear regression (Chapter 13) might be of interest if you use the method and are interested in a deeper idea of how and why it works.

Do you agree with this reading plan?
Let me know in the comments below.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Summary

In this post, you discovered the book “Statistics in Plain English” for learning about statistical methods without getting too bogged down in theory (proofs and derivations) nor implementation details (pages of code and commands for proprietary statistical packages).

Specifically, you learned:

  • That the book is intended to provide a clear presentation of statistical methods for practitioners.
  • The contents of the book focus on the foundations, Gaussian distribution, and parametric statistical hypothesis tests.
  • A careful reading list can be used to learn about the specific methods relevant to machine learning practitioners.

Do you have this book or have you read it?
What do you think of it? Share your thoughts below.

Are you thinking of getting this book?
Why or why not?

The post Statistics in Plain English for Machine Learning appeared first on Machine Learning Mastery.

Go to Source