Volume 5, Number 1, September 2004

 

Review of "Teaching Statistics Using Baseball" by Jim Albert
 
 
Jerry Reiter
Institute of Statistics and Decision Sciences
Duke University
Durham, NC 27008 USA
 
 
 
 

Many mathematicians, operations researchers, and statisticians became interested in their disciplines via the statistics of baseball. Personally, as a youngster I poured over the statistics of my Little League career (I was a generous official scorer when it came to my own batting average) and the statistics of my favorite players. As a more seasoned baseball fan, and a statistician in academia, I appreciate even more the statistical nature of the game, especially the current trend among baseball managers towards using mathematical models to dictate strategy and team composition.

Professor Albert's Teaching Statistics Using Baseball is a terrific book for instructors and students who love numbers and baseball. It introduces elementary statistical concepts like graphical displays and simple probability and advanced topics like Markov chains and Bayesian inference. The book's strength is the examples. Professor Albert first poses baseball questions and then uses statistical techniques to answer them. In contrast, many introductory statistics text books conjure questions to illustrate the statistical techniques, an approach I find less interesting and less realistic than Professor Albert's approach.

As Professor Albert states, the book is best viewed as a supplement to standard introductory statistics texts. It is not suitable to be used as the primary text book for a Statistics 101 course, although it could be the main text book for a course purely on baseball statistics (which Professor Albert has taught). Some standard introductory statistics concepts are not explained in much detail, for example the normal curve and correlation, and others are omitted entirely, for example significance tests. I do not see these as deficiencies, since the book is not meant to replace a main statistics text.

For instructors, the book has several potential uses. First, it contains a wealth of examples that can be used in the classroom. The examples are well-documented, with sufficient background information to enliven the stories. Professor Albert typically digs deep into analyses, for example explaining unusual trends in graphical displays by referring to events during the season (e.g., player injuries). He also starts each example with a list of the statistical methods used in that example, which I found useful when thinking about my own courses. Second, the book has clear introductions to Bayesian statistics and Markov chains. When reading the section on Bayesian inference (there is no frequentist inference in this book), I could not help but wonder why we don't teach Bayesian statistics to introductory students. It seems easier to follow than significance testing and confidence intervals! Lastly, the book contains data sets, as well as pointers to more data on the web, that can be distributed to students doing projects.

The book has nine chapters and two appendices. Chapters 1 through 4 cover descriptive statistics and graphical displays. Chapters 5 and 6 cover probability. Chapters 7 and 8 cover statistical inference, with a focus on Bayesian inference and simulation. Chapter 9 covers Markov chains. The two appendices include an introduction to the rules of baseball and a list of resources for readers who want to collect baseball data.

Chapter 1 is an introductory chapter. It contains a preview of the types of questions that will be addressed in the rest of the book. It also reviews common baseball statistics, such as on base percentage and runs created. The review assumes only a basic knowledge of baseball. The chapter concludes with a description of the types of baseball data that will be analyzed in the book.

Chapter 2 covers graphical displays for one variable, including dot plots, histograms, and stem and leaf plots. There are detailed explanations of how to make dot plots and stem and leaf plots. The examples in this chapter concern league-wide trends, such as the numbers of home runs in different years, and individual players' performances. There are a large number of exercises in the chapter; in fact, there are more pages of exercises than there are of examples. Many of the exercises require similar methods of solutions.

Chapter 3 covers comparisons of two variables, including side by side box plots, scatter plots, and standardization (comparing z-scores). Some concepts are used but not motivated with intuition, for example the use of 1.5 times the interquartile range in box plots and the use of the normal curve for calculating probabilities. This is consistent with the goal of the book as a supplement; readers can consult others texts for detailed explanation and intuition. Data analyses include comparisons of several hitters and pitchers, and comparisons of statistics from different seasons.

Chapter 4 focuses on scatter plots, correlation, and simple and (briefly) multiple regression. The chapter has excellent examples of the regression effect, showing how players' batting averages tend to regress towards the overall average in successive seasons. I found especially interesting the use of regression to justify some of the offensive statistics currently in vogue among baseball statisticians. I do have one quibble with the regression examples in this chapter: although Professor Albert describes residuals, he does not discuss their use for model checking.

Chapter 5 introduces elementary probability through the use of simulation. It describes several tabletop games that mimic outcomes of plate appearances, which leads to a frequency notion of probability. The games are built on independent events, an assumption which is reasonable for many baseball contexts. This does make it difficult to use the games to teach conditional probability.

Chapter 6 presents applications of the binomial and negative binomial distributions. In keeping with the spirit of the book, the chapter does not discuss formally the theory of probability distributions. Instead, Professor Albert walks readers through the model-building process for his two main examples-number of hits per game and number of runs scored in an inning-including checks of predicted values versus the actual data. These model-checking sections are excellent guides for students learning statistics.

Chapter 7 contains a brief introduction to statistical inference. The chapter presents a Bayesian perspective, using point mass prior distributions to derive posterior distributions for players' batting averages. It is possible from these posterior distributions to do testing and interval estimation. The chapter does not discuss significance testing or confidence intervals, although the normal approximation to the binomial distribution is used for posterior intervals. I found this presentation of Bayesian statistics to be very appealing, and I plan to use it in my introductory statistics classes.

Chapter 8 continues on the theme of statistical inference, using simulation based methods of inference. The chapter centers on two topics: interpreting situational data (e.g., batting average for day games versus for night games), and streakiness in baseball. Professor Albert proposes models for batting averages, and simulates data from those models using analogies to the tabletop games in Chapter 5. The simulated data are compared against the genuine data to check model fit. This approach finds that many situational statistics and apparent streakiness can be explained by chance variation.

Chapter 9 concludes with an application of Markov chains in baseball. The idea is to conceive of the 24 different game situations (3 outs by 8 combinations of players on base) as a state space for a Markov chain. By using data from an entire season to estimate transition probabilities, one can simulate innings. This is an extremely flexible approach, and it provides quantities like the expected number of runs scored in an inning, which in turn allows one to evaluate baseball strategies like the sacrifice bunt. I believe that students studying probability will enjoy reading the real-life applications of Markov chains in this chapter.

To sum, Professor Albert's Teaching Statistics Using Baseball is an excellent resource for instructors and students of beginning statistics. I enjoyed reading it thoroughly, and I will use some of the examples in my lectures. I highly recommend it to anyone who loves numbers and sports.


To download a printable version (pdf) of this paper, click here. To download the Adobe Acrobat reader for viewing and printing pdf files, click here.
To reference this paper, please use: 
Reiter J. (2004), "Review of "Teaching Statistics Using Baseball" by Jim Albert," INFORMS Transactions on Education, Vol. 5, No 1,  http://ite.pubs.informs.org/Vol5No1/Reiter/