Volume 5, Number 1, September 2004

 

Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League
 
 
Salwa Ammar & Ronald Wright
Department of Business Administration
Le Moyne College
Syracuse, New York
 
 
 
   
 

Abstract

This paper describes a simulation exercise designed for introductory quantitative method classes both at the MBA and undergraduate level. The exercise tracks the performances of teams in the National Basketball Association (NBA) during the season of 2004. It is designed as a spreadsheet model and is developed in stages throughout the academic semester. The example is a significant illustration of the use of sports as a vehicle for teaching OR topics, specifically simulation. It also incorporates many spreadsheet modeling skills such as the use of Excel functions and Data Tables. The model provided a good mix of the ingredients for an effective simulation example including a variety of answers, challenges and surprises.

1. Introduction

For years, sports have been used in teaching Statistics (Lock, 1997 and Nettleton, 1998). More recently, sports have been used as a motivating context for introducing simulation (Ammar and Wright, 2001). The probabilistic nature of the competitive outcomes, the desire to predict these outcomes, and the variety among the relationships between the outcomes, provide a rich array of applications for simulation models. Excel spreadsheets give instructors the ability to move away from 'toy' examples and introduce, in a manageable format, real and meaningful illustrations (Evans, 2000). The most effective sports related examples capitalize on current events of general interest that stimulate curiosity and inquisitiveness beyond those who may be considered diehard fans.

This paper describes an exercise that predicts the impact of two rookies in the National Basketball Association (NBA). One of the rookies, LeBron James, was the number one draft in the NBA for 2003 and generated much media attention. The second rookie, Carmelo Anthony, led his college team for Syracuse University to its first national championship in basketball. The championship was by far the most important sporting result for the central New York region in recent years. Thus following the immediate progress of this local 'hero' guaranteed for us the broad student interest. Student interest and curiosity were essential in sustaining their enthusiasm as we explored various modeling tools and concepts for the simulation model. Although specific to these rookies and their draft teams, the exercise can easily be generalized and updated to predict performances of any teams in the NBA.

In this paper we demonstrate the details of the simulation model using Carmelo Anthony's team, the Denver Nuggets. In the previous year and prior to his draft the Nuggets won only 17 out of the season's 82 games. By the all-star game of 2004 and with the help of Anthony, the Nuggets had already won 31 games and seem to be well on their way to the playoffs. The example described in this paper joins the league at the point of the all star game (after 55 games) and includes a Monte-Carlo simulation for the performance of the Nuggets in the remaining games of the season. In determining the playoff chances of the Nuggets, the simulation includes the performances of the team's nearest competitors. Similar assessments are also performed for the Cleveland Cavaliers, the team that drafted LeBron James.

The exercise is designed to help achieve several objectives. To run a successful simulation the first step tends to focus on defining the relevant probabilities. The example introduces a method for estimating probabilities that is intuitively acceptable and follows the rules of probability. In this paper the process of estimating the probabilities is simple (by design) in order to avoid the need for extensive discussion and coverage of advance topics in probability theory. Another objective of this example is to introduce students to the simulation capabilities of Excel including the use of data tables to replicate observations. Also, this example introduces the concept of simulating events that are related. For example simulating the outcome of the Seattle at Denver game determines the outcome of the Denver home game as well as the outcome of the Seattle away game. Another important objective of the example is to demonstrate for students how simulation can be used to provide answers to questions beyond the specific simulated events. Simulating the outcomes of games for the season allows us to explore the chances of the team returning for the playoffs. The final part of this paper focuses on the process of assessing and updating the estimated probabilities as the season progresses and games are won and lost.

This example is used in our management science class to introduce concepts and basic skills in spreadsheet simulation. This is a core junior level class required of students majoring in business and accounting. Students entering this class are expected to have completed the introductory statistics requirement. The simulation is done entirely in Excel. Large numbers of trials are executed using data tables and other useful functions in Excel. The use of add-ins such as Crystal Ball and @Risk is introduced at the higher (or senior) level simulation class and are not needed for this particular exercise.

2. Denver Simulation Model

The model involves an attempt to simulate the remaining 27 games for the Denver Nuggets. Once the outcome of these games is simulated the next step is to assess the Nuggets' chances of making the playoffs. This process is designed in several stages. The first stage is that of estimating the winning probabilities (percentage) by team in the league. It is important to recognize that these probabilities vary for each team depending on whether the game is played at home or away. The second stage is to simulate the number of Denver wins for the remaining scheduled season based on the estimated probabilities. Finally in assessing Denver's chances of reaching the playoffs, the model examines Denver's nearest (slightly above or slightly below) competitors by simulating each of their performances and comparing Denver's performance.

2.1. Estimating Winning Probabilities

The probabilities could be estimated by using information on teams' performances in the previously played games. In the first 55 games of the season, Denver's winning percent was .582. (Note: percent is typically referred to on sport pages as a number between 0 and 1. For convenience we maintain this convention for the data in this paper). We could use .582 as the probability of winning each game. However, in the NBA there is a significant difference between winning percents for home games and for games on the road. Table 1 shows Denver's home and road winning percentages as well as the average of the entire league. It also shows the same percentages for two select teams (for purpose of illustration).

Table 1. Select Winning Percentages

 

Winning Percentage

 

Home

Road

League Average

.62

.38

Denver

.72

.42

New York

.55

.38

Indiana

.78

.67

Since Denver has won 72% of its home games we could begin by assigning a probability of .72 to Denver winning a future home game. However, the quality of the opponent would raise or lower that probability for a particular game. For example it might be reasonable to assign a probability of .72 to Denver beating New York at home since New York's road winning percentage matches that of the league. That is New York is an average road team. If the probability that Denver beats New York at home is .72, then the probability that New York wins that game would be 1 - .72 or .28. This is lower than New York's road average, indicating that Denver is a better than average home team. In fact the extent to which New York's probability of winning is reduced (.38 to .28) of course matches the extent to which Denver is a better than average home team (.72 - .62). A possible generalization follows.

Consider two teams playing a particular game, team A, the home team and team B, the road team.

Let Hper= proportion Team A wins when playing at home.

Let Rper= proportion Team B wins when playing on the road.

Let LHper= average proportion all league teams win when playing at home.

Let LRper= average proportion all league teams win when playing on the road.

We can then estimate the probability that the road team wins as (team B over Team A):

Rper- (Hper- LHper). (1)

In our example, the estimated probability that New York beats Denver on the road is:

.38 - (.72 - .62) = .28 .

Similarly we can estimate the probability that the home teams wins as:

Hper- (Rper- LRper). (2)

In our example, the estimated probability that Denver beats New York at home is:

.72 – (.38 - .38) = .72.

Also, if Denver is playing Indiana at home the probability of Denver winning is:

.72 – (.67 - .38) = .43, a lower value since Indiana is a better than average road team.

It is important to check that the estimates for probabilities of complementary events add to one. If we add the probability that the road team wins (as calculated in (1)) to the probability that the home team wins (as calculated in (2)) we get:

Rper- (Hper- LHper) + Hper- (Rper- LRper) = LHper+ LRper= 1.

This relationship suggests that equations (1) and (2) provide possible estimates for winning probabilities. These estimates take into account the relative position of any particular team in the league.

2.2. Modeling the Number of Denver Wins

Figure 1 shows Denver's simulation sheet . It includes Denver's schedule and the appropriate home or road records of each opponent. Average league results are also included. Denver's schedule and the league standings at the time of this example were downloaded from a popular sports site. Denver's win percentages and the league win percentages were included. Using equations (1) and (2) (and IF statements for home or road), the probability of Denver winning each game is calculated.


Figure 1. Denver Simulation

The outcome for each game is simulated by using RAND, the Excel random number function for a uniform distribution between 0 and 1. If the random number is less than the probability of Denver winning that game a "win" is recorded. Otherwise "lose" is entered in the simulation column. (Results in highlighted cells are explained in the following section.) A COUNTIF statement is used to count the number of simulated wins. The total number of wins is the simulated number of wins plus the actual wins at the time of the example.

Each time a recalculation is done, new random numbers are generated and new simulation results recorded. The results of multiple simulations can be recorded using Excel's Data Table (Evans and Olson, 2002). For a 1000 run simulation we found the average number of wins to be 46 with a minimum of 39 and a maximum of 55. Table 2 shows the number of times a range of wins occurred in the 1000 runs.

Table 2. Range of Simulated Total Season Wins

Simulated Season Wins

Occurrences out of 1000

at least 39

1000

at least 40

997

at least 41

994

at least 42

978

at least 43

944

at least 44

861

at least 45

751

at least 46

612

at least 47

474

at least 48

324

at least 49

179

at least 50

92

At this point we have some sense of how many games the Denver Nuggets might win for the season. In more than 90 percent of the runs Denver wins at least 43 games. Is this enough to make the playoffs? In the previous year it took 44 wins to make the playoffs in the Western Conference while 38 would have been enough in the East. To better assess Denver's chances for the playoffs we may need to know how the number of Denver wins compared to the competing teams.

2.3. Modeling the Top Performer from Four Teams

At the time of this exercise, Denver was in eighth place in the West, the final playoff spot. Teams in spots 9, 10, and 11 ( Seattle , Utah and Portland ) could be regarded as threats to Denver's position. Our next step was to run similar simulations for these three teams and compare the number of wins (, Note: when running simulations only one spreadsheet should be open at a time.). Running a simulation for a new team merely requires entering a new schedule (a lookup functions will check records and recalculate probabilities). However we need to take into account instances in which each of these teams plays one another. For example Denver plays Utah at Utah . Although the probabilities that Denver wins and Utah loses add to one, we don't want to use two different random numbers to simulate the outcome of a single game. Here we choose to randomly generate outcomes for home games and use the results to determine those of the road games. For example, for the Denver at Utah game, the outcome on the Denver sheet is linked to the outcome on the Utah sheet. If a "lose" shows up for Utah , a "win" will be entered for Denver and vice versa. (See the lower highlighted cell in Figure 1.) The spreadsheet contains simulation sheets for each of the four teams. Each sheet duplicates the basic structure of the Denver sheet described above.

When the season for all four teams is simulated, an IF statement is used to record a "yes" if Denver's win total equals the maximum of the four win totals and a "no" otherwise (ignoring ties). As shown in Figure 2, a Data Table is used to perform a 1000 replications and a COUNTIF statement is used to count the percent of "yes" results.


Figure 2. First of Four

In one simulation of 1000 runs we observed that Denver's wins exceeded the number of wins of the other three teams 98% of the time. Our confidence that Denver will make the playoffs has increased.

2.4. Modeling the Top Three Performers of Seven Teams

Can we be even more confident? Currently Denver is actually tied with Memphis and Houston for the 6th, 7th, and 8thplaces. Hence there is a chance that one of these teams might falter, improving Denver's odds. Our third simulation includes the 6ththrough 12thplace teams and attempts to determine whether Denver would place in the top three of the last seven teams. We are assuming the current top five teams will be in the playoffs and only three spots are yet to be determined.

We can count the wins for each of the seven teams as we did before in the four teams simulation. In seven teams.xls the sheets for the individual teams (other than Denver ) are hidden to simplify the readers interaction with the spreadsheet. The sheets are not protected and can be easily unhiden to show a structure identical to those in the four teams' sheets. Once the seven team performance is simulated the RANK function is used to determine the rank of Denver within these seven teams. A Data Table is then used to simulate 1000 replications with the rank of Denver as the table output. Figure 3 includes these seven team calculations.


Figure 3. Top Three of Seven

Also, Table 3 shows the summary of the simulated results. Since the top three teams will qualify for the playoffs (along with the assumed first five) Denver misses the playoffs only 1% of the time (actually 9 out of 1000). By all three models Denver's and Carmelo Anthony's chances of making the playoffs seem very high.

Table 3. Denver’s Ranks as a Percent of 1000 Runs

Denver's Rank

Percent of Runs

1

34%

2

33%

3

32%

4

1%

5

0%

6

0%

7

0%

3. The Other Guy

Syracuse and Denver fans must admit that there is a second super rookie this year by the name of LeBron James. What are his chances of leading the Cleveland Cavaliers to the playoffs? At the time of the exercise the Cavaliers were in 11thplace in the East but talking confidently of rising to the 8thand final spot. The 8ththrough 11thspots were held by Boston , Miami , Philadelphia and Cleveland , in that order. A new spreadsheet was created for these four teams and a thousand endings to the seasons were simulated. It was assumed that only the top team would make the playoffs from these four teams (the seven higher teams are substantially above Boston , the current 8thand last qualifier). Table 4 contains the percent of trials (out of 1000) in which each of the four teams gained that last spot.

Table 4. Percent of Trials each Team Made the Playoffs

Teams

Percent of Trials Team Made Playoffs

Boston

26%

Miami

58%

Philadelphia

12%

Cleveland

4%

As modeled, Cleveland actually has very little chance of making the playoffs. Also the current 8thplace team, Boston , does not have the best shot. Miami is the favorite to gain the eighth position. Some of the students observe that this is consistent with Boston's poor play after all their recent trades and the resignation of their coach. Of course, our model knows nothing of that. What it does know, as summarized in Table 5, is that Miami has a better home record than the other teams and plays more home games down the stretch than the other three teams.

Table 5. Home Game Advantages

Teams

Home Win PCT.

Home Games / Remaining Games

Boston

0.39

13/27

Miami

0.60

16/27

Philadelphia

0.48

14/28

Cleveland

0.54

15/29

The students' observations however allow for an appropriate discussion about what our models do and do not take into account. If all teams play the remaining games at the level they played the first part of the season our results are likely to be a very fair representation. However, this analysis was performed just before the deadline for teams to make trades. As the students point out, the Boston team playing after the all-star break is a very different one from the one that played the first half of the season. This is also the case for other teams including Cleveland . Our model ignores all this and assumes the teams will play in a manner consistent with the way they played the first fifty some games. Obviously if the probabilities are based on historical data and the future is sharply different form the past than the results are less reliable. We could attempt to make the results better by reducing the probabilities that Boston will win games (at least based on their last 10 games) and we could increase the probabilities that Cleveland will win based on the improved team and perhaps the maturing of LeBron James. Unfortunately these changed probabilities might represent biased preferences.

What we can do is ascertain how much better Cleveland will have to play to have a reasonable chance at the playoffs. To do this we increase the probability of winning both home and road games until Cleveland makes the playoffs more than half the time. These incremental results are shown in Table 6. As the table shows, Cleveland's performance has to improve to well above the league average in order to have a reasonable chance. Of course a 4% chance is still a chance, so no fan should give up yet. And that too is a lesson from simulations. A lot of things can happen.

Table 6. Cleveland's Chances

Cleveland's Chance at Making Playoffs Assuming Different Win Percents

 

home win %

road win %

overall win %

chance

Cleveland current

0.54

0.22

0.377

4%

at league ave.

0.62

0.38

0.491

27%

above league ave.

0.65

0.41

0.528

38%

even higher

0.69

0.44

0.566

53%

4. Comparison of Model Prediction with Actual Outcomes

One of the many values of simulating sporting events is that we can compare our model predictions to actual outcomes within a relatively brief period of time. In our case Denver began losing some key games (including a game lost due to acknowledged referee error). Students started wondering about the validity of our model. This created the opportunity to discuss the nature of probability and the extent to which a simulation gives a range of possible outcomes any of which could occur (and others as well). Our simulation did include events in which Denver did not make it. We began periodically entering the actual results to date, recalculating probabilities, and re-simulating the rest of the season. The first recalculation dropped the percent of times Denver qualified to around 70%. As Denver started losing more games than predicted (on average), the recalculated lowered probabilities of winning produced lower likelihoods of making the playoffs. Figure 4 contains a chart showing those changing odds over the last part of the season.


Figure 4. Predicted probabilities that Denver makes the playoff as season progresses.

Eventually the regular season ended. To our relief Denver did make the playoffs, Cleveland did not, and Miami did, all as predicted. As we were prepared to argue (if Denver didn't make it), no single outcome says a great deal about the validity of a simulation model. In this exercise however we were actually simulating the outcome of over 150 games (in the Western Conference alone) and then keeping track of the playoff outcome. One way to evaluate the model is to compare the number of games won by each team with the expected number of wins based on the estimated probabilities. Furthermore, rather than comparing point estimates we can look at a distribution of predicted number of wins. This distribution can be estimated by using the average probability of each team winning its remaining games as the probability for a binomial random variable with the number of games as the number of trials. Figure 5 contains a 95% confidence interval for the predicted number of wins for each team along with the actual number of wins (marker). In every case the actual number of wins falls within the estimated distribution. It is true that Denver's number of wins was below the expected value and Portland's and Utah's wins were above the expected average. However, more often that not, results are going to be either above of below the average. In reality Portland and Utah did compete with Denver to the very end of the season for that final spot.


Figure 5. Actual # of wins versus a predicted 95% confidence interval.

5. Conclusions

This exercise has proved to be a very useful experience in the classroom. It is important to note that it was designed and used only in one semester (Spring of 2004). The effectiveness or the impact on student learning has not been measured in any formal way. Nonetheless, the anecdotal evidence points to a very useful approach in introducing simulation and its various components. Students were very enthusiastic about the model and its results. The basketball example coupled with the local interest in the lead player (Anthony) contributed greatly to students' continued interest and desire to explore the model further. Beyond the basic simulation model we were able to maintain a flexible agenda for the exercise. The four teams and seven teams' analyses were a direct result of further probing initiated by the students. Also all subsequent analyses including tracking and updating Denver's chances, developing confidence intervals for the simulation results, and evaluating the Cleveland team, were instigated by students' inquiries. We were able on several occasions to demonstrate the limitations of the model as well as our ability to interpret the results. Finally, where students were inclined to explain the outcome using factors not included in the model we were able to clearly point that out.

Overall, this exercise allows the instructor to reinforce important aspects of simulation modeling and modeling in general. Specifically issues related to probabilistic modeling, validity of simulation models, and what if analyses can all be address with this example.

In most simulation models the assumed probability distributions are estimated based on historical data. Whether the future is in fact well represented by this historical data is always a concern. This exercise allows students to fully understand this in a familiar context. The exercise also provides an opportunity to evaluate the quality of these estimated probabilities after a significant amount of actual data becomes available. The model can be used to reinforce the fact that the probabilities used in defining simulation models can have considerable impact on the results and validity of the model. As the probabilities of winning changed over the season, the likelihood of Denver making the playoffs changed considerably.

There is also room for meaningful student discussion about the extent to which these models fail to describe the real world exactly but still give useful information. For example, we could try to improve the calculation of probabilities to include the impact of a team's schedule in determining their record for the first part of the season. Did some teams play a weaker early schedule? If we attempted to use the existing data to ascertain this would we be using smaller and smaller samples to determine our probabilities and would we face diminishing returns for our efforts? Would it make any difference if we included all 29 teams in our model?

What-if analysis is an important part of any modeling exercise. We were able to demonstrate the usefulness of varying the probabilities used in the model to see if any reasonable variation in the probabilities really gave Cleveland a good chance of making the playoffs.

With this exercise students can see the value of Monte-Carlo simulation in a way that is fully transparent and in a context they understand. At the same time, and just as important, they get to experience using data tables and other useful Excel functions. Some students will get excited about the basketball results, some about the power of simulations, and some about what they can do with Excel. Hopefully we have improved the chances that some students will get excited about something.

References

Ammar, A and Wright, R. (2001), "What Chance Does the USA Have of Going to the World Cup?: An Example of Spreadsheet Monte-Carlo Simulation using Visual Basic," Proceedings of Decision Sciences Institute National Meeting.

Evans, J. (2000), "Spreadsheets as a Tool for Teaching Simulations," INFORMS Transactions on Education ,

Evans, J. and Olson D. (2002), Introduction to Simulation and Risk Analysis , 2nd Edition, Prentice Hall, New Jersey.

Lock, R. (1997), "NFL Scores and Point Spreads," Journal of Statistics Education, Vol. 5.

Nettleton, D. (1998), "Investigating Home Court Advantage," Journal of Statistics Education, Vol. 6.


To download a printable version (pdf) of this paper, click here. To download the Adobe Acrobat reader for viewing and printing pdf files, click here.
To reference this paper, please use: 
Ammar, S. and Wright, R. (2004), "Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League," INFORMS Transactions on Education, Vol. 5, No 1,  http://ite.pubs.informs.org/Vol5No1/AmmarWright/