Abstract
This paper describes a simulation exercise designed for introductory quantitative
method classes both at the MBA and undergraduate level. The exercise tracks the performances
of teams in the National Basketball Association (NBA) during the season of 2004.
It is designed as a spreadsheet model and is developed in stages throughout the academic
semester. The example is a significant illustration of the use of sports as a vehicle
for teaching OR topics, specifically simulation. It also incorporates many spreadsheet
modeling skills such as the use of Excel functions and Data Tables. The model provided
a good mix of the ingredients for an effective simulation example including a variety
of answers, challenges and surprises.
For years, sports have been used in teaching Statistics (Lock, 1997 and
Nettleton, 1998). More recently, sports have been used
as a motivating context for introducing simulation (Ammar and Wright, 2001).
The probabilistic nature of the competitive outcomes, the desire to predict these
outcomes, and the variety among the relationships between the outcomes, provide a
rich array of applications for simulation models. Excel spreadsheets give instructors
the ability to move away from 'toy' examples and introduce, in a manageable format,
real and meaningful illustrations (Evans, 2000). The most effective sports related
examples capitalize on current events of general interest that stimulate curiosity
and inquisitiveness beyond those who may be considered diehard fans.
This paper describes an exercise that predicts the impact of two rookies in the
National Basketball Association (NBA). One of the rookies, LeBron James, was the
number one draft in the NBA for 2003 and generated much media attention. The second
rookie, Carmelo Anthony, led his college team for Syracuse University to its first
national championship in basketball. The championship was by far the most important
sporting result for the central New York region in recent years. Thus following the
immediate progress of this local 'hero' guaranteed for us the broad student interest.
Student interest and curiosity were essential in sustaining their enthusiasm as we
explored various modeling tools and concepts for the simulation model. Although specific
to these rookies and their draft teams, the exercise can easily be generalized and
updated to predict performances of any teams in the NBA.
In this paper we demonstrate the details of the simulation model using Carmelo Anthony's
team, the Denver Nuggets. In the previous year and prior to his draft the Nuggets
won only 17 out of the season's 82 games. By the all-star game of 2004 and with the
help of Anthony, the Nuggets had already won 31 games and seem to be well on their
way to the playoffs. The example described in this paper joins the league at the
point of the all star game (after 55 games) and includes a Monte-Carlo simulation
for the performance of the Nuggets in the remaining games of the season. In determining
the playoff chances of the Nuggets, the simulation includes the performances of the
team's nearest competitors. Similar assessments are also performed for the Cleveland
Cavaliers, the team that drafted LeBron James.
The exercise is designed to help achieve several objectives. To run a successful
simulation the first step tends to focus on defining the relevant probabilities.
The example introduces a method for estimating probabilities that is intuitively
acceptable and follows the rules of probability. In this paper the process of estimating
the probabilities is simple (by design) in order to avoid the need for extensive
discussion and coverage of advance topics in probability theory. Another objective
of this example is to introduce students to the simulation capabilities of Excel
including the use of data tables to replicate observations. Also, this example introduces
the concept of simulating events that are related. For example simulating the outcome
of the Seattle at Denver game determines the outcome of the Denver home game as well
as the outcome of the Seattle away game. Another important objective of the example
is to demonstrate for students how simulation can be used to provide answers to questions
beyond the specific simulated events. Simulating the outcomes of games for the season
allows us to explore the chances of the team returning for the playoffs. The final
part of this paper focuses on the process of assessing and updating the estimated
probabilities as the season progresses and games are won and lost.
This example is used in our management science class to introduce concepts and basic
skills in spreadsheet simulation. This is a core junior level class required of students
majoring in business and accounting. Students entering this class are expected to have
completed the introductory statistics requirement. The simulation is done entirely
in Excel. Large numbers of trials are executed using data tables and other useful functions
in Excel. The use of add-ins such as Crystal Ball and @Risk is introduced at the higher
(or senior) level simulation class and are not needed for this particular exercise.
The model involves an attempt to simulate the remaining 27 games for the Denver
Nuggets. Once the outcome of these games is simulated the next step is to assess
the Nuggets' chances of making the playoffs. This process is designed in several
stages. The first stage is that of estimating the winning probabilities (percentage)
by team in the league. It is important to recognize that these probabilities vary
for each team depending on whether the game is played at home or away. The second
stage is to simulate the number of Denver wins for the remaining scheduled season
based on the estimated probabilities. Finally in assessing Denver's chances of reaching
the playoffs, the model examines Denver's nearest (slightly above or slightly below)
competitors by simulating each of their performances and comparing Denver's performance.
The probabilities could be estimated by using information on teams' performances in the previously
played games. In the first 55 games of the season, Denver's winning percent was .582. (Note:
percent is typically referred to on sport pages as a number between 0 and 1. For convenience
we maintain this convention for the data in this paper). We could use .582 as the probability
of winning each game. However, in the NBA there is a significant difference between winning percents
for home games and for games on the road. Table 1 shows Denver's home and road winning percentages
as well as the average of the entire league. It also shows the same percentages for two select
teams (for purpose of illustration).
Table 1. Select Winning Percentages
|
Winning Percentage |
|
Home |
Road |
League Average |
.62 |
.38 |
Denver |
.72 |
.42 |
New York |
.55 |
.38 |
Indiana |
.78 |
.67 |
Since Denver has won 72% of its home games we could begin by assigning a probability
of .72 to Denver winning a future home game. However, the quality of the opponent would raise
or lower that probability for a particular game. For example it might be reasonable to assign
a probability of .72 to Denver beating New York at home since New York's road winning percentage
matches that of the league. That is New York is an average road team. If the probability that
Denver beats New York at home is .72, then the probability that New York wins that game would
be 1 - .72 or .28. This is lower than New York's road average, indicating that Denver is a better
than average home team. In fact the extent to which New York's probability of winning is reduced
(.38 to .28) of course matches the extent to which Denver is a better than average home team
(.72 - .62). A possible generalization follows.
Consider two teams playing a particular game, team A, the home team and team B, the road team.
Let Hper= proportion Team A wins when playing at home.
Let Rper= proportion Team B wins when playing on the road.
Let LHper= average proportion all league teams win when playing at home.
Let LRper= average proportion all league teams win when playing on the road.
We can then estimate the probability that the road team wins as (team B over Team A):
Rper- (Hper- LHper). (1)
In our example, the estimated probability that New York beats Denver on the road is:
.38 - (.72 - .62) = .28 .
Similarly we can estimate the probability that the home teams wins as:
Hper- (Rper- LRper). (2)
In our example, the estimated probability that Denver beats New York at home is:
.72 – (.38 - .38) = .72.
Also, if Denver is playing Indiana at home the probability of Denver winning is:
.72 – (.67 - .38) = .43, a lower value since Indiana is a better than average road team.
It is important to check that the estimates for probabilities of complementary events add to
one. If we add the probability that the road team wins (as calculated in (1)) to the probability
that the home team wins (as calculated in (2)) we get:
Rper- (Hper- LHper) + Hper- (Rper- LRper) = LHper+ LRper= 1.
This relationship suggests that equations (1) and (2) provide possible estimates for winning
probabilities. These estimates take into account the relative position of any particular team
in the league.
Figure 1 shows Denver's simulation sheet . It includes Denver's schedule and the
appropriate home or road records of each opponent. Average league results are also included.
Denver's schedule and the league standings at the time of this example were downloaded from
a popular sports site. Denver's win percentages and the league win percentages were included.
Using equations (1) and (2) (and IF statements for home or road), the probability of Denver winning
each game is calculated.
 Figure 1. Denver Simulation
The outcome for each game is simulated by using RAND, the Excel random number function for a
uniform distribution between 0 and 1. If the random number is less than the probability of Denver
winning that game a "win" is recorded. Otherwise "lose" is entered in the simulation column.
(Results in highlighted cells are explained in the following section.) A COUNTIF statement is
used to count the number of simulated wins. The total number of wins is the simulated number
of wins plus the actual wins at the time of the example.
Each time a recalculation is done, new random numbers are generated and new simulation results
recorded. The results of multiple simulations can be recorded using Excel's Data
Table (Evans and Olson, 2002). For a 1000 run simulation we found the average number of wins
to be 46 with a minimum of 39 and a maximum of 55. Table 2 shows the number of times a range
of wins occurred in the 1000 runs.
Table 2. Range of Simulated Total Season Wins
Simulated Season Wins |
Occurrences out of 1000 |
at least 39 |
1000 |
at least 40 |
997 |
at least 41 |
994 |
at least 42 |
978 |
at least 43 |
944 |
at least 44 |
861 |
at least 45 |
751 |
at least 46 |
612 |
at least 47 |
474 |
at least 48 |
324 |
at least 49 |
179 |
at least 50 |
92 |
At this
point we have some sense of how many games the Denver Nuggets might win for the season.
In more than 90 percent of the runs Denver wins at least 43 games. Is this enough
to make the playoffs? In the previous year it took 44 wins to make the playoffs in
the Western Conference while 38 would have been enough in the East. To better assess
Denver's chances for the playoffs we may need to know how the number of Denver wins
compared to the competing teams.
At the time of this exercise, Denver was in eighth place in the West, the final
playoff spot. Teams in spots 9, 10, and 11 ( Seattle , Utah and Portland ) could
be regarded as threats to Denver's position. Our next step was to run similar simulations
for these three teams and compare the number of wins ( , Note: when
running simulations only one spreadsheet should be open at a time.). Running a simulation
for a new team merely requires entering a new schedule (a lookup functions will check
records and recalculate probabilities). However we need to take into account instances
in which each of these teams plays one another. For example Denver plays Utah at
Utah . Although the probabilities that Denver wins and Utah loses add to one, we
don't want to use two different random numbers to simulate the outcome of a single
game. Here we choose to randomly generate outcomes for home games and use the results
to determine those of the road games. For example, for the Denver at Utah game, the
outcome on the Denver sheet is linked to the outcome on the Utah sheet. If a "lose" shows
up for Utah , a "win" will be entered for Denver and vice versa. (See the lower highlighted
cell in Figure 1.) The spreadsheet contains simulation sheets for
each of the four teams. Each sheet duplicates the basic structure of the Denver sheet
described above.
When the season for all four teams is simulated, an IF statement is used to record a "yes" if
Denver's win total equals the maximum of the four win totals and a "no" otherwise (ignoring
ties). As shown in Figure 2, a Data Table is used to perform a 1000 replications and a COUNTIF
statement is used to count the percent of "yes" results.
 Figure 2. First of Four
In one simulation of 1000 runs we observed that Denver's wins exceeded the number of wins of
the other three teams 98% of the time. Our confidence that Denver will make the playoffs has
increased.
Can we be even more confident? Currently Denver is actually tied with Memphis and Houston for
the 6th, 7th, and 8thplaces. Hence there is a chance that one of these teams might falter,
improving Denver's odds. Our third simulation includes the 6ththrough 12thplace teams and
attempts to determine whether Denver would place in the top three of the last seven teams . We are assuming the current top five teams will be in the playoffs and only three
spots are yet to be determined.
We can count the wins for each of the seven teams as we did before in the four teams
simulation. In seven teams.xls the sheets for the individual teams (other than Denver ) are hidden
to simplify the readers interaction with the spreadsheet. The sheets are not protected and can
be easily unhiden to show a structure identical to those in the four teams' sheets. Once the seven
team performance is simulated the RANK function is used to determine the rank of Denver within
these seven teams. A Data Table is then used to simulate 1000 replications with the rank of Denver
as the table output. Figure 3 includes these seven team calculations.
 Figure 3. Top Three of Seven
Also, Table 3 shows the summary of the simulated results. Since the top three teams will qualify
for the playoffs (along with the assumed first five) Denver misses the playoffs only 1% of the
time (actually 9 out of 1000). By all three models Denver's and Carmelo Anthony's chances of
making the playoffs seem very high.
Table 3. Denver’s Ranks as a Percent of 1000 Runs
Denver's Rank |
Percent of Runs |
1 |
34% |
2 |
33% |
3 |
32% |
4 |
1% |
5 |
0% |
6 |
0% |
7 |
0% |
Syracuse and Denver fans must admit that there is a second super rookie this year by the name
of LeBron James. What are his chances of leading the Cleveland Cavaliers to the playoffs? At
the time of the exercise the Cavaliers were in 11thplace in the East but talking confidently
of rising to the 8thand final spot. The 8ththrough 11thspots were held by Boston , Miami
, Philadelphia and Cleveland , in that order. A new spreadsheet was created for these four teams
and a thousand endings to the seasons were simulated. It was assumed that only the top team
would make the playoffs from these four teams (the seven higher teams are substantially above
Boston , the current 8thand last qualifier). Table 4 contains the percent of trials (out of
1000) in which each of the four teams gained that last spot.
Table 4. Percent of Trials each Team Made the Playoffs
Teams |
Percent of Trials Team Made Playoffs |
Boston |
26% |
Miami |
58% |
Philadelphia |
12% |
Cleveland |
4% |
As
modeled, Cleveland actually has very little chance of making the playoffs. Also
the current 8thplace team, Boston , does not have the best shot. Miami is the
favorite to gain the eighth position. Some of the students observe that this is
consistent with Boston's poor play after all their recent trades and the resignation
of their coach. Of course, our model knows nothing of that. What it does know, as
summarized in Table 5, is that Miami has a better home record than the other teams
and plays more home games down the stretch than the other three teams.
Table 5. Home Game Advantages
Teams |
Home Win PCT. |
Home Games / Remaining Games |
Boston |
0.39 |
13/27 |
Miami |
0.60 |
16/27 |
Philadelphia |
0.48 |
14/28 |
Cleveland |
0.54 |
15/29 |
The students' observations
however allow for an appropriate discussion about what our models do and do not
take into account. If all teams play the remaining games at the level they played
the first part of the season our results are likely to be a very fair representation.
However, this analysis was performed just before the deadline for teams to make
trades. As the students point out, the Boston team playing after the all-star break
is a very different one from the one that played the first half of the season. This
is also the case for other teams including Cleveland . Our model ignores all this
and assumes the teams will play in a manner consistent with the way they played
the first fifty some games. Obviously if the probabilities are based on historical
data and the future is sharply different form the past than the results are less
reliable. We could attempt to make the results better by reducing the probabilities
that Boston will win games (at least based on their last 10 games) and we could
increase the probabilities that Cleveland will win based on the improved team and
perhaps the maturing of LeBron James. Unfortunately these changed probabilities
might represent biased preferences.
What we can do is ascertain how much better Cleveland will have to play to have
a reasonable chance at the playoffs. To do this we increase the probability of winning
both home and road games until Cleveland makes the playoffs more than half the time.
These incremental results are shown in Table 6. As the table shows, Cleveland's
performance has to improve to well above the league average in order to have a reasonable
chance. Of course a 4% chance is still a chance, so no fan should give up yet. And
that too is a lesson from simulations. A lot of things can happen.
Table 6. Cleveland's Chances
Cleveland's Chance
at Making Playoffs Assuming Different Win Percents |
|
home win % |
road win % |
overall win % |
chance |
Cleveland current |
0.54 |
0.22 |
0.377 |
4% |
at league ave. |
0.62 |
0.38 |
0.491 |
27% |
above league ave. |
0.65 |
0.41 |
0.528 |
38% |
even higher |
0.69 |
0.44 |
0.566 |
53% |
One of the many values of simulating sporting events is that we can compare our model predictions
to actual outcomes within a relatively brief period of time. In our case Denver began losing
some key games (including a game lost due to acknowledged referee error). Students started wondering
about the validity of our model. This created the opportunity to discuss the nature of probability
and the extent to which a simulation gives a range of possible outcomes any of which could occur
(and others as well). Our simulation did include events in which Denver did not make it. We
began periodically entering the actual results to date, recalculating probabilities, and re-simulating
the rest of the season. The first recalculation dropped the percent of times Denver qualified
to around 70%. As Denver started losing more games than predicted (on average), the recalculated
lowered probabilities of winning produced lower likelihoods of making the playoffs. Figure 4
contains a chart showing those changing odds over the last part of the season.

Figure 4. Predicted probabilities that Denver makes the playoff as season progresses.
Eventually the regular season ended. To our relief Denver did make the playoffs, Cleveland
did not, and Miami did, all as predicted. As we were prepared to argue (if Denver didn't make
it), no single outcome says a great deal about the validity of a simulation model. In this exercise
however we were actually simulating the outcome of over 150 games (in the Western Conference
alone) and then keeping track of the playoff outcome. One way to evaluate the model is to compare
the number of games won by each team with the expected number of wins based on the estimated
probabilities. Furthermore, rather than comparing point estimates we can look at a distribution
of predicted number of wins. This distribution can be estimated by using the average probability
of each team winning its remaining games as the probability for a binomial random variable with
the number of games as the number of trials. Figure 5 contains a 95% confidence interval for
the predicted number of wins for each team along with the actual number of wins (marker). In
every case the actual number of wins falls within the estimated distribution. It is true that
Denver's number of wins was below the expected value and Portland's and Utah's wins were
above the expected average. However, more often that not, results are going to be either above
of below the average. In reality Portland and Utah did compete with Denver to the very end of
the season for that final spot.
 Figure 5. Actual # of wins versus a predicted 95% confidence interval.
This exercise has proved to be a very useful experience in the classroom. It is important to
note that it was designed and used only in one semester (Spring of 2004). The effectiveness
or the impact on student learning has not been measured in any formal way. Nonetheless, the
anecdotal evidence points to a very useful approach in introducing simulation and its various
components. Students were very enthusiastic about the model and its results. The basketball
example coupled with the local interest in the lead player (Anthony) contributed greatly to
students' continued interest and desire to explore the model further. Beyond the basic simulation
model we were able to maintain a flexible agenda for the exercise. The four teams and seven
teams' analyses were a direct result of further probing initiated by the students. Also all
subsequent analyses including tracking and updating Denver's chances, developing confidence
intervals for the simulation results, and evaluating the Cleveland team, were instigated by
students' inquiries. We were able on several occasions to demonstrate the limitations of the
model as well as our ability to interpret the results. Finally, where students were inclined
to explain the outcome using factors not included in the model we were able to clearly point
that out.
Overall, this exercise allows the instructor to reinforce important aspects of simulation modeling
and modeling in general. Specifically issues related to probabilistic modeling, validity of
simulation models, and what if analyses can all be address with this example.
In most simulation models the assumed probability distributions are estimated based on historical
data. Whether the future is in fact well represented by this historical data is always a concern.
This exercise allows students to fully understand this in a familiar context. The exercise also
provides an opportunity to evaluate the quality of these estimated probabilities after a significant
amount of actual data becomes available. The model can be used to reinforce the fact that the
probabilities used in defining simulation models can have considerable impact on the results
and validity of the model. As the probabilities of winning changed over the season, the likelihood
of Denver making the playoffs changed considerably.
There is also room for meaningful student discussion about the extent to which these models
fail to describe the real world exactly but still give useful information. For example, we could
try to improve the calculation of probabilities to include the impact of a team's schedule in
determining their record for the first part of the season. Did some teams play a weaker early
schedule? If we attempted to use the existing data to ascertain this would we be using smaller
and smaller samples to determine our probabilities and would we face diminishing returns for
our efforts? Would it make any difference if we included all 29 teams in our model?
What-if analysis is an important part of any modeling exercise. We were able to demonstrate
the usefulness of varying the probabilities used in the model to see if any reasonable variation
in the probabilities really gave Cleveland a good chance of making the playoffs.
With this exercise students can see the value of Monte-Carlo simulation in a way that is fully
transparent and in a context they understand. At the same time, and just as important, they
get to experience using data tables and other useful Excel functions. Some students will get
excited about the basketball results, some about the power of simulations, and some about what
they can do with Excel. Hopefully we have improved the chances that some students will get excited
about something.
Ammar, A and Wright, R. (2001), "What Chance Does the USA Have of Going to
the World Cup?: An Example of Spreadsheet Monte-Carlo Simulation using Visual Basic," Proceedings
of Decision Sciences Institute National Meeting.
Evans, J. (2000), "Spreadsheets as a Tool for Teaching Simulations," INFORMS
Transactions on Education , 
Evans, J. and Olson D. (2002), Introduction to Simulation and Risk Analysis ,
2nd Edition, Prentice Hall, New Jersey.
Lock, R. (1997), "NFL Scores and Point Spreads," Journal of Statistics Education, Vol. 5.
Nettleton, D. (1998), "Investigating Home Court Advantage," Journal of Statistics Education, Vol. 6.
 |
To download a printable version (pdf) of this paper, click here. To download the Adobe Acrobat reader for viewing and printing pdf files, click here. |
 |
To
reference this paper, please use:
Ammar, S. and Wright, R. (2004), "Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League," INFORMS Transactions on Education, Vol. 5, No 1,
http://ite.pubs.informs.org/Vol5No1/AmmarWright/
|
|