Volume 3, Number 3, May 2003


The Conduct of an Effective Simulation Study
Richard W. Conway and John O. McClain
Johnson Graduate School of Management
Cornell University


Although the software tools to develop simulation models have improved dramatically, relatively little has been written about how to most effectively exercise those models. This paper provides warnings and suggestions to improve simulation studies. Although based primarily on the authors' experience with discrete-event models in manufacturing situations, many of their observations are more widely applicable.

Educators are facing a new challenge – increasing demand for including simulation in curricula, as complete courses or as a topic in existing courses. Simulation is no longer "for specialists only". Graphical simulation packages such as Extend and Arena, and spreadsheet-based versions such as Crystal Ball and @Risk have put this tool into the hands of end-users, and into many more courses in business and engineering. (For a review, see Swain, 2001.) Applications abound. A quick visit to the web sites of some of the software suppliers reveals some of the volume and variety of situations where simulation has been used, including Six-sigma training, real options analysis, financial planning, probabilistic oil and gas reserve estimates, cost estimation for the Health Insurance Portability and Accountability Act, process operating costs in mining, and credit risk of corporate restructuring. The authors have used simulation in manufacturing and health systems, both in application and in research.

These packages make it possible to construct models of reasonably complex systems with relative ease. No longer do the time, effort and expertise required make the technique economically viable only for major projects. In education, students can begin modeling almost immediately, with only minimal instruction about the package they are using. In industry, the person with the problem can conduct the study without having to engage the services of a "professional".

However, this ease of use can be deceptive – it is now easier than ever to produce sophisticated-appearing results that are entirely misleading. We saw one example in which the simulation was used to estimate how long customers would have to wait. The model did a good job of representing the real situation, but the conduct of the study led to incorrect results. Specifically, when the rate of customer arrivals was increased to investigate a worst-case scenario (a very good idea) the average waiting time did not increase substantially (wrong). The problem was that the run-length (how long the simulation was observed) was too short to see what would eventually happen after a large increase in the customer load. A longer run-length showed that the customer wait was enormous in the worst-case scenario, and in several other scenarios as well. If the first results had been used to design the actual service system, the outcome would have been a customer-satisfaction disaster.

Effectively constructing a simulation model, and using it to learn something interesting and useful about a real system, is a challenging task. It requires a combination of competence, ingenuity and experience that is neither intuitively obvious nor trivially learned. Using this tool with inadequate training incurs a risk not unlike that of placing a sharp tool in the hands of inexperienced users. There is some prospect of accomplishing useful work, but also danger of doing harm. The title of the first simulation text (Tocher, 1963) is The art of simulation, which nicely reflects one of our main points – simulation packages are tools that require care and skill in their application. Fortunately, end-user simulation programs can facilitate teaching and learning because of the relative ease in constructing, running and changing models, and the ready access to both animation and statistical summaries to see what is actually happening.

However, the danger remains of producing sophisticated-looking results that are misleading or incorrect. The purpose of this paper is to suggest ways to avoid some of the more common hazards, and to increase the effectiveness of a simulation study by focusing on what is important. We have necessarily simplified the issues, but hope to provoke users and educators to think about them. None of these issues is itself new. Each has been addressed in at least one of the references, but none of the prior work really focuses on the conduct of a simulation study, and none pulls all these issues together in one place.

1. Strategy, Tactics and Mechanics

The success of a simulation study depends upon decisions at three levels—strategy, tactics and mechanics. Strategic issues are unquestionably the most important—if you study the wrong questions, it doesn’t matter how you conduct the study. Tactics is next—if you conduct the study poorly, it matters little how you model the elements of the system. That leaves mechanics at the bottom of the list, but user’s guides in general concentrate on the detailed mechanics of modeling. This emphasis suggests (incorrectly) that the user should focus immediate attention on the details of the model, a mistake that leads to wasted time and effort and often leads to incorrect conclusions and misses opportunities for real improvement.

One strategic issue is to clarify the goals of the study. Fundamentally, modeling should be undertaken to guide you in making changes to an existing system or in designing a new system. Hence, the dominant virtues of a model are the ease with which it can be changed, and how well it predicts the effects of changes on performance of the real system. A study guided by this goal from the outset can produce useful results far sooner, and in greater profusion, than one that begins by constructing a detailed model of the existing system. Furthermore, simulation models constructed under this goal have a longer useful life because they are designed to be changed. Relevant points are discussed in Sections 3 and 4.

The overriding goal of simulation modeling is to obtain insight into how the underlying process works, because this offers the best hope of stimulating ideas for change. Graphics and animation are valuable tools in this regard. It is simply incorrect to say (as some critics do) that including graphics and animation is a fad and a gimmick, at best useful for debugging the model. For example, statistical output of a simulation run may indicate that a particular machine is idle more often than it should be, but animation can help in figuring out why it is idle and how to prevent it. A study guided by the goal of obtaining insight has lasting impact because of the knowledge gained about the drivers of performance in the real system.

2. Design of Experiments

Design of experiments is a substantial and well-developed topic in the field of statistics. However, we often do not use the formal procedures of experimental design in our simulation studies, and this section will explain our point of view on this issue.

Experimental design was developed for applications such as agriculture, biology, and psychology. When we do a simulation, the conditions in our environment are very different:

  1. We exercise complete control over our experimental media: We can perfectly reproduce an "observation".

  2. We can experiment sequentially, examining each observation before planning the next, since we do not have a long germination or gestation period.

  3. The magnitude of variability is often an order-of-magnitude smaller than the magnitude of the effects we seek to measure. This is often true in manufacturing, but perhaps less so in service industries.

  4. The cost of our observations is relatively small.

In spite of these differences, experimental design can make our investigations more efficient, but the power and flexibility of simulation can be used to overcome many of the problems experimental design was created to solve.

The topic in experimental design most relevant to simulation is the exploration of response surfaces. In agriculture, for example, the yield Y of a variety of sweet corn is a function of the values assigned to a set of k control variables (e.g. temperature, fertilizer, moisture) as well as certain uncontrolled random phenomena. Ignoring variability, the values of Y represent a surface in k + l dimensional space, and the objective is to explore that surface and to discover the values of the control variables that result in an optimum value of Y. This is a long and arduous task when the data collection involves planting, watering, fertilizing, and waiting for the crop to mature.

Simulation studies have it much better. In a model of a Doctor’s office, Y might be the average time that a patient waits, and the control variables might describe the scheduling procedure used (time between appointments, number of patients per appointment slot, etc.) The simulation model gives us an experimental means of exploring the response surface. Each run of the model uses a particular set of values of the control variables and yields a single value of Y—the elevation of one point on the response surface. Our experiment consists of crawling over this surface in the dark, seeking the lowest elevation (lowest average waiting time) or the highest point (highest profit). Variability means that the surface is somewhat spongy, and one must take care to distinguish between lumps in a swamp and true uphill progress.

The shortcoming in this paradigm is that the objective of a simulation study may very well be how to change the surface, rather than just finding the best spot on the existing surface. That is, our experiments are only partly concerned with changes in parameters; changes in structure are at least as important.

It is nonetheless useful to keep in mind that a simulation is fundamentally a device to obtain measurements for a specific set of values for the control variables. A useful way to think about the process, and a useful way to present results, is to change the control variables one at a time.

Much of the methodology literature in simulation is based on this underlying view. It is concerned with reducing the length of the run required to achieve a given quality (i.e. confidence interval) of the measurement of Y, or conversely, to improve the quality of measurement of Y for a given length of run. However, this may not be a crucial issue for practitioners because extreme precision is often not necessary for locating designs that lead to high profits.

3. Scale of the Model: Start Small

The simulation world suffers from a curious malady. For some reason many people automatically assume that, to be useful, a simulation model of a process has to be a "complete" model—that is, a model with a one-to-one relationship between the elements of the model and those of the real system. Consider a factory consisting of two hundred machines and producing five hundred different products. Must a model of that factory have 200 machines and 500 products? Probably not. A simulation model of less than "full dimension" is usually adequately realistic to achieve the goals.

Many fields routinely deal with scale models and partitioned models, particularly in the early stages of a design study. Naval architects perform tank tests on small-scale hull models, even though hydrodynamic phenomena are highly non-linear and prediction of full-scale behavior based on model performance is very complicated and potentially risky. Aircraft designers exploring a proposed airframe start by simulating airflow over separate subsystems—wing-sections, weapons pods, etc. The point is that in many fields full-scale and complete-system modeling is not automatically the first step, and frequently not used at all.

The situation is essentially the same for computer simulation. It is usually not appropriate to begin with a complete-system model, and often not necessary at all. There are obvious risks associated with studying a scale model, but there are also real risks associated with "over-modeling". A full-scale model takes longer to build and longer to run, and the results are often much more difficult to understand and use—the nature of the forest may be obscured by the detail of the trees.

Even problems that eventually will have to be subjected to full-scale modeling should initially be attacked in sections, with scaled-down dimensions, and with many other radical simplifications. It is often the case that you don’t really know what the problem is, or where the problem is, until after you have done some preliminary modeling. Animation and graphical capabilities are uniquely valuable for such initial studies. Even if it is later necessary to construct a larger, more detailed model, the total cost of the two-phase study may well be less than that of an all-at-once, full-dimensional, frontal assault.

In general, modeling is an example of what computer scientists describe as a "worse than linear" process. The work involved is more than proportional to the size of the model. When you double the size of a proposed model, you usually more-than-double (maybe even quadruple) the overall cost, which includes constructing the model, conducting experiments, analyzing the results and presenting them in a useful manner.

You should try to use the smallest model with the least amount of detail that will provide the required information. Avoid the temptation to build a full-scale model with as much detail as your modeling tool will permit. For example, just because you can model machine breakdown, this does not mean that every model you build has to reflect the reality of breakdowns. Although presumably all machines have some probability of breakdown, in many studies this may not be a relevant or significant part of the problem and should therefore be omitted from the model. Apply the KISS principle: "Keep It Small and Simple".

3.1 Partitioning

There are two fundamental ways to keep a model small: partitioning and scaling. Partitioning simply means studying sections of the operation separately, rather than trying to do it all at once. There are both parallel and serial partitioning strategies.

Serial partitioning means to study successive stages in a process separately. Look for "natural fault-lines" along which to partition the model into stages. Presumably, everyone recognizes the obvious partitions by geography or organizational boundaries. But there are also partitioning opportunities within what is apparently an integrated process. In a factory, a large storage area that can be assumed to always have some material, but never be filled to capacity, is a potential partition point. Processes upstream of that point do not affect those downstream, since their only link is this non-constraining supply/storage area. Another common example is to partition at a process that has been identified as a "bottleneck".

Parallel partitioning is simply recognition that many systems consist of replications of a basic production unit operating in parallel. The fueling islands of a truck stop are an example. Unless these parallel units interact in a significant way, it may be adequate to build a model with only one unit. For example, suppose you are studying the performance of an automatic storage and retrieval system for a warehousing operation. If each storage bay has its own transport system, you may wish to study different bin-assignment algorithms with a model of one bay.

3.2 Scaling

Scaling is more subtle than partitioning, and often takes more ingenuity and courage. It is the use of a smaller or less-detailed model when possible. But remember the goals of obtaining guidance and obtaining insight. For example, suppose you are responsible for redesigning an assembly line that produces batches of different products. Your concerns include how much inventory storage will be allowed, how to deal with defective units and equipment failures, and how large the production batches should be before stopping to change all of the equipment to produce the next item. Even if the real factory will consist of a hundred stations and will produce several hundred products, we would nevertheless start with a model of, say, a ten-station line producing five different products. We would experiment extensively with this model, and try to understand its behavior thoroughly, before even considering building a bigger model.

There are three possible outcomes of this preliminary study:

  1. The problem proves too difficult to understand, let alone find improvements. If this is true on a small model, it is unlikely that you will do better on a larger model. The bad news is that you will have to find another approach, but the good news is that you have just saved a ton of money by making this determination quickly with a small modeling investment.

  2. You come up with a modification so ingenious that it will have obvious benefit for lines of any length, or any mix. What are you waiting for? Begin implementation. Not only have you saved money by avoiding a full-scale modeling effort, but also your new design will be implemented sooner and its effect will hit the bottom line of your financial statement sooner.

  3. Your experiments on the scale model result in a proposal that needs to be tested on a larger or more detailed model. Although you have not avoided the larger model, you may now begin its construction with much better understanding of what is important and how the model is going to be used.

Scaling and partitioning are applicable in many different ways, depending on the application. However, whenever you find an opportunity, using these methods will reduce your effort in constructing the model and the time required to run it.

4. Tests of Validity

Obviously, a model must be "valid" in order to be useful. But it is surprisingly difficult to come up with a clear and precise definition of validity. The classical definition of validation is "reproducing performance of the real system." If you adopt this definition you are doomed to full-scale modeling and large-scale data collection.

Surprisingly, this type of validity check doesn’t really confirm validity. First, to produce results that are similar to those of the target system, you will probably have to "adjust" the structure of the model, the way in which it is run, or even the data describing the real system. When you are finished, there may be a real question as to whether the model faithfully reflects the system.

Second, and more importantly, the results of such a validity check, while comforting, may be largely irrelevant. The purpose of the simulation study is to predict behavior of systems that do not now exist. Knowing that the model has been adjusted to mirror the current operation provides no guarantee that it can accurately predict the performance of the operation as it might someday exist. This prediction is still a matter of judgment and faith. The validity check ritual that presumably strengthens that faith also consumes scarce resources—time and cost—that might be better invested elsewhere.

A fundamental misunderstanding is at the heart of this matter. The value of a model lies in what it does for the modeler, and not in what the model itself can do. The model should help us understand how a system works and yield insight as to what the critical aspects are. Much of this "education" occurs in the process of constructing the model. In some cases it is almost irrelevant whether the model can reproduce actual performance.

The usual way a model performs this magic is by showing direction—whether performance gets better or worse—when testing a variety of proposed changes. By doing this repeatedly, you get a sense of the shape of the "response surface" of system performance and some idea of how sharp the peaks and precipices are in that surface. These contrasts depend upon differences in performance between two versions of the model; the credibility of such differences is not tremendously enhanced by spending a great deal of time trying to reproduce actual history.

For example, a doctor’s office is considering changes to its "block scheduling" appointment scheduling system, going from "four patients per 15 minute block" to "eight patients per 30 minute block." It is far more important to understand whether this change will increase or decrease the average waiting time of patients than it is to verify that the model accurately reproduces the current average waiting time, plus-or-minus 30 seconds.

5. Using "Real" Data

There are many types of data. In this discussion "real data" means empirical or observational data, usually (but not always) collected specifically for the purpose of modeling. "Hypothetical data" may also be used in a simulation by choosing one of the available theoretical probability distributions and varying its parameters. This approach is often useful in the exploratory stages of a study. "Judgmental data" is often in the form of a guess about a mean or range of data, such as the range of times expected to repair a machine, or a forecast of customer arrival patterns. Deciding which kind of data to use will strongly influence the duration and expense of an investigation.

It is not always possible to base a simulation study on "real data", and not always necessary even when it is possible. It is, however, always painful, slow and costly to obtain and use such data. Therefore, you should resist the temptation, or the instructions, to collect real data with all the ingenuity you can muster. The collection process is much more difficult than you can imagine, and the result is much less valuable than you might think. Even in the real (that is, non-academic) world, there is life without real data. We offer the following assertions and suggestions.

Avoid collecting real data if you can. It is often possible to obtain useful and practical results from a simulation study with no real data whatever. For example, waiting lines in airports are organized into a single queue, with customers going to the first available agent. The superiority of this arrangement, compared to having a queue in front of each agent, is easily tested with a simple model. If this superiority holds up, regardless of the precise load of customers, speed of agents, or even on how much variability these things exhibit, the design decision can be made without actual data. (However, using the model for decisions about the size of the waiting area and the number of agents would require data collection.)

Postpone data collection until after you have built a model. This strategy allows you to use the model to determine what data are necessary and what degree of quality is required. The following examples concern a production line in a manufacturing plant. The model, like the factory, has 20 work stations arranged in series. In the first version of the model, the processing times at each station were "educated guesses."

Example 1: A series of experiments revealed that station 3 was not a critical factor—varying its mean processing time from 3 minutes to 10 minutes did not affect the overall throughput of the line. It was clear that the actual value (in the factory) was in that range, so no data collection was needed to estimate it more precisely.

Example 2: At station 4 the processing time varies quite a bit, so a probability distribution was incorporated in the model and some experiments were run to determine how important the variability is. The first two runs used a uniform distribution of processing times with a mean of 7. Processing times varied between 1 and 13 in the first run, and between 5 and 9 in the second. It was found that this difference strongly affects overall throughput of the line, so it was deemed worthwhile to spend more effort to estimate variability accurately for this process. However, if the effect of this change had been relatively small, you might just ask an experienced operator for a range.

Example 3: A similar approach can be taken to the shape of the distribution. Make runs with two or three distributions with the same mean and standard deviation of processing time. In the factory example, the differences were very small so it was concluded that, for station 4, data collection would focus on accurately measuring the mean and standard deviation, but the precise shape of the distribution was not needed.

Collect as little data as you can get away with, from the easiest source available. One low-cost data source is your workforce—verbal estimates from people in the actual system are often accurate enough, as illustrated in Example 1 above. If variability is important, you can also ask for a range of values, or even an estimate of how frequently those extreme values happen. A second source is data that have already been collected for another purpose. Setting out to collect new data should be a last resort.

Use common sense. For example, are there situations where random variation exists in the real system but may safely be ignored in the model? In the production line example, station 6 involved both machine-paced and manual operations. Both of these vary, but the modelers chose to represent the time for the machine-paced operation as a constant because its variability was negligible compared to that of the manual operations.

Understand that the collection of real data may have more to do with giving the impression of reality than with enhancing the utility of the results. As you will see below, collecting and using real data may provide very little information. However, you may need it to convince people that your results are believable.

5.1 Collecting "Real" Data

Sometimes you must collect data. It is difficult to believe how hard data collection can be if you have not had the misfortune of trying to do it. The following describes experiences in a manufacturing context, but the issues are the same in many situations.

You begin with the hope that someone else has already collected the appropriate data. "Processing time represents money. Surely the cost accountants will have the data we need." But immediately you run into a problem, because "distribution" and "variance" have different meanings to accountants. Furthermore their data have been selected, adjusted, aggregated and otherwise massaged to serve purposes of their own. You could easily end up with data that are undeniably "real", but essentially worthless for your purposes. Borrowing "production control" data will prove to be only slightly less frustrating.

So finally, in desperation, you get a stopwatch and go out to collect data of your own—assuming that this is an existing process and not a prospective one. You should eventually be able to accumulate a pile of data that are about as real as data can be. But, is the operator you observed representative of all operators? Did work habits change because you were watching? Did the ambient temperature have an effect? Did the operator encounter typical conditions while being measured? In other words, you now have a real sample from which you can collect information for your model, but it may not be typical or useful. Nonetheless, you now have some real data, and naive listeners can sometimes be encouraged to believe that your simulation results are more valid because of its presence.

In general, whenever you are driven to collect "raw" data—either by direct observation or by extraction from existing databases—you will find that it must be "edited" before it can be used. Editing is some combination of selection, filtering, or otherwise "cleaning it up". It is a highly subjective process in which you impose your conception of reasonableness upon what was ostensibly real. "Reality of the data" is not an absolute property; it is a question of degree of editing or abstraction. Once you understand this you tend to be a bit more liberal in your data collection practices.

The alternative to real data is presumably "artificia1" data, generated from "standard" distributions. But even these distributions need parameters (e.g. mean and standard deviation) that reflect properties of the real system. Hence the difference between edited real data and artificial data from distributions with realistic parameters is more one of degree than of kind. The proper way to pose the question is "What degree of reality do you need in your data?"

5.2 Sensitivity Analysis to reduce Data Collection

One useful way to reduce your data collection task is to invert the sense of the problem. For example, suppose your task is to determine whether one or two repairmen are required to service a certain group of machines. The answer depends on the mean-time-between-failures, so the obvious way to proceed is to collect data to estimate that mean, and then run your model to learn the best number of repairmen.

Alternatively, before you collect any data on time-between-failures use your model to determine the transition value between the one-repairman region and the two-repairman region. That is, using your model, vary the mean-time-between-failures until you find where the answer changes from one repairman to two. Armed with this information, go back and collect some data from the real system, but now your task is to determine whether the mean-time-between-failures is greater than or less than the transition value. A much smaller sample might be sufficient to answer this question than to estimate the mean to some arbitrary degree of precision.

And there is more good news about using the model in this manner. Typically, when a parameter is near a break-even point, there is very little loss if you make the wrong decision. That is just common sense. Suppose, in the above example, that the transition value for mean-time-between-failures was 37.7 hours. Below that value, one repairman is optimal, but near that value both solutions give approximately the same cost performance. The one-repairman solution will have a lower personnel cost but a higher cost of lost production because of machines that are waiting for repair. The opposite will be true for the two-repairman solution, but if the time-between-failures is near the transition value, the net cost will be nearly the same for both solutions.

This situation is important to recognize because it gives more freedom to the decision maker—if the two choices cost about the same, then one can use some other criterion to choose between them. For example, you might choose the two-repairman solution to provide better system reliability—they can cover for each other. Or, you might choose the one-repairman solution to avoid the need for duplicate sets of tools. The point is that the model can help you establish whether or not the system is near a break-even point, and how sensitive the system is to the choices you are about to make.

5.3 Beware of the Exponential Distribution

Finally, be wary of using the exponential distribution. It is often used for theoretical work because it makes the math easier, but real situations where it applies are very limited because of the very high degree of randomness that it imposes. Typically used to describe time between events, the exponential distribution is highly skewed, giving more weight to very long times than is typically found in practice. For this distribution the standard deviation cannot be lowered—it always equals the mean, which is a much higher level of variation than most distributions you will encounter in practice. And because reducing variation often is a primary target for improving system performance, the exponential distribution is often not well-suited for modeling.

True, it is a fundamental distribution in nature—if you are studying traffic accidents or other emergencies, equipment failures, or time between telephone calls, it is reasonable to assume that times are exponentially distributed. However, in many situations we pay people to make sure that times are not exponentially distributed. Randomness of patient arrivals is reduced by imposing a scheduling system. The same is true for arrival of airplanes, taxis and raw materials. Workers reduce randomness by speeding up and slowing down in response to variations in the amount of work, or sometimes by helping one another. These active efforts to reduce variation guarantee that the exponential distribution is the wrong one to use.

6. Analysis of Results

The funniest part of many simulation studies is the solemn announcement that the result is "statistically significant at the 5% level". There are a few situations where such reassurance is appropriate, but more situations where it is pretentious and confusing. For example, suppose you have conducted a careful comparison of designs X and Y, and have used extremely long simulation runs to establish that X is marginally better than Y. One way (common to academics) to report these results is "X is significantly better than Y at the 5% level." However, while this statement conveys that we have discovered a "real" difference, it does not convey whether or not that difference is of any practical importance.

A more practical conclusion might be "X and Y have nearly identical cost performance under a wide variety of common situations, and X is better only under conditions that are unlikely to be encountered." The point is to differentiate between "statistical significance" and "practical significance". Both are important. Statistical significance means "You can believe it" and practical significance means "It matters."

The purpose of establishing statistical significance is to assure your readers that you have not been deceived by variability. That is, if someone else did the same experiments, would they get the same results? As an absolute minimum, you must replicate your principal runs—repeat the runs using a different seed in the random number generator. This will give you some idea of the inherent variability of your measurement process. Unless the difference between alternative X and alternative Y is large compared to the variation when alternative X is run with different random numbers, you probably haven’t learned anything worth reporting.

In general, the details of a statistical hypothesis test have little place in a simulation study. Suppose, for example, that you are comparing two variations, A and B, of some model. Assuming that the structural difference between A and B is non-trivial and worth testing, you know a priori, that there is some difference in behavior between the two systems—perhaps miniscule, but nonetheless real. If, therefore, you fail to reject the hypothesis of "no difference" between Ra and Rb, you are simply reporting that either the experimental conditions were inappropriate or the run lengths were inadequate to detect the difference. On the other hand, if you reject the hypothesis of "no difference", there is either a big difference, or you have a marvelously precise measurement tool—and the reader must guess which is the case.

You may avoid this embarrassment by making sure that your runs are of sufficient duration to assure that any differences large enough to be of practical interest are also, without question, statistically significant. Here are some practical observations:

  • If you need a formal test of statistical significance to establish the credibility of a simulation result, you are doing research and not engineering.
  • The inherent approximation in a simulation model is generally at least as great as the statistical variability, so if you need formal statistics to prove your result, don’t believe it.
  • If you can’t see it with the naked eye and need statistics to believe it, forget it.

7. Concluding Remarks

Conceiving, designing, constructing and using a simulation model is a process that mixes art and science. Judgement is an essential element, and its skillful application can make or break a project. Effective use of judgment should be a central element in learning how to use simulation. But the views expressed in this paper often conflict with the usual ways of teaching and practicing simulation. Our theme is that the overriding goal of simulation modeling is to obtain insight into how the underlying process works. This paper illustrates how to apply that theme to the various aspects of modeling.

Simple models are more valuable than most people realize. Keeping the model as simple as possible not only keeps the time and expense under control but also increases the likelihood of useful, understandable results. Sometimes called "Quick-and-Dirty Analysis," this is one of the most important guidelines for using simulation in decision making. It has been made easier and more widely available by the advent of simulation packages with animation and other graphical tools.

When more detail is required, be as frugal as possible. Refining a model to make precise predictions is expensive and there is no guarantee of success. As detail increases, complexity explodes. This increases the time and expense of model development and data collection. Perhaps even worse is the fact that results are more difficult to understand because there are too many variables. As a consequence, many expensive simulation studies are valuable only in the most basic of their results, and most of the report gathers dust.

A model can be used in its own design. At every step of a project one can learn more about the process being modeled, and use that knowledge to guide the next step. In many cases this approach will bring a project to an early and successful end by identifying the most important changes and revealing why they should work. In a few situations, continual refinement leads to a large and complex model before useful results are obtained. However even in these rare cases, useful discoveries inevitably arise during model development, and they often have immediate impact on operations.

Using "real data" does not really validate a model. It can be expensive, frustrating and misleading. "Real data" is seldom real, but rather it is edited and adapted to the situation. Furthermore, the "validated" model has usually been modified to fit that data, raising the question of whether it remains suited to predicting what will happen in different situations.

A good model can predict the direction of a proposed change, and distinguish between large and small effects. That is the most important information that a decision-maker needs. The "Quick-and-Dirty" simulation packages reduce the effort required to construct a model and often help to produce low-cost results that are easy to understand. However, these tools cannot make a comparable reduction in the skill required to design an effective model or conduct an effective study. It is hoped that the ideas presented in this paper will point out some of the pitfalls as well as useful tricks of the trade, so that simulation tools may reach their full potential in the hands of the end-user.


Swain, James J. (2001), "Power Tools for Visualization and Decision-Making," OR/MS Today, Vol. 28, No. 1.

Tocher K.D. (1963), The Art of Simulation, English Universities Press, London.

To download a printable version (pdf) of this paper, click here. To download the Adobe Acrobat reader for viewing and printing pdf files, click here.
To reference this paper, please use: 
Conway, R. W. and J. O. McClain (2003), "The Conduct of an Effective Simulation Study," INFORMS Transactions on Education, Vol. 3, No 3,  http://ite.pubs.informs.org/Vol3No3/ConwayMcClain/


Suggested Readings

Conway, R. (1963), "Some Tactical Problems in Digital Simulation," Management Science, Vol. 10, pp. 47-61.

Conway, R., W. L. Maxwell, J. O. McClain and S. L. Worona (1990), User’s Guide to XCELL+ Factory Modeling System, The Scientific Press, South San Francisco.

Law, A. and W. D. Kelton (2000), Simulation Modeling and Analysis, McGraw-Hill Book Co., New York.

Pidd M. (1999), "Just Modeling Through: A Rough Guide to Modeling in OR," Interfaces, Vol. 29, No. 2, pp. 118-132.

Pidd M. (1998), Computer Simulation in Management Science, John Wiley & Sons, Chichester.

Pritsker A. (1995), Introduction to Simulation and SLAM II, John Wiley and Sons, New York.

Powell, S. G. (1995), "The Teacher’s Forum: Six Key Modeling Heuristics," Interfaces, Vol. 25, No. 4, pp. 114-125.

Rivett, B. H. P. (1994), The Craft of Decision Modelling, John Wiley and Sons Ltd, Chichester, England.

Robinson S. L. (1994), Successful Simulation: A Practical Approach to Simulation Projects, McGraw-Hill, Maidenhead.

Schruben, L. W. (1995), Graphical Simulation Modeling and Analysis Using Sigma for Windows, Boyd and Fraser Publishing Co., Danvers, Ma.

Shannon, R. E. (1995), Systems Simulation, The Art and Science, Prentice-Hall, Englewood Cliffs, NJ.