Abstract
This paper has two purposes: to help explain dynamic programs (DP) to students who have no background in procedural languages, and to quantitatively motivate good spreadsheet designs that can be developed and modified easily. Teachers and practitioners could benefit from using spreadsheets to solve the common problems that are suited to DP. We give examples of several important DPs, implemented in different ways with a spreadsheet.
We further analyze these DPs for their constructive complexity, which is the number of keystrokes required to write a spreadsheet for a given computational task, as a function of the input data. We show that a given DP can be written in several ways, with varying constructive complexity. The implementation can drastically affect the difficulty of writing and modifying a spreadsheet. The different examples demonstrate the value of good spreadsheet design.
Students in management science learn operations research techniques, such as linear programming, PERT/CPM methods, and dynamic programming (DP). Teaching is based on specific softwares that are mainly used in the academic world. Specialists in linear programming can use the Lindo package; others dealing with project management can use Microsoft Project and so on. But in contrast to linear programming and project management, no generic tools are available to solve DPs. So DP is often viewed as an algorithmic approach to combinatorial problems, rather than a tool. For these reasons, DP is not extensively studied by students learning management science, even though the underlying principles are foundations of optimization theory.
Spreadsheets are widely accepted in the business world and are not dedicated to specific techniques. Spreadsheets can be extended easily with add-ins. The spreadsheet is also accepted as a good modeling tool, as suggested by numerous textbooks. While spreadsheets are not particularly suitable for solving large dynamic programs, especially for hard problems with an exponential number of states, spreadsheets can be helpful in teaching DP to students. As we shall see, complex problems can be implemented in spreadsheets using just a few formulas. In this paper, we show that spreadsheets can reduce the gap between theoretical foundations of DP techniques and practical applications needed in any courses.
We identify the likely teaching environment for this material to be the lecturer in operations research or quantitative methods, with students in business who do not have a background in procedural languages. Engineering students may appreciate viewing the recursion graphically, but are likely to have done some programming, and therefore will not benefit as much. Anyone, though, can use our spreadsheets as templates. The lecturer can put Excel's handy sensitivity analysis tools to good use. So this paper is supplementary teaching material to the lecturer who wishes to give students a sense of the recursive nature of DPs, without resorting to a procedural language.
With our focus on the graphical layout, we do not use Visual Basic (VB). We want to test the ability of the spreadsheet to solve these models, and using VB would in fact be using a procedural language. VB, however, can be used to good effect, and writing a DP in VB is easy for someone with skill in programming. As an aside, we implemented a simple Gilmore-Gomory algorithm as a macro in Excel (Knapsack by VB.xls ). Surprisingly, the VB code took about 50 times longer than our simplest pure spreadsheet version. The code is in the Appendix.
Other people have used VB to good use with Excel. In fact, preceding these, Ho (1987) gave Lotus 1-2-3 macros to solve some classic DP problems (and LP problems). Jensen (2002) has a nice DP tutorial with Excel using VB heavily, along with many other excellent materials. Sarmiento (2003) uses VB templates elegantly to solve DPs.
In contrast to the VB approach, Neuwirth (1995) uses the spreadsheet to get insight into recursive relations for combinatorial formulas, and as a graphical tool to represent combinatorial relations between objects. Our paper is in the same spirit - we hope to get insight into the underlying algorithms, and how they work, using the graphical layout of the spreadsheet. We intend to use the graphical environment apart from procedural code, and we want to demonstrate good spreadsheet design, so VB is a digression. By exposing the DP in Excel, the student can visually see the number of cells required, and get an idea of the computational complexity and its recursive nature. In contrast, a tool like Solver hides the complexity intentionally, using the worksheet only to hold the input and output. So students do not always appreciate what is going on behind the dialog boxes. Furthermore, our analysis of constructive complexity has application beyond dynamic programs, but would not apply in the same way if we were to use VB.
We briefly survey a sample of textbooks with material on DP. These books may be divided into three sets, spanning a range from theoretical to practical.
The first set, at one end of the range, has the textbooks that are primarily about DP. These vary in level, with some appropriate for undergraduate applied courses, and others better suited for graduates. Except where noted, these books are theoretical, and lack any hints about how these DPs would be implemented. Beautifully written and typeset, Bather (2000) has plenty of simple numerical examples, and exercises (with partial answers in the back). The text is more theoretical in the latter half, and is aimed more at math students, with little about implementation. Beckmann (1968) ties DP clearly to LP. The text would be suitable for undergraduate math or economic, though it lacks exercises. Bellman & Dreyfus (1962) remains one of the most beautiful texts in spite of its age. Flowcharts show the logic. The text has lots of numeric examples. While it has no exercises, it is clearly intended for practical use, with sections titled, "Description of the Computational Procedure." The book frequently touches on issues of computer implementation. Cooper & Cooper (1981) is quite mathematical, and would be appropriate for graduate math or O.R. students. Dreyfus & Law (1977) give plenty of exercises and bless the reader with their answers in the back of the book. They give very little about how to actually implement DPs. The words "computer" and "memory" do not appear in the index. Nevertheless, the text is clear with nice direct language.
Gluss (1972) discusses computation time in some detail and shows a flow chart. The text is a bit heavy on equations, light on diagrams, and very light on applications. Hadley (1970) reads like the older classic texts, and discusses computing issues to some extent. His comments on DP are relevant to the problems of putting DPs in spreadsheets. "The reader ... will note that no two problems yielded precisely the same form of recurrence relations. This is a characteristic of dynamic programming. The recurrence relations obtained to solve the problem are strongly dependent on the particular structure of the problem being solved. For this reason, for example, it is not possible to develop a general computer code for solving all dynamic programming problems, as we were able to do for linear problems." Nemhauser (1966), in this reviewer's mind, is one of a few with a firm grip on the phrase "classic DP textbook". Though almost 40 years old, the writing is clear and direct. The flow charts are the closest thing to reading a computer program, but without the difficulties of the syntax. Puterman (1978), not a text, is a broad collection of useful papers, including applications. The print suffers a bit from being typewritten rather than typeset. Sniedovich (1992) addresses head-on the sometimes confusing question, "What is dynamic programming?" He uses nice direct language, but the text lacks exercises. The book would serve as a useful reference, especially for a theoretical course in DP. Bertsekas (2001) wrote two beautifully written comprehensive volumes, one on finite horizons, and the second for infinite horizons. The texts are appropriate for graduate students. Denardo (2003), the most recent of the texts, is shorter than the Bersekas work, and would be more accessible to undergraduates.
The second set of books with material on DP is the general OR textbooks. Wagner (1975), while an older text now, is beautifully written. The exercises are "pencil pushing," in Wagner's words, but he points out that these problems should be solved by a computer. He covers DP at some length, over several chapters. The material is still appropriate for undergraduate work, and highly approachable, it is just 30 years old. Winston (2003) has almost 50 pages on deterministic DP, the most thorough of the modern texts seen. He shows network diagrams and table calculations. His final section briefly covers DP with Excel. We use his knapsack example here in this paper. Winston has a separate chapter on probabilistic DP. Hillier, Hillier, and Lieberman (2000) have less material than Winston, and nothing on DP in spreadsheets, but still good basic coverage. Lawrence & Pasternack (1998) put a small section on DP in the companion CD-ROM, rather than in the printed text, but include specialized DP software. In general, the OR textbooks have good basic sections on DP. They cover the classic models in clear direct fashion. It still seems, however, that we have not moved much beyond Wagner's "pencil pushing" for DP, except for smatterings of spreadsheets and the occasional special bit of software.
The third set of texts with material about DP is the computer science texts. We found these to be quite appealing, in some ways, because the texts go right to the heart of the matter. How, exactly, do we get a computer to solve these problems? We mention two texts here. Weiss (1993) has only a short section on DP, and misses the applications that would be considered standard in O.R. However, he discusses the critical issue of using a table rather than recursion, and shows code in C. He goes into some detail about the number of operations required for a given implementation. Sedgewick (1988) similarly has a modest section on DP, discusses implementation issues in detail, giving Pascal-like pseudocode to show the algorithms. We expect almost any basic text on algorithms will have something on DP, and will emphasise implementation over DP theory. So the DP literature has quite a range of material, from theoretical without touching on implementation, to practical with little DP theory.
For the typical O.R. undergraduates, the textbooks do seem to have good discussions, and could use more only in implementation. Given the difficulties with creating a general program for DP, as Hadley pointed out, this is understandable. Hence, for implementation, we must look to the computer science textbooks − or spreadsheets.
The next sections present well-known DPs and several spreadsheet implementations leading to different constructive complexities and different pedagogical presentations. The spreadsheets are linked to the images in each section. The sections are as follows:
- constructive complexity, illustrated with the well-known Critical Path Method,
- general suggestions about designing a spreadsheet DP, illustrated with the classic knapsack problem,
- the batch scheduling problem - minimizing makespan for a single batching machine,
- minimizing the weighted number of late jobs on a single machine,
- minimizing the makespan on parallel unrelated machines.
A point of notation: the top of each spreadsheet shows the word "Formula" in a cell with a gray background. This indicates that every cell with a formula has that light gray background format. All other cells are constant.
Operations researchers evaluate the efficiency of an algorithm by its complexity using the big O() notation. This efficiency is the number of operations required to solve a problem as a function of the input, without considering technical details about computer technologies. Let I be the input to the algorithm and f(I) be the time to run the algorithm with input I. Then an algorithm is:
- a polynomial time algorithm if f(I) is a polynomial in the size of the input;
- a pseudo-polynomial time algorithm if f(I) depends on the size of the input (for instance the maximum value of input data);
- an exponential algorithm if f(I) is an exponentional function in the size of the input.
Most of Excel's built-in functions solve problems that can be calculated in polynomial time. Thus, the number of cells required to implement a DP in a spreadsheet leads directly to its computational complexity. Due to their computational complexity, even polynomial DPs can take up a lot of space in a spreadsheet. In addition to computational complexity, the spreadsheet writer should be concerned with finding a good design for the spreadsheet, to minimize the cost of writing and revising the spreadsheet if the problem parameters change. Similarly, lecturers should teach good spreadsheet design. These design issues have little to do with computational complexity. That is why we provide a new metric to qualify spreadsheets.
A spreadsheet that requires a few keystrokes to create and modify is generally considered a better design than a spreadsheet that solves the same problem, but that requires more keystrokes to create and modify. We therefore introduce the concept of constructive complexity. Constructive complexity is the number of keystrokes required to implement a given computational problem in a spreadsheet, as a function of the input data (i.e., the instance). The measure of constructive complexity will be based as for computational complexity on the "Big O" notation, to capture the asymptotic behavior of a function. To prove constructive complexity, a necessary condition is to give the keystrokes to accomplish the given task. Thus, constructive complexity quantifies some aspects of good spreadsheet design. Constructive complexity for a given task depends on the computer-user interface, the spreadsheet's features, and the writer's cleverness.
The referees have made several suggestions to improve our spreadsheets. Others may be able to improve these further. We welcome such suggestions. The point is that constructive complexity helps frame which design is better, and why. The lecturer should make this point to students, and can motivate this notion with our examples. A reasonable person may be able to conceive of extreme cases where an O(1) spreadsheet is harder to write than a spreadsheet of higher complexity. But the point is to get students to think about it.
For data entry, we can usually enter data in the format we like in the fewest keystrokes, then use formulas to convert the data to the format or layout that the spreadsheet requires. If we can create this conversion in O(1) keystrokes, then the data entry is effectively sunk. If we cannot do the conversion in O(1) keystrokes, then data entry affects the constructive complexity of the spreadsheet. A spreadsheet writer could require that data be entered more than once, whether from poor design or perhaps for data verification. So data entry may not always require a fixed number of keystrokes. Except for such strange cases, data entry will almost always require the same order of magnitude of input operations on any input device. We therefore will treat data entry as a sunk cost. We also note that a rectangular selection and fill are O(1) operations.
Seal (2001) gave a spreadsheet implementation for CPM and PERT. Ragsdale (2003) greatly improves on Seal's already good work. With CPM, we use recursion to find the longest path through a network of nodes and arcs. In this section, we briefly analyze Ragsdale's two spreadsheets for their constructive complexity.
Figure 1a shows Ragsdale's "Standard" spreadsheet example for CPM. We have modified it only in the formatting and some of the text. A glance at the formulas in Figure 1b for EST and LFT shows that these formulas are not consistent. Each one must be edited by hand, a cell for each node, with a reference for each predecessor. As Ragsdale points out, "numerous manual changes to formulas are required if project activities are added or deleted, or if predecessor relations among the activities are altered". Constructive complexity quantifies the difficulty. The constructive complexity of this spreadsheet is O(n2), where n is the number of activities, because each of the O(n2) arcs requires an edit by hand. Addition of a new node requires insertion of a row, and then filling in the row of formulas. Each EST and LFT formula cannot be copied, and must be written by hand in O(n) time, meaning that each formula could require an edit for all n nodes. If we think of the spreadsheet writer as the processor, the number of arcs and nodes as the input data, and writing the spreadsheet as an algorithm, the spreadsheet-writing algorithm would run in O(n2) time.
_CPM_spreadsheet_sm.png)
Figure 1a. Ragsdale's "Standard" spreadsheet example for CPM. Here and throughout this paper, formulas are gray, like the cell with the label "Formula."
Enlarge
Click here to download
_CPM_spreadsheet_formulas_sm.png)
Figure 1b. Formulas displayed for Ragsdale's O(n2) CPM spreadsheet in Figure 1a.
Enlarge
Figure 1c below shows Ragsdale's "New" CPM spreadsheet (which also has a nice Gantt chart which we do not show). Again, we have modified it only in the formatting and some text. This spreadsheet can be written and modified in O(1) time, meaning that the time required to make the spreadsheet is constant, regardless of the size of the model. Formulas do not need updating if the predecessors change, and addition or deletion of an activity requires only addition or deletion of the corresponding row. Thus, an analysis of the constructive complexity quantifies how much easier the spreadsheet is to write.
Interestingly, this version uses circular references. For example, cell G6 depends on H6, while H6 depends on G6. Even so, the spreadsheet is correct. The latest start time for a task depends on the latest finish for previous tasks. The ISERR() function returns TRUE if its argument is any error (except #N/A), else FALSE. The FIND(c, string) returns an error if character c is not in string. Note that Excel formulas inside curly brackets {} are array formulas, to be entered with Ctrl-Shift-Enter. Even if the student does not understand the details of these formulas, the overall effect of the stronger design should be obvious.
_CPM_spreadsheet_sm.png)
Enlarge
Click here to download
Key Cell Formulas
| Cell |
Formula |
Copied to |
| E6 |
{=MAX(IF(ISERR(FIND($A$6:$A$15,C6)),0,$F$6:$F$15))} |
E7:E15 |
| F6 |
=E6+D6 |
F7:F15 |
| G6 |
=H6-D6 |
G7:G15 |
| H6 |
{=MIN(IF(ISERR(FIND(A6,$C$6:$C$15)),MAX($F$6:$F$15),$G$6:$G$15))} |
H7:H15 |
| I6 |
=G6-E6 |
I7:I15 |
Figure 1c. Ragsdale's O(1) CPM spreadsheet.
In the next section, we discuss general principles of design for a spreadsheet DP, with constructive complexity in mind.
The basic principle of dynamic programming is that an optimal policy for a general optimization problem also leads to an optimal policy for every subproblem that belongs to it. This allows a complex problem to be decomposed into stages. At each stage, a decision is taken and its return is evaluated. The result is then used by subsequent stages. All stages (and decisions) are then dependent and can be graphically represented as a decision tree. Each stage takes as input the values of the state variables, which leads to new values of these variables as output (i.e., after the decision making process). These computations allow calculation of the optimal value of the objective function, but not the optimal policy. To obtain the optimal policy, a primal retrieval must be designed to establish which decision at a given stage is associated with the optimal value of the objective function. Thus, solving a dynamic program is a sequential process, even for the primal retrieval. The DP can be visualized as a graph. Each stage and state combination is a node, and each decision is an arc from one node to another node.
To see how this network structure works, consider the general integer knapsack problem, a classic example of DP recursion. For now, we will omit the primal retrieval to focus on the mechanics of the recursion.
Max 16 x1+ 22 x 2+ 12 x3 + 8 x4,
5 x 1+ 7 x2 + 4 x3 + 3 x4 £ 14,
x1, x2, x3, x4 non-negative and integer.
Let n be the number of items, four in this case, and b be the capacity of the knapsack, 14 in this case. The dynamic programming network is shown in Figure 2. Arcs starting from every node define a decision. The state variable is the available knapsack capacity as a node and corresponding profits are represented as node labels.

Figure 2. A DP network for the general integer knapsack problem.
Enlarge
Textbook DP examples usually show one table per stage. Our spreadsheets will use only one or two tables for enumerating all stages and values of state variables. Generally, a spreadsheet DP can have two alternate layouts:
- a one-dimensional table, where each arc is listed in a column or row, and stages are in a separate table,
- a two-dimensional table, where each arc is a cell in the table, with stages listed in a column at the right, or a row at the bottom.
The layout can be simplified if a dimension can be embedded within an Excel formula, as we shall do to reduce the constructive complexity. We shall see that spreadsheet DPs are sometimes conveniently written with matrix transposition, circular references, database lookups, and array functions, all of which are advanced tools. These functions may intimidate students, but can serve to motivate the use of such functions. Certainly the advanced functions do not need to be presented first, and the spreadsheet may be easier to develop if it starts simple, and is then improved.
As a general approach for implementing DP in a spreadsheet, we recommend first finding a very simple spreadsheet for a small example. The spreadsheet formula dependencies represent dependencies among nodes in a DP network. We recommend that the spreadsheet be simplified to the fewest number of cells reasonably possible. Following this, the spreadsheet writer can look for alternate formulas that decrease the constructive complexity.
- First, define the DP network of the dynamic program.
- Second, implement dependencies among nodes in the DP network as dependent formulas in a spreadsheet.
- Third, simplify the spreadsheet as much as possible.
- Last, refine the model to decrease its constructive complexity.
This four step method exploits the graphical layout of DPs as dependencies among cells in a spreadsheet. Recursion formulas are not difficult to implement. We found that the most difficult part was design of the primal retrieval. Many times, we used complicated formulas (such as arrays and lookups) only to improve the constructive complexity relative to a simple version.
We continue our knapsack problem example. Based on the DP network, a first very simplified knapsack implementation is in Figure 3a. The object sizes are given in the Capacity and Profit table in A6:B20. For example, an object of size 4 has profit of $12, as in cell B10.
Notice the relationship between this spreadsheet and the network diagram in Figure 2. Cells in the table correspond to arcs. The "Max Profit" in row 22 corresponds to the objective value at each stage. The implementation is a two-dimensional table, where stages are in columns and arcs are enumerated in the table's interior cells. To emphasize the network structure, consider Figure 3c, which shows the spreadsheet precedence structure as a graph (omitting the dependencies on constant data in column B). The cell precedence structure is exactly the DP network diagram. This network diagram cannot be seen with a procedural language.
Unfortunately, this implementation is quite annoying to write, as every formula in the table is different. Figure 3b shows that the first term of each formula in the table would be copied diagonally, while the second term would be copied across. The users is therefore forced to write each formula by hand.
_knapsack_sm.png)
Figure 3a, O(nb) knapsack spreadsheet.
Enlarge
Click here to download
_knapsack_formulas_sm.png)
Figure 3b, O(nb) knapsack spreadsheet formulas. Note that some terms recur diagonally, while other terms recur across. This spreadsheet is very difficult to write and modify.
Enlarge

Figure 3c, Graph of the cell precedences for the O(nb) knapsack spreadsheet formulas (omitting the dependencies on constant data in column B). The cell precedence structure is exactly the DP network diagram.
Enlarge
So we have a simple spreadsheet, but one that is bothersome to write and modify, due to its high constructive complexity. The difficult bit is to find a formula that will look up the object profits for the diagonal, while transposing the look up from the "Max profit" in row 22. Our spreadsheet in Figure 3d will do just that, with VLOOKUP() and TRANSPOSE().
_knapsack_sm.png)
Enlarge
Click here to download
Key Cell Formulas
| Cell |
Formula |
Copied to |
| D6 |
=VLOOKUP(D$5-$A6,$A$6:$B$20,2)+$S6 |
E6:Q6, F7:Q7, G8:Q8, … Q19:Q19 |
| S6:S20 |
{=TRANSPOSE(C22:Q22)} |
Single array formula. |
| D22 |
=MAX(D6:D20) |
E22:Q22 |
Figure 3d. A spreadsheet of O( b) constructive complexity, to solve the general integer knapsack problem.
This knapsack implementation has three distinct formulas: first, the formulas in the table D6:Q19; second, the MAX() formulas in row 22; and third, row 22 transposed with the formula TRANSPOSE(C22:Q22), entered as an array into S6:S20. This version of the knapsack algorithm requires transposition.
So how does this spreadsheet work? It lists all possible arcs in the dynamic program. For example, consider the selected cell, I10, in Figure 3d. The row label for cell I10 is 4, meaning that we have already packed 4 units. That is, we have recursively filled up to 4 optimally. The column label for cell I10 is 6, meaning that we are considering all options to fill an additional 2 units. What can we add? Nothing, since we have no object of size 2 to add. This could be shown in the network as an arc from node 4 to node 6, but since we have no object of size 2, we have not drawn an arc. However, this spreadsheet formulation still requires the calculation for an arc from node 4 to node 6. Thus, we already have an indication that this is not the best possible formulation. The largest of the values into node 6 (column I) is $16, which is the greatest profit we can get with a capacity of 6. This comes from one object of size 5 or two objects of size 3. This value of $16 is noted in row 22 in cell I22, then transposed to column S in cell S10, to be used in later calculations.
Note that the cells in row 6 for Capacity of 0 match the network arcs coming out of node 0. We have omitted some arcs, since they are unnecessary, as we shall see with better formulations. Each arc in the network corresponds to a cell in D6:Q19, but the network would be unnecessarily crowded if we put in all possible (b2 − b)/2 arcs.
In Figure 3d, we must copy formulas carefully for each level of capacity. If the formulas are copied to the entire rectangle D6:Q20, we have a circular reference. For example, D7 would flow to D22, which would flow to S7, which would flow back to D7. Because we must do a copy operation for each level of capacity, 1,..., b, this implementation is therefore of constructive complexity O(b), quite different from the computational complexity of O(b2).
However, we can reduce the constructive complexity to O(1), by simply adding an IF() statement to the arc formulas, as shown in Figure 4, which also shows the primal retrieval and optimal solution in rows 23 and 24. We can copy a single formula to the entire table, C6:Q20.
So how do we find the optimal solution (Figure 4)? Row 22 shows the maximum profit possible at each stage, based on the incoming arcs in cells C6:Q20. But these maxima may not be optimal. For example, cell P22 shows a maximum profit of $40, which could be had with five items of value $8, but $40 is not the optimal profit, because we can use the remaining capacity to obtain a profit of $44. Row 23 determines the optimal profits for each stage. The formula in cell D23 uses SUMIF() to return the profit of the optimal item. This was suggested by a referee and improves on an earlier version. Row 24 shows the item values that we should pack: we can optimally pack four items with value $8 and one item with value $12, for a total profit of $44.
_knapsack_sm.png)
Enlarge
Click here to download
Key Cell Formulas
| Cell |
Formula |
Copied to |
| C6 |
=IF(C$5<=$A6,0,VLOOKUP(C$5-$A6,$A$6:$B$20,2)+$S6) |
C6:Q20 |
| S6:S20 |
{=TRANSPOSE(C22:Q22)} |
Single array formula. |
| D22 |
=MAX(D6:D20) |
E22:Q22 |
| D23 |
=E23-SUMIF($B$6:$B$20,E23-C$22,$B$6:$B$20) |
E23:Q23 |
| R23 |
=Q22 |
- |
| D24 |
=E23-D23 |
E24:Q24 |
Figure 4. A spreadsheet of O(1) constructive complexity, to solve the general integer knapsack problem.
Our next implementation can be found in Winston (2003). The primal retrieval is original. This implementation avoids the need for transposition. In fact, the fundamental algorithm is quite different from the previous versions, being the following O(nb) recursion of Gilmore & Gomory (1966).
For the knapsack problem, max cx, ax ≤b, where a is the vector of sizes, x is the vector of decision variables and b is the scalar capacity, the Gilmore & Gomory recursion is:
for i = 0,…, b, f(i) = maxj=1,…,n, i ≥ aj {cj + f(i − aj)}.
The algorithm itself contains no transposition, so neither does the graphical spreadsheet implementation in Figure 5.
_Gilmore_Gomory_knapsack_sm.png)
Enlarge
Click here to download
Key Cell Formulas
| Cell |
Formula |
Copied to |
| B11 |
=IF($A11<B$6,0,B$7+INDEX($F$10:$F10,1+$A11-B$6)) |
B12:E24 |
| F11 |
=MAX(B12:E11) |
F12:F24 |
| G11 |
=G12-SUMIF($B$7:$E$7,G12-F10,$B$7:$E$7) |
G12:G24 |
| G25 |
=F24 |
- |
| H11 |
=G12-G11 |
H12:H24 |
Figure 5. A spreadsheet that implements the Gilmore and Gomory knapsack recursion.
So how does this spreadsheet work? It lists arcs in the dynamic program, but fewer than in Figures 3 and 4. Those earlier examples have an arc from each possible capacity to every greater possible capacity. In fact, we need only the arcs associated with each possible capacity and each item. For example, consider the selected cell, D15, in Figure 5. The row label for cell D15 is 5, meaning that we have 5 units of capacity available. The column label for cell D15 is x3, item 3, with size 4 and value $12. Observe that we have recursively already packed the capacity of 4 optimally, with a value of $8, as found in cell F14. The formula in D15 first checks to make sure that item 3 will fit in size 5. It will, so the formula looks up an item of size 1, discovers it has value $0, which means we can still add this item of size 4, and get a value of $12. This value of 12 would be noted in the Max Profit column F, but we actually did better with item 1 of size 5 and value $16.
As before, "Max profit" shows the best way to pack each value of capacity - if that were the only capacity available, while "Optimal profits" shows the true optimal way given the actual total capacity available. The "Optimal items" shows the value of the optimal items to pack, again four items of value $8 and one item of value $12.
Let us analyze the constructive complexity. This implementation requires four distinct formulas. After entering the data in rows 5, 6, and 7, the values 0, …,14 in column A are entered, then the formulas in columns B through E are entered, all O(1) operations. Next, the MAX() formulas are entered into column F, another O(1) operation. Finally, the primal retrieval is added in columns G and H. The spreadsheet has 5 distinct formulas. An instance with b=1,000 and n=200 would also have only 5 distinct formulas. Therefore, this spreadsheet is of O(1) constructive complexity. Within the knapsack capacity, changing the size or profit of items is trivial. To increase the capacity, we simply select the last three rows (excluding the optimal objective) and fill downwards (overwriting the objective). We then add the objective again in column H, pointing to the last cell in column G. To add another item requires inserting a column, perhaps between E and F, and then checking the MAX() formula. Thus, this model can be revised in O(1) time.
The thoughtful reader may observe that the data entry in the previous model, Figure 4, appears to be O(b), while the data entry in this model, Figure 5, is O(n). However, we have assumed that the rectangular fill in Figure 4, A6:B20, could be done in O(1) time. Then, if B6:B20 in Figure 4 were initialized with zeroes, the data entry would be only O(n), which is sunk.
Our next example knapsack, Figure 6, was suggested by an anonymous referee, and is a significant improvement from an earlier draft. The primal retrieval is original. The spreadsheet has just one distinct formula for the recursion, and this formula is reasonably short. Primal retrieval requires two more formulas. The spreadsheet needs only 3b total formulas. As such, it may prove useful to the practitioner. The one distinct formula relies on lookup and array functions. The user enters item profits in column B, for each possible size in column A. Changing the profits or sizes of items up to the capacity is trivial. Compared with the previous example, the dimension of items has been collapsed into an array formula.
The formula in column D uses SUMIF() to test for the optimal item. This was suggested by a referee, and is an improvement over our previous version. The SUMIF() returns the profit of the optimal item. We leave the explanation of the other formulas to the reader, since they are quite simple. The formulas follow directly from the recursion algebra.
_knapsack_sm.png)
Enlarge
Click here to download
Key Cell Formulas
| Cell |
Formula |
Copied to |
| C8 |
{=MAX(LOOKUP(A8-A$7:A7,A$7:A8,B$7:B8)+C$7:C7)} |
C9:C21 |
| D8 |
=D9-SUMIF($B$7:$B$21,D9-C7,$B$7:$B$21) |
D9:D20 |
| D21 |
=C21 |
- |
| E8 |
=D8-D7 |
E9:E21 |
Figure 6. A spreadsheet with only one distinct formula, to solve the general integer knapsack problem.
Modifying this spreadsheet is straightforward, but we need to keep in mind how a dynamic program works. To add capacity, insert rows between rows 20 and 21. Starting with rows 17, 18, and 19, use the fill handle to fill data and formulas downwards. Cell D20 points to the last row, so this needs to be overwritten. Thus, the spreadsheet can be written or revised in O(1) time.
We see that this implementation uses fewer cells than earlier versions, O(b) cells, rather than O(b2) or O(nb) cells. If we wish to try to solve a large knapsack problem in Excel, we are more likely to succeed with the better implementation. We were able to solve a knapsack with capacity 1,000, with random profits, in just a few seconds.
The last knapsack DP implementation lists the arcs as a database. The database lookup function, DMAX() serves in place of the MAX() function used above. In his book on Lotus 1-2-3, LP, and DP, Ho (1987) observes a connection between DPs and databases. "In dynamic programming, you do not actually write down a problem in the process of formulating it. Instead, you identify aspects of the problem which will serve as the key components in a DP model. In this sense, a DP model looks more like a database than a clear-cut optimization problem."
This spreadsheet works very much as the previous ones, with a different layout and different lookup functions. It enumerates all possible arcs in the dynamic program as a one-dimension list. Consider the selected cell, C16, in Figure 7. The formula finds the value $12 for an item of size 4 − 0, then adds this value to the best current objective $0 for a capacity of 0. This value of $12 is recorded in the Max Profit row 8, cell F8, for use in later calculation.

Enlarge
Click here to download
Key Cell Formulas
| Cell |
Formula |
Copied to |
| B8 |
=DMAX($B$12:$C$117,$C$12,B5:B6) |
C8:P8 |
| C9 |
=D9-SUMIF($B$7:$P$7,D9-B8,$B$7:$P$7) |
D9:P9 |
| C10 |
=C9-B9 |
C11:P10 |
| A14 |
=IF(B13=$P$6,A13+1,A13) |
A15:A117 |
| B14 |
=IF(A14=A13,B13+1,A14+1) |
B15:B117 |
| C13 |
=HLOOKUP(-A13+B13,$B$6:$P$7,2)+HLOOKUP(A13,$B$6:$P$8,3) |
C14:C117 |
Figure 7. A O(1) spreadsheet that lists the DP arcs as a database.
This spreadsheet first requires an O(1) operation to create the table in rows 5 to 8. The spreadsheet has four distinct formulas, all of which are O(1) operations, as shown in Figure 7. This spreadsheet therefore has constructive complexity of O(1). The user enters the object profits in row 7 for each possible size in row 6. Like the first knapsack implementation, changing the size or profit of items below the capacity is trivial. The user needs only to change the Profit data in row 7. Adding t more units of capacity requires inserting t columns in the summary table at the top, and then copying nt +t2/2−t/2 rows downwards, both of which are O(1) operations. The range in each database formula in row 8 needs to be corrected, but this could be avoided if rows were inserted at the bottom of the database, and anyway this is also an O(1) operation.
If the node label formulas in A13:B117 are converted to values, then the database can be sorted various ways. The modeler may want to think about dominated arcs, could graph the data in various ways, and could export the arcs to a linear programming solver such as AMPL or Lingo, perhaps with another column for the arc cost. Such a formulation could be used to solve the one-dimensional cutting stock problem using the formulation of Dyckhoff (1981) or Carvalho (1998).
In this problem, a batch processing machine can process a set of jobs simultaneously. The jobs are processed together to form a batch. The processing time of a batch is the maximum of the processing times of jobs belonging to it. We assume that a job cannot be preempted. If the number of jobs that can be processed simultaneously in a batch is greater than or equal to the number of jobs, then the problem can be solved in polynomial time (Lee, 1999) by using DP. More precisely, the problem can be solved in O(n2) time, where n is the number of jobs.
Let ri be the release date of job i and pi be its processing time. If job i is released before job j, and has a smaller processing time than j, then i can be scheduled in the same batch as j without increasing the makespan. Therefore, we need to consider only the case where jobs that come later have smaller processing times, and we can suppose for the implementation that r1≤ r2≤ ... ≤ rn and p1> p2> ...> pn.
The batch scheduling problem follows a standard forward recursion, where the optimal makespan of the k first jobs in the optimal schedule is f(k):
f(0)=0, f(k)=min1 ≤ i ≤ k(max(f(k−i), rk)+pk−i+1) for 1 ≤k ≤ 0.
Figure 8a presents the DP network.
Figure 8a. DP network for scheduling a batching machine.
Applying this refinement method gave a first version based on O(n) constructive complexity. Formulas are simple since only MAX and MIN functions have been used. Cell H6 equals =MAX(H4,C11)+D5. Cell G7 equals =MAX(G4,C11)+D5 and is copied to H7. Cell F8 equals =MAX(F4,C11)+D5, and it is used to fill cells G8:H8, and so on for cells E9, D10. As previously, column f(k) contains the minimum value for the D6:D10, and the fill handle is used through cells E11:H11. The constructive complexity is O(n) since n+1 formulas have been entered. This version is presented in Figure 9, without the primal retrieval which we leave for later.

Click here to download
Key cell formulas
| Cell |
Formula |
Copied to |
| H6 |
=MAX(H4,C11)+D5 |
|
| G7 |
=MAX(G4,C11)+D5 |
H7 |
| F8 |
=MAX(F4,C11)+D5 |
G8:H8 |
| E9 |
=MAX(E4,C11)+D5 |
F9:G9 |
| D10 |
=MAX(D4,C11)+D5 |
E10:G10 |
| D11 |
=MIN(D6:D10) |
E11:G11 |
Figure 8b. Second version of scheduling a batching machine.
To improve the constructive complexity, we give a refined implementation of the spreadsheet DP in Figure 8c. In Figure 8c, cell C10 is copied by using the fill handle in all cells belonging to the range C10:G14. If k−i<0, then the infinity value (set for this example to 1000) is used as an upper bound of the objective function. The formula in C15 calculates the minimum value of C10:C14, and is copied through cells D15:G15. Thus, the constructive complexity is O(1) since only two different formulas have been entered. The second part of the spreadsheet is dedicated to the primal retrieval, implemented in C17:G21. Rows 17 and 18 perform a backward recursion. For instance, the value 4 in F17 indicates that 4 jobs belong to the last batch, and the remaining jobs in a batch are counted backwards in row 18. Row 17 defines an array formula that uses the intermediate computations performed in range C10:G15 of the forward recursion table to detect the number of jobs in a batch:
Cell C17 is defined by {=IF(D18>0,0,MAX(IF(C15=C10:C14,$B$10:$B$14,0)))} and copied upwards from D17 to G17.
Cell C18 is defined by =IF(C17=0,IF(D18=0,0,D18-1),C17-1) and is copied upwards from D18 to G18.
Start and finish times of jobs are calculated in C20:G21, using array formulas.

Enlarge
Click here to download
Key cell formulas
| Cell |
Formula |
Copied to |
| C10 |
=IF(C$7<$B10,$B$6, MAX(INDEX($B$15:$G$15,1,C$7-$B10+1),C$8)
+INDEX($C$9:$G$9,1,C$7-$B10+1)) |
C10:G14 |
| C15 |
=MIN(C10:C14) |
D15:G15 |
| C17 |
{=IF(D18>0,0, MAX(IF(C15=C10:C14,$B$10:$B$14,0)))} |
D17:G17 |
| C18 |
=IF(C17=0,IF(D18=0,0,D18-1),C17-1) |
D18:G18 |
| D19 |
=IF(C17=0,C19,C19+1) |
E19:G19 |
| C20 |
{=MAX(IF(C19=$C$19:$G$19,$C$15:$G$15,0))} |
E20:G20 |
| C21 |
{=C20-MAX(IF(C19=$C$19:$G$19,$C$9:$G$9,0))} |
F19:F22 |
Figure 8c. A refine version of scheduling a batching machine.
A careful audit of this spreadsheet reveals several circular references. For instance, C15 depends on C10 and vice versa due to the index function in $C$15:$G$15. Nevertheless, this circular reference is not detected while entering the formula in the cells. This is the same kind of ‘hidden' circular reference in the CPM spreadsheet presented in Figure 2. Thus, a spreadsheet can contain circular references and still be correct.
A reverse engineering of the first version shows that the underlying recursion formula has been simplified:
f(0)=0, and
f(k)=min0 ≤i≤k−1 (max(f(i),rk)+pi+1) for 1 ≤ k ≤ n.
As a consequence, reasoning about organization of Excel's computations in the spreadsheet leads the student to a better understanding of recursion formulas by analysing computation dependencies in the DP network. We have shown through this example a slight simplification of the original recursion formulas proposed in Lee (1999) (but with the same computational complexity).
Suppose jobs have to be scheduled on a single machine without preemption. A job i cannot start before time 0 and must be processed for pi units of time (if possible) before its deadline di. For such a problem, meeting deadlines is usually achieved by scheduling jobs in non-decreasing order of their deadlines. This rule, called Earliest Due Date (EDD), minimizes the maximum lateness. Minimizing the number of late jobs is achieved by applying EDD, but if one deadline is not met for the current scheduled job, then the longest job already scheduled is moved to the end of the schedule. When jobs have different levels of importance (modeled as an integer wi), then the objective function becomes the weighted number of late jobs. There exist pseudo-polynomial time algorithms to solve the problem. The presented DP is due to Lawler (1969) and runs in O(n ∑pj) time, thus in pseudo-polynomial time.
The problem is solved by using recursion formulas for calculating the minimum weighted number of late jobs in the interval [0,t) while considering the j first jobs. Let Uj(t)=1 if job j is completed by its deadline in the interval [0,t), and Uj(t)=0 if job j is late or not scheduled in the interval [0,t). Let fj(t) denote the maximum criterion value for j jobs, subject to the constraint that the total processing time of the on-time jobs is at most t. Then the recursion formula can be based on the following performance measure:
fj(t)=min ∑i=1..j wiUi(t)
Stages are associated with jobs and the state variable is time t. The method assumes that jobs are indexed in non-decreasing order of their due dates. Then recursion formulas are:
| f0(t) = 0, |
for j=0,...,n and t ≥ 0, |
| fj(t) = ∞, |
for j=0,...,n and t < 0, |
| fj(t) = min(fj − 1(t − pj) , fj − 1(t) + wj), |
for j=1,...,n and 0 ≤ t ≤ dj, |
| fj(t) = fj(dj), |
for j=1,...,n and t > dj. |
To start, we need an upper bound T of the makespan. A simple upper bound is T=∑i=1..n pj. The optimal value is computed by fn(T). The implementation leads to the network diagram in Figure 10a, and the spreadsheet in Figure 10b.
The interpretation of the network is as follows. Each vertical column of nodes is a stage, corresponding to a job. Each column of nodes is a state for time t=0,...,22. The arc from (stage, state)=(j−1, t−pj) to (stage, state) = (j, t) has length pj and cost 0. What decision action is this? This arc means "Start job j at time t, to get it done on time." The arc from node (j−1, t) to node (j, t) has length 0 and cost wj. This means "Let job j−1 be late. Put job j at the end of the schedule." The label shown on each node (j, t) is the total lateness penalty weight fj(t) for decisions made up to that point. For example, in the worst case, the straight horizontal lines from left to right at time 0, 0 to 10 to 60 to 65 to 74, means all jobs are late, and we incur a total penalty of 74.

Enlarge
Figure 10a. DP network diagram for minimizing the weighted number of late jobs on a single machine. The optimal path is shown in the dotted lines. Jobs 1, 2, and 4 are on time. Job 3 is late and incurs a penalty of 5.

Enlarge
Click here to download
Key cell formulas
| Cell |
Formula |
Copied to |
| C9 |
=IF($A9<=C$8, IF($A9<C$7,C$6+B9,
MIN(C$6+B9,INDEX(B$9:B9,$A9+1-C$7,1))), INDEX(C$9:C9,C$8+1,1)) |
C9:F32 |
| C34 |
=C6+INDEX(B9:B31,C37+1,1) |
D34:F34 |
| C35 |
=INDEX(C9:C31,C37+1,1) |
D35:F35 |
| C36 |
=C35<>C34 |
D36:F36 |
| C37 |
=MIN(D38,C8) |
D37:F37 |
| C38 |
=C37-IF(C36,C7,0) |
D37:F37 |
| G38 |
=F8 |
− |
| G39 |
=SUM(C7:F7) |
− |
| C39 |
=IF(C36,C37,MAX(D39:$F39)+C7) |
C39:F39 |
Figure 10b. Spreadsheet DP for minimizing the weighted number of late jobs on a single machine.
The forward recursion is implemented with only one distinct formula. Thus, the constructive complexity is O(1). The main difficulty is finding values from previous iterations of the DP, using the INDEX function, in cell C9. We consider two cases. Case 1: the deadline of the first job is greater than or equal to the current date d1 < t. Case 2: d1 ≥ t.
If d1 < t then the objective value is f(d1), where d1 is the deadline of the first job (f1(t) = f1(d1)). This is calculated by using an index function that seeks a value in the table:
INDEX(C$9:C9,C$8+1,1)
If d1 ≥t, then we have to calculate: fj(t) = min(fj − 1(t − p) , fj − 1(t) + wj). This is implemented by the nested IF function (IF($A9<C$7) that checks if t − pi is negative or not. If it is negative then fj(t) = ∞ otherwise it can be evaluated. The first term in the min() statement is calculated as follows: C$6+B9 and calculates fj − 1(t) + wj. The second term corresponds to: INDEX(B$9:B9,$A9+1-C$7,1) and thus calculates fj − 1(t − p ).
With these two cases, we obtain the following formula:
IF($A9<C$7,C$6+B9,MIN(C$6+B9,INDEX(B$9:B9,$A9+1-C$7,1)))
Thus, the complete formula in cell C9 is:
=IF($A9<=C$8, IF($A9<C$7,C$6+B9, MIN(C$6+B9, INDEX(B$9:B9,$A9+1-C$7,1))), INDEX(C$9:C9,C$8+1,1))
The optimal value is found in the last cell of the range that stores fn(T).
Backward recursion allows us to calculate a schedule of jobs completed by their deadlines. Thus, it determines starting times of jobs, their completion times and the total weighted number of jobs completed by their deadlines. If a job has its starting time equal to its completion time in backward recursion, then such a job is not completed by its deadline and will be scheduled after all on-time jobs (the corresponding starting time is not computed in the backward recursion). This is achieved by using all fj(t) values computed in the forward recursion. Jobs are considered in the same order as in the forward recursion, from index 1 to n. The computation starts at time t = dn (defined in cell G38). Then job j can finish by time min(t ,dj). If the job j is late fj(t) = fj−1 (t) + wj , otherwise job i started at time t − pj. The spreadsheet implementation of the basic recursion scheme leads to cells C34:F38. The cells in row 37 store possible completion times of jobs. If the job is completed by its deadline, then row 38 stores related starting times of jobs.
We now describe formulas in the range C34:F38. We only deal with the last column of this range since the formulas are copied to adjacent cells in each row. The first time period to consider by the backward recursion is in cell G38 (the latest deadline):
F37 =MIN(G38,F8)
Then we need the values fj(t) calculated as follows:
F35 =INDEX(F9:F31,F37+1,1)
and fj−1(t) + wj calculated as follows:
F34 =F6+INDEX(E9:E31,F37+1,1)
The last job is on time if F34<>F35 (in cell F36). Lastly, row tb stores the next job completion time to consider for the next job (i.e., t or t − pj according to the status of job j). This is achieved by calculating the starting time of the current job using the following formula in cell F38:
F38 = F37-IF(F36,F7,0)
Note that four formulas have been defined and copied to adjacent cells. This number of formulas is not related to the number of jobs. Thus, the constructive complexity of the backward recursion is O(1).
Minimizing the makespan on identical parallel machines requires an exponential time algorithm, if the number of machines is greater than 2. But for unrelated parallel machines, minimizing the makespan is hard even for two-machine problems. We consider a problem with n jobs and m unrelated machines. The processing time of job j on machine k is denoted pjk. We present two different DPs to solve this problem, directly in their refined versions.
We first present a backward DP to solve the problem, due to concepts presented in Rothkopf (1966). Let tk be a state variable defining the time at which machine k becomes idle. A state ej of the DP is defined by a vector (t1,…,tm). The jobs are considered one by one according to their indices. Let fj(t1, ..., tm) be the optimal makespan for the first j jobs assuming that machines become idle respectively at times t1,…, tm. This optimal value can be defined by the following recursion equations:
Initialization: f0(t1,..., tm) = 0 if t1 = ... = tm , f0(t1,..., tm) = ∞ otherwise.
Recursion: fj(t1,..., tm) = min1 ≤ k ≤ m (max(tk, fj − 1(t1 , ... , tk − 1 , tk − pjk , ..., tm )))
We assume that if tk − pjk < 0 then fj − 1( ) returns infinity.
To bound the number of states, we need an upper bound T on the makespan. T can be found by a heuristic algorithm or by constructing a schedule by hand. The number of different states is then defined by the Cartesian product of {0,…,T}×…× {0,…,T} m times multiplied by the number of stages n+1 (including one for the initalization).
Without loss of generality, we next deal with a two-machine scheduling problem. We consider four jobs and an upper bound of the makespan equal to 9. The spreadsheet is presented in Figure 11. Since 500 rows have been required, most of them are hidden.
The first step is to define states of the DP. Stage number j considers the j first jobs. A state variable defines the time at which the corresponding machine completes a job. Two state variables are necessary since we consider two parallel machines. Thus, for every stage, all values of the state variables must be enumerated from 0 to the upper bound T of the makespan. Our case study requires 500 rows to enumerate all stages and states.
We enumerate stages and states with one formula (cell C14) that is based on a simple number conversion. The state number in column B is encoded in base 10. Then, it is converted in base T and the formula extracts the hundreds place for the stage number (column C), the tens place for t1 (column D) and the units place for t2 (column E). Let x be a decimal number denoting a state number. Defining the values of state variables can be obtained by converting x in a binary number. The arithmetic calculations are: x mod 2 is the unit place, ë x /Tû mod T is the tens place, and lastly for the hundreds place: ë x /T2 û mod T. This principle leads to a general layout for enumerating the state space in a DP using the index i as the position of digits (values 2,1 and 0 in cells C13:E13): ë x /Ti û mod T. This formula is defined using Excel's built-in functions:
C14=MOD(ROUNDDOWN($B14/($C$9^C$13),0),$C$9)
Columns F and G use VLOOKUP() statements to report processing times in the main table. Columns H to J define the DP recursion formula. We need to detail separately the initialization stage and the recursion formula implementations.
The initialization stage is defined by the formula: f0(t1,..., tm) = 0 if t1 = ... = tm , f0(t1,..., tm) = ∞ otherwise. We define the corresponding function in cells H14 (copied to all cells belonging to the initialization stage), where D14, E14 are stage numbers and C10 the makespan upper bound:
H14=IF(AND($D14=0,$E14=0),0,$C$10)
A max function is computed for each machine (respectively in columns H and J for machines 1 and 2). Then, the minimum of H and I is the objective function. In column H we need to compute the max function corresponding to m=1 (the first machine):
max(t1, fj − 1(t1 − pj1, t2))
t1 is in cell D114 and the processing time pj1 is in cell F114. If t1 − pj1 <0 then the returned value is the upper bound located in cell C10, otherwise the max function is evaluated. In this case, we need to define the decimal number corresponding to the values of the state variables. This decimal number is then used to calculate the row corresponding to fj − 1(t1 − pj1, t2 ) in the array J14:J513 by using the INDEX function. The decimal number is computed as the sum of hundreds digit (power 2 that is in cell C13), tens place and units place, plus one since the first row corresponds to the decimal number 0:
(C114-1)*$C$9^C$13 + (D114-F114)*$C$9 + E114+1
The complete formula in cell H114 is:
H114=IF(D114-F114<0,$C$10, MAX(D114, INDEX($J$14:$J$513,
(C114-1)*$C$9^C$13 + (D114-F114)*$C$9 + E114+1)))
Now, for column I, the same principle is used to calculate the max function correspond to m=2 (the second machine):
max(fj − 1( t1 ,t2 − pj2), t2)
t2 is in cell E114 and the processing time pj2 in cell G114. If t2 − pj2 < 0 then the returned value is the upper bound located in cell C10, otherwise the max function is evaluated. In this case, the decimal number corresponding to the values of the state variables is computed as for the first machine. This decimal number is then used to calculate the row corresponding to fj − 1( t1 ,t2 − pj2 ) in the array J14:J513 by using the INDEX function. The decimal number is computed as the sum of hundreds digit (power 2 that is in cell C13), tens place and units place , plus one since the first row corresponds to the decimal number 0:
(C114-1)*$C$9^C$13 + D114*$C$9 + E114-G114+1
The complete formula in cell I114 is:
=IF(E114-G114<0,$C$10, MAX(E114,INDEX($J$14:$J$513,
(C114-1)*$C$9^C$13 + D114*$C$9 + E114-G114+1)))
The objective function in cell J114 is:
J114=MIN(H14,I14)

Enlarge
Click here to download
Key cell formulas
| Cell |
Formula |
Copied to |
| C14 |
=MOD(ROUNDDOWN($B14/($C$9^C$13),0),$C$9) |
C14:E513 |
| F14 |
=IF($C14=0,0,VLOOKUP($C14,$B$4:$D$7,F$13+1,FALSE)) |
F14:G513 |
| H14 |
=IF(AND($D14=0,$E14=0),0,$C$10) |
H14:I113 |
| H114 |
=IF(D114-F114<0,$C$10, MAX(D114, INDEX($J$14:$J$513,
(C114-1)*$C$9^C$13 + (D114-F114)*$C$9 + E114+1))) |
H115:H513 |
| I114 |
=IF(E114-G114<0,$C$10, MAX(E114,INDEX($J$14:$J$513,
(C114-1)*$C$9^C$13 + D114*$C$9 + E114-G114+1))) |
I115:I513 |
| J14 |
=MIN(H14,I14) |
J15:J513 |
| D515 |
{=MIN(IF(C14:C513=4,J14:J513,$C$10))} |
− |
Figure 11. General structure of the backward DP.
The makespan is the minimum cost found in states belonging to stage n, where n is the number of jobs to be scheduled. The makespan is calculated by an array formula as follows:
{=MIN(IF(C14:C513=4,J14:J513,$C$10))}
Columns F and G store the processing times of the job associated with the current stage. This will simplify the formula by avoiding VLOOKUP() formulas in subsequent columns. The recursion is implemented in columns H:I.
For the example with four jobs and two machines in the spreadsheet of Figure 11, the optimal makespan is 8 and the corresponding optimal schedule is presented in Figure 12.
Figure 12. Gantt chart of the solution calculated in the spreadsheet DP.
Now we turn to the constructive complexity. Exactly one formula is defined in every column of the table except for columns H and G where two are needed (one for the initialization stage and another for remaining stages). Thus, the constructive complexity is related to the number of machines (m), but not to the number of jobs. Thus, the constructive complexity is O(m). This spreadsheet can easily be extended to deal with more machines and a larger upper bound. Nevertheless, even if the constructive complexity is low, some formulas are tricky. They have to be entered carefully to avoid errors, and are difficult to audit.
With this last example, we see the limit of implementing DPs in a spreadsheet. If the value T is large enough (for our case study with 4 jobs and 2 machines, the maximum number for T is 115), Excel would not have enough rows to enumerate all possible states. Of course, we can use several sheets or design a cyclic pattern to continue the table in subsequent columns of the sheet. But such an implementation would not be a natural spreadsheet implementation. The most natural implementation uses a classical programming language that handles large arrays of data.
We now present a forward DP due to Horowitz (1976), that enumerates jobs in the order of their index. In each stage, exactly one job is assigned to a machine. Thus, job 1 has to be assigned to a machine. This leads to m possible assignments that will be modeled as m different states in the first stage. In the next stage, job 2 is considered while considering every state created in stage 1, leading to m2 new states. And so on for the remaining jobs. Figure 13 presents the state enumeration for a 2-machine scheduling problem. Every level l in the tree is associated with the possible assignment for job l−1 (level 1 is associated with the root of the tree).
Figure 13. State space with two machines. The first number on each label is the job number. The second number is the time used on machine 1. The third number is the time used on machine 2. An arc upwards corresponds to assigning the job to machine 1. An arc downwards corresponds to assigning the job to machine 2.
The spreadsheet implementation is presented in Figure 14. The columns From and To in the spreadsheet refer to node labels defined in Figure 13. The enumeration of states is based on the following formulas, copied downwards:
Machine, cell A15=1+MOD(A14,$C$4)
From node, cell B15=B14+(A14>A15)
To node, cell C15=C14+1
The job associated with the current state is defined by the index of the job considered in the previous state (i.e., in column From) plus one. This is achieved by using a VLOOKUP statement. The formula in cell D15 that will be copied downwards is:
D15=VLOOKUP(B15,$C$14:$D15,2)+1
Columns E and F calculate the current makespan of each machine. For a given node, the current machine is defined in column A. The makespan on a machine is defined by the makespan on the same machine defined in the predecessor node, plus the processing time of the current job if the machine is the current one. This leads to the following formula in cell E15 (to copy downwards and to column F):
E15=VLOOKUP($B15,$C$14:E15,E$13+2)
+IF($A15=E$13,VLOOKUP($D15,$A$5:$C$8,E$13+1,FALSE),0)
Lastly, the makespan of the state in row 15 is defined as MAX(E15,F15) and then copied downwards. The optimal makespan is calculated by a DMIN function as in the backward DP. To detect if the current state is an optimal one it is sufficient to calculate cell H15:
H15=AND(D15=$A$8,G15=$G$10)
Thus, the constructive complexity is O(1), whereas the computational complexity becomes O(mn).
 Enlarge
Click here to download
Key cell formulas
| Cell |
Formula |
Copied to |
| G9 |
=$A$8 |
− |
| G10 |
=DMIN(D13:G45,COLUMNS(D13:G13),G8:G9) |
− |
| A15 |
=1+MOD(A14,$C$4) |
A16:A45 |
| B15 |
=B14+(A14>A15) |
B16:B45 |
| C15 |
=C14+1 |
C16:C45 |
| D15 |
=VLOOKUP(B15,$C$14:$D15,2)+1 |
D16:D45 |
| E15 |
=VLOOKUP($B15,$C$14:E15,E$13+2)
+IF($A15=E$13,VLOOKUP($D15,$A$5:$C$8,E$13+1,FALSE),0) |
E15:F45 |
| G15 |
=MAX(E15:F15) |
G16:G45 |
| H15 |
=AND(D15=$A$8,G15=$G$10) |
H16:H45 |
Figure 14. Spreadsheet implementation of the forward DP.
We now show that the DP presented in Figure 14 can easily be extended. To insert a new machine:
1) Move C4:C8 one column to the right.
2) Copy D4:D8 back to C4:C8. Fix the data and labels.
3) Insert a column between E and F.
4) Copy E12:E44 to F12.
5) Fix the column labels in F13:G13.
To insert a new job:
1) Insert a row between rows 7 and 8.
2) Add the data and correct the job numbers in column A of the "Process time" table.
3) Insert new rows near the bottom, and copy the row of formulas downwards.
4) Make sure the "Jobs" column reaches the number of jobs + 1, else go back to step 3.
We can easily delete machines or jobs by deleting rows or columns and correcting the labels. But if we increase the number of jobs or machines, the spreadsheet gets huge. Still, these modifications on a job or a machine require O(1) constructive complexity, because the number of keystrokes required to change the spreadsheet is independent of the size of the model.
This spreadsheet provides a relatively easy way to do primal retrieval, visually: the user has to locate the leaf corresponding to the optimal value and then consider the path from that node to the root in Figure 13. There is only one path from the root to every leaf in the decision tree and it contains all taken decisions.
In this paper, we have demonstrated a variety of dynamic programming structures in Excel. These examples will be of benefit to teachers and students in operations research, and to the practitioner. We have also studied each model using the concept of constructive complexity, which allows a quantitative approach to studying the structure of a spreadsheet. We found that reducing the constructive complexity of a spreadsheet is usually helped with advanced spreadsheet features, such as lookups, array functions, and circular references. We hope that operations researchers can analyze their own spreadsheets in similar ways, to produce better spreadsheets, and to teach good spreadsheet design.
We would like to thank the referees for their efforts in helping us improve the spreadsheets and the text of this paper. Their comments and suggestions were invaluable, and helped the presentation considerably.
Bather, J., (2000), Decision theory: An Introduction to Dynamic Programming and Sequential Decisions, Wiley, Chichester, W. Sussex, England; New York.
Beckmann, Martin J., (1968), Dynamic programming of economic decisions, Springer-Verlag, Berlin & New York.
Bellman, R. and S.E. Dreyfus, (1962), Applied dynamic programming, Princeton University Press, Princeton, N.J.
Bertsekas, D.P., (2001), Dynamic Programming and Optimal Control, 2nd Edition (Volumes 1 and 2), Athena Scientific.
de Carvalho, J.M. Valério (1998), "Exact Solution of Cutting Stock Problems Using Column Generation and Branch-and-Bound," International Transactions in Operations Research, Vol. 5, No. 1, pp. 35-44.
Cooper, L. and M.W. Cooper, (1981), Introduction to dynamic programming, 1st ed., Pergamon Press, Oxford.
Denardo, E.V., (2003), Dynamic Programming: Models and Applications, Dover Publications.
Dreyfus, S.E. and M.L. Averill, (1977), The art and theory of dynamic programming, Academic Press, New York, NY.
Dyckhoff, H. (1981), "A New Linear Programming Approach to the Cutting Stock Problem," Operations Research, Vol. 29, No. 6, Nov-Dec.
Gilmore, P.C. and R.E. Gomory (1966), "The Theory and Computation of Knapsack Functions," Operations Research, Vol. 14, No. 1, pp. 1045-1074.
Gluss, B., (1972), An elementary introduction to dynamic programming : a state equation approach, Allyn and Bacon, Boston.
Hadley, G., (1970), Nonlinear and dynamic programming, Addison-Wesley, Reading, Mass.
Hillier, F. S., M. S. Hillier and G. J. Lieberman, (2000)Introduction to Management Science, Irwin-McGraw-Hill, Boston, MA.
Ho, J.K., (1987), Linear and dynamic programming with Lotus 1-2-3, Release 2, Management Information Source, Portland, OR.
Horowitz E. and S. Sahni (1976), "Exact and approximate algorithms for scheduling non-identical processors", Journal of Association of Computing Machinery, Vol. 23, No. 2, pp. 317-327.
Jensen, P.A., and J.F. Bard, (2002), Operations Research Models and Methods, John Wiley and Sons, 2002.
Lawler, E.L., and J.M. Moore (1969), "A functional equation and its application to resource allocation and sequencing problems," Management Science Vol. 16, No. 1, pp. 77-84.
Lawrence, J.A., and B.A. Pasternack, (1998), Applied Management Science, chapter 16, John Wiley and Sons, New York, NY.
Lee C.Y. and R. Uzsoy (1999), "Minimizing makespan on a single batch processing machine with dynamic job arrivals," International Journal of Production Research, Vol. 37, No. 1, pp. 219-236.
Nemhauser, G.L., (1966), Introduction to dynamic programming, Wiley, New York.
Neuwirth, E. (1995), "Spreadsheet structures as a model for proving combinatorial identities," Journal of Computers in Mathematics and Science Education, Vol. 14, No. 3, pp. 419-434.
Puterman, M.L., ed., (1978), Dynamic programming and its applications, Proceedings of the International Conference on Dynamic Programming and its Applications (University of British Columbia, Vancouver, British Columbia, Canada, April 14-16, 1977), Academic Press, New York.
Ragsdale C. T., (2003), "A New Approach to Implementing Project Networks in Spreadsheets," INFORMS Transactions on Education, Vol. 3, No. 3, .
Rothkopf M.H., (1966), "Scheduling independent tasks on parallel processors," Management Science, Vol. 12, No. 5, pp. 437-447.
Sarmiento, A., (2003), "Dynamic Programming Solver," shareware, .
Seal, K.C. (2001), "A generalized PERT/CPM implementation in a spreadsheet," INFORMS Transactions on Education, Vol. 2, No. 1, .
Sedgewick, R., (1988), Algorithms, 2nd ed., Addison-Wesley, Reading, Mass.
Sniedovich, M., (1992), Dynamic programming, M. Dekker, New York, N.Y.
Wagner, H.M., (1975), Principles of operations research : with applications to managerial decisions, 2d ed., Prentice-Hall, Englewood Cliffs, N.J.
Weiss, M.A., (1993), Data Structures and Algorithm Analysis, Benjamin/Cummings Publishing, Redwood City, CA.
Winston, W., (2003), Operations Research: Applications and Algorithms, 4th edition, Duxbury Press, Belmont, CA.
|