Chapter 14: Balancing the Models

 

“All men are liable to error; and most men are, in many points, by passion or interest, under temptation to it.”

— John Locke
Essay Concerning Human Understanding, 1690

IN THIS CHAPTER, YOU WILL LEARN:

  1. How to balance a DFD against the data dictionary;
  2. How to balance a DFD against a process specification;
  3. How to balance a process specification against the data dictionary;
  4. How to balance an ERD against the DFD and process specification;
  5. How to balance an ERD against the data dictionary; and
  6. How to balance a DFD against the state-transition diagram.

In the past five chapters, we have examined several important modeling tools for structured analysis:

  • Dataflow diagram
  • Data dictionary
  • Process specification
  • Entity-relationship diagram
  • State-transition diagram

Each of these tools, as we have seen, focuses on one critical aspect of the system being modeled. It is important to keep this in mind, for it means that the person reading the model is also focusing on one critical aspect, that is, the aspect to which his or her attention is being drawn by the modeling tool itself. Because the underlying system has so many different dimensions of complexity, we want the dataflow diagram to focus the reader’s attention on the system functions without letting data relationships distract his attention; and we want the entity-relationship diagram to focus attention on the data relationships without letting the functional characteristics distract his or her attention; and we want the state-transition diagram to focus attention on the timing characteristics of the system without the distractions of functions or data.

But there comes a time for pulling all the modeling tools together, and that is what this chapter is all about. The situation faced by the systems modeler is somewhat analogous to the ancient fable of the three blind wise men in India who stumbled up against an elephant. As Figure 14.1 illustrates, they came to three different opinions about the “reality” they were dealing with after touching different parts of the elephant:

Figure 14.1: Three blind men touching an elephant

 

  • One blind man touched the sharp end of one of the elephant’s long tusks. “Aha,” he said, “what we have here is a bull. I can feel its horns.”
  • The second blind man touched the bristly hide of the elephant. “Without a doubt,” he said, “this is a ... what? A porcupine? Yes, indeed — a porcupine!”
  • The third blind man felt one of the elephant’s thick legs and said, “This must be a tree that we’re dealing with.”

Similarly, when modeling three different aspects of a system (functions, data, and timing), as well as modeling detailed characteristics of the system in a data dictionary and set of process specifications, it is easy to develop several different inconsistent interpretations of that one reality. This is a particularly serious danger on large projects, where various people and various special interest groups are likely to be involved. It is also a danger whenever the project team (and/or the user community) involves people with very different backgrounds.

There is another reason for focusing on model consistency: whatever errors exist will eventually be found, but they become increasingly difficult and expensive later in the project. Indeed, any errors that are introduced into the requirements model during the systems analysis phase are likely to be propagated and magnified during the design and implementation phases of the project. This is a particularly serious danger on large projects where the systems analysis is often done by different people (or even different companies!) than the design and implementation. Thus, Martin points out that 50% of the errors that are detected in a system and 75% of the cost of error removal are associated with errors in the systems analysis phase. And studies in [Boehm, 1981] have shown that the cost of correcting an error goes up exponentially in later stages of a project; it is ten times cheaper to fix a systems analysis error during the systems analysis phase of the project than it is to fix the same error during the design phase.

Some of these errors are, of course, simple errors in each individual model (e.g., a dataflow diagram with an infinite sink). And some of the errors can be characterized as wrong interpretations of what the user really wanted. But many of the more difficult and insidious errors are intermodel errors, that is, inconsistencies between one model and another. A structured specification in which all the modeling tools have been cross-checked against each other for consistency is said to be balanced.

The most common balancing error involves a missing definition: something defined (or described) in one model is not appropriately defined in another model. We will see several examples of this in the following sections (e.g., a data store shown on the DFD but not defined in the data dictionary, or an object in the ERD not shown as a corresponding data store on the DFD). The second common type of error is one of inconsistency: the same “reality” is described in different, contradictory ways in two different models.

We will examine several major aspects of balancing:

  • Balancing the dataflow diagram against the data dictionary.
  • Balancing the dataflow diagram against the process specifications.
  • Balancing the process specifications against the data dictionary.
  • Balancing the ERD against the DFD and process specifications.
  • Balancing the ERD against the data dictionary.
  • Balancing the DFD against the state-transition diagram.

As we will see, the balancing rules are all very straightforward; they require very little intelligence or creativity to carry out. But they must be carried out, and carried out diligently.

14.1 BALANCING THE DFD AGAINST THE DD

The rules for balancing the dataflow diagram against the data dictionary are as follows:

  • Every dataflow (i.e., an arrow on the DFD) and every data store must be defined in the data dictionary. If it is missing in the data dictionary, the dataflow or data store is considered to be undefined.
  • Conversely, every data element and every data store defined in the data dictionary must appear someplace on a DFD. If it does not appear, the offending data element or data store is a “ghost” — something defined but not “used” in the system. This can happen if the data elements are defined to correspond with an early version of the DFD; the danger is that the DFD may be changed (i.e., a dataflow or data store may be deleted) without a corresponding change to the data dictionary.

This means, of course, that the systems analyst must painstakingly review both the DFDs and the data dictionary to ensure that they are balanced. It doesn’t matter which model is examined first, though most analysts begin with the DFD to ensure that all the elements are defined in the data dictionary. Like all the other balancing activities in this chapter, it is a tedious chore and one that is well suited to automated support.

14.2 BALANCING THE DFD AGAINST THE PROCESS SPECIFICATION

Here are the rules for balancing the DFD against the process specifications:

  • Every bubble in the DFD must be associated with a lower-level DFD or a process specification, but not both. Thus, if the DFD shows a bubble that is identified as 1.4.2, then there must either be a corresponding figure identified as Figure 1.4.2 whose bubbles are identified as 1.4.2.1, 1.4.2.2, and so on, or the structured specification must contain a process specification for bubble 1.4.2. If both exist, the model is unnecessarily (and dangerously) redundant.
  • Every process specification must have an associated bottom-level bubble in the DFD. Since the process specification does require a lot of work, one would think it highly unlikely that there would be “tramp” process specifications floating around a system. But it can happen: the process specification may have been written for a preliminary version of the DFD, after which a revision process might eliminate some of the DFD bubbles.
  • Inputs and outputs must match. The DFD will show incoming and outgoing flows for each bubble, as well as connections to stores. These should be evident in the process specification, too: thus, we should expect to see a READ statement (or GET, or INPUT, or ACCEPT, or some other similar verb) corresponding to each incoming dataflow and a WRITE (or PUT, or DISPLAY, etc.) for each outgoing dataflow.

Note that these comments apply specifically to processing bubbles. For the control bubbles in a DFD, there are correspondences between the bubbles and associated state-transition diagrams, as discussed in Section 14.6.

14.3 BALANCING THE PROCESS SPECS AGAINST THE DFD AND DD

The rules for balancing the process specifications against the dataflow diagram and data dictionary can be described as follows; each data reference in the process specification (typically a noun) must satisfy one of the following rules:

  • It matches the name of a dataflow or data store connected to the bubble described by the process specification, or
  • It is a local term, explicitly defined in the process specification, or
  • It appears as a component in a data dictionary entry for a dataflow or data store connected to the bubble. Thus, the data elements X and Y appear in the process specification shown in Figure 14.2, but do not appear as a connected dataflow in the DFD shown in Figure 14.3. However, the data dictionary, a fragment of which is shown in Figure 14.4, indicates that X and Y are components of Z; and in Figure 14.3 we see that Z is indeed a dataflow connected to the bubble, so we conclude that the model is balanced [1].

 

PROCESS SPECIFICATION 3.5: COMPUTE WIDGET FACTOR

* P AND Q ARE LOCAL TERMS USED FOR INTERMEDIATE RESULTS *

P = 3.14156 * X

Q = 2.78128 * Y - 13

WIDGET-FACTOR = P * Q + 2

Figure 14.2: A process specification component of a system model

 

Figure 14.3: A DFD component of a system model

 

X = * horizontal component of frammis factor *
* units: centimeters; range: 0 - 100 *

Y = * vertical component of frammis factor *
* units: centimeters; range: 0 - 10 *

Z = * frammis factor, as defined by Dr. Frammis *
X + Y

Figure 14.4: A data dictionary component of a system model

 

14.4 BALANCING THE DATA DICTIONARY AGAINST THE DFD AND PROCESS SPECIFICATIONS

From the discussion above, it can be seen that the data dictionary is consistent with the rest of the model if it obeys the following rule:

  • Each entry in the data dictionary must be referenced by a process specification, or a DFD, or another data dictionary entry.

This assumes, of course, that we are modeling the essential behavior of a system. A complex, exhaustive data dictionary of an existing implementation of a system may contain some data elements that are no longer used.

One could also argue that the data dictionary might be planned in such a way that it permits future expansion; that is, it contains elements that are not needed today, but might be useful in the future. A good example of this is a data dictionary that contains elements that may be useful for ad hoc inquiry. The project team, perhaps in concert with the user, can determine whether this kind of unbalanced model is indeed an appropriate thing to do. However, it is important to at least be aware of the occurrence of such deliberate decisions.

14.5 BALANCING THE ERD AGAINST THE DFD AND PROCESS SPECIFICATIONS

The entity-relationship diagram, as we saw in Chapter 12, presented a very different view of a system than did the dataflow diagram. However, there are some relationships that must hold in order for the overall system model to be complete, correct, and consistent:

  • Every store on the DFD must correspond to an object type, or a relationship, or a combination of an object type and relationship (i.e., an associative object type) on the ERD. If there is a store on the DFD that does not appear on the ERD, something is wrong; and if there is an object or relationship on the ERD that does not appear on the DFD, something is wrong.
  • Object names on the ERD and data store names on the DFD must match. As we saw in Chapters 9 and 12, the convention in this book is to use the plural form (e.g., CUSTOMERS) on the DFD and the singular form on the ERD.
  • The data dictionary entries must apply to both the DFD model and the ERD model. Thus the data dictionary entry for the above example should include definitions for both the object on the ERD and the store on the ERD. This would imply a data dictionary definition such as the following:
    CUSTOMERS = {CUSTOMER}
    CUSTOMER = name + address + phone-number + ...

The data dictionary entries for the singular form (e.g., CUSTOMER) must provide the meaning and composition of a single instance of the set of objects referred to (in the singular) in the ERD and (in the plural) in the data store of a DFD. The data dictionary entries for the plural form (e.g., CUSTOMERS) provide the meaning and the composition of the set of instances.

Similarly, there are rules for ensuring that the ERD is consistent with the process specification portion of the function-oriented model (keep in mind that the process specifications are the detailed components of the model whose graphical “incarnation” is the DFD). The rules are that the combined set of all process specifications must, in their entirety:

  • Create and delete instances of each object type and relationship and relationship shown in the ERD. This can be understood by looking at the DFD shown in Figure 14.5: as we know, the CUSTOMERS store corresponds to the CUSTOMER object. Something must be capable of creating and deleting instances of a customer, which means that some bubble within the DFD must have a dataflow connected to the CUSTOMERS store. But the actual work of writing to the store (i.e., creating or deleting an instance of the related CUSTOMER object in the ERD) must take place inside the bubble, which means that it must be documented by the process specification associated with the bubble.
  • Some DFD bubble sets values for each data element attributed to each instance of each object type, and some DFD process uses (or reads) values of each data element [2].

Figure 14.5: Creating and deleting ERD instances

 

14.6 BALANCING THE DFD AGAINST THE STATE-TRANSITION DIAGRAM

The state transition diagram can be considered balanced against the dataflow diagram if the following rules are met:

  • Every control bubble in the DFD has associated with it a state-transition diagram as its process specification. Similarly, every state-transition diagram in the overall system model must be associated with a control process (bubble) in the DFD.
  • Every condition in the state-transition diagram must correspond to an incoming control flow into the control process associated with the state-transition diagram. Similarly, every incoming control flow on the control bubble must be associated with an appropriate condition on the corresponding state-transition diagram.
  • Every action in the state-transition diagram must correspond to an outgoing control flow in the control process associated with the state-transition diagram. Similarly, every outgoing control flow on the control bubble must be associated with an appropriate action on the corresponding state-transition diagram.

These correspondences are illustrated in Figure 14.6.

Figure 14.6: Correspondences between the DFD and STD

 

14.7 SUMMARY

Note that all the balancing rules presented in this chapter have been presented as if you were going to personally examine all the components of a system model to spot potential errors and inconsistencies. This would imply that you should lay out, on the floor or on a very large bulletin board, all the DFDs, process specifications, ERDs, STDs, and data dictionary, and then walk from one to the other, carefully checking that everything is in place.

As this edition of the book is being prepared that is precisely what you would have to do in most systems development organizations around the world. The balancing rules we have presented in this chapter can be automated, and there are already a number of relatively inexpensive PC-based workstation tools that will carry out some or all of the error-checking mechanically. Unfortunately, they are not widely deployed and used in systems development organizations

We have seen exactly the same phenomenon in a number of other fields. One could argue that the proliferation of cheap word-processing systems has obviated the need for learning script writing; indeed, one might argue that the availability of spelling checkers has even obviated the need for learning how to spell. And the universal availability of pocket calculators has obviated the need to learn how to do long division. And the ubiquitous presence of automatic-shift cars has obviated the need to learn how to drive stick-shift cars.

Indeed, I can’t think of any compelling reason for teaching someone in North America how to drive a stick-shift car as we enter the 21st century. Nor can I think of any reason for emphasizing the art of calligraphy and handwriting (except, perhaps, as an art form) in an age where word-processing systems are about to be replaced by voice-recognition systems. But I can appreciate the need for learning the basic principles of long division, even if one is supremely confident that one will never be without a pocket calculator; if nothing else, as Joshua Schwartz of Harvard University points out, it helps us to know whether the answer we have produced with our calculator has the decimal point in the right place.

Perhaps one could even argue the merits of learning script handwriting when home computers are still not as widespread as televisions and telephones, and when only a small percentage of U.S. schools are prepared to teach the mechanical skills of typing. Script handwriting is technologically obsolete, and it is painful for computer-literate parents (not to mention computer-literate children!) to be forced to learn this ancient, primitive communication skill; but it is probably still a necessary skill in today’s society. After all, it was only a few years ago that most parents stopped teaching their children how to replace the spark plugs, change the oil, and fix a flat tire on their automobile.

Similarly, I am convinced that a professional systems analyst needs to understand the principles of balancing presented in this chapter. As a systems analyst, you may have no alternative but to carry out these error-checking rules mechanically unless proper software engineering tools are used within your organization. The manual error-checking process will normally be validated in a walkthrough environment; walkthroughs are discussed in Appendix D.

REFERENCES

  1. James Martin, An Information Systems Manifesto. Englewood Cliffs, N.J.: Prentice-Hall, 1984.
  2. Barry Boehm, Software Engineering Economics. Englewood Cliffs, N.J.: Prentice-Hall, 1981.

QUESTIONS AND EXERCISES

  1. Why is it important to balance the models of a system specification? What are the dangers of an unbalanced specification?
  2. Why is it important to find errors in a system model as early as possible?
  3. What percentage of the cost of error removal is associated with the systems analysis phase of a project?
  4. What are the two most common forms of balancing errors?
  5. What parts of the system model must the DFD be balanced against?
  6. What parts of the system model must the ERD be balanced against?
  7. What parts of the system model must the STD be balanced against?
  8. What parts of the system model must the data dictionary be balanced against?
  9. What parts of the system model must the process specification be balanced against?
  10. Are there are any other components of the system model that must be balanced?
  11. What are the rules for balancing the DFD against the data dictionary?
  12. Under what conditions could an item be defined in the data dictionary without appearing somewhere on a DFD?
  13. What are the rules for balancing the DFD against the process specifications?
  14. What would happen if a process specification were written for a nonprimitive (or nonatomic) bubble in the DFD?
  15. Should there be a process specification for control processes in the DFD? If so, should it take the same form as a process specification for a normal process?
  16. What are the rules for balancing the process specification against the DFD and data dictionary?
  17. What are “tramp data?”
  18. Under what conditions is it acceptable for a term (or data reference) in the process specification to not be defined in the data dictionary?
  19. What are the rules for balancing the data dictionary against the DFD and process specification?
  20. Under what conditions is it possible that the project team might deliberately put items into the data dictionary that are not in the DFD?
  21. What are the rules for balancing the ERD against the DFD?
  22. What is the convention for matching names in the ERD with stores in the DFD?
  23. What are the rules for balancing the ERD against the process specification?
  24. What are the rules for balancing the STD against the DFD?
  25. Under what conditions is it valid not to have an STD in a system model?
  26. How should the balancing rules presented in this chapter be carried out in a typical systems development project? Who should be responsible for seeing that it gets done?
  27. If you have an automated systems analysis workstation, is it necessary to learn the balancing rules presented in this chapter? Why or why not?
  28. If the system models have been balanced, can we be confident that they are correct? Why or why not?
  29. Point out three balancing errors in the following system model.
  30. Should the STD be balanced against the ERD? Why or why not?

FOOTNOTES

  1. [1] However, it may be worth doing some further checking at this point: if X is the only component of Z that is used in the process specification, we could seriously question why Z was shown as an input in the first place. That is, the remaining components of Z may be “tramp data” that “float” through the bubble without being used. This often reflects a model of an arbitrary implementation of a system, rather than a model of the essential behavior of the system.
  2. [2] Note that the situation may be somewhat more complicated: the bubble shown on the DFD may not be a bottom-level bubble. Thus, it is possible that the bubble labeled ENTER-NEW-CUSTOMER in Figure 14.5 may be described by a lower-level dataflow diagram, not by a process specification. If this is the case, then one of the lower-level bubbles (perhaps not one level, but several levels below) will be a primitive and will access the store directly. Recall from Chapter 9 that our convention on the DFD is that the store is shown at the highest level where it is an interface between two bubbles, and it is repeated in every lower-level diagram.