P.6. Seven Rules for Sound Documentation

ADVICE

These rules for any software documentation, including software architecture documentation, follow:

  1. Write documentation from the reader's point of view.
  2. Avoid unnecessary repetition.
  3. Avoid ambiguity.
  4. Use a standard organization.
  5. Record rationale.
  6. Keep documentation current but not too current.
  7. Review documentation for fitness of purpose.

Architecture documentation is much like the documentation we write in other facets of our software development projects. As such, it obeys the same fundamental rules for what distinguishes good, usable documentation from poor, ignored documentation.

P.6.1 Rule 1: Write Documentation from the Reader's Point of View

Seemingly obvious but surprisingly seldom considered, this rule offers the following advantages.

In the realm of software documentation, documents written for the writer often take one of two forms: stream of consciousness or stream of execution. Stream-of-consciousness writing captures thoughts in the order in which they occurred to the writer and lacks an organization that is helpful to a reader. Avoid stream-of-consciousness writing by making sure that you know what question(s) are being answered by each section of a document; that is, architect your documentation.

Stream-of-execution writing captures thoughts in the order in which they occur during the execution of a software program. For certain kinds of software documentation, this is entirely appropriate, but it should never be given as the whole story.

P.6.2 Rule 2: Avoid Unnecessary Repetition

I have made this letter rather long only because I have not had time to make it shorter.

Blaise Pascal, French mathematician, physicist, and moralist, 16231662

Each kind of information should be recorded in exactly one place. This makes documentation easier to use and much easier to change as it evolves. It also avoids confusion; information that is repeated is likely to be in a slightly different form, and now the reader must wonder: Was the difference intentional? If so, what is the meaning of the difference?

Now, expressing the same idea in different forms is often useful for achieving a thorough understanding. However, it should be a goal that information never, or almost never, be repeated verbatim unless the cost to the reader of keeping related information separate is high. Locality of information reference is important; unnecessary page flipping leads to reader dissatisfaction. Also, two different views might have repetitive information for clarity or to make different points. If keeping the information separate proves too high a cost to the reader, repeat the information.

Clarity is our only defense against the embarrassment felt on completion of a large project when it is discovered that the wrong problem has been solved.

C. A. R. Hoare (1987, p. 85)

P.6.3 Rule 3: Avoid Ambiguity

A primary reason architecture is useful is that it suppresses or defers the plethora of details that are necessary to resolve before bringing a system to the field. The architecture is therefore ambiguous, one might argue, with respect to these suppressed details. Even though an architecture may be brought to fruition by any number of elaborations/implementations, as long as those implementations comply with the architecture, they are all correct. Unplanned ambiguity occurs when documentation can be interpreted in more than one way and at least one of those ways is incorrect. The documentation should be sufficient to avoid multiple interpretations.

A well-defined notation with precise semantics goes a long way toward eliminating whole classes of linguistic ambiguity from a document. This is one area where architecture description languages help a great deal, but using a formal language isn't always necessary. Simply adopting a set of notational conventions and then avoiding unplanned repetition, especially the "almost-alike" repetition mentioned previously, will help eliminate whole classes of ambiguity. But if you do adopt a notation, then the following corollary applies:

3a. Explain Your Notation

ADVICE

We have several things to say about box-and-line diagrams masquerading as architecture documentation.

  • Don't be guilty of drawing one and claiming that it's anything more than a start at an architectural description.
  • If you draw one yourself, make sure that you explain precisely what the boxes and lines mean.
  • If you see one, ask its author what the boxes mean and what, precisely, the arrows connote. The result is usually illuminating, even if the only thing illuminated is the author's confusion.

The ubiquitous box-and-line diagrams that people always draw on whiteboards are one of the greatest sources of ambiguity in architecture documentation. Although not a bad starting point, these diagrams are certainly not good architecture documentation. For one thing, the behavior of the elements, a crucial part of the architecture, is not defined. Furthermore, most such diagrams suffer from ambiguity. Are the boxes supposed to be modules, objects, classes, processes, functions, procedures, processors, or something else? Do the arrows mean submodule, inheritance, synchronization, exclusion, calls, uses, data flow, processor migration, or something else?

Make it as easy as possible for your reader to determine the meaning of the notation. If you're using a standard visual language defined elsewhere, refer readers to the source of the language's semantics. (Even if the language is standard or widely used, different versions often exist. Let your reader know, by citation, which one you're using.) For a home-grown notation, include a key to the symbology. This is good practice because it compels you to understand what the pieces of your system are and how they relate to one another and it is also courteous to your readers.

P.6.4 Rule 4: Use a Standard Organization

Establish a standard, planned organization scheme; make your documents adhere to it; and ensure that readers know about it. A standard organization offers many benefits.

Corollaries are

  1. Organize documentation for ease of reference. Software documentation may be read from cover to cover at most once, probably never. But a document is likely to be referenced hundreds or thousands of times.
  2. Mark as TBD what you don't yet know rather than leaving it blank. Many times, we can't fill in a document completely because we don't yet know the information or because decisions have not been made. In that case, mark the document accordingly rather than leave the section blank. If the section is blank, the reader will wonder whether the information is coming or whether a mistake was made.

P.6.5 Rule 5: Record Rationale

When you document the results of decisions, record the alternatives you rejected and state why. Later, when those decisions come under scrutiny or pressure to change, you will find yourself revisiting the same arguments and wondering why you didn't take another path. Recording your rationale will save you enormous time in the long run, although it requires discipline to record your rationale in the heat of the moment.

P.6.6 Rule 6: Keep Documentation Current But Not Too Current

Documentation that is incomplete or out-of-date does not reflect truth, does not obey its own rules for form and internal consistency, and is not used. Documentation that is kept current and accurate is used. Why? Because questions about the software can be most easily and most efficiently answered by referring to the appropriate document. Documentation that is somehow inadequate to answer the question needs to be fixed. Updating it and then referring the questioner to it will deliver a strong message that the documentation is the final, authoritative source for information.

During the design process, on the other hand, decisions are made and reconsidered with great frequency. Revising documentation to reflect decisions that will not persist is an unnecessary expense.

Your development plan should specify particular points at which the documentation is brought up-to-date or the process for keeping the documentation current. Every design decision should not be recorded the instant it is made; rather, the document should be subject to version control and have a release strategy, just as every other artifact being produced does.

P.6.7 Rule 7: Review Documentation for Fitness of Purpose

Only the intended users of a document will be able to tell you whether it contains the right information presented in the right way. Enlist their aid. Before a document is released, have it reviewed by representatives of the community or communities for which it was written.

PERSPECTIVES

Quivering at Arrows

Many architectural diagrams with an informal notation use arrows to indicate a directional relationship among architectural elements. Although this might seem like a good and innocuous way to clarify a design by adding visual semantic detail, it creates a great source of confusion in many cases. What do the arrows mean? Do they indicate direction of data flow? Visibility of services or data? Control flow? Invocation? Any of these might make sense, and people use arrows to mean all these things and more, often using multiple interpretations in the same diagram.

Consider the following architectural snippet:

Suppose that Component 1 (C1) invokes Component 2 (C2) via a simple procedure call. What might the arrow mean? It might mean that C1 calls C2. It might mean that C1 passes data to C2 via its parameters. It might mean that C1 obtains a return result from C2. It might mean that C1 causes C2 to come into existence or be loaded into a memory space. It might mean that C2 cannot execute until C1 does. It might mean that C1 cannot execute until C2 terminates. All these interpretations are valid under the assumption that C1 invokes C2.

Alternatively, suppose that we know that C1 invokes C2 and we want to show a data flow between the two. We could use the preceding figure, but if C2 returns a value to C1, shouldn't an arrow go both ways? Or should a single arrow have two arrowheads? These two options are not interchangeable. A double-headed arrow typically denotes a symmetric relationship between two elements, whereas two single-headed arrows suggest two asymmetric relationships at work. In either case, the diagram will lose the information that C1 initiated the interaction. Suppose that C2 also invokes C1. Would we need to put two double-headed arrows between C1 and C2?

The same questions would apply if we wanted to show control flow. How should we depict the fact that C2 returns control to C1 after its execution has completed?

Of course, the situation is even worse if the relationship is a more complex form of interaction, possibly involving multiple procedure calls, complex protocols, rules for handling exceptions and timeouts, and callbacks. To avoid confusion, follow this advice.

When arrows represent nontrivial interactions, document the behavior, using some form of behavior or protocol specification. For example, a dotted line might be used to indicate a control relationship; a solid line, a data transfer relationship. Similarly, different arrowhead shapes can help make distinctions. But by the same token, a procedure call-based interaction, for example, should use the same kind of connecting line throughout the architectural documentation.

Although arrows are often used to indicate interactions, often one can avoid confusion by not using them where they are likely to be misinterpreted. For example, one can use lines without arrowheads. Sometimes, physical placement, as in a layered diagram, can convey the same information.

D.G.

ADVICE

Explain what semantic and notational conventions you are using.

ADVICE

Use different visual conventions to distinguish between semantically distinct types of interaction within a diagram.

ADVICE

Use the same visual conventions for like interactions throughout.

ADVICE

Don't feel compelled to use arrows.

Категории