[Book notes] A Philosophy of Software Design

Table of Contents

Preface
#

The most fundamental problem in computer science is problem decomposition: how to take a complex problem and divide it up into pieces that can be solved independently.

Chapter 1. Introduction (It’s All About Complexity)
#

Complexity will still increase over time, in spite of our best efforts, but simpler designs allow us to build larger and more powerful systems before complexity becomes overwhelming.
The first approach is to eliminate complexity by making code simpler and more obvious. For example, complexity can be reduced by eliminating special cases or using identifiers in a consistent fashion.
The second approach to complexity is to encapsulate it, so that programmers can work on a system without being exposed to all of its complexity at once. This approach is called modular design. In modular design, a software system is divided up into modules, such as classes in an object-oriented language. The modules are designed to be relatively independent of each other, so that a programmer can work on one module without having to understand the details of other modules.
Because of these issues, most software development projects today use an incremental approach such as agile development, in which the initial design focuses on a small subset of the overall functionality. This subset is designed, implemented, and then evaluated.
In contrast, major design changes are much more challenging for physical systems: for example, it would not be practical to change the number of towers supporting a bridge in the middle of construction.
The first is to describe the nature of software complexity: what does “complexity” mean, why does it matter, and how can you recognize when a program has unnecessary complexity?
The book’s second, and more challenging, goal is to present techniques you can use during the software development process to minimize complexity.
When you read other people’s code, think about whether it conforms to the concepts discussed here and how that relates to the complexity of the code. It’s easier to see design problems in someone else’s code than your own.

Chapter 2. The Nature of Complexity
#

Complexity is anything related to the structure of a software system that makes it hard to understand and modify the system. Complexity can take many forms. For example, it might be hard to understand how a piece of code works; it might take a lot of effort to implement a small improvement, or it might not be clear which parts of the system must be modified to make the improvement; it might be difficult to fix one bug without introducing another. If a software system is hard to understand and modify, then it is complicated; if it is easy to understand and modify, then it is simple.
The overall complexity of a system (C) is determined by the complexity of each part p (cp) weighted by the fraction of time developers spend working on that part (tp).
Change amplification: The first symptom of complexity is that a seemingly simple change requires code modifications in many different places.
Cognitive load: The second symptom of complexity is cognitive load, which refers to how much a developer needs to know in order to complete a task.
Cognitive load arises in many ways, such as APIs with many methods, global variables, inconsistencies, and dependencies between modules.
Sometimes an approach that requires more lines of code is actually simpler, because it reduces cognitive load.
Unknown unknowns: The third symptom of complexity is that it is not obvious which pieces of code must be modified to complete a task, or what information a developer must have to carry out the task successfully.
Of the three manifestations of complexity, unknown unknowns are the worst. An unknown unknown means that there is something you need to know, but there is no way for you to find out what it is, or even whether there is an issue.
In an obvious system, a developer can quickly understand how the existing code works and what is required to make a change.
A simple example is a variable name that is so generic that it doesn’t carry much useful information (e.g., time). Or, the documentation for a variable might not specify its units, so the only way to find out is to scan code for places where the variable is used.
The need for extensive documentation is often a red flag that the design isn’t quite right. The best way to reduce obscurity is by simplifying the system design.

Chapter 3. Working Code Isn’t Enough (Strategic vs. Tactical Programming)
#

If you program tactically, each programming task will contribute a few of these complexities. Each of them probably seems like a reasonable compromise in order to finish the current task quickly. However, the complexities accumulate rapidly, especially if everyone is programming tactically.
Almost every software development organization has at least one developer who takes tactical programming to the extreme: a tactical tornado. The tactical tornado is a prolific programmer who pumps out code far faster than others but works in a totally tactical fashion.
The first step towards becoming a good software designer is to realize that working code isn’t enough.
Strategic programming requires an investment mindset. Rather than taking the fastest path to finish your current project, you must invest time to improve the design of the system. These investments will slow you down a bit in the short term, but they will speed you up in the long term, as illustrated in Figure 3.1.
Try to imagine a few ways in which the system might need to be changed in the future and make sure that will be easy with your design.
I suggest spending about 10–20% of your total development time on investments.
Eventually, Facebook changed its motto to “Move fast with solid infrastructure” to encourage its engineers to invest more in good design. It remains to be seen whether Facebook can successfully clean up the problems that accumulated over years of tactical programming.

Chapter 4. Modules Should Be Deep
#

There will be dependencies between the modules: if one module changes, other modules may need to change to match. For example, the arguments for a method create a dependency between the method and any code that invokes the method.
The goal of modular design is to minimize the dependencies between modules.
Each interface also includes informal elements. These are not specified in a way that can be understood or enforced by the programming language. The informal parts of an interface include its high-level behavior, such as the fact that a function deletes the file named by one of its arguments. If there are constraints on the usage of a class (perhaps one method must be called before another), these are also part of the class’s interface.
For most interfaces the informal aspects are larger and more complex than the formal aspects.
The term abstraction is closely related to the idea of modular design. An abstraction is a simplified view of an entity, which omits unimportant details. Abstractions are useful because they make it easier for us to think about and manipulate complex things.
An abstraction that omits important details is a false abstraction: it might appear simple, but in reality it isn’t. The key to designing abstractions is to understand what is important, and to look for designs that minimize the amount of information that is important.
The best modules are deep: they have a lot of functionality hidden behind a simple interface. A deep module is a good abstraction because only a small fraction of its internal complexity is visible to its users.
Another example of a deep module is the garbage collector in a language such as Go or Java. This module has no interface at all; it works invisibly behind the scenes to reclaim unused memory.
A shallow module is one whose interface is complicated relative to the functionality it provides.
Unfortunately, the value of deep classes is not widely appreciated today. The conventional wisdom in programming is that classes should be small, not deep.
Classitis may result in classes that are individually simple, but it increases the complexity of the overall system.
If an interface has many features, but most developers only need to be aware of a few of them, the effective complexity of that interface is just the complexity of the commonly used features.

Chapter 5. Information Hiding (and Leakage)
#

Information hiding reduces complexity in two ways. First, it simplifies the interface to a module. The interface reflects a simpler, more abstract view of the module’s functionality and hides the details; this reduces the cognitive load on developers who use the module. For instance, a developer using a B-tree class need not worry about the ideal fanout for nodes in the tree or how to keep the tree balanced. Second, information hiding makes it easier to evolve the system. If a piece of information is hidden, there are no dependencies on that information outside the module containing the information, so a design change related to that information will affect only the one module. For example, if the TCP protocol changes (to introduce a new mechanism for congestion control, for instance), the protocol’s implementation will have to be modified, but no changes should be needed in higher-level code that uses TCP to send and receive data.
Note: hiding variables and methods in a class by declaring them private isn’t the same thing as information hiding. Private elements can help with information hiding, since they make it impossible for the items to be accessed directly from outside the class. However, information about the private items can still be exposed through public methods such as getter and setter methods. When this happens the nature and usage of the variables are just as exposed as if the variables were public.
However, partial information hiding also has value.
The opposite of information hiding is information leakage. Information leakage occurs when a design decision is reflected in multiple modules. This creates a dependency between the modules: any change to that design decision
Information leakage is one of the most important red flags in software design.
If the affected classes are relatively small and closely tied to the leaked information, it may make sense to merge them into a single class.
However, most design decisions manifest themselves at several different times over the life of the application; as a result, temporal decomposition often results in information leakage.
When designing modules, focus on the knowledge that’s needed to perform each task, not the order in which tasks occur.
In temporal decomposition, execution order is reflected in the code structure: operations that happen at different times are in different methods or classes.
If the same knowledge is used at different points in execution, it gets encoded in multiple places, resulting in information leakage.
One team used two different classes for receiving HTTP requests; As a result, both classes needed to understand most of the structure of HTTP requests, and parsing code was duplicated in both classes.
Because the classes shared so much information, it would have been better to merge them into a single class that handles both request reading and parsing.
Here is a better interface for retrieving parameter values:

public String getParameter( String name) { ... } 
public int getIntParameter( String name) { ... }

getParameter returns a parameter value as a string. It provides a slightly deeper interface than getParams above;
If the API for a commonly used feature forces users to learn about other features that are rarely used, this increases the cognitive load on users who don’t need the rarely used features.
As a software designer, your goal should be to minimize the amount of information needed outside a module; for example, if a module can automatically adjust its configuration, that is better than exposing configuration parameters.
When decomposing a system into modules, try not to be influenced by the order in which operations will occur at runtime; that will lead you down the path of temporal decomposition, which will result in information leakage and shallow modules.

Chapter 6. General-Purpose Modules are Deeper
#

I now think that over-specialization may be the single greatest cause of complexity in software. Conversely, code that is more general-purpose is simpler, cleaner, and easier to understand.
In my experience, the sweet spot is to implement new modules in a somewhat general-purpose fashion. The phrase “somewhat general-purpose” means that the module’s functionality should reflect your current needs, but its interface should not. Instead, the interface should be general enough to support multiple uses.
The original student implementation leaked specialized user-interface details such as the behavior of the backspace key down into the implementation of the text class. The improved text API pushed all of the specialization upwards into the user interface code, leaving only general-purpose code in the text class.
Each of these categories can be implemented without any understanding of the other categories. The History class does not know what kind of actions are being undone; it could be used in a variety of applications. Each action class understands only a single kind of action, and neither the History class nor the action classes needs to be aware of the policy for grouping actions.
The best way to do this is by designing the normal case in a way that automatically handles the edge conditions without any extra code.
The selection handling code can be simplified by eliminating the “no selection” special case, so that the selection always exists. When there is no selection visible on the screen, it can be represented internally with an empty selection, whose starting and ending positions are the same.
With this approach, the selection management code can be written without any checks for “no selection”. When copying the selection, if the selection is empty then 0 bytes will be inserted at the new location;

Chapter 7. Different Layer, Different Abstraction
#

If a system contains adjacent layers with similar abstractions, this is a red flag that suggests a problem with the class decomposition.
A pass-through method is one that does little except invoke another method, whose signature is similar or identical to that of the calling method. For example, a student project implementing a GUI text editor contained a class consisting almost entirely of pass-through methods.
A pass-through method is one that does nothing except pass its arguments to another method, usually with the same API as the pass-through method. This typically indicates that there is not a clean division of responsibility between the classes.
Pass-through methods make classes shallower: they increase the interface complexity of the class, which adds complexity, but they don’t increase the total functionality of the system.
Pass-through methods indicate that there is confusion over the division of responsibility between classes.
Having methods with the same signature is not always bad. The important thing is that each new method should contribute significant functionality. Pass-through methods are bad because they contribute no new functionality.
When several methods provide different implementations of the same interface, it reduces cognitive load. Once you have worked with one of these methods, it’s easier to work with the others, since you don’t need to learn a new interface. Methods like this are usually in the same layer and they don’t invoke each other.
The motivation for decorators is to separate special-purpose extensions of a class from a more generic core. However, decorator classes tend to be shallow: they introduce a large amount of boilerplate for a small amount of new functionality. Decorator classes often contain many pass-through methods. It’s easy to overuse the decorator pattern, creating a new class for every small new feature. This results in an explosion of shallow classes, such as the Java I/ O example.
Could you add the new functionality directly to the underlying class, rather than creating a decorator class? This makes sense if the new functionality is relatively general-purpose, or if it is logically related to the underlying class, or if most uses of the underlying class will also use the new functionality.
Another form of API duplication across layers is a pass-through variable, which is a variable that is passed down through a long chain of methods.
Eliminating pass-through variables can be challenging. One approach is to see if there is already an object shared between the topmost and bottommost methods.
The solution I use most often is to introduce a context object as in Figure 7.2( d). A context stores all of the application’s global state (anything that would otherwise be a pass-through variable or global variable).
The context object unifies the handling of all system-global information and eliminates the need for pass-through variables. If a new variable needs to be added, it can be added to the context object; no existing code is affected except for the constructor and destructor for the context.
Contexts are far from an ideal solution. The variables stored in a context have most of the disadvantages of global variables;
Contexts may also create thread-safety issues; the best way to avoid problems is for variables in a context to be immutable. Unfortunately, I haven’t found a better solution than contexts.

Chapter 8. Pull Complexity Downwards
#

Another way of expressing this idea is that it is more important for a module to have a simple interface than a simple implementation.
As a developer, it’s tempting to behave in the opposite fashion: solve the easy problems and punt the hard ones to someone else.
Approaches like these will make your life easier in the short term, but they amplify complexity, so that many people must deal with a problem, rather than just one person.
For example, if a class throws an exception, every caller of the class will have to deal with it. If a class exports configuration parameters, every system administrator in every installation will have to learn how to set them.
However, configuration parameters also provide an easy excuse to avoid dealing with important issues and pass them on to someone else. In many cases, it’s difficult or impossible for users or administrators to determine the right values for the parameters.
Before exporting a configuration parameter, ask yourself: “will users (or higher-level modules) be able to determine a better value than we can determine here?” When you do create configuration parameters, see if you can provide reasonable defaults, so users will only need to provide values under exceptional conditions.

Chapter 9. Better Together Or Better Apart
#

Some complexity comes just from the number of components: the more components, the harder to keep track of them all and the harder to find a desired component within the large collection.
They share information; for example, both pieces of code might depend on the syntax of a particular type of document.
They are used together: anyone using one of the pieces of code is likely to use the other as well.
This approach is most effective if the repeated code snippet is long and the replacement method has a simple signature. If the snippet is only one or two lines long, there may not be much benefit in replacing it with a method call.
The cursor implementation also got simpler because the cursor position was represented directly, rather than indirectly through a selection and a boolean.
In combination, the cursor position should be determined by the position, and whether it is selected.
The logging methods were highly dependent on their invocations: someone reading the invocation would most likely flip over to the logging method to make sure that the right information was being logged;
However, length by itself is rarely a good reason for splitting up a method.
In general, developers tend to break up methods too much. You shouldn’t break up a method unless it makes the overall system simpler;
Methods containing hundreds of lines of code are fine if they have a simple signature and are easy to read. These methods are deep (lots of functionality, simple interface), which is good.
When designing methods, the most important goal is to provide clean abstractions. Each method should do one thing and do it completely.
If you make a split of this form and then find yourself flipping back and forth between the parent and child to understand how they work together, that is a red flag (“ Conjoined Methods”) indicating that the split was probably a bad idea.
If the caller has to invoke each of the separate methods, passing state back and forth between them, then splitting is not a good idea.
If you can’t understand the implementation of one method without also understanding the implementation of another, that’s a red flag.
I agree that shorter functions are generally easier to understand than longer ones. However, once a function gets down to a few dozen lines, further reductions in size are unlikely to have much impact on readability. A more important issue is: does breaking up a function reduce the overall complexity of the system?

Chapter 10. Define Errors Out Of Existence
#

Large systems have to deal with many exceptional conditions, particularly if they are distributed or need to be fault-tolerant. Exception handling can account for a significant fraction of all the code in a system.
The first approach is to move forward and complete the work in progress in spite of the exception. For example, if a network packet is lost, it can be resent; if data is corrupted, perhaps it can be recovered from a redundant copy.
The second approach is to abort the operation in progress and report the exception upwards. However, aborting can be complicated because the exception may have occurred at a point where system state is inconsistent (a data structure might have been partially initialized);
To prevent an unending cascade of exceptions, the developer must eventually find a way to handle exceptions without introducing more exceptions.
Bugs can go undetected for a long time, and when the exception handling code is finally needed, there’s a good chance that it won’t work (one of my favorite sayings: “code that hasn’t been executed doesn’t work”).
A recent study found that more than 90% of catastrophic failures in distributed data-intensive systems were caused by incorrect error handling.
The exceptions thrown by a class are part of its interface; classes with lots of exceptions have complex interfaces, and they are shallower than classes with fewer exceptions.
The best way to reduce the complexity damage caused by exception handling is to reduce the number of places where exceptions have to be handled.
The Java substring method would be easier to use if it performed this adjustment automatically, so that it implemented the following API: “returns the characters of the string (if any) with index greater than or equal to beginIndex and less than endIndex.” This is a simple and natural API, and it defines the IndexOutOfBoundsException exception out of existence.
Overall, the best way to reduce bugs is to make software simpler.
The second technique for reducing the number of places where exceptions must be handled is exception masking. With this approach, an exceptional condition is detected and handled at a low level in the system, so that higher levels of software need not be aware of the condition.
The idea behind exception aggregation is to handle many exceptions with a single piece of code; rather than writing distinct handlers for many individual exceptions, handle them all in one place with a single handler.
The fourth technique for reducing complexity related to exception handling is to crash the application. In most applications there will be certain errors that are not worth trying to handle.
Applications using the module had no way to find out if messages were lost or a peer server failed; without this information, it was impossible to build robust applications. In this case, it is essential for the module to expose the exceptions, even though they add complexity to the module’s interface.

Chapter 11. Design it Twice
#

Rather than picking the first idea that comes to mind, consider several possibilities.
Even if you are certain that there is only one reasonable approach, consider a second design anyway, no matter how bad you think it will be.
If you want to get really great results, you have to consider a second possibility, or perhaps a third, no matter how smart you are. The design of large software systems falls in this category: no-one is good enough to get it right with their first try.

Chapter 12. Why Write Comments? The Four Excuses
#

The process of writing comments, if done correctly, will actually improve a system’s design. Conversely, a good software design loses much of its value if it is poorly documented.
Good comments can make a big difference in the overall quality of software; it isn’t hard to write good comments; and (this may be hard to believe) writing comments can actually be fun.
In order to understand the behavior of the top-level method, readers will probably need to understand the behaviors of the nested methods. For large systems it isn’t practical for users to read the code to learn the behavior.
If users must read the code of a method in order to use it, then there is no abstraction: all of the complexity of the method is exposed.
Comments allow us to capture the additional information that callers need, thereby completing the simplified view while hiding implementation details.
If you allow documentation to be de-prioritized, you’ll end up with no documentation.
The overall idea behind comments is to capture information that was in the mind of the designer but couldn’t be represented in the code.
Documentation can reduce cognitive load by providing developers with the information they need to make changes and by making it easy for developers to ignore information that is irrelevant.

Chapter 13. Comments Should Describe Things that Aren’t Obvious from the Code
#

The guiding principle for comments is that comments should describe things that aren’t obvious from the code.
Developers should be able to understand the abstraction provided by a module without reading any code other than its externally visible declarations.
The most common reason is that the comments repeat the code: all of the information in the comment can easily be deduced from the code next to the comment.
A first step towards writing good comments is to use different words in the comment from those in the name of the entity being described. Pick words for the comment that provide additional information about the meaning of the entity, rather than just repeating its name.
Comments augment the code by providing information at a different level of detail.
The most common problem with comments for variables is that the comments are too vague. Here are two examples of comments that aren’t precise enough:
In the first example, it’s not clear what “current” means. In the second example, it’s not clear that the keys in the TreeMap are line widths and values are occurrence counts.
When documenting a variable, think nouns, not verbs. In other words, focus on what the variable represents, not how it is manipulated.
If you want code that presents good abstractions, you must document those abstractions with comments.
If interface comments must also describe the implementation, then the class or method is shallow. This means that the act of writing comments can provide clues about the quality of a design; Chapter 15 will return to this idea.
If there are any preconditions that must be satisfied before a method is invoked, these must be described (perhaps some other method must be invoked first; for a binary search method, the list being searched must be sorted).
For each piece of information given below, ask yourself whether a developer needs to know that information in order to use the class (my answers to the questions are at the end of the chapter):
The main goal of implementation comments is to help readers understand what the code is doing (not how it does it).
The goal of comments is to ensure that the structure and behavior of the system is obvious to readers, so they can quickly find the information they need and make modifications to the system with confidence that they will work.
When following the rule that comments should describe things that aren’t obvious from the code, “obvious” is from the perspective of someone reading your code for the first time (not you).

Chapter 14. Choosing Names
#

If it’s hard to find a simple name for a variable or method that creates a clear image of the underlying object, that’s a hint that the underlying object may not have a clean design.

Chapter 15. Write The Comments First
#

The best time to write comments is at the beginning of the process, as you write the code. Writing the comments first makes documentation part of the design process.
If a method or variable requires a long comment, it is a red flag that you don’t have a good abstraction.
Writing the comments first will mean that the abstractions will be more stable before you start writing code. This will probably save time during coding. In contrast, if you write the code first, the abstractions will probably evolve as you code, which will require more code revisions than the comments-first approach.

Chapter 16. Modifying Existing Code
#

The design of a mature system is determined more by changes made during the system’s evolution than by any initial conception.
A typical mindset is “what is the smallest possible change I can make that does what I need?” Sometimes developers justify this because they are not comfortable with the code being modified; they worry that larger changes carry a greater risk of introducing new bugs.
Each one of these minimal changes introduces a few special cases, dependencies, or other forms of complexity. As a result, the system design gets just a bit worse, and the problems accumulate with each step in the system’s evolution.
Ideally, when you have finished with each change, the system will have the structure it would have had if you had designed it from the start with that change in mind.
Ask yourself “Is this the best I can possibly do to create a clean system design, given my current constraints?”
If information is already documented someplace outside your program, don’t repeat the documentation inside the program; just reference the external documentation.

Chapter 17. Consistency
#

Don’t change existing conventions. Resist the urge to “improve” on existing conventions. Having a “better idea” is not a sufficient excuse to introduce inconsistencies.

Chapter 18. Code Should Be Obvious
#

Event-driven programming makes it hard to follow the flow of control. The event handler functions are never invoked directly; they are invoked indirectly by the event module, typically using a function pointer or interface.
To compensate for this obscurity, use the interface comment for each handler function to indicate when it is invoked, as in this example:
If you need a container, define a new class or structure that is specialized for the particular use.
Software should be designed for ease of reading, not ease of writing.
If code is nonobvious, that usually means there is important information about the code that the reader does not have

Chapter 19. Software Trends
#

For instance, it may be possible to use small helper classes to implement the shared functionality. Rather than inheriting functions from a parent, the original classes can each build upon the features of the helper classes.
One of the risks of agile development is that it can lead to tactical programming. Agile development tends to focus developers on features, not abstractions, and it encourages developers to put off design decisions in order to produce working software as soon as possible.
Tests, particularly unit tests, play an important role in software design because they facilitate refactoring. As a result, developers avoid refactoring in systems without good test suites; they try to minimize the number of code changes for each new feature or bug fix, which means that complexity accumulates and design mistakes don’t get corrected.
Although I am a strong advocate of unit testing, I am not a fan of test-driven development. The problem with test-driven development is that it focuses attention on getting specific features working, rather than finding the best design.
Don’t try to force a problem into a design pattern when a custom approach will be cleaner.
Although it may make sense to use getters and setters if you must expose instance variables, it’s better not to expose instance variables in the first place. Exposed instance variables mean that part of the class’s implementation is visible externally, which violates the idea of information hiding and increases the complexity of the class’s interface.

Preface #

Chapter 1. Introduction (It’s All About Complexity) #

Chapter 2. The Nature of Complexity #

Chapter 3. Working Code Isn’t Enough (Strategic vs. Tactical Programming) #

Chapter 4. Modules Should Be Deep #

Chapter 5. Information Hiding (and Leakage) #

Chapter 6. General-Purpose Modules are Deeper #

Chapter 7. Different Layer, Different Abstraction #

Chapter 8. Pull Complexity Downwards #

Chapter 9. Better Together Or Better Apart #

Chapter 10. Define Errors Out Of Existence #

Chapter 11. Design it Twice #

Chapter 12. Why Write Comments? The Four Excuses #

Chapter 13. Comments Should Describe Things that Aren’t Obvious from the Code #

Chapter 14. Choosing Names #

Chapter 15. Write The Comments First #

Chapter 16. Modifying Existing Code #

Chapter 17. Consistency #

Chapter 18. Code Should Be Obvious #

Chapter 19. Software Trends #

Preface
#

Chapter 1. Introduction (It’s All About Complexity)
#

Chapter 2. The Nature of Complexity
#

Chapter 3. Working Code Isn’t Enough (Strategic vs. Tactical Programming)
#

Chapter 4. Modules Should Be Deep
#

Chapter 5. Information Hiding (and Leakage)
#

Chapter 6. General-Purpose Modules are Deeper
#

Chapter 7. Different Layer, Different Abstraction
#

Chapter 8. Pull Complexity Downwards
#

Chapter 9. Better Together Or Better Apart
#

Chapter 10. Define Errors Out Of Existence
#

Chapter 11. Design it Twice
#

Chapter 12. Why Write Comments? The Four Excuses
#

Chapter 13. Comments Should Describe Things that Aren’t Obvious from the Code
#

Chapter 14. Choosing Names
#

Chapter 15. Write The Comments First
#

Chapter 16. Modifying Existing Code
#

Chapter 17. Consistency
#

Chapter 18. Code Should Be Obvious
#

Chapter 19. Software Trends
#