Keep your source code SIMPLE
As software developers we are fortunate to have many useful best practices for productive and fun coding like the SOLID principles, GRASP patterns, or STUPID anti-patterns. These principles are timeless and apply to many forms of software development, no matter which programming paradigm or language you use.
Here is one more set of best practices that I have found useful for large-scale and long-term software development and maintenance. They are aptly named the SIMPLE principles:
- Strong-enough data: avoid stringly-typed data structures
- Immutable where possible: avoid unexpected mutability
- Misuse-proof APIs: make it impossible to use your APIs the wrong way
- Pure logic: separate processing from side-effects
- Lean components: avoid large and complex pieces of code
- Expressive errors: provide helpful error messages
Before we get started, let’s remember Uncle Bob’s wise words about such principles:
They are not rules, laws, nor perfect truths. They are statements on the order of “an apple a day keeps the doctor away”. They give a name to a concept so that we can talk and reason about it. They provide a place to hang the feelings we have about good and bad code. They attempt to categorize those feelings into concrete advice.
In other words, the purpose of these principles is to build up a healthy intuition for finding and cleaning up smelly code. There are legitimate reasons not to follow any of these rules in certain situations.
Another important aspect to keep in mind is that the SIMPLE principles make the overall development process — maintaining, debugging, refactoring, and adding features with limited knowledge of how the entire system works — simpler, at the expense of making your codebase less simple. I’d argue that this is often a worthwhile tradeoff.
Strong-enough data types
Even with a strong static type system, you can end up with lots of different meanings for basic types, which makes easy to mix them up. Joel Spolsky provides convincing examples for this problem in his excellent piece Making Wrong Code Look Wrong: In the Excel codebase, there are at least a few dozen possible meanings for an integer
. It could be:
- a row or column number
- a horizontal or vertical coordinate relative to the layout or window
- the difference between two horizontal or vertical coordinates (a width or height)
- a count of bytes (an offset)
Only certain types of integers should be combined. For example, a horizontal offset relative to the layout with a width relative to the layout. Adding a horizontal offset to a vertical offset, or mixing offsets relative to the window and layout is most likely a bug. Having so many meanings for integers makes your type system weaker than you might think. The Excel team, being limited to what C++ can do, addresses this by including the domain-specific meaning of variables into their names via Hungarian Notation. If you use a more flexible type system, you could also define domain-specific type aliases. In Go, this would look like:
type LayoutHeight int
You want to define dedicated new types that you have to manually cast if needed, not type aliases or typedefs. But don’t over-engineer your type system: do this only when the benefit of having additional consistency exceeds the cost for the increased code complexity.
Immutable where possible
Shared mutable data is a hotspot for bugs and concurrency problems. The best solution to this problem is fully immutable data. But that’s not always possible or might come with undesirable tradeoffs. In those situations it can be helpful to allow mutability but control which mutations are allowed.
As an example, consider a StringBuilder component that allows building up a longer string out of smaller pieces. Using a fully mutable variable here would allow users to accidentally change already accumulated text. This can lead to bugs that are hard to track down. So let’s allow only appending of new text. Here is an example implementation in TypeScript:
class StringBuilder {
private pieces: Array<string> = [] append(text: string) {
this.pieces.push(text)
} // returns the text accumulated so far
toString() {
return this.pieces.join("")
}
}
Misuse-proof APIs
Poorly designed APIs make it easy to use them the wrong way and lead to unhappy users and pressure on customer support. Here is a hypothetical API to download a file that is easy to use wrong:
client = new http.Client()
client.setHeader("foo", "bar")
client.setCredentials(myCreds)
client.request("https://acme.corp/info.txt")
if (client.success()) {
return client.receivedText("utf8")
}
Several things are easy to to wrong here:
- This class requires calling its methods in a particular order. For example, before starting the download, the user must call
setHeader
andsetCredentials
to configure this client. You better not forget this! - It’s not clear what happens if you call
setCredentials
twice. Does it use both values? If not, which of the two credentials does it use? - Some state, like the received text or whether the request was a success, is visible to the user right away but only meaningful after the download has completed. Examining this state earlier is invalid but this API allows it.
- We could start a second download using the same client. Does this work? Do both requests run in parallel? Does it abort one of them? If so, which one? Does it reuse the credentials and header values from the first call? If yes, how does one unset one of them for the second call?
- What are other possible text encodings besides
"utf8"
?
Good APIs make it natural to use them the right way by exposing only controls that perform valid operations. Better than throwing errors or exceptions at users when they misuse an API is to avoid these issues from happening in the first place. Here is an example for a more “idiot-proof” file download API:
response = http.downloadFile({
url: "https://acme.com/info.txt",
headers: { foo: "bar" },
credentials: myCreds
})
if (response.success()) {
return response.receivedText(encoding.UTF8)
}
This new API exposes just one function to perform a download. It is obvious which configuration it uses for this download, the one that it receives in the arguments. It is also obvious how to perform another download using this API: call the downloadFile
function again with whatever configuration you want, and you get a new Response
containing the outcome of that download. Users only see responses once they can do something with them: when the download is complete. All fields in the response are meaningful and populated. Possible text encodings are type-checked enums or constants.
Pure logic
Pure functions are functions that only use their arguments to determine their result and have no side effects, i.e. don’t interact with external variables, databases, files, or the network.
You want to make most of your logic pure. It requires fewer tests since it always produces the same output for the same input no matter in which state the rest of your system is. Tests for it require less setup because it doesn’t interact with external dependencies. You can cache/reuse the results of pure logic and run it concurrently.
A pragmatic approach to integrating pure logic into larger systems is Gary Bernhardt’s functional core, imperative shell architecture.
Lean components
A maintainable codebase consists of files that don’t exceed 100 lines of actual code (ignoring comments and data), give or take depending on how chatty your programming language is. Bigger files likely do too many things, which makes it harder to work on them. Functions shouldn’t exceed 20 lines of actual code. But don’t take this principle too far and make Ravioli code where each function contains only one or two lines and the application logic is spread out too much.
Expressive error messages
Besides misuse-proof APIs, error messages are another opportunity to help guide your users towards calling your APIs correctly. Here is an example of a not particularly helpful error message:
Error: ENOENT, no such file or directory '~/foorc'
at server.js:13:9
at cli.js:27:11
Googling indicates that ENOENT means “I couldn’t access the file”. To the user, it might not be clear why the application tries to access this file, why it couldn’t access it in case it’s actually there, nor what to do to make this error go away. The stack trace shows where inside your code this problem happened, but that isn’t helpful to your user who doesn’t want to use but not change your software.
Your component knows how to use it, and it has just experienced the problem the user made. It has a lot of information about what went wrong and what should have been done instead. A more helpful error message could look something like this:
Error: cannot determine the application configuration:
problem reading the global application configuration file:
file ~/foorc exists but cannot open it.Possible solutions:
* make this file readable by the FOO user
* provide a different configuration file via the --config parameter
* provide --no-config to use the default configuration values More information at http://acme.corp/foo/configuration
This error message gives the user helpful insights into the nature of the problem, points to possible solutions, and links to the respective parts of the documentation for more background.
Making error messages useful can cause significant amounts of complexity. You need to enrich errors as they bubble up the call stack. This extra complexity is a worthwhile investment since it makes your code easier and simpler to use, you have happier and thereby more users, and spend less time and money on customer support.
Wrapping up
The SIMPLE principles bring some good ideas from functional programming to the architecture of any codebase and hopefully make creating high-quality software solutions simpler and more fun. Happy hacking!
Discuss this story on Hacker News.