The Skeptical Methodologist

Software, Rants and Management

To Abstract; or not Too Abstract

“All problems in computer science can be solved by another level of indirection.”

David Wheeler

“…except for the problem of too many layers of indirection.”

Kevlin Henney

When we abstract in software, we often attempt to build a new “thing” with underlying parts of old “things”, because we think it will be easier to reason using this new “thing”.

For instance, if we’re sitting down mucking with spark plugs, wheels, and crankshafts, sometime’s it’s easier to just call the whole thing a car and deal with it on that level.

An abstraction like this is a little mental machine or tool. We can use it to think in higher level terms and ‘chunk’ complexity away. But there is a cost to this abstraction – it’s one more thing someone new to the code has to learn.

The Cost of Abstraction

Abstraction has costs. Let’s take two five line bits of duplicated code.

foo x = y(); //line 1

y.baz1(); //line 2

y.baz2(); //line 3

y.baz3(); //line 4

y.baz4(); //line 5

//then, somewhere else in our code…

foo x = y(); //line 6

y.baz1(); //line 7

y.baz2(); //line 8

y.baz3(); //line 9

y.baz4(); // line 10

I’m going to put this all into a common function.

foo bar() { //line 1

    foo x = y(); //line 2

    y.baz1(); //line 3

    y.baz2(); //line 4

    y.baz3(); //line 5

    y.baz4(); // line 6

    return y; } //line 7

foo x = bar(); //line 8

//then, somewhere else in our code

foo x = bar(); //line 9

Now, by line count, which is one measure of complexity, I’ve made the problem less complex as I’ve saved a single line. Notice that if the duplicated code were any smaller, the raw line count overhead of refactoring it out would have actually broken even or made things worse.

But I’ve added a bit of complexity which I’ll call distance complexity. There’s size complexity – the raw number of lines. Then there’s how complex something is based on how related, local or distant something is. Functions like the above deliberately introduce some distance so that code can be shared.

Think about it this way – if more complex code is harder to understand, you can use these heuristics: size metrics like “raw number of lines of code” are how much code is in one tab of your IDE. Distance metrics like how many functions are called from my line of code is how many tabs of your IDE you’d need open.

Inline code is, in this way of viewing, simpler in that you’d need fewer tabs open to read it.

Good Design Helps

Of course, there are design principles in play that reduce complexity over and above when and if to refactor duplication. Good naming, documentation, strong types and testing all will make a function call less and less complex to a reader.

double do_it(double arg1, double arg2, double arg3, double arg4);

is more complex than 

//Calculates the square root of the sum of squares, or euclidian distance

Distance euclidian_distance(Point start, Point end);

But why? Let’s think about this for a second. Let’s break programming in the things I need to know about the problem specifically (we’ll call them accidents) to solve it, and the things I need to know about the problem generally (we’ll call these essentials) to solve it.

In the case of the above, I need to know – specifically – that arg1 and arg2 are the x,y variables of a point, arg3 and arg4 are the x,y variables of another point, and the double is the distance between them. That’s specific or accidental knowledge.

In both cases above, I need to know how to calculate the euclidian distance. That’s the general or essential knowledge.

When I read the first function, I have to decipher it first. I have to figure out what each variable means, and what on earth the function is doing. When I read the second function, all of that knowledge is already encoded for me in the types, documentation, variable names, and tests. Thus, the length of time it takes me to understand the second is less than the first, and what’s why good design principles such as naming and documentation result in less complex code.

Accidental Essense

So we’ve seen one technique for reducing accidental complexity – encoding. This is basically like taking good notes, for reference, and then referring to them over and over while trying to decipher a text. It makes it easier because you have a guide. In our case above, the text has already been ‘pre-translated’ using our encoding so you can read it directly.

But what if you do that long enough – have a simple reference, and some sort of esoteric text to translate, like a cryptographic code? Eventually, you’ll just remember what’s in the reference and not have to look at it as much. This is the second approach.

It works like loading a cache – each time you see an unfamiliar symbol, you look it up. The more times you look up any particular symbol, the more likely you are to remember it next time, allowing you to skip the lookup step.

This is the transference of accidental knowledge to essential knowledge. You may see code in your system that’s duplicated so much, that you begin to think in terms of it. These bits of duplicated code no longer represent accidental complexity, because you’ve memorized them. You see them and you think “oh, we’re summing a list again”. You’ve created a function, in your head, that helps you understand that bit of code. You’ve ‘chunked’ these accidents into one essence, how to sum a list.

Refactoring this common code out into functions is just letting the language do it for you.

Back To Cost

Again, with this metaphor, what are our costs? Well, it’s the cost to memorize what summing a list does. Certainly, if you have a function named ‘sum_list’, that’s easier to memorize, or load into the cache, than a more poorly named function.

But ultimately, you’re switching from ‘encoding’ to ‘chunking’ in terms of your complexity management scheme when you refactor redundant code. When the code is spread out, inlined, it’s all right there. You don’t need to refer to anything else to figure out what the code does, it’s all on your screen, hopefully well encoded for you.

When you refactor it out into a function, you’re creating a new chunk. New readers of your code won’t know what that chunk means and they’re going to have to go read it to find that out. That’s the cost of abstraction, right there, the time it takes them to internalize your chunk to their cache because it’s not on the paper right in front of them anymore.

As stated before, good design can shrink this chunking time, just as it can speed up encoding/decoding. But it’s not free to learn a new system.

Simple Vs. Complex: It depends!

So do we keep code relatively simple – perhaps good names, but mostly inlined, because we don’t want to burden our readers with chunking time on our functions and classes? Or do we push for abstraction, and take the benefits of chunking over time at some up front cost?

Of course, it depends. If your code is going to be read frequently by newer developers, they will pay the chunking penalty a lot, and not be able to take advantage of its long-term speed. They’ll be memorizing things over and over, and then never using them.

If your code is read frequently by senior developers, they’ll pay the chunking penalty once, then be able to reap the rewards over and over, speeding up their understanding and productivity in your code base.

Actually, it doesn’t depend at all. Abstract!

The problem with the ‘it depends’ above is that we compare a shop with mostly new developers – maybe they move on to other projects frequently or there’s a high turnover rate – with a shop of mostly senior developers, those who have been around awhile.

But the productivity benefits to seniority – not years experience, but years experienced on this project – are MONUMENTAL. A developer who’s been in the weeds on a project for two years is going to be many, many times more productive than someone just starting.

Managers should always manage to lower turnover rates, and should always be trying to keep people growing their knowledge base, rather than scrapping it over and over and starting over.

With this in mind, you should also always emphasize the benefits of good abstraction. Now there are times to abstract and times to not abstract, and the benefits of documentation, naming, and the rest still help lower the “cache loading” time on any new code. But if you find someone arguing we need to keep things simple so that “anyone can work on it”, you’re working in a shop that’s specializing towards a newer developer – not you. You’re working in a shop okay with turnover, and that’s not where you want to be.


September 19, 2016 - Posted by | Uncategorized

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: