The Skeptical Methodologist

Software, Rants and Management

When do I abstract?

Premature Flexibilization is the Root of whatever evil is left” takes a look at what I’ve also heard called ‘premature abstraction’, and in today’s high level languages, probably is on par with premature optimization for which causes the most headaches in maintenance code.

It’s also a really funny title.

First, a note about culture.  I work mostly with EE’s (electrical engineers) who cut their teeth on after all these years still swear by Fortran.   I’ve mentioned the odd distrust between EE’s and computer science majors:  engineers think scientists don’t know how to get real work done, while the scientist don’t believe engineers understand any of the complexity of what they build.  Ever find code that attempts to match a string by going through each character one by one against a switch statement?  That’s an engineer at work.  Ever work with a AbstractBuilderFactoryObserver that’s templated on five types?  That’s a scientist at work.  So as not to confuse these ‘scientists’ with real scientists, I’m going to call them designers.  Engineers thrive on seeing things work, designers thrive on building them.  A designer’s work is done when she’s figured out how it should work, there’s no thrill in actually building it.  An engineer’s work is done when it’s built and he can test it.  He wants to see the capabilities of the thing when he’s done.

You can see that each of the major sins discussed so far, premature optimization and premature abstraction, also tend to be committed more by one community than the other.  Code that’s prematurely optimized tends to come from the engineer’s world, while prematurely abstracted code tends to come from the designer.  They each find different reasons to defend what they should know to be wrong.  Frequently I’ll hear an engineer friend tell me that he has to code in such and such way because he’s targeting an embedded system.  I’ll ask him for profile evidence, or try and reason with him that he’s avoiding a temporary variable that doesn’t appear to be in a tight loop.  I’ll try and say that the compiler’s optimization will catch what he’s trying to do.  No avail.  For the most part, he doesn’t trust compiler optimizations.

The designer will claim that she’s building with maintainability in mind, and that the alternative is cowboy coding.  She’s being rigorous, planning for change.  You might ask, “What are the use cases for this abstraction?” which usually yield something like “well, maybe we want to target a completely different OS.”  That’s true, maybe we DO want to target a different OS, but that’s not a change most projects take lightly.  More often than not, some business rule is going to change.  It’s an abstraction that’s going to cost developer time and energy for everyone who touches it, and most likely never have any pay off.

We all know (or we should) when to optimize.  At the design phase, we should be considering and analyzing use cases, choosing efficient data structures and algorithms.  We should draw on our experience in the field to predict certain hot spots, but not rely on this experience too much (in our rapidly changing field, experience has a pretty short shelf life).  We should be aware of free optimizations, or those that don’t sacrifice readability, when coding.  We should design with performance in mind, but realize that performance mostly comes from reducing work, rather than working more cleverly than normal.  When we finally have something built, we should stress test it to find hot spots, and then we should begin working on our ‘clever’ approaches.

But it’s not chiefly taught when to abstract.  This is because, for our go-lucky computer scientists coming out of their Java schools, abstraction is still pretty fresh in everyone’s mind.  It took us about fifty years of being burned by prematurely optimized code to realize we should code first for readability and optimize later.  It is only now, with our gigantic frameworks of frameworks and systems of systems that we’re realizing we’re creating huge  spider web messes of code.  Just look at the Eclipse framework – quite a marvel of design.  Anyone can write a plug in and it WILL work, and you CAN extend or change virtually any part of it, but I’d claim that 90% of the parts of code that CAN change DON’T in any extension.  Yet each part of code that is modifiable exacts a cost on the maintenance of the system.

So when should we abstract?  When should we decide to use polymorphism or some other means to put a ‘point of inflection’ in our code, a point where we can extend it?  I suspect we can derive some simple rules (indeed, many already have such as KISS and YAGNI) based more or less on the lessons of premature optimization.

First, in the design phase, for optimization, we take two things into account – we pick good data structures and algorithms that fit the expected usage and we let experience be our guide (to an extent).  How would this apply to abstraction?  To me, it says that our first pass of abstraction, or that which takes place during the design phase, should attempt to model the points of inflection that exist in the domain model itself.  If the domain model actually contains the classic ‘animal’ base class, with cats and dogs being subsets, then that polymorphism certainly belongs in the code as well.  How can experience guide us?  This is where patterns come in, and a designer who’s solved similar problems before may be able to offer some insight as to what he or she expects to change in the future.  BUT, just as experience has a short shelf life in optimization, it does as well in abstraction.  Design pattern Hell is the result of searching a little too hard for these, and ultimately, experience tells us how to build the last system, not the next one.

The second point to learn from optimization is the “free optimizations”.  These include compiler optimizations, and little tips like “pass big structures by reference” and what not.  These are the idioms in the language, in addition to being the more efficient way to do things.  What “free abstractions” can we expect from our languages?  Well, first and foremost, are the abstractions the languages can get us – these are things like libraries and frameworks that are already built.  Prefer these to home-rolled solutions, as they’ve already been abstracted in many useful ways without any work on your part.  Secondly are the “best practices” and idioms that improve abstraction with very little cost to readability or understandability.  This would include popular techniques like dependency injection and preferring to pass by interface rather than concrete type.

The third point is that we should design with performance in mind, but the best performance boost is work avoided rather than work done cleverly.  In abstraction terms, this means that small, elegant solutions tend to, ironically, be more extensible than large abstract ones.  The best example of this might be the object hierarchy mentioned above, our ‘animal’ base class and ‘cat’ and ‘dog’ concrete classes.  Well, to be extensible, we might need to have a ‘multithreaded_cat’ and ‘multithreaded_dog’ class as well, or maybe even a ‘distributed_cat’ and…  But all of this avoids the original point.  We can avoid all this work in the first place by ensuring neither cat nor dog make any assumptions on the threading scheme.  That way, one hierarchy describes threading and one describes our domain and they never need meet.  We’ve avoided a whole lot of abstraction (an exponential explosion of classes) by simply designing elegantly in the first place.

Finally, to optimize, we measure and begin offering up some more fine tuned ‘clever’ code where we find our hot spots are.  Similarly, in abstract code, we must refactor at our points of repetition.  You can get some abstraction done at design time, but much of your abstraction is only going to become apparent as you code.  You should be on constant gaurd for repetitive code, as this is a sure sign that the code should be encapsulated elsewhere.  Similar to the optimizations that mostly sacrifice readability, the abstractions found purely by refactoring tend to be the ones that are most at risk for ‘leaking’, or, sacrificing understandability for a point of inflection.  But, so long as we have proof that the code is used in multiple places, it will save us work in the long run.  It is these sorts of abstractions that can only be ‘discovered’, rather than designed in, because we are so poor at predicting what code is going to repeat itself as we develop our system.

Optimization and abstraction both are important parts of performing, maintainable systems, but we poor humans have shown ourselves to be poor predictors of where the tools of optimization and abstraction best be leveraged.  But we needn’t take another 50 years to learn how to best abstract.

July 3, 2009 - Posted by | Uncategorized

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: