The Skeptical Methodologist

Software, Rants and Management

Software Engineering Is Dead

Long live Software Engineering!

The exalted Father of many of the processes we know and lovehate, Tom Demarco, recently wrote an article describing his second thoughts on many of his prescriptions from his book, Controlling Software Projects: Management, Measurement and Estimation.

Much of what he says should ring true to most of us in the trenches.  Attempting to code directly to software metrics is a fools errand.  Not only do the current methods of collecting metrics frequently have relatively high costs to true value, but they also cause a game of metric cat and mouse where software increasingly fits what metrics say ‘looks good’ but loses all qualities associated with those metrics when developers begin coding ‘to’ the metric.  It’s basically teaching to the test!

It’s refreshing to see such an intellectual giant in our field so humbly admit his faults in the past – surely many more could learn from him.  We’ve had far too many methodologies come and go while their authors seem to continue to pretend that the few good ideas encased within those methodologies are worth all the cruft that’s built up over the years.  Just like in many cases what a software project needs is a complete gut-job or rewrite, some methodologies could use the same.

Let’s not throw out the baby with the bathwater though.  I’m convinced that nearly every ‘fad’ that’s appeared in the field of software HAS had some valuable things to teach us.  It’s all about keeping the good and dismissing the bad, separating diamonds from the rough.  The Waterfall method as a whole has shown to create projects drastically over schedule and over budget, but it has shown that many of our most expensive errors come from misunderstanding requirements/use cases.  Likewise, coding to metrics in and of themselves is going to get us no where.  It’s like writing until Word’s language analyzer gives me back a high ‘reading level’ for my paper.  Word can’t analyze the content or tell whether I used the English language correctly.  But, all other things being equal, these metrics – when combined with sound human judgement, can make our jobs easier.

If I’m brought on to a failing project as a firefighter, I want to know where the flames most likely are.  I can either slowly and methodically scan over the code with my own eyes – most likely spending weeks chasing false positives or style issues (even the most disciplined of ‘real’ developer is going to confuse style with content every now and then), or I can use a host of automated metric gathering tools to give me hints on where to start.

A few object oriented metrics, like various definitions of coupling and cohesion, probably would give me a good clue on where to start refactoring for maintainability.  Some good old basic metrics like line count per function or cyclomatic complexity may give me some good hints on where I ought to start tearing apart some unreadable or stagnant code.  Simple metrics like test coverage and test count would give me an eyeball figure of how brittle I should expect this code to be.

The point is, metrics in and of themselves say nothing about software quality.  When interpreted by a skillfull developer, they make that developer that much more useful and productive.  Since, as I’ve mentioned before, Amdahl’s law says as much about software projects as it does about software itself (namely, that any particular ‘solution’ a software product is to solved has an optimal number of different ‘threads’ of production, and any developers working on the project in addition to that will actually slow it down due to communication overhead), assuming we have an optimal number of developers, the only way we can speed up projects is to actually increase the speed of each individual developer rather than adding more.

But this is the rub, isn’t it?  In fact, hasn’t this always been the rub?  No one ever said metrics were all you needed, but it was in fact a misinterpretation by PHB (pointy-haired bosses) that we could automate and more easily manage software development with these metrics.  A hammer will allow a craftsman to do a lot more work than his hands alone, but also allow someone unskilled to do a lot more damage.  While the point that software is a people problem, not an engineering problem, is beyond the scope of this blog post, perhaps DeMarco’s later focus on the people problem and now his disemphasis on the ‘engineering’ problem will help the community lurch (ever so slowly) towards really understanding how to build software.

Advertisements

July 19, 2009 Posted by | Uncategorized | Leave a comment

When do I abstract?

Premature Flexibilization is the Root of whatever evil is left” takes a look at what I’ve also heard called ‘premature abstraction’, and in today’s high level languages, probably is on par with premature optimization for which causes the most headaches in maintenance code.

It’s also a really funny title.

First, a note about culture.  I work mostly with EE’s (electrical engineers) who cut their teeth on after all these years still swear by Fortran.   I’ve mentioned the odd distrust between EE’s and computer science majors:  engineers think scientists don’t know how to get real work done, while the scientist don’t believe engineers understand any of the complexity of what they build.  Ever find code that attempts to match a string by going through each character one by one against a switch statement?  That’s an engineer at work.  Ever work with a AbstractBuilderFactoryObserver that’s templated on five types?  That’s a scientist at work.  So as not to confuse these ‘scientists’ with real scientists, I’m going to call them designers.  Engineers thrive on seeing things work, designers thrive on building them.  A designer’s work is done when she’s figured out how it should work, there’s no thrill in actually building it.  An engineer’s work is done when it’s built and he can test it.  He wants to see the capabilities of the thing when he’s done.

You can see that each of the major sins discussed so far, premature optimization and premature abstraction, also tend to be committed more by one community than the other.  Code that’s prematurely optimized tends to come from the engineer’s world, while prematurely abstracted code tends to come from the designer.  They each find different reasons to defend what they should know to be wrong.  Frequently I’ll hear an engineer friend tell me that he has to code in such and such way because he’s targeting an embedded system.  I’ll ask him for profile evidence, or try and reason with him that he’s avoiding a temporary variable that doesn’t appear to be in a tight loop.  I’ll try and say that the compiler’s optimization will catch what he’s trying to do.  No avail.  For the most part, he doesn’t trust compiler optimizations.

The designer will claim that she’s building with maintainability in mind, and that the alternative is cowboy coding.  She’s being rigorous, planning for change.  You might ask, “What are the use cases for this abstraction?” which usually yield something like “well, maybe we want to target a completely different OS.”  That’s true, maybe we DO want to target a different OS, but that’s not a change most projects take lightly.  More often than not, some business rule is going to change.  It’s an abstraction that’s going to cost developer time and energy for everyone who touches it, and most likely never have any pay off.

We all know (or we should) when to optimize.  At the design phase, we should be considering and analyzing use cases, choosing efficient data structures and algorithms.  We should draw on our experience in the field to predict certain hot spots, but not rely on this experience too much (in our rapidly changing field, experience has a pretty short shelf life).  We should be aware of free optimizations, or those that don’t sacrifice readability, when coding.  We should design with performance in mind, but realize that performance mostly comes from reducing work, rather than working more cleverly than normal.  When we finally have something built, we should stress test it to find hot spots, and then we should begin working on our ‘clever’ approaches.

But it’s not chiefly taught when to abstract.  This is because, for our go-lucky computer scientists coming out of their Java schools, abstraction is still pretty fresh in everyone’s mind.  It took us about fifty years of being burned by prematurely optimized code to realize we should code first for readability and optimize later.  It is only now, with our gigantic frameworks of frameworks and systems of systems that we’re realizing we’re creating huge  spider web messes of code.  Just look at the Eclipse framework – quite a marvel of design.  Anyone can write a plug in and it WILL work, and you CAN extend or change virtually any part of it, but I’d claim that 90% of the parts of code that CAN change DON’T in any extension.  Yet each part of code that is modifiable exacts a cost on the maintenance of the system.

So when should we abstract?  When should we decide to use polymorphism or some other means to put a ‘point of inflection’ in our code, a point where we can extend it?  I suspect we can derive some simple rules (indeed, many already have such as KISS and YAGNI) based more or less on the lessons of premature optimization.

First, in the design phase, for optimization, we take two things into account – we pick good data structures and algorithms that fit the expected usage and we let experience be our guide (to an extent).  How would this apply to abstraction?  To me, it says that our first pass of abstraction, or that which takes place during the design phase, should attempt to model the points of inflection that exist in the domain model itself.  If the domain model actually contains the classic ‘animal’ base class, with cats and dogs being subsets, then that polymorphism certainly belongs in the code as well.  How can experience guide us?  This is where patterns come in, and a designer who’s solved similar problems before may be able to offer some insight as to what he or she expects to change in the future.  BUT, just as experience has a short shelf life in optimization, it does as well in abstraction.  Design pattern Hell is the result of searching a little too hard for these, and ultimately, experience tells us how to build the last system, not the next one.

The second point to learn from optimization is the “free optimizations”.  These include compiler optimizations, and little tips like “pass big structures by reference” and what not.  These are the idioms in the language, in addition to being the more efficient way to do things.  What “free abstractions” can we expect from our languages?  Well, first and foremost, are the abstractions the languages can get us – these are things like libraries and frameworks that are already built.  Prefer these to home-rolled solutions, as they’ve already been abstracted in many useful ways without any work on your part.  Secondly are the “best practices” and idioms that improve abstraction with very little cost to readability or understandability.  This would include popular techniques like dependency injection and preferring to pass by interface rather than concrete type.

The third point is that we should design with performance in mind, but the best performance boost is work avoided rather than work done cleverly.  In abstraction terms, this means that small, elegant solutions tend to, ironically, be more extensible than large abstract ones.  The best example of this might be the object hierarchy mentioned above, our ‘animal’ base class and ‘cat’ and ‘dog’ concrete classes.  Well, to be extensible, we might need to have a ‘multithreaded_cat’ and ‘multithreaded_dog’ class as well, or maybe even a ‘distributed_cat’ and…  But all of this avoids the original point.  We can avoid all this work in the first place by ensuring neither cat nor dog make any assumptions on the threading scheme.  That way, one hierarchy describes threading and one describes our domain and they never need meet.  We’ve avoided a whole lot of abstraction (an exponential explosion of classes) by simply designing elegantly in the first place.

Finally, to optimize, we measure and begin offering up some more fine tuned ‘clever’ code where we find our hot spots are.  Similarly, in abstract code, we must refactor at our points of repetition.  You can get some abstraction done at design time, but much of your abstraction is only going to become apparent as you code.  You should be on constant gaurd for repetitive code, as this is a sure sign that the code should be encapsulated elsewhere.  Similar to the optimizations that mostly sacrifice readability, the abstractions found purely by refactoring tend to be the ones that are most at risk for ‘leaking’, or, sacrificing understandability for a point of inflection.  But, so long as we have proof that the code is used in multiple places, it will save us work in the long run.  It is these sorts of abstractions that can only be ‘discovered’, rather than designed in, because we are so poor at predicting what code is going to repeat itself as we develop our system.

Optimization and abstraction both are important parts of performing, maintainable systems, but we poor humans have shown ourselves to be poor predictors of where the tools of optimization and abstraction best be leveraged.  But we needn’t take another 50 years to learn how to best abstract.

July 3, 2009 Posted by | Uncategorized | Leave a comment