The Skeptical Methodologist

Software, Rants and Philosophy

Quality First

I’ll often find myself muttering that the code I am having to fix lacks documentation. I can’t figure out what does what – it’s overly complex. To boot, it’s probably unreadable.

I take away from this experience that I should write more readable code. That I should write documentation, and keep things from getting too complex. But this may actually be a confirmation bias causing me to make the wrong decisions, or at best, a case of correlation does not equal causation.

What happened here was something broke, and I had to go into that code and fix what broke. When I get into the broken code, I find it poorly documented, hard to read, and overly complex. Since I don’t want to feel those pains in the future, I resolve to change my behaviors so that the next time something breaks, it’s easier for me to debug.

The next time something breaks…

Instead of focusing on writing code that’s well documented, simple and readable, why don’t I instead focus first and foremost on code that doesn’t break in the first place? No one complains about the code they don’t have to fix, no matter how unreadable, complex or poorly documented it is.

Indeed, if I adopt the habits of readable code, simple designs, and thorough documentation and find my life getting better, it’s far more likely that those practices didn’t just make debugging easier, they made debugging more rare. Code built with those habits tends to be less defect prone in the first place.

The Takeaway

The amount of effort we put into our designs is not limitless, and ultimately, time spent doing one thing can often cost us time spent doing another thing. Some practices, like writing readable code, are relatively cheap whereas others, like effusive documentation, can be expensive.

It often makes more sense to spend time you might have spent documenting on making the thing you’re documenting less error prone to begin with. If it never breaks, the documents you slave over may never get read. Likewise, if a certain software technique – such as functional programming in a strictly procedural shop – gets your work flagged as unreadable, it may still be worth it due to the lower defect density.

November 22, 2015 Posted by | Uncategorized | Leave a comment

Competence vs “Leadership Qualities”

The HBR recently posted an interview with some social scientists at Erasmus and Stanford about what is more predictive of team success – leaders with actual expertise, leaders without expertise, or more democratic groups without leaders.

The outcomes are not surprising. Often teams do not actually nominate experts as leaders, and instead nominate people who are taller, louder, or male-er. Teams that *did* nominate experts tended to do better on a task. And while the expert lead teams beat democratic or leaderless teams, democratic teams beat teams who had nominated an incompetent leader.

I think the confirmation that without objective measures, people tend to pick the wrong man for the job (and it’s almost always a man they pick). Moreover, expertise does matter in leaders – authoritarian leaders who are incompetent will lead teams astray compared to those that are competent. But I think the comparison between authoritarian or traditional hierarchies and democratic teams may have some nuance to pick out that this interview doesn’t properly identify.

The key is understanding the task they asked students to accomplish. This was a traditional ‘team building’ task where teams were told they had survived an airplane crash over the sea, and had to identify items that would help them survive on a deserted island. Their choices were compared to actual experts and the teams decision making prowess were judged on how well they lined up.

The research showed that in many cases, experts with actual knowledge of survival were not always chosen by the group to lead the process. In many cases, people who simply claimed the loudest they knew the answers were chosen instead.

But more importantly, the research glossed over an important point. Why was a team deciding these options in the first place? We assume we form teams because two heads are better than one, but is that actually the case in this exercise? Indeed, given that the researchers are comparing the team’s choices to other *individual experts*, they seem to be implying this is a task that individuals could do as competently as a team, maybe even more competently. So naturally, teams that rendered teamwork negligible – i.e., teams who nominated an authoritarian expert – did little more than try their best to emulate an individual. Those teams did the best, when they nominated someone competent.

Comparing teams who’s best strategy for the task at hand is to act like an individual to teams set up explicitly to make this strategy harder (flat or democratic teams) will of course do better when someone competent is at the helm.

But what about tasks where acting like an individual is not the best strategy? What if instead of merely choosing the items for survival, teams were tasked with designing a survival kit with those items with a successful marketing plan, then write a computer program on how to teach a robot hand to wield some of the items? These open ended meta-tasks, which end up generally consisting of one or more drastically different types of sub-tasks, are much more common in the real world.

I’d wager in *these* cases, democratic leadership would be more successful. This is because for each subtask, expertise shifts from person to person. In fact, this is the really selling point of heterarchy – the expert for *this* job is often not the same as the expert for the *last* job. Indeed, I’d wager in addition that not-so-democratic teams who’s leaders took it upon themselves to nurture expertise and mediate the team’s decision making process to actively fight against cognitive biases that lead teams to generally follow the male-est of their members would do even better than the pure democratic teams.

The take away from this theory would be that yes, competence in a leader matters. But so do emotional intelligence and rationality in terms of their ability to mediate team disputes and not be pulled into the same politics as the rest of the team. And so do a focus on growth and talent management – if you need more than one expert, then a good leader coaches those experts.

November 5, 2015 Posted by | Uncategorized | Leave a comment

The 10x Myth

There’s a myth abound in software development circles, and it needs some deconstructing. It’s probably one of the best indicators of how much further the industry has to go in regards to sexism, since it’s a patently masculine myth, evoking images of great Greek Heroes slaughtering thousands of men as they move forward.

The Myth of the 10x Developer

Now, this isn’t a myth because it hasn’t been researched. There is ample amounts of research on programmer productivity, at least from the 80’s, and if it is still to be believed we should assume that there is at least some difference in  programming ability between developers.

The real myth comes from the interpretation of these results, and that’s where the testosterone-fueled neck-bearded bias comes in. There’s the results of these studies, and then there are how the studies are understood by so-called “Rock Star” developers who always assume they are one of the 10x’ers and that’s their justification for why they shouldn’t be forced to get along with anyone.


The results are thus:

There is roughly an order of magnitude difference in productivity (measured as time to get code working for a toy problem) between the best and the worst programmers, with causes unknown.

However, here’s how it’s commonly repeated. See if you can spot the difference:

There is roughly an order of magnitude difference in general productivity between the best and average programmers, and it’s due entirely to innate talent.

So, let’s take this apart one by one.

We’ve solved this problem before…

The first flaw is somewhat methodological, however I don’t think the researchers ever claimed that their toy problem measured generalized productivity, so it’s also a flaw in how the general population of brogrammers have read the result. Think of it this way, if I took a random sample of programmers and gave them a test, even if their skills were all roughly the same, what kind of result would I see? I’d see some programmers doing better than others, because they’ve solved that or a similar problem before. I can compare a person who’s never written an SMTP server to someone who has, ask them to do so, and witness a miraculous 10x or more productivity benefit to the programmer who has built it before. Imagine that!

This is similar to what you might call the halo effect. Rock stars are identified by their ability to solve their specialized problem very well – perhaps they’ve built a few Rails apps from start to finish. They’re going to be great at that. But throw them at writing a compiler and watch them flounder.

Distribution of wealth…

The second issue is the confusion of the average for the worst programmer. Let me give an example of why this might be an issue. If I reported to you that the best programmers are 10x better than the worst, that’s one bit of information, but not enough to really say anything about the average (i.e., the majority of programmers). If I then said that average programmers are 9x better than worst, now we know something about the average, and the distribution. Unfortunately, we have no idea which way the distribution of these results is skewed, at least as it’s commonly reported.

First, it’s outright false to say that the best developers are 10x better than the average, even though that’s often what’s reported. Second, we don’t know if the productivity difference at hand is due to the best being that much better than everyone else or the worst being that much worse than everyone else. The issue here is that due to all the manliness in our industry, we of course all assumed it must be the former, and not the latter. Because we’re all magically that 10x’er, and that is why everyone else is jerks.

We point to this myth over and over again to justify why we don’t get along with others. It’s nearly always used to justify mistreatment of our colleagues – they’re all idiots, I’m brilliant, and that’s why I shouldn’t change and they should. I’m a 10x programmer. But clearly, the best you can claim if the distribution has a fat tail to the right (i.e., the worst are much worse than the average) is that you’re only a little bit more productive than the average, and dogonit, you ought to pay attention more to what your colleagues say because they’re all nice people and would it kill you to shower?

Cause and Effect…

The last issue that the studies say nothing about, at least as they’re repeated in their mythological form, is why? There’s an implied why, an implied cause for this difference, but it’s hardly ever stated. These genius gods-among-men programmers are so productive because they wield the magic of Zeus. What they do cannot be replicated, repeated, or taught to anyone. They’re entirely packaged up, not able to be distributed or copied. If you want the 10x programmer, you have to accept his ego, his arrogance, his complete lack of communication or emotional skills, and his tendency to shit all over everyone else. And yes, it’s almost always a he.

The issue here is that we have no idea what makes the 10x best programmers more productive than the worst programmers. Sheer numbers of years of experience don’t seem to play a role, but is that because there’s far too many enterprises where we can disappear and never have to code again? A year at a large enterprise curating UML documents is not the same as a year getting your own Rails site up for customers to use. What tools or techniques did these 10x programmers use that the worst ones didn’t? Were they more skilled with the debugger? Did they adopt more structured coding conventions (this was in the era of structured code)? Did they test their code any different?

There are many questions we can ask, and non of the productivity myths answer. It is almost always supposed to be magic, always supposed to be something innate to the neck beard itself that grants the brogrammer who dons it the powers to develop and deploy only the best code, and disregard everyone else’s opinion who may have something to add (or learn!).


There are 10x programmers – or at least there were, in the 80’s – who were ten times, roughly, more productive than the worst programmers. But given how bad some programmers can be, it’s probably safer to say that you should ensure you don’t hire (or at least train) the worst of your crew rather than try to always hire the best. Average programmers are, on average, pretty good in my experience. That informs me that the tail is fat to the right, not to the left. Moreover, invest in methods and tools that are shown to increase productivity: iterative methods, testing and peer review, static analysis tools, and training in your methods of source control and deployment. It isn’t magic – there is a way to turn average to great, and we can figure out what that way is if we use the methods of SCIENCE!

Finally, don’t fall for this machoismo myth that there are the great men and then there is everyone else. It keeps far too many ‘good’ average developers who don’t fit our implied mold of the great programmer – apparently an asshole white guy – out of organizations that sorely need them.

August 26, 2015 Posted by | Uncategorized | Leave a comment

When Counting from 100 to 1, Interview Candidates will do Precisely as Well as You Think They Should

Can you write a program that prints 100 to 1? Apparently, some are claiming such a program can be as valuable as Fizz Buzz in determining the value of interview candidates. Some people can’t solve this incredibly simple problem…

Wait, bait and switch time. I only told you about the easy part of the problem, not the hard part – now that you’ve already clicked on my article, I’ll go ahead and fill you in the ‘tricky’ constraint that any solution you have must start with:

for(int i = 0; …

This isn’t a programming challenge, it is now a brain teaser. Why? Because you’ve taken away the obvious answer and for no really good reason added an additional constraint. Brain teasers aren’t bad, they’re just tests of insight, not expertise. And insight is notoriously difficult to reach when you’re under pressure in an interview.

The main issue I have with this line of thinking isn’t the reemergence of brain teasers, it’s the author’s implication that programmers need to knuckle down on the hard practice of programming and put their egos aside. It seems far more likely the case that the author needs to knuckle down on the hard practice of Industrial Psychology and put his or her ego (I couldn’t gender check since the page was failing to load due to traffic) aside.

Despite the warnings that 22 is not a large enough sample size to get any significant result out of, the author goes ahead and does it anyway. If the rest of their book is written with such rigor and you’re interested, I advise you to buy my own book I put together in a few weeks after learning the graph function in Excel.

But the ‘hard’ statistics isn’t even the worst part of drawing conclusions from this ‘study’ – the ‘soft’ part is where the author utterly failed.

One data point that I obtained for the book (but didn’t quite include in the book because it was too programmer centric) was based on 22 job interviews for programming positions I conducted for one of my clients over a period of two months.

The author claims two questions were asked to test the hypothesis of whether or not what they very scientifically call ‘whining’ can predict what they’re claiming to be programming ability. Did you see the flaw?

Unless I’m reading the blog wrong – and I could be – the author him or herself asked both questions, with hypothesis in mind, most likely in the order implied: whining then programming ability. This removes what, you know, experts in statistics and survey design would call ‘blinding’. It means the author’s own implicit bias going into each interview could possibly skew the result. To sum up, the author could very well have badgered every candidate during the programming test that whined more than a few minutes or they could have stayed silent. With the study designed as it was, we wouldn’t know the difference.

What’s a much better conclusion from this statistically insignificant result? Candidates are going to do precisely as well as you think they ought to. Specifically, they’ll do exactly as well as you want them to on brain teaser type problems that require insight. This is why you need structured, repeatable tests that measure insofar as possible expertise, not insight. Insight is important, but practically impossible to measure under the pressure of an interview when the candidate is going to be analyzing your every subconscious twitch to see whether they’re getting the job or not.

April 10, 2015 Posted by | Uncategorized | Leave a comment

Engagement Versus Alignment

Engagement, or how interested employees are in their company, and alignment, or how well the efforts of disparate employees and teams are aligned with a single goal, are both important cultural metrics to track.

An unengaged work force is less productive, has higher turnover, and is a much less fun place to work. A misaligned work force can quickly become disengaged as it is at higher risk of infighting, wasted efforts, and missed opportunities for team synergies.

“Don’t work on that, work on this.”

An issue arises, though, when decisions can increase engagement at the cost of alignment, or visa versa. For instance, a micro managing leader may second guess their subordinates ideas or work. This is an attempt to improve alignment in some cases, as the lead does not necessarily agree that the work is in alignment with the teams goals.

Obviously an employee is going to be more engaged when she’s working on her idea and she believes her idea will help the company. So this micromanagement may have increased alignment but at the cost of engagement. In some cases towards overall organizational productivity, this may end up being a wash, as the rise in team productivity due to increased alignment may come at the cost of individual productivity due to lowered engagement.

But if we take a step back, there’s a third idea we’re not taking into consideration, and that is correctness. That is to say, alignment is the measure of how well the team is focused on the same goal. Correctness is some measure of how well that tactical goal achieves the overall strategic goal of sustainable profits for the company.

In many cases of micromanagement, the lead believes he better understands what’s wrong and what needs to be fixed. He believes he better understands the correct course of action, and thus the job becomes getting his subordinates to focus on that course of action (alignment) with all their potential (engagement). But what if this assumption is false?

Emergent Correctness

In the creative economy, knowledge often is much more highly distributed among the company than in more top-down organizations. At the assembly line, the foremen often has much more experience and often more education than the front line worker, thus the foreman supervising the front line worker in terms of correctness makes sense. But in the software startup, the front line worker often knows more about how any particular piece is architected, what new technologies solve lingering problems, and what problems they actually face. The foreman or tech leads role is to focus instead on the two remaining metrics, engagement and alignment. The correct goal emerges from a organization that is highly engageged and highly designed.

What are birds? We just don’t know

For example, take a flock of birds. No one bird, or set of birds, is in control of the flock. The flock itself, though, looks incredibly organized – both aligned and engaged. The flock behaves correctly, in this case, it flies south for the winter or moves towards food, due to the shared burdens on all birds in the flock.

Management’s duty in this emergently correct culture then becomes ensuring that lines of communication between each front line worker are open, to help ideas become shared and implicitly voted on by what interests people more. This might include removing organizational barriers such as one lower level employee not feeling comfortable talking to a higher level one, or emotional barriers if employees don’t naturally get along, or political barriers if employees start removing lines of communication to protect their own feifdoms.

Their duties also are to increase forms of engagement important in these emergently correct cultures – engagement in the company as a team, identification with the company as a team, and excitement about the future. This includes letting ideas that might just be more interesting than immidiately applicable fly for awhile, since the costs of shutting them down early are just too great.

Emergence isn’t perfect

The main argument against emergent correctness of decisions is that it is rarely perfectly correct. Often, indeed, in hindsight we can see exactly where the company made mistakes. We think this becomes, in turn, an argument for stronger hierarchical control. Indeed, this appears to be why over time and with size, companies become more and more hierarchical. Turning over control to the experts on the front always sounds good in theory, but we know they will make mistakes. We just don’t know, in advance, what those mistakes will be yet. It is said that it is often better to fail traditionally than to succeed nontraditionally. This tendency drives control freaks to argue – not only with each failure, but with each success – that they again be given more hierarchical control to ‘prevent the mistakes’ we just made.

This is an organizational fallacy. As we never seem to consider the opportunity cost of increased hierarchical control. In emergent organizations, the trade off between alignment and engagement never occurs. The addition of hierarchical control is the de facto addition of this trade off. It is, quite literally, the argument (using perfect hindsight information as evidence) that if we had given up engagement in some key areas and gained alignment, we would have been done faster or with higher quality. The core fallacy here is that you can never see those opportunities in front of you, only behind you. So it is never worth looking at those trade offs.

Engagement and alignment are both important for an organization. In many cases, it appears there is a trade off to improve overall correctness of our actions. But this is almost always a fools errand – we can only identify these actions in hindsight. Emergent control isn’t perfect, but this can’t be an argument for inferior forms of control in an organization where the front line, on average, really does know what is best.

January 10, 2015 Posted by | Uncategorized | Leave a comment


Get every new post delivered to your Inbox.