The Skeptical Methodologist

Software, Rants and Philosophy

Jakob Heuser is a no hire

Jakob Heuser is refusing to do the code challenge, and because of that, he’s a no hire.

We use code challenges at my job because measuring technical ability is notoriously difficult, and an at-home code challenge is by far the best way to measure it.

The take-home aspect gives candidates the time and breathing room they need to think through the problem, a rough downside of in-person code challenges. Additionally, the extra time means the challenge’s difficulty can be ramped up. Seeing whether or not you can code Fizz Buzz isn’t going to be as good of a judge of ability as actually taking a small feature from requirements to testing.

Most of the people who’ve refused our code challenge have been candidates that we were already on the fence about technically. I’ve always taken a refusal to do a code challenge as a sign that the candidate was uncomfortable with its difficulty, and a self selection out.

Additionally, many others who aren’t necessarily looking but will avoid jobs that have code challenges as ‘beneath them’ are typically people who haven’t been able to develop for years and are afraid of being found out. Thus, in addition to the challenge helping us judge technical ability, it also helps us screen out developers so arrogant as to believe that position on our team is their birthright and shouldn’t be verified by any objective tools.

While Jakob worries a code challenge might cause strong candidates to walk away to other companies who don’t offer them, that has rarely been the case here. We have had only one candidate walk away who we believed was strong but didn’t complete the challenge. In this case, better selling of the company up front usually can keep most developers interested enough in the company to complete the challenge.

What about other approaches like Github repos? Jakob seems to argue that code challenges require too much time from candidates, time they’d rather spend on their hobbies or families, but then claims candidates should be graded on their open source contributions? Who has time to contribute to open source but can’t find a few hours to work through a code challenge?

Finally, what about candidates who game the system, especially for larger companies, and get an immaculate downloaded version of the challenge? Plagiarism is incredibly easy to detect these days – try googling a few lines from the candidate’s solution and see if stack overflow comes up. Moreover, try a live code review and see how well the candidate can describe their solution – but you really should have been doing that anyway.

To sum up, code challenges are probably the best tool we have in finding and hiring great technical talent, but if you aren’t that technically talented – or think having to produce actual proof of your skills is beneath you – by all means, go apply at LinkedIn.

December 7, 2015 Posted by | Uncategorized | Leave a comment

Quality First

I’ll often find myself muttering that the code I am having to fix lacks documentation. I can’t figure out what does what – it’s overly complex. To boot, it’s probably unreadable.

I take away from this experience that I should write more readable code. That I should write documentation, and keep things from getting too complex. But this may actually be a confirmation bias causing me to make the wrong decisions, or at best, a case of correlation does not equal causation.

What happened here was something broke, and I had to go into that code and fix what broke. When I get into the broken code, I find it poorly documented, hard to read, and overly complex. Since I don’t want to feel those pains in the future, I resolve to change my behaviors so that the next time something breaks, it’s easier for me to debug.

The next time something breaks…

Instead of focusing on writing code that’s well documented, simple and readable, why don’t I instead focus first and foremost on code that doesn’t break in the first place? No one complains about the code they don’t have to fix, no matter how unreadable, complex or poorly documented it is.

Indeed, if I adopt the habits of readable code, simple designs, and thorough documentation and find my life getting better, it’s far more likely that those practices didn’t just make debugging easier, they made debugging more rare. Code built with those habits tends to be less defect prone in the first place.

The Takeaway

The amount of effort we put into our designs is not limitless, and ultimately, time spent doing one thing can often cost us time spent doing another thing. Some practices, like writing readable code, are relatively cheap whereas others, like effusive documentation, can be expensive.

It often makes more sense to spend time you might have spent documenting on making the thing you’re documenting less error prone to begin with. If it never breaks, the documents you slave over may never get read. Likewise, if a certain software technique – such as functional programming in a strictly procedural shop – gets your work flagged as unreadable, it may still be worth it due to the lower defect density.

November 22, 2015 Posted by | Uncategorized | Leave a comment

Competence vs “Leadership Qualities”

The HBR recently posted an interview with some social scientists at Erasmus and Stanford about what is more predictive of team success – leaders with actual expertise, leaders without expertise, or more democratic groups without leaders.

The outcomes are not surprising. Often teams do not actually nominate experts as leaders, and instead nominate people who are taller, louder, or male-er. Teams that *did* nominate experts tended to do better on a task. And while the expert lead teams beat democratic or leaderless teams, democratic teams beat teams who had nominated an incompetent leader.

I think the confirmation that without objective measures, people tend to pick the wrong man for the job (and it’s almost always a man they pick). Moreover, expertise does matter in leaders – authoritarian leaders who are incompetent will lead teams astray compared to those that are competent. But I think the comparison between authoritarian or traditional hierarchies and democratic teams may have some nuance to pick out that this interview doesn’t properly identify.

The key is understanding the task they asked students to accomplish. This was a traditional ‘team building’ task where teams were told they had survived an airplane crash over the sea, and had to identify items that would help them survive on a deserted island. Their choices were compared to actual experts and the teams decision making prowess were judged on how well they lined up.

The research showed that in many cases, experts with actual knowledge of survival were not always chosen by the group to lead the process. In many cases, people who simply claimed the loudest they knew the answers were chosen instead.

But more importantly, the research glossed over an important point. Why was a team deciding these options in the first place? We assume we form teams because two heads are better than one, but is that actually the case in this exercise? Indeed, given that the researchers are comparing the team’s choices to other *individual experts*, they seem to be implying this is a task that individuals could do as competently as a team, maybe even more competently. So naturally, teams that rendered teamwork negligible – i.e., teams who nominated an authoritarian expert – did little more than try their best to emulate an individual. Those teams did the best, when they nominated someone competent.

Comparing teams who’s best strategy for the task at hand is to act like an individual to teams set up explicitly to make this strategy harder (flat or democratic teams) will of course do better when someone competent is at the helm.

But what about tasks where acting like an individual is not the best strategy? What if instead of merely choosing the items for survival, teams were tasked with designing a survival kit with those items with a successful marketing plan, then write a computer program on how to teach a robot hand to wield some of the items? These open ended meta-tasks, which end up generally consisting of one or more drastically different types of sub-tasks, are much more common in the real world.

I’d wager in *these* cases, democratic leadership would be more successful. This is because for each subtask, expertise shifts from person to person. In fact, this is the really selling point of heterarchy – the expert for *this* job is often not the same as the expert for the *last* job. Indeed, I’d wager in addition that not-so-democratic teams who’s leaders took it upon themselves to nurture expertise and mediate the team’s decision making process to actively fight against cognitive biases that lead teams to generally follow the male-est of their members would do even better than the pure democratic teams.

The take away from this theory would be that yes, competence in a leader matters. But so do emotional intelligence and rationality in terms of their ability to mediate team disputes and not be pulled into the same politics as the rest of the team. And so do a focus on growth and talent management – if you need more than one expert, then a good leader coaches those experts.

November 5, 2015 Posted by | Uncategorized | Leave a comment

The 10x Myth

There’s a myth abound in software development circles, and it needs some deconstructing. It’s probably one of the best indicators of how much further the industry has to go in regards to sexism, since it’s a patently masculine myth, evoking images of great Greek Heroes slaughtering thousands of men as they move forward.

The Myth of the 10x Developer

Now, this isn’t a myth because it hasn’t been researched. There is ample amounts of research on programmer productivity, at least from the 80’s, and if it is still to be believed we should assume that there is at least some difference in  programming ability between developers.

The real myth comes from the interpretation of these results, and that’s where the testosterone-fueled neck-bearded bias comes in. There’s the results of these studies, and then there are how the studies are understood by so-called “Rock Star” developers who always assume they are one of the 10x’ers and that’s their justification for why they shouldn’t be forced to get along with anyone.


The results are thus:

There is roughly an order of magnitude difference in productivity (measured as time to get code working for a toy problem) between the best and the worst programmers, with causes unknown.

However, here’s how it’s commonly repeated. See if you can spot the difference:

There is roughly an order of magnitude difference in general productivity between the best and average programmers, and it’s due entirely to innate talent.

So, let’s take this apart one by one.

We’ve solved this problem before…

The first flaw is somewhat methodological, however I don’t think the researchers ever claimed that their toy problem measured generalized productivity, so it’s also a flaw in how the general population of brogrammers have read the result. Think of it this way, if I took a random sample of programmers and gave them a test, even if their skills were all roughly the same, what kind of result would I see? I’d see some programmers doing better than others, because they’ve solved that or a similar problem before. I can compare a person who’s never written an SMTP server to someone who has, ask them to do so, and witness a miraculous 10x or more productivity benefit to the programmer who has built it before. Imagine that!

This is similar to what you might call the halo effect. Rock stars are identified by their ability to solve their specialized problem very well – perhaps they’ve built a few Rails apps from start to finish. They’re going to be great at that. But throw them at writing a compiler and watch them flounder.

Distribution of wealth…

The second issue is the confusion of the average for the worst programmer. Let me give an example of why this might be an issue. If I reported to you that the best programmers are 10x better than the worst, that’s one bit of information, but not enough to really say anything about the average (i.e., the majority of programmers). If I then said that average programmers are 9x better than worst, now we know something about the average, and the distribution. Unfortunately, we have no idea which way the distribution of these results is skewed, at least as it’s commonly reported.

First, it’s outright false to say that the best developers are 10x better than the average, even though that’s often what’s reported. Second, we don’t know if the productivity difference at hand is due to the best being that much better than everyone else or the worst being that much worse than everyone else. The issue here is that due to all the manliness in our industry, we of course all assumed it must be the former, and not the latter. Because we’re all magically that 10x’er, and that is why everyone else is jerks.

We point to this myth over and over again to justify why we don’t get along with others. It’s nearly always used to justify mistreatment of our colleagues – they’re all idiots, I’m brilliant, and that’s why I shouldn’t change and they should. I’m a 10x programmer. But clearly, the best you can claim if the distribution has a fat tail to the right (i.e., the worst are much worse than the average) is that you’re only a little bit more productive than the average, and dogonit, you ought to pay attention more to what your colleagues say because they’re all nice people and would it kill you to shower?

Cause and Effect…

The last issue that the studies say nothing about, at least as they’re repeated in their mythological form, is why? There’s an implied why, an implied cause for this difference, but it’s hardly ever stated. These genius gods-among-men programmers are so productive because they wield the magic of Zeus. What they do cannot be replicated, repeated, or taught to anyone. They’re entirely packaged up, not able to be distributed or copied. If you want the 10x programmer, you have to accept his ego, his arrogance, his complete lack of communication or emotional skills, and his tendency to shit all over everyone else. And yes, it’s almost always a he.

The issue here is that we have no idea what makes the 10x best programmers more productive than the worst programmers. Sheer numbers of years of experience don’t seem to play a role, but is that because there’s far too many enterprises where we can disappear and never have to code again? A year at a large enterprise curating UML documents is not the same as a year getting your own Rails site up for customers to use. What tools or techniques did these 10x programmers use that the worst ones didn’t? Were they more skilled with the debugger? Did they adopt more structured coding conventions (this was in the era of structured code)? Did they test their code any different?

There are many questions we can ask, and non of the productivity myths answer. It is almost always supposed to be magic, always supposed to be something innate to the neck beard itself that grants the brogrammer who dons it the powers to develop and deploy only the best code, and disregard everyone else’s opinion who may have something to add (or learn!).


There are 10x programmers – or at least there were, in the 80’s – who were ten times, roughly, more productive than the worst programmers. But given how bad some programmers can be, it’s probably safer to say that you should ensure you don’t hire (or at least train) the worst of your crew rather than try to always hire the best. Average programmers are, on average, pretty good in my experience. That informs me that the tail is fat to the right, not to the left. Moreover, invest in methods and tools that are shown to increase productivity: iterative methods, testing and peer review, static analysis tools, and training in your methods of source control and deployment. It isn’t magic – there is a way to turn average to great, and we can figure out what that way is if we use the methods of SCIENCE!

Finally, don’t fall for this machoismo myth that there are the great men and then there is everyone else. It keeps far too many ‘good’ average developers who don’t fit our implied mold of the great programmer – apparently an asshole white guy – out of organizations that sorely need them.

August 26, 2015 Posted by | Uncategorized | Leave a comment

When Counting from 100 to 1, Interview Candidates will do Precisely as Well as You Think They Should

Can you write a program that prints 100 to 1? Apparently, some are claiming such a program can be as valuable as Fizz Buzz in determining the value of interview candidates. Some people can’t solve this incredibly simple problem…

Wait, bait and switch time. I only told you about the easy part of the problem, not the hard part – now that you’ve already clicked on my article, I’ll go ahead and fill you in the ‘tricky’ constraint that any solution you have must start with:

for(int i = 0; …

This isn’t a programming challenge, it is now a brain teaser. Why? Because you’ve taken away the obvious answer and for no really good reason added an additional constraint. Brain teasers aren’t bad, they’re just tests of insight, not expertise. And insight is notoriously difficult to reach when you’re under pressure in an interview.

The main issue I have with this line of thinking isn’t the reemergence of brain teasers, it’s the author’s implication that programmers need to knuckle down on the hard practice of programming and put their egos aside. It seems far more likely the case that the author needs to knuckle down on the hard practice of Industrial Psychology and put his or her ego (I couldn’t gender check since the page was failing to load due to traffic) aside.

Despite the warnings that 22 is not a large enough sample size to get any significant result out of, the author goes ahead and does it anyway. If the rest of their book is written with such rigor and you’re interested, I advise you to buy my own book I put together in a few weeks after learning the graph function in Excel.

But the ‘hard’ statistics isn’t even the worst part of drawing conclusions from this ‘study’ – the ‘soft’ part is where the author utterly failed.

One data point that I obtained for the book (but didn’t quite include in the book because it was too programmer centric) was based on 22 job interviews for programming positions I conducted for one of my clients over a period of two months.

The author claims two questions were asked to test the hypothesis of whether or not what they very scientifically call ‘whining’ can predict what they’re claiming to be programming ability. Did you see the flaw?

Unless I’m reading the blog wrong – and I could be – the author him or herself asked both questions, with hypothesis in mind, most likely in the order implied: whining then programming ability. This removes what, you know, experts in statistics and survey design would call ‘blinding’. It means the author’s own implicit bias going into each interview could possibly skew the result. To sum up, the author could very well have badgered every candidate during the programming test that whined more than a few minutes or they could have stayed silent. With the study designed as it was, we wouldn’t know the difference.

What’s a much better conclusion from this statistically insignificant result? Candidates are going to do precisely as well as you think they ought to. Specifically, they’ll do exactly as well as you want them to on brain teaser type problems that require insight. This is why you need structured, repeatable tests that measure insofar as possible expertise, not insight. Insight is important, but practically impossible to measure under the pressure of an interview when the candidate is going to be analyzing your every subconscious twitch to see whether they’re getting the job or not.

April 10, 2015 Posted by | Uncategorized | Leave a comment


Get every new post delivered to your Inbox.