The Skeptical Methodologist

Software, Rants and Management

SYWTLTC: Novice Chapter 2: Effective Hacking

Go here if you want the prolog and table of contents to the SYWTLTC series!

The Novice section of SYWTLTC is intentionally pretty sparse – Chapter 1 gives you all the tools you need to get started in Code Combat .

However, there are often meta-lessons to be learned even as early as Code Combat. We’ll go over one of those today.

Hacking?

So I’m using this term loosely. But often, we hear the term ‘hack’ in the context of programming meaning when someone doesn’t fully understand what they’re doing and they’re just trying to ‘make it work’.

The usual work flow here is to make a sometimes not-so-educated guess about what might be wrong, change that thing, then run your program and see if it works.

Senior coders tend to have many tools in their toolbelt, however, we never completely drop hacking as a means to understand things. There will always be time when you will have code you inherit, a library you don’t understand, or even code you wrote yourself that you no longer remember how it works – there will be times like these that all you can do is ‘fiddle with it’ until it does what you want it to do.

Still, there are tips for more rigorous hacking

1. Scientific Hacking

Change one variable at a time

I don’t mean actual variables in a program, although that may be the case as well. What I mean here that’s scientifically inspired is that we try to isolate only one ‘theory’ of why it’s not working at a time.

If it may be X, Y or Z, you don’t change X, Y or Z all at the same time. Change one and see if it worked, back that change out, change the next and see if it worked, and so on.

This may feel like you’re going slower, but you’re actually going faster. This is because your ability to mentally understand what’s changing in the system goes out the window after a certain (very small) level of complexity. So you may be able to “change all the things!” once or twice on toy programs you’re working on, and things will appear to work.

But in larger programs, many bad things happen when you do this.

  1. Your program can suddenly appear to work. But it’s all in appearances.
  2. You may fix your thing and break something else.
  3. You may not even fix your thing, break something else, and not understand what you changed well enough to unbreak it.

Three is usually the most common.

There is actually an advanced way to change “all the things” though, and you’ll need to combine it with tip 7 at the bottom – always leave breadcrumbs (i.e., lots of git commits for every change you can back out, or comments in code that you can back out, ways to easily undo what you’ve done.)

Backwards Science

This is to basically do science in reverse – change all the things. Then run your program – does it work? If so, undo half of the things. Does it still work? Then you know it wasn’t that half. Undo half of what’s remaining – does it still work? If it doesn’t, you’ll want to turn those fixes back on and turn the other half off.

This is akin to a the ‘binary search’ algorithm which you may become more familiar with later. And it’s a good compliment to the traditional ‘turn one on, leave all off’ technique described above. This is because some issues may be interactions of multiple fixes, i.e., you may need to make more than one fix to the code to get things in ship-shape. The turn-it-all-on and then binary search downwards can find this easier than the turn-one-on-at-a-time approach. The turn-one-on-at-a-time approach, though, usually is faster since it requires less work to set up and back out.

Keep a Journal

You can do this in a documentation tool, in comments in the code, or just in a paper spiral at your desk. Often it’s good to write down what you’re doing, and what the results were, again in a scientific manner. Each change you make to the code is a little ‘experiment’, and you need to write down what you did and what the results were for each experiment.

This helps with number 7 below – keeping a journal complements other techniques of ‘backing things out’. It also prevents trying the same experiment twice – which may happen if you’re struggling with a bug for months at  a time. When you start forgetting what you’ve already tried, that’s when you truly begin to spin in place and become completely unproductive.

Finally, a journal can help with hypothesis-generation. As I stated above, each fix is an experiment. Your minds ability to come up with a hypothesis for any given event is nearly infinite (given enough time). But you’ll come up with better hypothesis the more information you have.

A hypothesis is valuable insofar as it explains the given data. Your initial bug is one data point – the program currently does X when it should do Y. Many hypothesis can fit this, and your job is to methodically step through them one by one until you find the one that’s correct.

However, each time you do an experiment, you narrow the solution space. If your program prints “Hello WOrld!” when it should print “Hello World!” and you perform an experiment to lowercase all O’s in the program and it fails… your real problem just got constrained. Now your problem is:

  1. Program prints “Hello WOrld!” when it should print “Hello World!” AND
  2. When lower casing all O’s at line 13, the program continued to malfunction.

A journal helps keep these thoughts all in order and allows each of your experiments to gather more data.

2. 90% of Programming is Knowing What to Google

Most of coding is research.

cfxphz8w4aa711p

But what to google and what sites to go to first is something you learn over time. This series will have a particular module dedicated to research, but until then, understand that if you have to search for something on the internet, that doesn’t mean you aren’t coding right.

Most of coding is googling for APIs, code snippets, blog opinions about tool X versus tool Y, and looking for others who have had your same problem and fixed it.

3. Don’t Grind

There’s a lot of times when you’ll be struggling with making your program work and you’ll chose to … struggle more.

Bayesian reasoning is a kind of statistical reasoning that says “What should we expect given what we’ve seen?” It says: take all the data into account, including new data, and what should we expect in data going forward?

In other words, given that you’ve already struggled for 3 days with this bug, how likely are you going to solve it by struggling for 3 more days?

Not very likely.

This is called grinding. And it’s a technique that may leave you with the answer, after maybe thirty more days, or may drive you to completely change your result (which is bad – if you wanted to design it in a certain way, it’s probably because that certain way was good. Changing to another way means you’re sacrificing quality because you couldn’t make it work.)

Or it may leave you quitting your job. I’ve seen all three happen.

When you find yourself grinding, your hypothesis generating engine slows down, and you have trouble coming up with new ideas for why your bug is occurring. You either rehash old ones – which is a waste of time if you’ve kept a journal – or you come up with increasingly bizarre theories on why your program may not be working, which isn’t the best use of your time.

The best thing to do when you realize you’re grinding is give up and work on something else. Your subconscious will be busy grinding away at the problem for you, and you’ll be greeted with an especially good idea right when you’re falling asleep, or when you’re showering, or otherwise occupied. These are the insightful ideas that have lots of promise, whereas the bizzare ideas you come up with staring at the code are almost always bad.

Walk away, play a game, read a book, talk about your problem with someone else, or talk about anything but your problem with someone else. Insight will strike.

4. Play

When you’re coding, you’re not always stuck on something. Sometimes, things are going just fine, swimmingly actually. This is when you should try to make your own problems to get stuck on.

If you’re trying some tutorial and you can get a button to show up on your screen where you want it – what happens when you move it? What happens when you set certain things to negative numbers? What happens when you try and push it off the screen?

These are experiments, like the above, but rather than experiments trying to prove or disprove a theory about how something is causing a failure, they’re still adding data to “how buttons work” or “how strings work” or just about anything else. They’re a form of play – exploration for its own sake – and they’re incredibly valuable forms of “hacking”.

Again, as with tip 7, leave yourself a way to back out. But rather than trying things to fix your program, you’re more or less trying ways to break it – or maybe not. You’re just trying things on a completely fine program, and checking with what you think will happen with what actually does.

Along with tip 5 below, playing is the best way to get the most out of something you’ve already done – if you already implemented some widget, what are a few ways to change it that you don’t know what they do? That ensures you get the most out of every project and exercise.

5. Make it Work; Then, Make it Pretty

Before we get further into this tip, let me make one thing clear –

You are not done with your code until it works and is pretty.

There’s nothing more demoralizing than sitting in a peer review with some recalcitrant coder who refuses to change what they have done because “it works, doesn’t it?”.

Working code is the bare minimum of what you’re expected to produce.

However, when trying to prioritize what to do first – getting things working is often the hardest part. Finding one solution to your problem is hard – there’s an infinite variety of solutions, but a much larger infinity of non-solutions. It’s ‘sparse’.

However, once you do have a working solution, it’s usually far easier to make slow incremental changes to that working solution to make it more pretty.

What I’m not saying here is that you should code a large project together with no regard to making things readable, and then do it later as an after thought. What I am saying is that sometimes, you’ll get stuck – it’s these times when it’s okay to get some sawdust in various places, so long as you can follow tip 6 below and keep it isolated to a certain area.

The fact is, writing a test for each solution is going to be cumbersome if 99 of your potential solutions don’t work and the 100th does. Sometimes you get the benefit of a single test telling you whether or not your solution works at all – this is when you’re lucky. But when you’re designing a new feature and you don’t know how it should work yet – you want to play in the design space and see what feels right – letting things get slightly dirty in isolated parts of code is fine, so long as you follow through and get them cleaned up before any peer review.

6. The Surgical Curtain

In surgery, surgeons often lay down cloth around the incision site to block out everything except the area that they’re going to be working with. This is to more or less shrink the problem size and focus all attenion only on the surgical area.

Similarly, when trying to ‘hack’, you want to shrink the problem by as much as possible, and only work on the area that is problematic.

Remember in scientific hacking, we talked about ‘reverse science’, where you change everything to see if your issue is still there?

There’s a similar technique to shrink the problem space, where you try and turn off (by removing or commeting out code) large swaths at a time and seeing if the problem is still there. As you turn things off and the problem remains, that means you can be confident (not sure, but confident) that your problem is not in that area of code.

Often you can shrink things down into a small toy program where your problem lies, and it becomes much easier and faster to try different experiments out on it.

This is one benefit of well factored / well designed code, it’s usually easy to isolate parts of the code and write small ‘unit’ tests around where your issue lies, rather than having to run your entire program to see if it works or not. The curtain is easy to lay down in well designed code.

If your stuck, and there are lines you can comment out while not affecting your problem, do so – this reduces changes of accidentally breaking other things, introducing interactions, and keeping the problem small enough that you can keep it all in your head.

7. Leave, and use, Breadcrumbs

Finally, leave yourself a way out.

Hacking can often mean many changes to your code – if you’re making them methodically as illustrated in tip 1, you also need a methodical way of backing them all out. This is what source control is often used for – try an experiment and commit it to the repo. If that experiment doesn’t fix your problem, roll back the commit and your code will be as it was before you did anything.

Often, even with the best rigor, we find the code base to be an unintelligable mess after some hacking around. It’s best at some times to start all the way over, and leaving yourself breadcrumbs allows you to do that.

You really don’t want to find yourself trying to fix a problem where you have a code base that is so heavily hacked that it’s unrecognizable compared to how you found it. It means that you’ll basically have to debug your way out, which is never fun.

Get in, change only what you need to in a methodical fashion, and get out, leaving the code as clean as possible.

Leaving breadcrumbs like git commits that are very granular also allows you to easily back out scaffolding code like print statements and other things that help you debug.

Finally, backing out fixes that don’t work is incredibly important. If we write for readability first, and performance second (which you should), then you should assume that the code base is as readable as it can be. Any change you make that’s not a refactoring to make it more so must by default make it less so. In other words, any change you make that’s not explicitly made to improve readablility is most likely harming it. No change should be left in that doesn’t do something – like fix a bug. If it doesn’t fix your bug, you need to take it out.

There are often thousands of ways to code something. Your fix may not have fixed your original problem, but it may not have also introduced new ones, meaning you could potentially leave it in and the code base would work as it always did. Don’t do this – you’ve harmed readability by letting in code that had no reason to be there. Back that code out and start anew on a new experiment.

Conclusion

Hopefully these tips and early insights into how coders code are good to have. I know a few things like knowing that people spend most of their time debugging and googling have helped people feel like they aren’t utter failures when they’re working through code combat.

It’s okay to hack, it’s okay to research.

But it’s also good to practice hacking and researching the right way, so that you can speed yourself up and be ready for some more robust tools to put in your toolchest.

November 1, 2016 - Posted by | Uncategorized

1 Comment »

  1. […] SYWTLTC: Novice Chapter 2: Effective Hacking […]

    Pingback by So You Want To Learn to Code: Prologue « The Skeptical Methodologist | November 14, 2016 | Reply


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: