The Skeptical Methodologist

Software, Rants and Management

SYWTLTC: (AB) Chapter 3.3: Assertions

“There are no Black Swans”

How do you ensure the above in a computer program?

Right now, we have two methods to ensure certain quality prepositions, tests: and linting. Tests, if you recall, set up a scenario and check that things work exactly like you expect. To ensure that there are no black swans in our program, our tests would end up looking like…

def test_scenario_1():
    go_to_north_america()
    for swan in get_all_swans():
        assert swan.color != "black"

def test_scenario_2():
    go_to_south_america()
    for swan in get_all_swans():
        assert swan.color != "black"

Testing this way can become onerous. You have to think of all possible situations (or places) where black swans may occur. These tests can also take a long time to run. And what do you get in the end? Well, it turns out there was an enclave of black swans living on the dark side of the Moon that you didn’t anticipate. For all that testing, you didn’t ensure there were no black swans.

What about linting? Linting, unlike testing, can guarantee the absence of a certain subset of errors in our code. But these are largely stylistic errors – linting cannot understand our code’s logic, it only processes it as text.

Syntax vs Logic Errors

There are two kinds of errors we might run into while programming – syntax errors and logic errors. Syntax errors are errors in the way the program is actually encoded. They happen when we’ve literally failed to write a program. They’re usually easy to find, and linters expand the universe of things we can consider syntax errors.

Logic errors, however, are harder to find. For instance, can you think of a linter that would catch the error below:

#This program prints "hello coit"
print("Hello Eric!")

What linter could catch the error above – that is, the comments are out of line with the behavior, and it leaves us wondering what exactly the program is supposed to do. Maybe the comments are wrong, maybe the program is wrong. Maybe both are wrong. A linter that would be able to spot the error above would have to know what the requirements are (what the program was supposed to do) as well as be able to parse and understand English to recognize the comments are out of line with the program.

Suffice it to say, such linters don’t exist. And many attempts to create programs that can understand requirements and English have been made – perhaps in the future, we’ll get programs smart enough to find the error above. But we don’t have them yet.

Back to Black Swans

Our black swans are encoded into the logic of our program. Testing can’t find them, and linters can’t rule them out.

We have two remaining arrows in our quiver – assertions and types. Types we’ll get to in the next module, and they are the only thing that can actually guarantee we don’t have black swans. For now, though, let’s look at assertions.

What’s an assertion

Assertions and the ‘assert’ keyword are things that should seem familiar to you. You use the assert keyword in your unit tests.

assert x == 1, "X should equal 1"

In Python, the above assertion implements the following behavior: it checks to see if the variable one equals the number 1. If it does, then we keep running our program. If it doesn’t, the program halts and the message “X should equal 1” is printed. That’s it. Our program is toast.

In our tests, our assert keyword is actually rewritten by the test framework to also keep track of some test code stuff. However, it works more or less the same – check something, if it’s false, report an error and crash.

Assertions and Black Swans

How do assertions help us catch black swans?

Well, in the above code we were told that they found an enclave of black swans on the dark side of the moon. Who’s they? Our clients, unfortunately. And they’re pissed because we said there were no black swans. That’s the reason they went with us rather than their competitor.

Assertions help us make promises like this. If instead of all the testing above, which again, ultimately didn’t turn up any black swans, we wrote code like this:

def handle_swan(swan):
    assert swan.color != "black", "We can't handle black swans!"
    ...do other stuff...

Then we’d guarantee not so much that we won’t have black swans, but instead, the program will do something predictable and safe if we do. That is, crash in a safe way, and let us know why. This ends up being incredibly valuable.

Some real life examples

Let’s say you’re writing a script that parses a text file. You glance at the file and it looks like the format is pretty regular, so you write some stuff and assume the rest of the file looks the same.

This program reads in some data does some transformations on it and writes it back out to the same file. It’s a big file, and it has to be fast, so you can’t keep a lot of stuff in memory.
Only someone accidentally put an error half way through the file. The next day, after your script runs, you realize you deleted half the file because you were off by one character in your parsing.

Oops.

In this case, you could have asserted that things lined up in the file like you expect. If they ever deviated, you’d crash the program, immediately, and leave the rest of the file unchanged. Then the next day you can debug what’s going on, and pat yourself on the back for not accidentally deleting your project.

Let’s say you’re writing some embedded code for an X-Ray machine. Your machine takes X-Rays of small children and finds things that might be cancer. To stay safe, the machine has to administer a dose dependent on the size and weight of the child.

However, that idiot Josh made some changes last minute and turns out, if you set the machine up just right, it thinks the kid is about the size of three elephants.

Ooops, you just killed a kid.

You could have asserted that the dosage never exceeds a certain amount, and now little Sally’s parents wouldn’t be casket shopping.

These two examples illustrate two strengths of assertions, discussed below.

First, Assertions Document Your Assumptions

When you assert in your text file that you should find the letter ‘c’ about 4 characters into each line, you’re basically saying to the reader “If C isn’t 4 letters in, then something is dreadfully wrong. I have no idea what’s going on and I should stop writing to this file”.
This ends up being a very valuable tool for two reasons.

First, any assumptions you are making in your code are now assumptions that every other reader and maintainer of the code (i.e., you in six months) now are aware of. This is invaluable communication, as many assumptions like these are so often either never written down, or written down in comments.

Comments are better than nothing, but comments aren’t executable. Assertions are. Assertions will crash your program if the assumptions change, and require you to rewrite bits. That’s okay and is often the desired behavior – wouldn’t you want to know when your assumptions need to be updated?

Second, it allows you to make more assumptions. Often you try and code for corner cases, errors that may or may not ever happen, or weird things that you’re not sure are impossible and thus want to handle. This makes code really complex. You could be lazy and just assume none of these things ever happen – but when they do, your stuff will break in unexpected ways and be very hard to debug.

Or you can just assert that they don’t happen. Then you’ve let the future maintainer know you didn’t handle that corner case, you fail in a known good way if it does happen, and it allows your code to cleanly do the thing it should do, and assert that all the other stuff never happens.

Basically, if you’re reading code and you are thinking “this should never happen” or “this is impossible” – then assert it. If you’re thinking “this must be the case”, then assert it! Some people say “that’s impossible to do, that assertion will never trigger and its a waste of time” – that’s precisely why you write the assert! Because you believe it’s impossible, you believe the assertion will never fire, so you should check that belief and write the assert.

You’ll be surprised how often it fires.

Second, Fail Fast and Fail Often

By littering your code with assertions, you can begin to adopt a ‘fail fast and often’ design mentality.

Often, when trying to make our code robust to violations of our assumptions, we try and think of every possible thing and handle it. This is called defensive coding. This makes the code very complicated, and make the ‘essential’ nature of what you’re doing buried under lines and lines of error handling.

Erlang is a programming language invented in the late ‘80’s by Ericsson (the mobile phone company). They wanted something that would keep their phone switches up with better reliability than what they had been using. When phone switches go down, people can’t talk. So they started trying to figure out how to make their switches never go down, or at least have very high uptime.

The standard advice of the day was defensive coding – make sure the switches never go down in the first place. Ericson, through Erlang, actually tried a different approach – instead of trying to limit failure, they just made sure their programs were really good and really fast at coming back up.

Erlang programs don’t try to handle errors. They just crash as soon as they can with something informative and then restart. This has lead to switches that have very high uptime, because while they can and do crash all the time, there’s always a backup running and the programs themselves start in milliseconds.

What we can learn from this is that code becomes greatly simplified if instead of trying to handle errors, we just crash in a known safe way when we encounter them. Assertions allow us to do that. Assertions tell us two things. First, they tell us when code fails, what went wrong. They give us something much more informative than code normally does when it fails – instead of a backtrace where we have to start theorizing what might

Assertions tell us two things. First, they tell us when code fails, what went wrong. They give us something much more informative than code normally does when it fails – instead of a backtrace that ultimately is where the program (already in an error state) finally did something that the OS killed it for, a backtrace where we have to start theorizing what might have lead the program to its bad state, we get a crash and a message of what exactly violated some assumption exactly where it first occurred.

The second thing they tell us is that if our program doesn’t crash, then all of our assumptions were true for that run of the program! This gives you confidence that your program works. So often, programs appear to work, but later we ask ‘how could this have ever worked?’ Assertions get rid of those sorts of errors. If the program worked, it worked in precisely the way you intended it to work.

Not when there’s nothing left to add, but rather when there’s nothing left to take away

Art, Antoine de Saint Exupéry tells us, isn’t done when there’s ‘nothing left to add’, but rather when there’s ‘nothing left to take away’. This is important with software design too.

Often people are enticed by languages that make it easy to do ‘a lot of things’. They’re considered powerful languages – it’s easy to do anything in them, and with very little work. C is considered powerful for this reason.But all that power can actually be very limiting. In contrast to the French Poet quoted above, the great Philosopher, Spiderman, has taught us that “With Great Power comes Great Responsibility”. Sometimes this responsibility is too much.

How do we make our languages less powerful? How do we make our programs capable of less rather than more? With assertions. Assertions tell us that the program is now incapable of a whole way of working. If we assert X > 0, that means the program is incapable of doing anything if X is less than or equal to 0.

Theoretically, when we’re done, we have a program that’s capable of only one thing, and that one thing is what it was designed to do.

Assertion Density

If poetry and comics don’t convince you, perhaps science will. This study, done by Microsoft in 2006, shows that as assertion density went up (assertions per 1000 lines of code), defect density went down (defects per 1000 lines of code).

It also showed that many of the defects that were eventually found in the bug database for these projects were found via the use of assertions.

Debugging can take up the lion’s share of your time. Hopefully, you’ve already found through code combat or the simple code exercises in the chapters so far that your initial coding doesn’t take too long. What takes a long time is when something doesn’t go as planned and you have to figure out what.

Every bug is different, and so often your mentor might be powerless to help you. Instead, she probably has to sit down, step through your code, and try and replicate the error herself. How to write a for loop? She knows off the top of her head. Why your for loop doesn’t work? That she doesn’t know, and she won’t until she spends a while debugging it.  Debugging takes a long time, and it’s not fun.

Assertions reduce the number of bugs you generate and reduce the amount of time it takes to debug the ones you do generate. They’re very valuable.

Best Practices

Code contracts as an idiom. More on this later…

We’ll get into ‘design by contract’ later, but a quick introduction is in order to help you think of ways to introduce assertions into your code.

First, the precondition. Preconditions are things that should be true of your program’s state at the ‘beginning’ of a function. If I have a function that takes in two arguments and one needs to be larger than the other, I can assert that with a precondition. Preconditions most closely model ‘assumptions’ in code.

def foo(x, y):
    assert x > y, "X should be larger than y!"
    ...rest of foo...

Second, the postcondition is similar, but makes promises about the return value of functions rather than arguments of a function. For example, maybe foo has to return an integer larger than 10. Post-conditions most closely model “promises” you can make to other parts of code.

def foo(x, y):
    assert x > y, "X should be larger than y!"
    ....other parts of foo...
    assert return_value > 10, "Return of foo needs to be larger than 10!"
    return return_value

Try to think in terms of preconditions and postconditions – what should you assume of the arguments of every function you write? Better yet, what can you assume to make writing the function easier? Assert it!

What should you promise? Can you promise more? If so, do it!

Sanity Checks

Another pattern for adding assertions is the ‘sanity check’. This idea weakens the idea of something that ‘must be’ true, or ‘should be’ true to something that ‘really ought to be true, I think’.

If you’re a bathtub, it really ought to be the case that the temperature can’t be set to above scalding. If you’re a microwave, nothing really ought to be in there for 99 hours. These kinds of assertions may end up firing more often than others and need some ‘tailoring’ to work. But they also can serve as great ‘canaries in the coal mine’ – they’ll almost always fire first, before anything is ‘technically wrong’, and give you a situation that really ought to be looked at to see if it’s a problem.

The downside of sanity checks is often we might have tests that test corner conditions to ensure things work – these corner case tests are more likely to trigger sanity checks. It’s a design trade-off on whether you want to ban some corner conditions outright or make sure you work properly through them.

No side effects

Side effects are when code actually ‘does’ something, like print to the screen, write to a file, or send a message over the network. They should never occur in assertions!

This is because, in some languages, assertions can be turned off. Moreover, it usually is more readable when assertions are simply true or false, or functions that can return true or false. Assertions shouldn’t ‘do’ anything in and of themselves.A common idiom is when a function returns a value saying whether it worked or not. The wrong way to check the return value would be to assert directly.

A common idiom is when a function returns a value saying whether it worked or not. The wrong way to check the return value would be to assert directly.

assert foo_prints_to_screen(1,2), "Foo should return true!"

The right way is to pull off the return value and assert just on that.

val = foo_prints_to_screen(1,2)
assert val, "Foo should return True!"

If debugging, someone should be able to skip your assertions or even comment them out and not have any code no longer work because you no longer read correctly from a file or something.

A special kind of function that does nothing and just returns true or false is called a predicate and is a-ok to put in an assertion. Many times these predicate helpers make code more readable.

Nothing that takes a long time

Likewise, assertions are just supposed to represent quick promises about the code. They shouldn’t take too long themselves, as that would screw up performance numbers between when assertions are on and when they are off.

So only check variables or run predicates that you believe run quickly. I wouldn’t invert a thousand matrices to check an assertion. Using assertions to check long-running behavior is probably something better done as a test.

Consider writing predicate library helpers

Alluded to above, predicates can make assertions easier to sprinkle throughout your code as well as more readable. Don’t shy away from writing and using functions that are only used in assertions.

For example, comparing two floating point numbers directly like 3.14 and 3.15 is notoriously dangerous to do. The Numpy numerical computing library for Python has a function to do so – compare two floating point numbers and return true or false. This reads very well inside assertions.

def foo(x, y):
    assert numpy.isclose(x * 3.14, y * 3.15), "X and Y should be close after modifiers"
    ...other code...

Don’t be afraid to use other’s predicates or write your own!

What about exceptions or ‘known failures’?

What if your program is ‘supposed’ to tell the user their input was bad and ask them for more input? What if, on disconnect from the server, it’s not supposed to crash but reconnect? What if something happens that you can actually recover from?

We will talk about this – error and exception handling – later. For now, just assert that these things don’t happen.

Why? Shouldn’t we learn how to ‘do it right’ first?

There are a few reasons.

First, error handling is notoriously difficult to do. There are a few patterns I’ll introduce that already rely on a good understanding of assertions.

Second, you will handle preciously few errors in your code well. This goes with the above but extends it. Not only is error handling hard, and unless you test your error handling code it’s almost certainly wrong, but there’re just too many possible errors to handle.

In C, the printf can fail. Hardly anyone knows this. And even fewer know what it means when it fails or how to recover from it. Instead, they write code as if printf didn’t fail. What asserts allow you to do is find a middle ground. They allow you to say “I don’t know how to handle this error, but I don’t want it to happen, and I’m not going to pretend it never happens.”

Error handling code, if taken to the extreme, can dominate your code base and make it incredibly difficult to read. It’s surprisingly tricky to get right. And most errors won’t be handled anyway, even if you try. You’ll produce higher quality code if you learn to assert as much as you can and convert those assertions to error handling code on a case by case basis.

We’ll discuss that later.

Relationship with TDD

Finally, assertion density has a very synergistic relationship with test driven development. Likewise, as we’ll get into later when we focus more on design, test driven design is also very synergistic with design by contract!

There are two things to think about when working with assertions and tests.
First, each assertion is only worth the number of times it’s executed. If I have 100 assertions in my code, and 10 in my test, but I only have one test, then I’ll execute 110 assertions over my code.

However, if I have 100 assertions in my code, and have 20 split between two tests… That means the 100 assertions in my code are exercised twice, with potentially different values. That results in 100 + 10 for test 1 and  100 + 10 for test 2 = 220 assertions exercised.

Thus, the higher your assertion density, the more valuable writing another test is, and the higher number of tests you have, the more valuable it is to add assertions in the code. They go hand in hand!

Second, related to the above, assertions help squash the ‘exponential integration test’ problem.

Unit tests are the tests you’re most familiar with writing. They call a single function with known inputs and check that the outputs are what you expect.

Integration tests are tests which call code you wrote which call other code you wrote. It tests that all the code that you or your team has written works together, or integrates.
The problem is maybe you worked on unit 1, and your colleague has worked on unit 2. We then have to write one integration test that tests the connection between unit 1 and 2.

That’s pretty easy.

But when a third unit is written, now we have to write two more tests – one to test unit 1 and 3, and another to test unit 2 and 3. When a fourth unit is added, three more tests need to be added – units 1 and 4, 2 and 4 and 3 and 4. We can actually test a lot more than this for every unit added to the system.

This quickly gets out of hand, ensuring that you can’t really have complete ‘integration coverage’ of tests like you can have line coverage from unit tests. There’s no way you can write a test for all possible combinations of units for any system of any level of complexity.

There’s a third level of testing we’ll call system testing – that’s when you test everything together. You don’t mock, stub, or fake out anything. So you test all units, 1, 2, 3 and 4. System level testing is pretty easy to set up, but hard to increase coverage on. Usually, I’d advise you write one or two high-level system tests, and then the rest as units to up your coverage.

A funny thing happens, though, when you have system tests and assertions. Let’s say unit 1 has a few preconditions and postconditions, unit 2 as the same, and so on. Our unit tests all exercise these preconditions and post conditions just fine. But every system test we write suddenly benefits from all of the assertions we have through our code. So we get a lot more assertions checked (as we stated above).

But there’s more! All of these assertions are along unit ‘boundaries’. That is to say, we’re writing assertions at the beginning and ends of all of our units (the ‘preconditions’ and ‘postconditions’, which is precisely where they talk to each other (often called the boundary). This is what integration tests are supposed to check.

To sum up, assertions basically provide a basic level of integration testing ‘built in’.

To apply what I just said, if you find yourself struggling to figure out how to test something – often this happens when your code is complex enough that you’re trying to set up an integration level test of a few moving parts and it’s getting hairy – think about some set of assertions you can embed in the code instead. Often assertions are much easier to add than integration tests are to write!

Don’t type check… too much

We’re going to get into types (which eliminate black swans) in the next module. You can do some dynamic type checking with python using assertions by insisting that variables are certain types, or inherit from certain types. These are often good ideas for assertions at this stage, but don’t go overboard.

One of the strengths of python is that its code can be called on a lot of different ‘stuff’. In other words, if you have code that reverses a string, and you call it on a list, you probably have reversed your list. This is a good thing and is called ‘polymorphism’.If you had done an assertion in there that the code only works on strings, you would have made the code needlessly dedicated to strings.

If you had done an assertion in there that the code only works on strings, you would have made the code needlessly dedicated to strings.

While we haven’t gotten into object orientation yet, or typeful programming, suffice it to say you should only assert on things that are ‘reversible’ – basically, things on which your algorithm would work. So beware making things too concrete.

Moreover, it’s considered ‘pythonic’ to ask forgiveness rather than permission. That means code should try (from typing perspective) to get as far as it can before crashing. This is not the same as failing fast on a value perspective.

(Types are ints, strings, floats, lists, dictionaries or the ‘category of container’ of a variable, while values are what the value actually is. For example, you can check the type of something in python with the ‘type’ function:

>>> x =  1
>>> type(x)
<type 'int'>

In the above, x has a type of int and a value of 1. Asserting on values is usually better than types, though some type-checking is fine!)

Don’t test your assertions

Writing a test to make sure assertions fire is a waste of time. It’s also impossible in some languages. In python, assertions are ‘exceptions’, which can be ‘caught’ and handled. In C++, assertions crash the program. Testing assertions is a great way to double the amount of work you have to do. Plus, assertions deeply embedded in some nasty complex code are hard to test, despite being some of the most valuable assertions!

From a psychological perspective, you’re going to find yourself eventually finding reasons not to test your code because it’s too complex. Testing can be, at times, painful. If you make assertions that painful, you’ll find excuses not to assert. Make assertions as easy as possible.

(TDD, by the way, also makes tests easier to write by doing them first. Unless a developer is particularly skilled in writing testable code, testing after development is done can be painful indeed.)

Instead, assertion ‘quality’ is most easily measured through peer review and even static analysis tools. There’s not a linter written right now for this, but it’d be trivial to write a linter that checked that there were assertions at the beginning and end of functions, as well as reporting the assertion density.

Note how interesting that is! We said above that linters only work on syntax issues – not logic. Assertions can work on logic. But how do we know if we’ve asserted well? That’s a syntax issue! We’ve ‘bootstrapped’ ourselves by combining tools together to make a formerly very hard problem only slightly hard.

Code Challenge

Read this for more examples of how to use assertions in python.

For some ideas on how to do useful type checks in python, check out the third option on the first answer here and the whole of the first answer here.

While not required, this is sometimes a useful syntax construct to keep assertions as ‘one-liners’.

Clone this repo. It has the beginnings of some simple statistics functions. Use the following development process:

  1. Pick one of the functions (mean, median, range or standard deviation) and write a test for it.
  2. Ensure your test fails, and your code is pylint clean.
  3. Commit.
  4. Think of assertions you can run as a precondition for your function. Add those.
  5. Ensure your test still fails, and your code is pylint clean.
  6. Commit.
  7. Implement the function.
  8. Ensure your test passes, the assertion doesn’t fire, and your code is pylint clean.
  9. Commit
  10. Pick another function and go back to step 1 until you’re done with all four.
  11. Open up a pull request and ask your mentor for peer review

Definitions of the functions can be found here as well as in the code.

Mean
Median
Standard Deviation
Range

For Mentors

Check out the ‘answer_key’ branch of the repo above to see some examples of good assertions. For students, don’t look at that branch – you’ll have more or less wasted your time reading this if you just cheat and look at the branch. You’ve already gotten this far, might as well give it a shot without looking at the answers right?

Also, don’t read ahead unless you’re a mentor – again, it just ruins things for you.

  • Did your mentee use the Python statistics module? If so, ask them to reimplement the algorithms from requirements. Let them know that they did it absolutely right and in the real world, you would always use libraries. Commend them for doing research. Let them know that this challenge is also to practice implementing an algorithm from a definition, and that’s why we’re doing it that way.
  • If they did not use the statistics module, ask why not? Good research at the beginning of any project to see what’s already been solved is valuable.
  • Did your mentee implement the population standard deviation or the sample standard deviation? Whichever one they did, ask them to reimplement using the other way. Talk about how errors and defects can be in the requirements themselves – in this case, the requirements weren’t detailed enough for them to know which one to use.
  • Discuss whether the tests and assertions helped in their initial development, and whether it helped in any ensuing refactors? Did the tests and assertions help them understand the problem, as well as shrink the problem space? Did they help ensure that everything still worked after refactors?

December 19, 2016 Posted by | Uncategorized | Leave a comment

Peer Review Teams

Peer reviews accomplish a number of things:

  • They are one of the most cost-effective means of ensuring quality
  • They spread general system knowledge across a team
  • They spread best practices

The ideal peer review size in terms of ensuring quality alone is one person. The marginal benefit of adding more than one person to a review, at least in terms of quality, is low. However, there are effective ways to increase peer review team size if you have different reviewers focus on different things.

You ensure that reviewers have different foci by ensuring they come from different perspectives. I’ve seen three general patterns in perspective.

The Architect

The Architect, in this peer review role, will be a senior or very senior engineer who’s had her hands in a lot of parts of the system. They’ll tend to be a jill of all trades, master of none, and be most interested in larger issues.

They’ll best satisfy the “most cost-effective means of ensuring quality”.

Architects will focus on:

  • Whether or not the code’s external API matches other project’s expectations
  • Whether or not the code abides by a coding and design standard, for instance covering requirements such as
    • Readability
    • Testability
    • Maintainability

Architects also, to a degree, help ‘spread general system knowledge across a team’. They can bring other projects knowledge into a peer review, as well as export project specific knowledge out of this project into other peer reviews.

The Junior Developer

Junior-Senior pairs, which are one pairing pattern that seems to work very well (and which I have begun calling ‘Haseltine Pairs’), also provide another perspective in peer review.

They best satisfy “spreading general knowledge about the system” in peer views.

Specifically, having a junior peer review code will

  • Help them better understand the larger project
  • If they’re part of a Haseltine Pair on the project, help them understand how to debug, test, and document the project

Also important is the Junior Developer’s influence on the process itself. Having juniors on a review, and ensuring they’re confident in asking ‘dumb’ questions, helps code become more maintainable and readable. The Senior now knows she’s writing for an ‘audience’, the junior dev, who may or may not understand everything.

It’s important to remember that we’re all juniors when we first set foot in legacy code, even our own if it has been a few months! Having code written for a junior in mind very much helps keeps things maintainable.

The Language Lawyer

The final person on a great peer review would be what’s been called the ‘Language Lawer’. Language lawyers are your subject matter experts in the programming languages and technologies being used.

An important part of a good peer review is access to supporting documentation such as designs and requirements that help the reviewer understand “what problem is this trying to solve” and “what are the broad swaths of the solution I should be looking for?” This can be done via some supporting documentation, however, ‘lore’ (or informal, baked in knowledge in a team that isn’t documented) is much stronger with the two people listed above. The architect knows the broader problem the code tries to solve, whereas the embedded junior / Haseltine junior has a rough idea of how the design is supposed to work.

What’s left for the language lawyer?

Well, the language lawyer ends up being effective precisely because he doesn’t need formal knowledge of the project to have an impact. Language lawyers are good ‘in the small’ – they know about the APIs, the libraries, the language rules, and idiomatic code. Thus, many of their comments won’t be on why a certain design was used versus something else, but rather, why the actual code was written the way it was.

The Language Lawyer best solves the “spreading best practices” problem by using Peer Reviews as a way to help the author become a better programmer.

They’ll look for:

  • Nitty gritty bugs such as null pointer dereferences
  • Small performance gains that don’t hurt readability
  • Idiomatic code structures and improvements to code’s modularity that takes advantage of a specific language

They also help improve quality, especially in error-prone languages.

Putting it all together

Small teams don’t always have many people to draw on to build the ideal peer review. And even in large teams, you don’t always have the make-up required to get your junior, your architect and your lawyer all in a peer review.

But working with this pattern can help you make quick decisions about who’s needed. If someone is a great programmer in Ruby, then the value added by a Haseltine Junior or Architect is higher than that of a language lawyer.

If you’re most big picture senior person is pushing out some code, getting the big picture perspective probably isn’t going to get you as far as ensuring a language expert is involved.

And if someone’s spitting out code without help, getting a dedicated peer reviewer to play the part of a junior to ask the ‘dumb’ questions is going to help more than an architect or lawyer.

If you can build your ideal team, that’s great. If you can’t, look for what weaknesses remain and cover those to get the most out of your peer reviews.

December 19, 2016 Posted by | Uncategorized | Leave a comment

SYWTLTC: (AB) Chapter 3.2 Quality : Static Analysis

Go here if you want the prolog and table of contents to the SYWTLTC series!

Per the 5 pillars of quality, up next is static analysis. As always, treat all links as required reading unless stated otherwise.

What is Static Analysis?

Static analysis is a broad term used to categorize all tools that you can run on code to tell you whether it’s correct or not. It’s a program that you run on your program which tells you whether you’re making mistakes.

It’s called a static analyzer because it doesn’t run your code to figure out what’s wrong – it analyzes it “statically”, or unchanging.

What kinds of Static Analysis are there?

There are three broad categories of static analysis tools out there. Linters, static analyzers proper, and model checkers / theorem provers.

The most prevalent, and the one you’ll be using from here on out, is called a linter. Linters “remove lint” from programs. They operate primarily on the text of the program itself, looking for simple stylistic mistakes. Think of them like spellcheckers. They more or less look at your program line by line and give you warnings if for example, you use a variable name that is hard to understand, or if you switch between spaces and tabs.

The other categories (static analyzers, model checkers, theorem provers) can all eliminate harder and harder bugs to suss out, but require substantially more work. Python is a ‘dynamic language’, which means the entire program isn’t really defined until it’s running, and so ‘static’ analysis of the code itself tends to have too many unknowns to be worthwhile.

We’ll be investigating Pylint in particular which is primarily a linter but also does some more difficult checks as well by attempting to interpret your python without actually running it.

Benefits of Static Analysis

There are a number of benefits of static analysis as well as drawbacks, but most important to note is that many of these benefits and drawbacks tend to be complementary to testing, peer review, types / contracts and design (our other pillars). Static analysis in and of itself is of limited power, but combined with the other pillars of quality can be very powerful.

Consistent style

First off, stylistic checking ensures a code base has a single, consistent style. This helps maintainers as they can expect certain patterns in the whitespace, variable names and other parts of the code to read it more easily.

It also helps peer reviewers since, again, a single way to use whitespace, variable names, and other stylistic concerns make code easier to read than many different styles.

Absolute removal of certain kinds of bugs

Some bugs, such as variable misspellings, which would end up crashing your program at runtime can be absolutely eliminated from your code base.

This is in contrast to testing. Testing can only show that the one path through the code that the test executes does not fail in any way that the test doesn’t expect. In other words, you can never really prove your program works via tests alone, since each test only proves that that one, single scenario worked.

Linters can prove that your program is free of certain kinds of bugs, completely and absolutely.

Very low cost in terms of time; quick turn around

Compared to contracts, peer reviews or tests, linting takes nearly no time at all to run. Tests take a lot of time to write, and later, to maintain. Peer reviews can involve multiple person hours as other developers look at your code.

Linting takes, usually, on the order of seconds. This is great for two reasons.

First, it means that the level of effort you have to get a clean lint is minimal compared to testing. You can squash a lot of bugs very quickly with linting, a lot more than you would via testing.

Second, it means you can lint often. In the previous chapter, I showed you how to automatically run your tests as files change. This is a great productivity tool as you can find out if you broke a test very early.

Trying to make sure that “bad thing” gets feedback ASAP is a key to learning, and it’s also a key to fixing “bad thing” fast. The mistake you just made is still fresh in your mind, so getting feedback on it means you don’t have to go looking for the bug – it’s right there, right where you were already working.

Tests and testing still require some care – tests can easily take minutes or hours, which means you have to start splitting up what tests run when. Usually, we like ‘unit’ tests to be our fast tests, the ones we can run automatically on changes, whereas other tests we may run nightly.

Linters, however, are super fast. They can be run faster than unit tests even. Many linters are actually built into text editors and IDEs so that when you save your file, the linter automatically runs and tells you what errors it has found (again, like spellcheck).

For static languages like C++ or Java, it’s often said that just getting your program to compile is like one big test. We don’t get that luxury in python – however, we can get most of it back by linting early and often. A clean lint is like a version of a test that runs quickly and eradicates many kinds of errors.

Can’t test your tests

Speaking of testing, it’s hard to test your tests, and not always value added to do so. TDD ensures you do some minimal testing of your tests – this is why you make sure the test fails first and then passes when you do code changes. All too often have I written tests after the fact, then when a bug crept in, I realize that the way I wrote my test would have never found it because I screwed up writing the test.

How do you ensure your test code is high quality then? With the other pillars – static analysis in particular. Ensuring your test code has a clean lint gives some assurance that your tests are maintainable and readable, as well as free of certain kinds of errors. This, in turn, makes your tests more easy to peer review for other issues.

It’s a virtuous cycle!

Drawbacks

There are some downsides to expect from linting.

High false positive rate

Linters are going to find a lot of issues that just aren’t that important. Whitespace issues may hurt readability, but they’ll never crash a program. Variable names are nice to get consistent, but the interpreter doesn’t care.

Most of what you’ll be fixing will be things that may have never ended up crashing your program.

Fixing them, though, is often very simple. And you’ll get into a habit of breathing a sigh of relief when the linter runs and finds no issues in your code. You’ll become more confident as a coder, and be much more willing to take risks.

Types of bugs found usually aren’t that nefarious

Along with the above, the worst bugs are often those hardest to catch via linters. If you’re handling credit cards, making sure you debit the right account isn’t going to be something a linter can help you with. Making sure you don’t leak personally identifiable information is something linters would struggle to help you with too.

Often the bugs found are simpler readability and maintenance errors, as well as some actual defects that are pretty quick to learn how to avoid. On the other hand, linters prepare the code for people who can find those bugs in peer review and can give more assurance to test code that it’s correctly exercising your credit card and PII functionality.

Hard to do in dynamic languages

One final drawback is that linting is hard to do in dynamic languages, as discussed above. This means things that some languages can spot via static analysis alone like resource leaks (you grabbed memory from the operating system and forgot to give it back) aren’t going to be things Pylint can find, though.

On the other hand, linters end up being of about equivalent power to the compiler in dynamic languages – which is a great first step towards ensuring your program works. If another program reads it and says “I don’t see anything obviously wrong with this”, that’s some assurance.

Smelly Code

Despite the drawbacks mentioned above, often we go along with fixing all the false positives as you don’t really know whether or not something is wrong until you try to fix it.

Code with lots of Pylint errors can be said to be ‘smelly’ code – we don’t know something is wrong for sure, but we need to check it out. Check out the write up here, and then skim a few of the code smells classified on C2.

Often you might fix one or two Pylint errors and three more will pop up. This is a sign that there’s actually a fundamental design flaw that leads the code to be brittle and hard to understand – even if on the surface it just seems like a few small warnings from Pylint.

If we keep the code squeaky clean, we’ll avoid any smell.

Pylint

Pylint is pretty much the industry standard linter for python. It does a lot of stylistic checking based on what’s considered true idiomatic python (called PEP8) as well as some deeper analysis.

Go download it here.

Challenge

We’re going to loosely follow it’s tutorial, which involves the Ceaser cipher which you’ll want to read up on.

First, fork this repo. Then create a branch in your forked repo where we’re going to do some work.

Then, go ahead and clean up the ceaser_script.py file using Pylint according to this tutorial.

 

When it’s clean, commit.

That is a workflow you might use if you inherit some code and want to clean it up – often running a linter on inherited code is a good way to both improve its readability as well as get familiar with it.

Next, we’ll work on a workflow that combines both linting and test driven development. In the future, you’ll be required to use the workflow practiced below!

The next step will be a little more difficult – create a new file, ceaser.py and ceaser_test.py – we’re going to refactor or change the script you worked on before to be more reusable.

 

  1. Write a test for a function you haven’t written yet in ceaser_test.py, the function will have the following signature:encode(message, offset) so you can call it like this, encode("beware the ides of march", 3) and get a message with the Ceaser cipher offset 3. (You’ll probably have to create a test message by hand)

  2. Ensure this tests fails. You may have to put an empty function in ceaser.py that does nothing.

  3. Ensure pylint is clean.
  4. Commit
  5. Using ceaser_script.py, copy and paste some of the functionality into your encode function in ceaser.py

  6. Debug it until your test passes.
  7. Ensure pylint is clean.
  8. Commit
  9. Write a test for a function you haven’t written yet in ceaser_test.py, the function will have the following signature:decode(encoded_message, offset) so you can call it like this decode("jewlrp ajk ippf kl aqjrk", 9) and get a decoded, English message using the Ceasar cipher. (Again, you’ll have to create a message by hand, the above was just random letters I made up, it’s not an actual message.)

  10. Ensure pylint is clean.
  11. Commit
  12. Using ceaser_script.py, copy and paste some of the functionality into your decode function.

  13. Debug until your test passes.
  14. Ensure pylint is clean.
  15. Commit
  16. Open a pull request on your branch.

The above illustrates a pattern – in Test Driven Development with Static Analysis, every commit should either be adding a test or code. Every commit has 10/10 on pylint and 100% coverage.

When a test fails and you can’t figure out why, then break out the debugger. Also, often running the debugger the first time you want to walk through your code can also be a good practice.

Hook up Pylint to your Text Editor

Fixing things as soon as they happen creates a tight feedback loop that both makes you more productive and accelerates learning. It’s easiest to see during testing.

If you make a change to your code, and your tests fail, you know what you just changed. All the context is still in your head and you’re much more quickly able to debug code and get the test passing again. Moreover, you know that the changes you made in the code ended up affecting tests that you may have not predicted. You learned something about the code.

Compare that to making a lot of code changes, then days later, running the tests. A few fail. You have no idea what changes are tied to which failures. You can try taking a debugging approach, and you can look at your git diffs to see what’s changed, but this is a much more complex problem than above. You’ve already moved on, mentally, to other things. Debugging the same issues could take two to ten times longer.

The lesson? Debug as close to possible to when you added the bug.

Linters work like fast unit tests – something that can run in the background of your editor and let you know about issues as soon as possible. Again, since they work like a compiler for a dynamic language, they’re a single large global test for things like misspellings, syntax errors and other things you’d otherwise have to wait for your tests to run. Catching them as soon as possible speeds you up, and allows you to focus your testing efforts on things the linter can’t catch, like actual logic errors rather than merely running the code looking for syntax problems.

Go ahead and use the instructions below to hook up Pylint to your editor of choice:

Pylint for Sublime

Pylint for Vim

Pylint for PyCharm

Hook up Pylint to Git

Another approach is to have git automatically reject any commit that doesn’t have a 10/10 out of Pylint.

When running from inside a text editor, Pylint decorates the current file. If you make changes to that file, and Pylint gives you a clean bill of health, that doesn’t mean that your changes didn’t suddenly break other files.

For example, you may rename a function, and forget to rename other places it is used. Pylint would flag your current file as clean, but other files where that function as being used as having errors.

Putting a Pylint check-on-commit allows you to do a whole project Pylint at the last moment to prevent adding any erroneous code to the repo.

Check out this repo and add it to your own fork of the Ceaser project above.

Some additional resources can be found here.

What about false positives?

For the duration of these chapters, we’ll treat every pylint error as a real error. You’ll be expected to fix everyone, whether you agree or not unless your mentor explicitly tells you to ignore them.

That being said, in the real world, often you have to make compromises. For that purpose, there are configuration files to turn off families of checks, suppression files to suppress warnings line by line, as well as in line suppressions. No task is tied to these, but go ahead and skim these links so you have a cursory understanding of how to squelch a Pylint error.

Example of inline suppressions

How to add suppressions to the config file

Example config file

To move on…

When you’re done, you’ll need to provide your mentor with the following…

  • show your mentor a 100% coverage report
  • show your mentor a 10/10 Pylint report
  • open a pull request on your code, and clean up any comments your mentor has.
  • show your mentor that you have pylint installed in your text editor
  • show your mentor that you have a pylint hook in your git repo

For Mentors…

  • In addition to the above, check out each commit and ensure that each one is pylint clean.

November 11, 2016 Posted by | Uncategorized | 1 Comment

SYWTLTC: Novice Chapter 2: Effective Hacking

Go here if you want the prolog and table of contents to the SYWTLTC series!

The Novice section of SYWTLTC is intentionally pretty sparse – Chapter 1 gives you all the tools you need to get started in Code Combat .

However, there are often meta-lessons to be learned even as early as Code Combat. We’ll go over one of those today.

Hacking?

So I’m using this term loosely. But often, we hear the term ‘hack’ in the context of programming meaning when someone doesn’t fully understand what they’re doing and they’re just trying to ‘make it work’.

The usual work flow here is to make a sometimes not-so-educated guess about what might be wrong, change that thing, then run your program and see if it works.

Senior coders tend to have many tools in their toolbelt, however, we never completely drop hacking as a means to understand things. There will always be time when you will have code you inherit, a library you don’t understand, or even code you wrote yourself that you no longer remember how it works – there will be times like these that all you can do is ‘fiddle with it’ until it does what you want it to do.

Still, there are tips for more rigorous hacking

1. Scientific Hacking

Change one variable at a time

I don’t mean actual variables in a program, although that may be the case as well. What I mean here that’s scientifically inspired is that we try to isolate only one ‘theory’ of why it’s not working at a time.

If it may be X, Y or Z, you don’t change X, Y or Z all at the same time. Change one and see if it worked, back that change out, change the next and see if it worked, and so on.

This may feel like you’re going slower, but you’re actually going faster. This is because your ability to mentally understand what’s changing in the system goes out the window after a certain (very small) level of complexity. So you may be able to “change all the things!” once or twice on toy programs you’re working on, and things will appear to work.

But in larger programs, many bad things happen when you do this.

  1. Your program can suddenly appear to work. But it’s all in appearances.
  2. You may fix your thing and break something else.
  3. You may not even fix your thing, break something else, and not understand what you changed well enough to unbreak it.

Three is usually the most common.

There is actually an advanced way to change “all the things” though, and you’ll need to combine it with tip 7 at the bottom – always leave breadcrumbs (i.e., lots of git commits for every change you can back out, or comments in code that you can back out, ways to easily undo what you’ve done.)

Backwards Science

This is to basically do science in reverse – change all the things. Then run your program – does it work? If so, undo half of the things. Does it still work? Then you know it wasn’t that half. Undo half of what’s remaining – does it still work? If it doesn’t, you’ll want to turn those fixes back on and turn the other half off.

This is akin to a the ‘binary search’ algorithm which you may become more familiar with later. And it’s a good compliment to the traditional ‘turn one on, leave all off’ technique described above. This is because some issues may be interactions of multiple fixes, i.e., you may need to make more than one fix to the code to get things in ship-shape. The turn-it-all-on and then binary search downwards can find this easier than the turn-one-on-at-a-time approach. The turn-one-on-at-a-time approach, though, usually is faster since it requires less work to set up and back out.

Keep a Journal

You can do this in a documentation tool, in comments in the code, or just in a paper spiral at your desk. Often it’s good to write down what you’re doing, and what the results were, again in a scientific manner. Each change you make to the code is a little ‘experiment’, and you need to write down what you did and what the results were for each experiment.

This helps with number 7 below – keeping a journal complements other techniques of ‘backing things out’. It also prevents trying the same experiment twice – which may happen if you’re struggling with a bug for months at  a time. When you start forgetting what you’ve already tried, that’s when you truly begin to spin in place and become completely unproductive.

Finally, a journal can help with hypothesis-generation. As I stated above, each fix is an experiment. Your minds ability to come up with a hypothesis for any given event is nearly infinite (given enough time). But you’ll come up with better hypothesis the more information you have.

A hypothesis is valuable insofar as it explains the given data. Your initial bug is one data point – the program currently does X when it should do Y. Many hypothesis can fit this, and your job is to methodically step through them one by one until you find the one that’s correct.

However, each time you do an experiment, you narrow the solution space. If your program prints “Hello WOrld!” when it should print “Hello World!” and you perform an experiment to lowercase all O’s in the program and it fails… your real problem just got constrained. Now your problem is:

  1. Program prints “Hello WOrld!” when it should print “Hello World!” AND
  2. When lower casing all O’s at line 13, the program continued to malfunction.

A journal helps keep these thoughts all in order and allows each of your experiments to gather more data.

2. 90% of Programming is Knowing What to Google

Most of coding is research.

cfxphz8w4aa711p

But what to google and what sites to go to first is something you learn over time. This series will have a particular module dedicated to research, but until then, understand that if you have to search for something on the internet, that doesn’t mean you aren’t coding right.

Most of coding is googling for APIs, code snippets, blog opinions about tool X versus tool Y, and looking for others who have had your same problem and fixed it.

3. Don’t Grind

There’s a lot of times when you’ll be struggling with making your program work and you’ll chose to … struggle more.

Bayesian reasoning is a kind of statistical reasoning that says “What should we expect given what we’ve seen?” It says: take all the data into account, including new data, and what should we expect in data going forward?

In other words, given that you’ve already struggled for 3 days with this bug, how likely are you going to solve it by struggling for 3 more days?

Not very likely.

This is called grinding. And it’s a technique that may leave you with the answer, after maybe thirty more days, or may drive you to completely change your result (which is bad – if you wanted to design it in a certain way, it’s probably because that certain way was good. Changing to another way means you’re sacrificing quality because you couldn’t make it work.)

Or it may leave you quitting your job. I’ve seen all three happen.

When you find yourself grinding, your hypothesis generating engine slows down, and you have trouble coming up with new ideas for why your bug is occurring. You either rehash old ones – which is a waste of time if you’ve kept a journal – or you come up with increasingly bizarre theories on why your program may not be working, which isn’t the best use of your time.

The best thing to do when you realize you’re grinding is give up and work on something else. Your subconscious will be busy grinding away at the problem for you, and you’ll be greeted with an especially good idea right when you’re falling asleep, or when you’re showering, or otherwise occupied. These are the insightful ideas that have lots of promise, whereas the bizzare ideas you come up with staring at the code are almost always bad.

Walk away, play a game, read a book, talk about your problem with someone else, or talk about anything but your problem with someone else. Insight will strike.

4. Play

When you’re coding, you’re not always stuck on something. Sometimes, things are going just fine, swimmingly actually. This is when you should try to make your own problems to get stuck on.

If you’re trying some tutorial and you can get a button to show up on your screen where you want it – what happens when you move it? What happens when you set certain things to negative numbers? What happens when you try and push it off the screen?

These are experiments, like the above, but rather than experiments trying to prove or disprove a theory about how something is causing a failure, they’re still adding data to “how buttons work” or “how strings work” or just about anything else. They’re a form of play – exploration for its own sake – and they’re incredibly valuable forms of “hacking”.

Again, as with tip 7, leave yourself a way to back out. But rather than trying things to fix your program, you’re more or less trying ways to break it – or maybe not. You’re just trying things on a completely fine program, and checking with what you think will happen with what actually does.

Along with tip 5 below, playing is the best way to get the most out of something you’ve already done – if you already implemented some widget, what are a few ways to change it that you don’t know what they do? That ensures you get the most out of every project and exercise.

5. Make it Work; Then, Make it Pretty

Before we get further into this tip, let me make one thing clear –

You are not done with your code until it works and is pretty.

There’s nothing more demoralizing than sitting in a peer review with some recalcitrant coder who refuses to change what they have done because “it works, doesn’t it?”.

Working code is the bare minimum of what you’re expected to produce.

However, when trying to prioritize what to do first – getting things working is often the hardest part. Finding one solution to your problem is hard – there’s an infinite variety of solutions, but a much larger infinity of non-solutions. It’s ‘sparse’.

However, once you do have a working solution, it’s usually far easier to make slow incremental changes to that working solution to make it more pretty.

What I’m not saying here is that you should code a large project together with no regard to making things readable, and then do it later as an after thought. What I am saying is that sometimes, you’ll get stuck – it’s these times when it’s okay to get some sawdust in various places, so long as you can follow tip 6 below and keep it isolated to a certain area.

The fact is, writing a test for each solution is going to be cumbersome if 99 of your potential solutions don’t work and the 100th does. Sometimes you get the benefit of a single test telling you whether or not your solution works at all – this is when you’re lucky. But when you’re designing a new feature and you don’t know how it should work yet – you want to play in the design space and see what feels right – letting things get slightly dirty in isolated parts of code is fine, so long as you follow through and get them cleaned up before any peer review.

6. The Surgical Curtain

In surgery, surgeons often lay down cloth around the incision site to block out everything except the area that they’re going to be working with. This is to more or less shrink the problem size and focus all attenion only on the surgical area.

Similarly, when trying to ‘hack’, you want to shrink the problem by as much as possible, and only work on the area that is problematic.

Remember in scientific hacking, we talked about ‘reverse science’, where you change everything to see if your issue is still there?

There’s a similar technique to shrink the problem space, where you try and turn off (by removing or commeting out code) large swaths at a time and seeing if the problem is still there. As you turn things off and the problem remains, that means you can be confident (not sure, but confident) that your problem is not in that area of code.

Often you can shrink things down into a small toy program where your problem lies, and it becomes much easier and faster to try different experiments out on it.

This is one benefit of well factored / well designed code, it’s usually easy to isolate parts of the code and write small ‘unit’ tests around where your issue lies, rather than having to run your entire program to see if it works or not. The curtain is easy to lay down in well designed code.

If your stuck, and there are lines you can comment out while not affecting your problem, do so – this reduces changes of accidentally breaking other things, introducing interactions, and keeping the problem small enough that you can keep it all in your head.

7. Leave, and use, Breadcrumbs

Finally, leave yourself a way out.

Hacking can often mean many changes to your code – if you’re making them methodically as illustrated in tip 1, you also need a methodical way of backing them all out. This is what source control is often used for – try an experiment and commit it to the repo. If that experiment doesn’t fix your problem, roll back the commit and your code will be as it was before you did anything.

Often, even with the best rigor, we find the code base to be an unintelligable mess after some hacking around. It’s best at some times to start all the way over, and leaving yourself breadcrumbs allows you to do that.

You really don’t want to find yourself trying to fix a problem where you have a code base that is so heavily hacked that it’s unrecognizable compared to how you found it. It means that you’ll basically have to debug your way out, which is never fun.

Get in, change only what you need to in a methodical fashion, and get out, leaving the code as clean as possible.

Leaving breadcrumbs like git commits that are very granular also allows you to easily back out scaffolding code like print statements and other things that help you debug.

Finally, backing out fixes that don’t work is incredibly important. If we write for readability first, and performance second (which you should), then you should assume that the code base is as readable as it can be. Any change you make that’s not a refactoring to make it more so must by default make it less so. In other words, any change you make that’s not explicitly made to improve readablility is most likely harming it. No change should be left in that doesn’t do something – like fix a bug. If it doesn’t fix your bug, you need to take it out.

There are often thousands of ways to code something. Your fix may not have fixed your original problem, but it may not have also introduced new ones, meaning you could potentially leave it in and the code base would work as it always did. Don’t do this – you’ve harmed readability by letting in code that had no reason to be there. Back that code out and start anew on a new experiment.

Conclusion

Hopefully these tips and early insights into how coders code are good to have. I know a few things like knowing that people spend most of their time debugging and googling have helped people feel like they aren’t utter failures when they’re working through code combat.

It’s okay to hack, it’s okay to research.

But it’s also good to practice hacking and researching the right way, so that you can speed yourself up and be ready for some more robust tools to put in your toolchest.

November 1, 2016 Posted by | Uncategorized | 1 Comment

SYWTLTC: (AB) Chapter 3.1 Quality: Test Driven Development

Go here if you want the prolog and table of contents to the SYWTLTC series!

In this and future chapters, treat all links as required reading.

Five Pillars of Quality

I claim that there are five pillars of quality in software.

  1. Testing and Dynamic Analysis
  2. Static Analysis
  3. Peer Review
  4. Contracts and Types
  5. Design

You can find people who will swear by one, or even two. Some folks will say all you need is tests, others will say that types are the only way to prove there are no bugs in your code.

Each of these pillars has strengths and weaknesses, but they all tend to be very complimentary. That means they work best as a team.

But first…

Why Quality? And Why so Early?

We all want to be ‘good’ at what we do, we all want to produce ‘quality’ work, sure. But I’ve got to get this script out by tomorrow, so we can keep all the nice and pretty stuff like testing for tomorrow, I have real work to do today!

This is very myopic thinking!

Ultimately, we are looking how to be productive coders. Quality is part of productivity, it’s part of being fast – it is not the opposite of being fast! The fastest coders out there code quality, and the reason is that the number one thing that’s going to slow you down is rework. After all, we never seem to have time to do things ‘right’, but we always seem to have time to ‘do them over’. The previous sentence should bother your logic center, as clearly – we have time to do things right if we have time to do them over.

And if we do them right in the first place, we don’t have to redo them later, and we go much faster.

Software maintenance costs us about 60% of our total effort we put into software, the rest going to requirements, design and coding. That means that our coding time – the part you think you’re ‘speeding up’ by not doing quality work – accounts for a very small part of our overall efforts. You may be penny wise but pound foolish.

What is maintenance? It’s anything you do to code after it’s already written. That’s going to be the lion’s share of your work – and you probably have already noticed that you’ve done a lot of maintenance. You attempt to write out some code to solve a problem, and it doesn’t work exactly right. Everything you do after that point is maintenance. When you hack on your program to attempt to make it work, that’s maintenance.

It’s most of what you do as a developer.

When we forego quality, we rack up what’s called ‘Technical Debt‘. We call it debt like credit card debt because, from the time we take it on to the time we pay it off, we have to pay ‘interest’ in terms of effort. We go slower and slower in future projects, spending more and more time hacking through our low-quality code to get things done.

Debt is generally a bad thing until you know how to deal with it. So for now, it’s best to learn how to never go into debt, as well as if you find yourself in debt, how to dig yourself out.

Why so early, though? Why do you need to learn about quality, now? You barely know how to do string manipulation or arithmetic in Python. Why are you having to worry about getting things perfect now?

There are two main reasons.

What did I JUST SAY about productivity?

Do you want to learn how to code faster? I just said that quality is the same thing as productivity because it prevents rework. Why wouldn’t you learn quality as soon as possible so that you can blow through the rest of these lessons as fast as possible?

Learn Good Habits Early

You’re going to run into a lot of coders who refuse to test. Who review peer review. Who think static analysis is a waste of time. They never really bit the bullet to learn the right way to do things, and they don’t like how testing, peer review, and other processes make them feel dumb.

We end up justifying a lot of stuff to ourselves to avoid psychological pain. Tests are painful – they’re going to tell you where you screwed up. But they don’t have to be – if you learn good testing discipline early, you’ll realize that tests are just a part of the process, not a tool designed to make you feel dumb. You’ll realize that everyone makes mistakes – a lot of mistakes – and the earlier we catch them, the better for everyone. Macho coders who refuse tests, reviews and other help aren’t all that good, they just don’t want to be ‘exposed’ as an imposter.

Learn these things now so you never really learn how to ‘code’ without them – an early realization of their value means you never have to think “I can either implement this thing my Boss wants by tomorrow, or learn to test.” You’ll already know how to test.

You’ve Already Been Doing It

Code Combat is a test driven system – you had to keep trying out your code on the right to see if you passed the challenge on the left. Each state is a test, and it had certain requirements to move on to the next stage. That’s the exact same thing as a test.

Of course, Code Combat also gave you a very nice visual debugger too – you got to see where your character was. So keep in mind, testing and using the debugger go hand in hand.

On to Testing!

Early on, I advised learning how to do what’s traditionally called unit testing. Traditional unit testing is when you write some code, and then write tests that exercise that code. We’re going to be doing the opposite, though, and focus on Test-Driven-Development, or TDD.

What is TDD? What are its benefits?

TDD is when you write the tests first, then you write code that passes that test.

The benefits of TDD are as follows:

  • In the traditional approach, you can’t guarantee that you’ll write code that’s easily testable. As you get more experience coding and testing, you’ll realize that some code is hard to test, while another code seems easier to test. If you start with the tests first, you’ll almost always, instinctively, write code that’s easy to test.
  • You reduce pressure from management. If you write code first, bad managers might be tempted to ask you to deploy what you have, and explain “we can always test later.” If you start with the tests first, you never can be pressured to deliver before things are quality and get into technical debt.
  • TDD is also sometimes called ‘Test Driven Design’. This is because sometimes starting with the tests helps us think about our code as already complete and well designed – how would you like to interact with the module you’re writing? If you test against that design, then you’ll be forced to code to that design. Too often, if we try and build prototypes first, we end up testing whatever design we get that works. We don’t think about how we want our code to look from the outside and make sure it looks like that. With TDD, we get those benefits.

How do I do TDD in Python?

I’m glad you asked!

First, watch this video.

Then, read this article.

We’ll be using py.test from here on out, but it’s useful to see other testing frameworks like nosetest and unittest to see similarities.

How do I know when I’m done, or if I’ve done a good job?

Testing and TDD have a great initial metric of ‘goodness of testing’ called coverage. Coverage is a measure of how many lines of your program were executed by your tests.

Generally speaking, higher coverage is better, and if you can get 100% that’s great, although some functions are intrinsically harder to test. There are many kinds of bugs that can sneak through 100% test coverage. Coverage is a good first metric to watch, though.

Coverage has been shown to improve perceptions of quality by nearly 10%.

You can get a plugin for py.test that adds coverage reporting here. Read through the overview to see how to use it, specifically how to generate an HTML report! When you generate an HTML report, you should be able to open the index.html file it generates (in a directory it makes) in your browser to see a very nice, colorful coverage report built for you automatically.

Once you’re done with the videos, reading and tools above, you’re ready for this module’s code challenge.

Code Challenge

You need to fork this repo on GitHub.

I’ve started a simple calculator module and tests, with the “calculator_add” function, and “test_calculator_add” test.

I want you to fully implement the simple arithmetic for a calculator:

  • Add “calculator_subtract”, “calculator_multiply” and “calculator_divide” functions.
  • Add appropriate tests.
  • Do it all TDD style

For the purposes of this exercise, you should be able to reach 100% code coverage easily.

To ensure you’re following TDD, please use the following Github workflow:

  1. Fork the source repo
  2. Create a branch where you’ll do your work
  3. Add a test THAT FAILS, commit your work.
  4. Add code that makes the test pass, commit your work.
  5. Go to 3 until all functions are added.
  6. When you’re done, open a pull request on your branch (how to open a pull request)

What you are doing, and what TDD emphasizes, is also known as unit testing. There are many other forms of testing, but no one seems to agree on hard and fast definitions so we’re gonna skip over them for now.

The important thing to note is that unit tests and tests you write need to be reasonably fast. This is so you can run them again and again and again – even after every file change. A good habit to get into is every time you save your file, you should run the test.

There’s actually a helper function in py.test that will automatically rerun your tests for you as you make changes to a file. It’s a great idea to use this plugin and keep at least two windows open, one for your test runs constantly and one for your text editing. (You may even want three, one text editor, one test window, and finally, a window with IPython running where you can interactively prototype your program.)

Finally, there is a py.test option to drop into the debugger on test failure. Try it out.

Mentors

For this challenge, please confirm that your mentee has built the above-mentioned functionality, that they can generate an HTML coverage report and that they’ve reached 100% test coverage. You can confirm this in person.

You’ll also need to confirm they know how to use the looponfail feature and the debug on fail feature of py.test.

You need to also review the commit history in their repo to make sure they’re following TDD.

Onward and Upward!

First, the requirements imposed above – that you’ll need 100% test coverage, or as high as you can get it, and that you’ll commit after every test and after every passing of a test, are requirements of all future challenges. Your mentor will double check that you are keeping test coverage high and working in a TDD fashion!

October 21, 2016 Posted by | Uncategorized | 2 Comments