The Skeptical Methodologist

Software, Rants and Management

SYWTLTC: (AB) Chapter 3.5 Type Checking

This is the final chapter in our software quality series.

We’re going to draw on object-oriented techniques below, so make sure you’re this far along in Python Programming:

  1. Programming Basics (ch 1, ch 2)
  2. Beginner Data Structures (ch 5, ch 11)
  3. Structured Programming (ch 7, 8)
  4. Procedural Programming (ch 6)
  5. Algorithms and Recursion (ch 13)
  6. Object Oriented Programming and Design (ch 4, ch 10, ch 9, ch 12)
  7. Numerical computing (ch 3)

Let’s recap.

We’ve talked about testing – that’s a tool you write yourself that verifies parts of your program.

We’ve talked about linting – that’s a tool someone else has written that will analyze your code for common errors.

We’ve talked about contracts and assertions – or the idea that our functions can and should make promises to each other, and if those promises are violated, our program should crash. This, in a way, embeds a program inside your program so that it can check itself.

Finally, we’ve talked about peer review and collaboration – this is the only non-programmatic means we’ve introduced to ensure quality. Have another programmer read over your code.

There are interesting crossovers and reinforcements.

  • You shouldn’t test your tests, so all you have are linters and peer review on test code.
  • Littering assertions and contracts through your code means every test you run checks many more things, so they build on each other.
  • Assertions can document assumptions, making peer reviews easier.
  • Linting can leave code more consistent, making peer reviews easier.
  • Much more…

There’s one more technique that is not as popular in Python but very dominant in other languages and this one technique, when applied well, can prove the absence of certain kinds of errors.

Eye of Newt, Hair of dog…

What would happen if we combined our linter with our contracts? In other words, what if we could have something check that our code never violated our contracts?

x = 3
assert x > 4

We want something that will flag the assertion above as false without running the code. Something that can reason about our code statically, and discover errors automatically.

Enter the Type Checker

First, a little random history.

Back in the early 1900’s, a philosophy known as Logical Positivism was having its prime. Logical Positivism claimed that logical sentences – sentences constructed via a specific mathematical process – made either true or false claims about the world. Sentences that violated the mathematical formulation were determined to be gibberish.

It was an attempt to place the entirety of human knowledge on the basis of mathematics. And at the very center, was a mathematics called Set Theory.

Sets are more or less just lists or collections of things. The set of sheep, for example, or the set of prime numbers. Primarily used for number theoretic questions, Set Theory – with enough hammering – could in and of itself define basic arithmetic (called Peano arithmetic), and thus begin to define the rest of mathematics.

There was a problem though – a huge hole in the middle of set theory that leads to a paradox. Anything can be a set, after all. What about this – is this a set?

“The set of all sets that do not contain themselves”

A set is a collection – certainly, it can be a collection of collections. So that checks out, and nothing in Set Theory says the sets can’t be self-referential – either containing or not containing themselves. So that checks out too. Seems like it’s a set.

Let’s call the above set X. And we’ll ask a very easy question that blows up all of Set Theory – does X contain itself?

If X is inside X – X contains itself – then by definition, it can’t – because it belongs to the set of all things that do not contain themselves.

Of course, if X is not inside X, i.e., X does not contain itself… then it does contain itself since it’s the set of all things that do not contain themselves!

Logical Positivists wanted all logical sentences to be True or False – not paradoxical. Thus began the great quest to figure out how to strengthen set theory to once again be sound enough to be a foundation for all mathematics, and thusly all human knowledge.

Two interesting things happened out of this – one, a weird offshoot called Type Theory came to be. The other interesting thing that happened was the Incompleteness Theorem which more or less said the whole quest was doomed from the start. No matter what you do to Set Theory – if it’s powerful enough to create Peano arithmetic, it will always contain paradoxes.

You can think about that second part for a while and the futility of ever organizing human knowledge or thinking that mathematics was a sound and complete system of reasoning. We’re going to talk about the first more minor blip.

What type of thing is a type?

Type Theory tried to categorize ‘things’ that can be in sets into different ‘types’ which can’t be compared. A set is one type, a set of sets is another type, and a set of sets containing itself would be a violation of type theory – since from one angle you’re talking about a set, and from another, you’re talking about a set of sets.

You know what? Let’s go back to the Incompleteness Theorem, cause it shows up again here of all places. While Gödel was working on the Incompleteness Theorem, another smart dude named Alan Turing was coming up with his Halting Problem.

See, the problem was he wanted to see if a program could be written to determine if another arbitrary program crashed. It’s tricky reasoning he used, but similar enough to the ‘sets that don’t contain themselves’ mind bender above. Basically, he proved that it was impossible. No program could determine if another arbitrary program ever stopped or went into an infinite loop.

Your linter will never detect whether your program crashes or not. It can detect certain kinds of crashes, based on patterns. But it can’t rule everything out. It’s mathematically impossible.

In fact, a lot of problems in computer science have been determined impossible by proxy. If you can show that by solving your problem X, you could solve the halting problem, you know that solving X must be impossible.

The idealized debugger is one of those programs. We’d love a program that could inspect ours, find all the bugs and fix them, without any human intervention. Unfortunately, one problem the idealized debugger could fix is infinite loops, and thus it’d solve the halting problem. The idealized debugger is impossible.

Back to types. In an effort to ensure types weren’t “too powerful” as to allow paradoxes so that they could properly constrain set theory, mathematicians invented a type system that a computer could implement. In other words, by reasoning about types, a computer could prove an absence of type errors and only type errors.

How did they do this? Types aren’t as powerful as sets. They’re restrained. You cannot implement Peano Arithmetic in types, and from a computability standpoint, strict type systems aren’t ‘Turing complete’. They’re a constrained form of programming that you can put on top of your less constrained program to borrow its safety when you need it.

What’s a type error? A type is a ‘category’ of thing. So in the case of Python, trying to add a number to a string is a type error. Numbers can’t be added to strings. Taking the square root of a string is similar – square roots expect a type of number, and so, it’s impossible.

Dynamic versus Static Typing

Python is a dynamic language though – in an effort to allow a little more power, Python doesn’t type check until runtime.  This makes type checking only as powerful as assertions/contracts. We know assertions and contracts are great to have in code, but they cannot guarantee the lack of certain kinds of errors – they only help to debug them when they happen in the wild.

There are other kinds of languages out there that use a ‘static’ type system – this type system is enforced and checked before the program is even run. Like a linter, the type checker is a program that runs over the code itself. More powerful than a linter, instead of merely looking for patterns in the code, the type checker actually interprets the code and builds a model of the code in memory. It then ‘proves’ various things mathematically about that model, such as the absence of type errors.

There’s further categorization – so-called strongly typed languages versus weakly typed languages. This basically is a measure of how much you’re allowed to break the type system. Languages like C are statically typed, but also weakly typed. This is due to the fact that there is a type checker, but you break it at any time by doing what’s called a void pointer cast. You have a string? You can trick C into thinking it’s a number pretty easy.

Python is a dynamic and reasonably strongly typed language. It won’t allow you to break the type system, but it doesn’t enforce type errors until runtime.

Haskell is a static and very strong type system. Haskell has many ways to reason about types such that if you model your program via types well, and it compiles, you have proven out a lot of bugs.

A New Fighter has Entered The Arena!

So if Python is dynamically typed… why talk about type checking?

Because the above is no longer the case. There are now type checkers for Python. We’re going to look at this one.


How to Annotate

The first thing you’d want to do on code that isn’t type annotated, or on new code, is add the types. Python already supports type annotations but just ignores them. So you can add annotations in the style of here to your code here.

The best bang for your buck will be annotating function signatures. I’ll tell you why in just a bit.

How to Run

Run mypy like a linter. Instructions are here.

How to Add New Types

We’re going to be doing basic Object Oriented programming in the code challenge, which you should be familiar with from Python Programming.

Check out how MyPy automatically turns all of your classes into types, respecting inheritance here.

Type Checking Versus Other Methods

Certainly, you can see how type checking is related to linting and assertions. It’s basically a combination of the two, solving a certain kind of problem.

It’s more powerful than a linter as it actually uses a proof engine to reason about your code and more powerful than assertions as it can rule things out statically, rather than just triggering dynamically.

It cannot find all the little things a linter can though, so the two should be combined. And most of your program needs to reason dynamically – not statically. This means that types cannot model your entire system, and you should fall back to assertions when you have to.

Types serve as a powerful form of documentation, enhancing peer reviews. They make code easier to reason about by assigning each variable a certain type. They also clean up variable names, as

def func(string_first_name):

is always going to seem less readable than this

def func(first_name : String):

How does type checking compare to testing? This is where things get interesting.

Unit Versus Integration Tests

There are two large classes of tests – what’s called unit, and what’s called integration. There are other kinds of tests, but these types of tests are most often written.

Unit tests are supposed to test a small bit of code in isolation, quickly. Dependencies like a database or file reading are ‘mocked out’ using special code that pretends to be a database or a file.

Integration tests put multiple pieces of code together, as well as third-party dependencies like databases. They tend to be slower and exercise much more code. They are also often more difficult to write.

Unit tests often chase ‘coverage’ – trying to get each line of your code run by at least one test. When attempting to increase coverage, unit tests are usually the easiest thing to spin up and write more of. A coverage of 70% is pretty good, 100% is the highest you can go.

There’s a goal in mind.

Integration tests try to test integration points, which can get hairy. Let’s say you have three components you use (a website, a database, and a script). We’ll call them X, Y, and Z. You’d need to write…

…an integration test of X to Y…

…an integration test of X to Z…

…an integration test of Y to Z…

…and an integration test of X to Y and Z.

four integration tests to test 3 components. Conversely, if you had three well-factored components and needed to unit test them, you’d only need to write… three unit tests. Unit tests scale linearly with the number of components you want to test, while integration tests scale with the size of the superset of all components. Which is bigger than linear.

With 4 components you’d have 11 integration tests you’d need to write, but only 4 unit tests.

It gets out of hand quickly, and often, no one writes that many integration tests. Unit tests are easier to write. So there’s this ‘black hole of integration’. Most people write a few integration tests – usually never enough.

Types fill the Black Hole of Integration

Types, especially types on function signatures, are promises along with a ‘boundary’. Function signatures are often the integration points between components. If you used a database, what you’d really do is use a database library, and call a function in it.

That function is where you want to put your types. This function – the gateway to the database – is the ‘boundary’.

Each integration point can be decorated easily with types. If you have component X and Y and Z, it’s a linear effort to add types to component X, then Y, then Z. You do not need to add types just for X talking to Y, or X talking to Z. It’s like unit tests.

The type checker can then generate all of your integration checks for you, ensuring that whenever X talks to Y, they’re talking in the same language. They’re using the same types.

Type checking can turn the overwhelming integration test problem into something that’s pretty easy to manage. Don’t test integration points, type check them.

Typeful Programming

Often you’ll see detractors of type checking argue that the number of times they’ve confused a ‘float’ type for a ‘string’ type is next to none. It’s dumb to have a check for something that never happens.

And they’re right – the built-in types of the language rarely conflict. Simply decorating your code with ‘string’ and ‘integer’ and the rest isn’t going to suddenly discover a lot of bugs, nor is it going to reduce the risk of introducing new ones.

The power of types in programming is realizing they’re a tool that you can use too. Integers and Strings are what the programming language designers wanted – you can create your own types and use the type checker to enforce it.

What types do you want? This is important from a design perspective in object-oriented programming, which more or less asks – “Pretend you already have the objects (types) you need to solve the problem, then write that program.”

If your program talks in terms of temperature, you’d better not have floats running around. You should have a Fahrenheit type and a Celsius type. Those types can be implemented-in-terms-of floats but should be represented in your code as fully fledged types.

This makes it impossible to do silly things like adding a zip code to a temperature, and possible to do useful things like automatic conversions between temperature types.

A heuristic here, especially since it takes years of trying to get a good intuition around object-oriented design, is looking for ‘primitive’ types and get rid of them. If you’re passing an ‘int’ or a ‘string’ – ask yourself. Are you really passing an int or a string? Or are you passing a count of vegetables and a name of a vegetable? If you have those things – and you don’t have a type defined in your code-named ‘Vegetable’ – add it and refactor!

Let’s take the following program for example:

age = get_raw_input("Please enter your age")
print("You are {0} years old.".format(age))

The age variable above is an integer. But is it really? No. It’s AN AGE!

Ages are represented by numbers, but only certain numbers. And they’re a concept we can think a lot about.

Consider adding the following class to the above program:

class Age(object):
    def __init__(self, raw_age):
        assert raw_age < 124,\ 
            "You can't be older than the oldest person alive!" 
        assert raw_age > 0,\ 
            "You can't be less than 0!"
        self._raw_age = raw_age

    def input_age(): -> Age
        return Age(get_raw_input("Please enter your age"))

    def print_age(self): -> None
        assert self._rage_age is not None
        print("You are {0} years old.".format(age))

The above program is a little longer – typeful programming like the above requires a little more overhead in small programs. But you see we’ve modeled a concept – Age – in our program, and it makes the program easier to reason about. We’ve now got ideas like enforcing a range on age. Human ages don’t go to a billion, and if one age in your program was at a billion, that probably means there’s a bug somewhere.

In large programs, typeful programming is actually far shorter. This is because you’ve built a large dictionary of ideas and concepts to build more advanced concepts from. The number of assertions and tests you need to write will shrink because you’ll be reusing all the assertions and tests you’ve written on all your small classes/types. And you’ll need far less defensive coding and integration tests since you can use the type checker to enforce most of the integration points.


Live Coding Sessions to Watch

Remember, when watching coding sessions don’t just look at the main content being covered, but watch what tools the coder uses. Look for typing techniques, where they lay their screens out, what plugins they might use on their editors, and so forth. Research things you find interesting, and try to incorporate them into your workflow!

The two below are a bit more ‘produced’ than I usually prefer, but keep in mind that real programmers make mistakes all the time, have to look things up, and so on.

Below is a quick live session that uses MyPy

To round this out, here’s another coding session out of left field – and introduction to the Web Framework “Flask”. This is a 7 part series if you’re interested, but for now, just get through the first introduction.

Code Reading / Review

For the reading, let’s look at MyPy itself – you’ll be looking at its options loader here.

Practice doing code reviews, what comments would you leave for this code? Think of these following questions to discuss with your mentor:

  1. Where did comments, style, and docstrings help or hinder your reading of the code?
  2. How much of the code could you loosely understand? Why or why not?
  3. How much did types help you understand the code?
  4. What did you like about the code? How might you replicate that in your own code?
  5. What did you not like about the code? How would you avoid those practices in your own code?

Code Challenge

You’ll be refactoring some old code that handles geometry lookups, or ‘geo-fencing’. It’s a prototype to see if a cell phone’s latitude and longitude falls within some boundary. The problem is, there’s a bug in it, and no one can figure out what it is.

Your first mission is to add at least two new classes/types: Latitude and Longitude. After creating these classes and removing the old floating point numbers that represented latitude and longitude, see if that along with MyPy doesn’t help you figure out where the bug is.

Finally, with the bug fixed, push test coverage up, clean up pylint, add assertions and prep the code for review.


When exploring code you inherited, use some of the other tools in your toolbox. We like to get tests in place first, but often it’s hard to figure out what the code is even doing, much less how to test it.

Try running the debugger on the code and stepping through it line by line – see what variables are changing. What are those variables supposed to be?

Feel free to add comments as you go – inherited code is not always well documented. You can make notes and annotations as you attempt to understand the code.

Feel free to change variable names or make formatting changes too – it may be best to get some pylint flags cleaned up first that deal with style. This can make ugly inherited code easier on the eyes.

Add assertions where they make sense – if you think the code works in a certain way, is there a way you can assert that? For instance “I think X and Y should be equal here… maybe I should add an assert that they are.” Assertions are your assumptions about code – if you add them to the code itself, you allow the program to check your assumptions.

As you understand the code more and more, you can take some larger steps to try to refactor it out into something testable.

The real lesson here is not to leave code this ugly for others to inherit! Remember to leave tests for others, maintain a sense of style with the linter, and over document! I’m giving you ugly code on purpose to help remind you why good-looking code matters. It’s a huge productivity benefit during maintenance, and maintenance is 90% of where you’ll spend your time coding.

For Mentors (And Coders Too)

Mentors – Ask your mentee about the live coding sessions and code readings. What questions did they have? What did they find interesting?

Review Checklist

  • Are types documented and MyPy clean?
  • Is test coverage at 100%
  • Is it pylint clean 10/10?
  • Does the code use assertions?
  • Is pylint doc strings clean?
  • Is the documentation readable?
  • Does the code use good names?
  • Does the code use good use of white space?
  • Does the code have consistent and idiomatic style?
  • Does the code include comments?
  • Does the code use git hooks for pylint, pylint docs, and git commit lints?
  • Does the Readme explain what the code does and how to install and test the code?
  • Can the coder give a ‘guided tour’ using the debugger through one of their test cases?

May 9, 2017 Posted by | Uncategorized | Leave a comment