The Skeptical Methodologist

Software, Rants and Management

Resolving the Python TCO Problem

The BDFL of Python has recently stirred up a hornets nest when he fully explained why the so-called ‘tail call optimization’ will never be in Python.  Lovers of functional languages have responded, and I think we’re all the wiser for both of their points of view.

Guido’s post mentions the accusation that Python is becoming the “Basic of the Future”, which the author responds might be more of a badge of honor than a derogatory statement.  This is because Python was originally conceived as a teaching language, and you can see that much of what Guido does in design work is to strive to make Python as easy to learn and understand as possible.  His biggest overall gripe about TCO is that it screws up stack traces, and it is only useful in a handful of circumstances anyway.

In a way, this can be expected.  After all, part of the Zen of Python is that “Explicit is better than Implicit”.  Behind the scenes optimizations are a good example of the implicit things the BDFL is looking to prevent.

So lets make TCO no longer an optimization.  Let’s just simply talk about tail calls, and the optimization is simply a side effect.  Much of what Joe Marshall talks about refers to the power of tail calls, mainly via continuation passing.  It makes certain ideas very easy to implement.  Perhaps the reason Guido believes there is little use for continuation-passing style constructs which Joe brings up is because it is a newer idea, and quite frequently, new ideas are completely useless until we find a use for them.  And then they explode into a world in which we can’t remember living without them.

I’d propose a change to the Python language that makes tail calls explicit, backwards compatible and, in my opinion, Pythonic.  That is to say, in the same pattern that other interesting function/control flow mechanisms have been implemented.  Check out the following example, which I’m not going to give as anything fancy, because it’s pretty obvious and simple what I’m after.  Let’s compare a normal recursive function call in our extended Python to a tail call version.

def func(recursiveFunc):

return recursiveFunc()

That’s our current vanilla implementation.  Nothing really to see here, except that it’s inefficient yet easy to debug.  Now the tail call version:

def func(recursiveFunc):

continue recursiveFunc()

The use of continuations is very explicit, which is in the Zen of Python.  But how else is it really Pythonic?  Compare a tail call function to a Python generator, which were added to the language more recently:

def sillyIterator(lst):

for elem in lst:

yield elem

The yield keyword was introduced for generators (and extended to allow for co-routines) as a different mechanism of returning a value from a function with interesting control flow.  The language identifies functions that yield values instead of returning them and behind the scenes turns those into generators.  Is it implicit?  Not really –  you can see from the code itself that it’s a generator.  It yields.  Likewise, the new use of the ‘continue’ keyword would also signify the function is using a ‘continuation’.

Does this work?  Can we ‘abuse’ (if you want to call it that!) the ‘continue’ keyword like this?  I’m no expert, but in the current Python standard, I believe continue, like break, is meant to be and is always used as a standalone statement in loops.  It takes no ‘arguments’, and thus, if we added it to the language in the way I’m proposing, it’d be easy for the parser to tell when we’re continuing in a loop versus continuing from a function (a continuation).  If we’re in a loop and we simply continue, we go back to the top of the loop.  If we continue with an argument, then we’re using a function as a continuation.

Can there be ambiguous uses?  First of all, using recursion in a loop probably doesn’t happen as much as you think because people who think recursively hate loops and people who use loops hate recursion (I keed).  But secondly, no, there is no ambiguity.  A continuation only makes sense with a function as its argument.  There can be no ‘continue’ statement in a recursive call that passes nothing – that doesn’t make any sense.  There is a clear deliniation between using continue to return to the top of your loop and using it for continuation passing.  If you want a recursive call to simply return nothing, then use a return statment with no arguments.  This mixing of continues and returns parallels with how generators mix yields with returns.

What if I write a loop and I mean to call a function as a continuation but I leave the function off?  Then, yes, this will use the loop-style continue.  But it’s a parse error anyway, and if you weren’t in a loop, the function wouldn’t even make sense.  The opposite, somehow accidentally adding an argument to a continue statement you meant to use as a loop-style continue would also be an error, although it would be easier to catch since only a few things would be able to be used as a continuation (any fully evaluated callables, to be specific).  We can’t design all potential bugs out of the language, but the potential for ‘continue’ accidents happening is pretty small, and mixing intricate loop logic (continue is pretty rare as it is and only shows up in more complex loops) with intricate functional logic (continuation passing, like any other tool, has specific uses) either means you’re in over your head and doing it wrong anyway, or potentially awesome, in which case you won’t make any errors.

Is this really abuse?  I’d argue no.  Both the concept and the name flow from the word – continue obviously looking very similar to ‘continuations’, which is what we’re doing, and the idea of ‘continuing’ on to a different function also makes intuitive sense rather than returning the value of that function evaluated, signified with the ‘return’ statement.  More importantly, it does not differ that much from the way the keyword is already used.

Just as many claim most recursive calls can be turned into loops, it also seems altogether poetically just that loops and recursion have other similar concepts.  In this case, continuing in a loop means return to the top of the loop, continuing in recursion means go back ‘up’ to the next function.  If you were to use a ‘trampoline’ similar to what Guido has suggested in his post, continue would practically mean the same thing and have the same effect.  Putting the continue statement in a loop that simulates a trampoline does the same thing as a continuation in a function – we pass control onto the next function in the line in either case.  Kinda.  If you squint at it funny.

Just as some have claimed continuations are the ‘goto’ of functional languages, it’s ironic that the continue statement started out its own life as a replacement for ‘goto’ in loops.  They’re parallel constructs and its altogether too fitting to just use the same keyword to do both.

Python is a learning language, but learning procedural code is not where we’re headed.  In the future, students and amateurs are going to need to learn functional style as well as imperative style, and as such, Python should  be their introduction to both.  It already handles the paradigm quite well.  The ‘self’ argument to methods is quite functional (in my own opinion), as it allows methods to be passed around like first class objects without being tied to an object.  Generator expressions too are rather functional, yet these things are easy to learn and very Pythonic.  Why not continuations too?  But explicit – it’s better than implicit, and in a style that already fits into the way Python has expanded in the past towards different ways of returning from functions.  With the keyword, it suddenly becomes ok to optimize behind the scenes because the user knows its a tail recursive call.  The parser too can annotate information in the callstack to make it easier to debug.  After all, the yield statement makes debugging slightly more difficult, but is still in.

Don’t make it an optimization, make it a feature, and settle this whole debate.

Advertisements

April 29, 2009 Posted by | Uncategorized | 25 Comments

Combine Testing Strategies for Increased Quality with Less Work

Testing and correctness techniques build off of each other, and rarely replace each other.  I was reminded of this fact when an argument over the merits of unit testing versus Design by Contract.  The argument was that invariants and preconditions that were being checked in code should be moved out into unit tests and exercised explicitly.

There are multiple problems with this.  The best way to describe the overarching theme of wrongness, here, though is the “black swan” problem.  That is, you can never prove that there are no black swans without individually looking at every single swan out there (let’s just pretend its impossible.)  You can, however, prove that there is at least one white swan just by finding one.  Unit tests, especially TDD style unit tests, are checking for white swans.  They put together certain scenarios where the output is a well known value, and then ensure that the program, as implemented, outputs that well known value.  Unit tests should not be looking for black swans because, due to their nature, it’s incredibly hard to write enough tests to cover ALL possibilities of black swans.  Invariants, for instance, are black swan sorts of problems.

If I claim that “after I do x, y and z, my state should move from U to V”, then I’m describing a white swan.  I only have to do it once to know that, in this situation, it was true.  But if I say something like “I should never be in state U”, then I’m talking about a black swan, and that’s something that cannot be tested very thoroughly.  Sure, maybe I don’t go into state U given the stimulus of my test, but what’s to say I never go into state U?  I need a different strategy.

One of these strategies is type checking, and it does a very good job of rooting out black swans, but only for certain well formed ideas.  I can prove that I never get an Float when I expect a String, but many things can’t be proven in this way.

Another strategy is invariant enforcement in Design-By-Contract style assertions.  These will not prove that there are no black swans, but instead are sort of a ‘brute force’ approach.  I said earlier that it’s nigh impossible to find all swans and make sure they aren’t black – but if I already have a hundred or so tests, or perhaps many hours of real life use of my application, certainly I should check any swans I find THERE are not black.  These assertions will give me some evidence, at least, that my lakes are relatively black swan free.

Long story short, Design-By-Contract is a good way to do validity testing (“Did we build the thing right?”) and TDD style unit test are a good way to do verification testing (“Did we build the right thing?”).  But more importantly, put together, they are more powerful than either one alone.  TDD looks for white swans, while DBC asserts the non-existence of black ones.  TDD already has done a lot of work to find certain white swans, why not make sure, in the code, you guarantee they aren’t black swans while you’re out?  And why not, to reduce code duplication and increase self-documentation of the code, you put the assertions that you haven’t found a black swan where the black swan might appear, rather than in a test somewhere else that checks for its existence?

In other words, your test coverage is driven by your unit tests.  The more unit tests you have, the higher your test coverage (hopefully).  But coverage does not only have to ensure that you found all the white swans you anticipated, but that you found no black ones as well.  Another test enhances the quantity of your coverage, but strategies like DBC and other assertions enhance the quality.  The two strategies remain separate, though, since they have different concerns.

April 20, 2009 Posted by | Uncategorized | Leave a comment