The Skeptical Methodologist

Software, Rants and Management

My Five Things

It’s the cool thing to do, after all.

First, Jesse pointed out (actually, a friend of his pointed out) that if you can’t name five things you don’t like about your favorite language, then you still don’t understand it.  I think this is a little on the liberal side, though, as most reasonable people can figure out five things they don’t like about C++ before they understand even half of it.  Of course, in that case, understanding even half of C++ is about as far as most people get 😉

Then was Zed’s own contribution to the list.  Both of these guys are very active in their communities and contribute a lot of code via libraries and projects they work on.  Neither of them is a ‘dirty little schemer‘.  Yet now that they’ve been seen disagreeing with the politburo, the wagons have been circled and they must be torn apart with ad hominem attacks until everyone’s convinced Python’s perfect again.

As Jesse and Zed both tried to explain, I will as well.  I love the Python programming language.  It reads like pseudo-code, and is incredibly expressive.  It’s ease of incorporating C or C++ makes nearly any performance complaint moot, and the vast number of libraries out there gives it a gigantic code base to work from.

But, again, as Jesse said, if I thought Python was perfect, I’d be fooling myself.  In fact, I think it’d be an interesting challenge to hear Guido’s own list of five things he wish he could fix in Python.  GVR is hard to read, sometimes he’s incredibly reasonable, sometimes he’s a little irrational.

But this post is not about politics, on with the list!

1. Static Default Args

This gets brought up time and time again.  The arguments that you provide as defaults to functions are done at ‘compile-time’, and are not re-evaluated during runtime.  This means you get funny behavior like this:

  >> def func(arg=[]):
  ::   arg.append(1)
  ::   return arg
  >> func()
  [1]
  >> func()
  [1,1]

I’ve got two problems with this.  First of all, as is the classic way to shoot down ideas the community has already ‘decided’ upon (other than the ‘code it yourself’ comment), what exactly is the use case for this behavior?  What idioms does it allow that I’ve yet to see?  Indeed, ‘idiomatic’ python seems to be to work around it, by using ‘none’ as the only real default argument.  First you check for ‘none’, then if the arg is in fact ‘none’, you set it to its real default.  This is added boilerplate and, quite honestly, seems to be the default (forgive the pun) rather than the exception.

Another argument for the status quo is that it’s ‘more efficient’.  I don’t get this line of reasoning at all.  Virtually 99% of debates between ‘efficiency’ and ‘readability’ in Python always end up on the side of readability.  That’s why Python is Python!  Why efficiency gets a free ride in this most awkward of cases is beyond me.  Moreover, it’s not at all more efficient, as far as I can tell.  The replacement, checking for ‘none’ and then recomputing the value at runtime is exactly the same efficiency hit as the proposed dynamic default args!  Most people are already taking the efficiency hit.

Indeed, it looks like this is the rare case where Python has optimized for efficiency of a niche use of the language, which to be blunt, is not at all Pythonic.

2.  String Concatenation

Most people new to a programming language are going to get a feel for the way to do things.  When you want to put two lists together, they experiment in the Python shell, try simply +’ing them together, and hey, it works.  Doing the same thing to two strings also has the same effect.  So obviously, if you need to add many strings together, you add them together like you’d add many other things together.  Wrong.

Instead, the “Pythonic” way to do things is to put all of your strings in a list and do the following:

  >> "".join(myStringList)

Does that at all seem obvious at first?  The joining of many strings is a method on another string?  In its defense, the string being ‘joined’ on is the separator string.  But even that seems odd, since the separator really isn’t the ‘object’, in object-oriented terms, I’m worried about here.  The object is a list of strings!  The reason given by the Python community is that “”.join() is faster, and therefor preferable for large scale string processing.  While I might balk a bit at again optimizing for efficiency in Python, where everything else is optimized for consistency and readability, at least with string processing there’s a major use case for it.  Python is used for string crunching a lot, and the “”.join() construct has some specialized C routines to do it fast.

This brings me to a slightly tangential point, but one that really applies here.  Python is a notoriously ‘smart language, dumb compiler’ compared to languages like C++, where it’s a ‘dumb language, smart compiler’.  That codifies, in other words, the constant strive for readability and intelligence in language design.  But it also implies that there’s no major drive to optimize the Python compiler.  Scripting languages are slow, and are going to continue to be slow for awhile.  Besides, Python is mostly used for things that aren’t processor bound anyway, and the FFI is good enough to drop into C whenever you need.  Why optimize?

But there is a particular kind of optimization that enhances readability as well as speeding up the execution time, called Strength Reduction.  The idea is, when the compiler detects a construct that can be stated in an equivalent yet more efficient way, it replaces it behind the scenes.  That way you can still divide by two, because that’s how a human would read your algorithm, yet the compiler will go ahead and turn that into a much faster bit shift.

Similarly, appending multiple strings together is a pretty common operation in Python.  I still believe that the most obvious way to do this is to just repeatedly add them together.  Strength Reduction optimizations would detect this repeated addition and change it to the more efficient C routine – or hell, even an assembly routine.  GVR has been skeptical of optimizations in the past because he’s afraid they will change behavior from what the user’s expect, but in this case, behavior is entirely conserved.  Only the runtime changes.  In fact, from an optimization standpoint, Strength Reduction is more Pythonic since there’s far too much anecdotal evidence that the user’s original expectations is to use repeated addition rather than some special method on an empty string.  The ‘optimized’ approach to use a library function to drop into C is not expected and is actually pretty difficult to find the first time.

3. Non-recursive Reload

Working with the interactive shell is awesome for someone coming from mostly compiled languages.  Getting immediate feedback allows for easy exploratory development.  At first, I was entranced by the idea of having a large code file, making incremental changes in an IDE, then reloading it in the shell and testing various procedures on it.  That is, until I realized reload does not do what one expects it do.  When you import a file, it reads the module, and imports all the things that file imports, as well as defining things in that module’s namespace, and so on.  But when you reload a file, the imports – since they’ve already taken place and are in the global namespace – are skipped.  You only reload the things defined in that file.  If I’ve imported Module X, and it’s imported Module Y, and I change Module Y, simply reloading X won’t do me a bit of good.  X will still point to the old version of Y.  Instead, I need to reload Y… oops, no, that doesn’t work either.  First I need to import Y, and then reload it.  That doesn’t make any sense at all.  How come reloading tells me it’s not loaded, yet import assumes it is.  Importing a module that has already been imported by some other module just copies the namespaces over, I’d imagine, and reload seems to absolutely depend on those namespaces.  Either way, it’s a lot more complicated than it seems at first.

Upon reloading a file, imports need to behave differently, and files really ought to be re-imported.  I’m sorry if this is a performance hit to you, but you really shouldn’t be using reload in a library anyway.  Reload should only be used in interactive development, in my opinion.  Python’s reload and import process is similar to other languages Make process.  Make is highly recursive, indeed, that’s often a problem since you have to nest things so oddly in esoteric Make files.  But Python’s reload doesn’t even have the functionality to go out and see whether or not it needs to re-import libraries that might have changed.  The quick fix is to always re-import libraries, the good fix would be to check their load times and their current file modification dates like a Makefile would to look for changes.

4. Love/Hate relationship with Functional Programming

GVR seems to maintain that Python is not a Functional Programming language, and that Functional Programming is not Pythonic.  This is despite the ‘Pythonic’ list and generator comprehensions (generators are a form of ‘lazy evaluation’), closures, first class functions and currying.  What, pray tell, would make Python a Functional Programming language if not these things?

But this odd schizophrenia doesn’t really bother me.  It’s the casualties that get to me, namely, and yes, this one is debated again and again, neutered lambdas.  Why aren’t lambdas allowed closures like normal functions?  Because Python isn’t a functional programming language.  This seems like an odd excuse given all the other support for FP Python has, and it seems altogether too convenient to use whenever the Python community wants to.  Closures are ok, because those are ‘Pythonic’.  Lambdas, no, because Python isn’t an FP language.  I feel like I’m stuck in 1984.

GVR has made some valuable, and some less valuable, critiques of FP.  One of the places where FP seems to muck up ‘Pythonicness’ was in ‘reduce’, which is still in the language, it’s just not built-in anymore.  Folds are a major part of many FP languages, but it appears for now they’re a little too terse for most people to understand.

One of the criticisms of lambdas, or anonymous functions in general, is that they get abused and grow to a size in which they should be a named function to add some level of self-documentation to them.  This is fine, and I agree with it.  Lambdas in Python probably ought to remain one-liners.  But why they don’t support closures doesn’t seem at all to be covered by this line of reasoning, and indeed, they really ought to support closures or just be taken out of the language altogether.  I’ve refrained from using them because I’ve been bitten too many times by odd errors.  Why would such and odd construct be allowed – in fact, lambdas DO capture variables, they just don’t close over them.  It’s one of the weirdest scoping errors I’ve seen in a modern language.

5. No Encapsulation

There are obviously many differences between the way Python does OO and other languages.  It’s incredibly unpythonic, for example, to write getters and setters.  Hell, that’s just bad OO altogether.  Python provides properties for those sorts of operations.  But one thing that Python could sorely use is some explicit encapsulation protection like private, public and protected.

I can hear the Politburo now harping away at how such things are not at all Pythonic, how a person should be able to modify and change whatever he or she wants whenever he or she wants it.  I agree.

But private, protected and public, as well as any other encapsulation protection mechanisms  you’d like to add, add a large level of self-documentation to an API.  It is one more layer of encoding designer intent.  The best case I can give you is using the dir() command on any sufficiently large object you get from a library, or even a module.  You get a few screen fulls of non-sense and implementation defined stuff that doesn’t concern  you, and then  you have to go hunting through this giant list of operations for the one you want.  How would Python incorporate encapsulation?  Could be nothing less than a keyword.  Pass in ‘SeeImplementation=True’ to the dir() command and you get everything, leave it off, and you get only that which the designer said was public.  When I’m doing exploratory programming, I really don’t want to explore implementation minutia which wasn’t meant for the user anyway.  I want to see what tools were explicitly given to me.  It helps with the information overload that becomes all too common when working with large API’s.

Much like FP, Python implicitly agrees with this statement, that a to define and communicate whether or not something is implementation or interface is important, via the _underscoreFirst coding convention.  But, in a language that strictly enforces a white space coding convention, at least compared to languages like C, it seems odd to suddenly fall back on just a ‘general and friendly agreement between the whole of Python programmers’ that putting a _inFront of your variable or function means its implementation details and not to be trifled with.  If it’s conventional enough, why not enforce it?  Why not allow some explicit introspection rather than hand coding some hack to look for a ‘_’ in the front of a variable to provide your own dir and other functions?

Things that are not broken, and I agree with. (In which I try and gain the forgiveness of the Politburo for my heresies before I’m to be shot)

Explicit self.  I actually love this, and it’s changed the way I code in C++ to scope everything with a this-> pointer to be clear.  It also makes methods and normal functions nearly the same thing, allowing you to more easily pass around methods just like you would functions (but this is NOT FP, remember, it’s Pythonic 😉

Not allowing implicit TCO.  TCO, despite its name, is way more than an optimization.  As said before, it opens up an entirely new model of computation.  That’s hardly something I’d like to happen behind the scenes from a sufficiently smart compiler.  This is why I was an advocate of an explicit TCO via a keyword, although since I got branded a dirty little schemer (despite not knowing the language!), that idea got shot down by our good friends at the Kremlin.

Whitespacing.  You know how to get rid of all arguments about coding conventions?  By choosing one arbitrarily and enforcing it as apart of the language.  This and this alone has focused the Python community’s effort on much more useful things.  It also allows almost anyone to read any python code and be able to figure out it’s structure without getting a headache.  Readability does count.

No brackets.  Same as above – you don’t need explicit brackets if your white space shows the structure for you.

No doubt none of this will go anywhere.  But I thought, given the, er… ‘open-mindedness’ of my favorite programming language’s designer community, a little extra jabbing couldn’t hurt.  For my sake, let’s hope not 🙂

May 31, 2009 Posted by | Uncategorized | 1 Comment

The Essence Of Design

Just some notes for today.

Despite all the methodologies over the years, there are a few key topics that always re-present themselves.

  • Iterative Development / Rapid Prototyping

The faster you can take a solution to a problem from just someone’s vague idea to actually testing it against requirements and customer expectations, the faster you can narrow in on a satisfactory system.

  • Conceptual Integrity

The more clearly it is that a solution has come from one unified idea or mind, the easier it is to extend and maintain that solution so long as extensions and maintenence don’t interfere with the unified conceptual integrity that allowed their existance in the first place.

May 29, 2009 Posted by | Uncategorized | Leave a comment

Dirty Little Schemers

A post or two ago, I advocated a potential solution to adding explicit tail-call optimization(TCO) to Python.  In this forum, it seemed like a reasonably popular idea.  Hell, even reddit seemed to think it was somewhat useful, barring some grumpy old men who, to their credit, learned me a thing or two about TCO and continuations.  I proposed this to the Python ideas forum and it died.  I didn’t see why it could be so popular outside of Python, yet so dull inside.  Who was reading my blog, Schemers?

Really, the entire debate was a culture clash between Schemers and Pythonistas which I walked into blindly.

After reading the BDFL’s posts, and the counterposts, it seemed that it was a forgone solution that TCO was ‘useful’, just that it had problems being ‘pythonic’.  That is, easy to learn and consistent with the rest of the Python mental model (Ha!  Take that all of you who want a definition of Pythonic! (What’s that, you say, I’ve just moved the goal posts?  What’s the Python mental model?  Why, anything that’s Pythonic of course!)).  In a stroke of geniusluck, and idea occurred to me that many of Guido’s criticisms could be side stepped if we just made TCO not so much of an ‘optimization’ as an explicit part of the language.  I proposed an overloaded use of the ‘continue’ keyword, which, after having many debates on the color of the bike shed, would probably be better stated as ‘return from’ (to mimic yield from).  By making the syntax explicit, and checked, many of the perceived drawbacks would disappear.  The community offered solutions to the traceback problem and other details.

After submitting this idea to the Python ideas forum,  I was confronted with a new argument – that TCO is simply not a useful thing to have.  “There is absolutely no use case for it in Python.”  I mentioned the litany, that I had learned from others, of message passing, true unbounded recursion, state machines, even navigating websites.

First though, an admission.  In the practicality versus academic political leaning of languages,  I lean academic.  Whenever I see a new idea in programming, I want to try it and understand it even if I don’t have any use for it.  I have a love and passion for the stuff.  Thus, I came at the TCO debate assuming that if it could be put in the language cleanly, and there was at least SOME use for it, then the proposal should be taken seriously.  I mean, I still haven’t found a common use for coroutine generators, but that doesn’t mean there isn’t one.

In my opinion, there are three ways to solve problems in software.  There is breaking a problem down into smaller pieces (componentization), there is identifying similar patterns in the pieces and building them back up into a single construct (abstraction).  There is also a third way, one which is less practiced and closer to the ‘refactoring’ ideal, and is derived from how mathematicians solve problems.  You transform the problem you have into an equivalent one in which you already know the solution.  I felt like TCO was an example, at least in Python, of the final sort of approach.  The transform/refactor approach to problem solving requires you have a large ‘bag of tricks’ to work, since it is limited by the methods of transformation you know and the solutions you already have.  TCO was one of these solutions looking for a problem, I thought, and in time we’d forget how we got along without it all along.

I now want to talk about culture.  I gave a pretty bad defense of TCO because, to be quite honest, I don’t know much about it.  It’s a new idea to me, I thought I could add something to the community as an outsider to the debate via the simple explicit keyword solution.  I didn’t realize at the time that it wasn’t a debate over whether TCO should be in Python, rather it was, as frequently breaks out in our community, a religious war between programmers.  When I approached the Python community with my idea, some were skeptical, some were interested, but some were downright hostile.  To them, I was yet another dirty little Schemer in their midst trying to ruin their ‘perfect’ programming language.  I was to be told I didn’t understand the Python way of doing things, that in Python, you didn’t need the solutions TCO provided.  And after the BDFL advocated the use of the ‘elegant’ trampoline solution in his blog, that became dogma.  The trampoline was now to be the standard ‘Pythonic’ solution to the problems solved by TCO.

There was only one problem – I don’t know Scheme.  Not a bit of it.  I’m only loosely familiar with Lisp-like languages in the first place.  I understood the basics of TCO, and understood the arguments of those who argued for it, but they were not themselves Pythonistas, but Schemers.  I listened at the table of the Schemers, and brought back what I had learn to my home, Python, and was treated like a Leper.  There is a large underlying hostility here that I suppose I was too naive to believe existed.

Fast-forward (or rewind) to the recent Ruby on Rails uproar.  Two separate incidents that cause a bit of a question of just how ‘mature’ the Rails community is.  Take everything I say with a grain of salt (read the disclaimer on the right!) as I’m very likely talking about something I’m completely ignorant of.

But if we didn’t do that, very few people would say anything.

Anyway, first there was the porn thing, then Robert C. Martin using terminology like “masculine” to describe C++ languages (“a man’s language”) and “feminine” or, worse, “insipid” to describe languages like Java.  Ironically, and hilariously to his counterpoint, he was claiming to bring professionalism to software.  Here’s a hint, Uncle Bob, if you want developers to be a profession, perhaps we should stop insulting entire genders like a bunch of cigar smoking misogynists from the 40’s.  But that’s beside the point.  (I need another snifter of brandy, damnit!  And a sammich!)

After critiques of these sorts of talks took place, the Ruby community for the most part reacted like a ten year old child, claiming that it was really the people who were offended’s fault for complaining in the first place.  Good way to take constructive criticism.  This was all a microcosm and a side show for what really bugged me, and what has bugged me about many of the isolated programmer cultures that develop.   The rails community continued to more or less believe they had invented programming.  This was the underlying theme of the entire debate.  When outside criticism erupted, of anything, the knee jerk reaction was to distrust outsiders, circle the wagons, and shoot first.  After all, they don’t really know programming like we do.  There was no openness to criticism, which is indicative that there is little sharing of information at all.  When a new idea hits Ruby on Rails, it is assumed to have been invented there by those in the community (Read: Test Driven Development).  For the most part, they are the youngest members of our overarching programmer community, and should be expected to lash out like this.  The problem is, it isn’t just them and never was.  As I showed above, even more established communities like the Schemers and the Pythonistas can jealously guard their ideas and distrust anything new.

In the case of the Schemers and the Pythonistas, it was the old saw that Scheme is an ‘academic’ language and Python is a ‘pragmatic’ language.  Which would be untrue, if it even meant anything.  What the advocates of the language are actually saying is that ‘only academics’ use Scheme, but real programmers (pragmatic types, we) use Python.  Real programmers don’t have time to debate theory and angels on the heads of pins.  We’ve got deadlines and customers.  What could a “schemer” possibly teach us?  I believe this debate originally started, and is perpetuated primarily by the amateur programmer versus computer science major, basically the two ways to get into our field.

There is no ‘real’ split here, though.  For example, take Haskell, one of the most ‘academic’ languages and C++, never a more ‘pragmatic’ language (although, pragmatic in another sense, i.e., runtime efficiency).  Haskell has an idea called Type Classes that, oddly enough, do the exact same thing C++ ‘concepts’ do.  I doubt either of these two communities openly talk about how similar such ‘academic’ and ‘pragmatic’ languages are, because both secretly want to take credit for a structural typing idea that is sure to have been around since the 60’s (citation needed).  The point is, I can’t imagine a larger divide between communities, and yet there really is no large divide there.  The devil, quite literally as a force of hate pulling us apart, is in the details.

Let me finish by clearing up a few things, to anyone who will listen.  Just because you haven’t read SICP, you aren’t an idiot.  But just because you have read SICP, you aren’t necessarily a genius.  There are a whole bunch of good ideas and concepts in that book, like many of the books in our shared ‘cannon’ (I’m working through TAOCP, I swear!).  While I disagree with Uncle Bob that we might all one day be ‘professionals’ (or that we should be), we can certainly all be friends.  We are united by the fact that we all love and have a passion for this stuff, we should not let that passion for our particular solution, implementation or language divide us.  We have too much to learn from each other.  Python isn’t a child’s language (or, perhaps it is, but in a sense that even a child can learn it).  Haskell isn’t too dense (because, ironically, it has been taught pretty easily to children too!).  Scheme isn’t just academic, C++ isn’t THAT ugly.  And rubyists aren’t that immature.  Ok, well, some of them are, but we love them anyway 🙂  When it comes to programming, when it comes to what we do, what we’ve chosen to create, I’d rather have a friend by my side than a ‘professional’ any day, even if that friend is a dirty little schemer.

May 20, 2009 Posted by | Uncategorized | 3 Comments

The Essense of Complexity

You’re reviewing source code. It is a relatively small ‘for’ loop in a procedural language, and it uses the ‘continue’ and ‘break’ statements. Suddenly one of your colleagues makes a critique – “Continue and break are tantamount to gotos, and therefore are hard to understand. You can simplify this loop by adding a few flags and only returning in once place.”

The author is notably flustered, “Adding in those variables will make this loop less efficient! Plus it will add needless lines of code and could introduce errors if we don’t check the flags right.” he responds.

“Your solution is too complex!” The original critic might say.

“But it’s elegant!” The author defends.

What is complexity? When is a solution complex, and when is it simple? When is it understandable? In the context of yesterday’s conversation, when is a language feature – continuations – make a language too complex?

By complex, we usually mean ‘hard to understand’. But as we’ll see, this is hardly an objective measure. We’ve seen one example of two different ideas of complexity in the argument above, but there are others. Some of these have been settled, so we’ll look at what Python has considered complex.

Nowadays, if you code up a huge for loop that operates on each item in a collection and builds a new collection, it’s generally recognized that a list comprehension is less complex. Why? A comprehension is also less complex if you’re building a new collection from an old one, but removing some members based on some predicate.

On the other hand, using another functional construct, ‘reduce'(fold), is considered MORE complex than the equivalent for loop that operates over the collection, except in certain cases.

Another example would be the general notion that simple logic and shallow nesting is almost always less complex than intricate logic and heavy nesting. Should you find these things in your code, you’re supposed to break them up into smaller chunks. Why would breaking something up somehow make it less complex – you still have to check all your flags and do all your loops? In fact, isn’t it even more complex now that everything is spread out all over the place?

So breaking things down, even pulling things away from their context is, counter-intuitively, a way to make a program LESS complex. What gives? You’re giving a person LESS information about the problem, yet they understand the whole problem better?

The fewer things we have to think about, the less complex they are! The less we have to keep in our head, the less complex it all is!

Think of your mind as having, like a computer, RAM and a hard drive. What happens when you fill up RAM? The OS steps in, carves out a place on the hard drive, and switches out some virtual memory. We, like computers, only have so much RAM we can use as our working memory. The more we have to hit our hard drive, the longer it takes us to do something. Plus, unlike computers, we’re not all that infallible when it comes to remembering things. We may do a look up of a certain problem on our hard drives today and come up with one answer, and then tomorrow come up with another. Uh oh, programming bug!

How do things like list comprehensions and loop control statements help us not hit our hard drives when we’re thinking about a problem? How does breaking a problem up keep things all in working memory?

The second statement should be obvious – the smaller our focus, the fewer things we have to keep in our heads. We divide one large problem up into multiple small ones, plus one more. The small ones are the modules we use to decompose the problem, while the plus one problem is the far simpler ‘big picture’ of how these small modules work together to solve our original problem.

The first statement, though, reduces complexity in another way. Think of breaking a problem up as ‘optimizing’ the problem for our mind’s cache. We think much quicker that way, and can keep everything in fast, working memory. There is another option we can take when we try and shrink a problem to fit in memory and that’s compression.

We like to zip up files because transmitting over the Internet is slow, so we want to transmit as little information as possible to the other side. Compression takes data and eliminates as much redundancy as possible to shrink it to its smallest ‘essential’ size. This identification of ‘essential’ also applies to our problem solving and programming. We can compress ideas by identifying them by their ‘essential’ structure, and ignoring all the extra details. This is the process of ‘abstraction’.

You see, ‘continue’ and ‘break’ abstract away a very common loop control structure. That means they allow a more complex loop to be stated in a simpler way. Instead of keeping the details of exactly how the loop will continue or break in our heads, we ‘compress’ that information away and simply mark those facts with the smaller, ‘essential’ nature of the problem.

List comprehensions do the same thing, they boil a big problem – a loop over a collection – into a smaller one that only consists of the ‘essential’ complexity of the problem.

But what about the critic in our example? Surely he knows that continues and breaks reduce complexity via abstraction. Er, don’t they?

Not necessarily. Compression in your mind, just like on a computer, takes time. It’s called ‘learning’. We see patterns, over and over again, and only after awhile do we begin to replace the pattern with it’s ‘essential’ nature (the parts that change). Only after we’ve seen something many times do we begin to think about that thing as an object in itself, and then begin to try and shrink that object into its most compact form.

The critic probably was not familiar with the use of continue and break, and thus saw the continue and break solution as inherently more complex. To him, continue and break didn’t boil down to small complex problems, but instead made the original problem even bigger. Now, in addition to understanding what the loop DOES, the reader has to understand these esoteric control flow keywords. That’s even more stuff to keep in your head!

And now comes the rub. I call this the Teacher’s Fallacy: You tend to project your level of competence onto other people. In other words, if you understand a subject, you frequently over-estimate how much others know. Indeed, we’ve all been in talking to a professor or local guru, asking a question and they start answering with jargon about three levels above the question we asked. They don’t realize they’re being incomprehensible. They don’t know what patterns you have or have not abstracted, and the problem is, once you’ve abstracted a pattern, you become blind to it. After all, how else is your mind going to keep all that information from clouding your RAM?

Similarly, if you do not understand something, you tend to project that level of misunderstanding on to others. The developer skeptical of ‘continue’ and ‘break’ sincerely believed that everyone else in the room also found their use as complex as he(or she) did. He( or she) most likely believed that the author of the code was simply trying to be ‘clever’, and obfuscate things in an effort to show how smart they were.

So, when is a potential language feature too complex to add to the language? People who aren’t familiar with the feature, but ARE familiar with the language, are likely to greatly over-estimate the complexity of the proposed feature. After all, we all get complacent with our level of mastery over things. Too often, we often see all knowledge as a single vast mountain, rather than a mountain range. In other words, we think “Well, I understand so much about programming, and this concept is complex to even me, therefor, it must be VERY complex to someone new to the language!” In actuality, you’re not looking at a new steep cliff near the top of your own mountain of knowledge, but rather a rather shallow grade at the bottom of someone else’s. You’re on equal footing with many beginners, but are so used to the thin air solving your own advanced problems that you don’t recognize where you are.

Indeed, if continuations and functional programming were, in fact, more difficult to understand than imperative or procedural programming, you’d see a trend of students taking far longer to master the former than they do the latter. But we don’t. There have been projects to teach Haskell, generally conceived to be one of the more obtuse academic languages out there, to high schoolers. And they were successful.

Functional programming seems hard because we’re already imperative programmers. Good ones too! We forgot how hard it originally was to wrap our minds around objects, around function calls, around variables and references versus values. When we approach functional programming, we generally do so at the level of an amateur (at least, those of us new to it), but we still believe ourselves to be experts. It will take your average smart imperative programmer about as much time as it would someone completely new to programming to ‘get’ Lisp, Scheme or Haskell. It’s starting all over again.

So, we’ve established that it’s very hard to judge the complexity of a new idea, because we’re so used to ‘getting’ it so quickly. We tend to over-estimate how hard it might be for someone else to grasp, since it’s so hard for us, and we’re, after all, the experts.

For someone arguing that continuations are too complex to put in a learning language like Python, really, the onus is on them to show that learning a stack-based function call mechanism is somehow intrinsically simpler for a person completely new to programming than a simple continuation based flow. Until then, I’d argue that it is simple enough for anyone to grasp. Even we expert imperative programmers 🙂

Another major question looms, though, and that is – whatever the complexity is, is a feature or solution useful enough to justify it. If, in our original story, we lived in an alternate dimension where loops were written so differently that really, ‘continue’ and ‘break’ only made sense in one or two cases throughout all code, then it would not be worth introducing those control flow mechanisms.

Learning is hard, recognizing new patterns takes time. I don’t want to learn a new language construct if I’m never going to use it. Then, even if it only adds a little complexity, it’s entirely needless complexity. Even easy things should be justified.

There are two kinds of abstractions, though. The first kind of abstraction is ‘evolved’. It comes from repeated exposure to a pattern until we shrink that pattern into its essential form. Once we have shrunk a few patterns, there emerge new patterns – patterns of patterns, and we continue to shrink entire systems of patterns down into their essential form. This process continues, and these abstractions build off of each other.

The second kind of abstraction is different. It comes out of the blue at us, from something completely unrelated. A flash of insight, maybe. These abstractions are ‘orthogonal’. They come at us from right angles, and are usually taught rather than learned through our environment. They are also used in a different way. As you recall, evolved abstractions come from obvious patterns – in other words, the use of them precedes their existence. We have a problem, and we find a solution, and now we can solve that problem again and again. Orthogonal abstractions go the other way, presenting us with a solution. We frequently see the problem they solve as contrived or unrelated to what we need.

This type of abstraction arises frequently in mathematics. We frequently prove theorems well before we find applications for those theorems. Math is in constant supply of solutions looking for problems, and the most frequent way to apply these new solutions is the transformation. Instead of seeing a pattern in our world and reducing it to its essential form, we see instead raw data. No patterns. But we can transform this raw data, this accidental complexity, into a form of which we already have a solution to. Mathematicians frequently make progress by stepping back an restating the problem.

We take a problem we don’t know how to solve, or one that has a messy solution, and we turn it into another problem that has an easy solution. But the only way we can do this is if we steadily expand our solutions, our ‘bag of tricks’, through other means than evolution alone. Evolution is never going to give you a solution you don’t already need, and such, your problems will always grow linearly with your solutions. Orthogonal abstractions are the only way to really grow your bag of tricks.

Continuations have a few uses that have been stated multiple times. But they still may seem like they solve a problem that’s almost as easily solved in some other way. Are we just tacking on something to the language that doesn’t NEED to be there? The Zen of Python is that there should be preferably one way, and only one way, to do things. This helps the learner of the language, because when they discover a new feature, they should immediately be able to find applications. If there is one way to do something, and a feature is in the language, then it’s abilities aren’t replicated by any other, and when you run into a problem it solves, then you’ll need it and only it.

But just as we said the imperative expert will misjudge the complexity of learning functional style for the newcomer, I think we’ve shown that we all are going to underestimate the usefulness of orthogonal abstractions. They’re solutions looking for problems – why should we tack that on to the language? In fact, such things may even be risky. A solution today with no problem, but tomorrow we may find it solves the same problem as something we already have. It could only be needless today, but tomorrow it could be downright redundant. That is not Pythonic.

There is a debate in economics over whether demand drives the economy or supply. Demand siders say we should give people what they want. They are clearly indicating what problems they’d like solved, and we evolve solutions to those problems from stuff we already have. Supply siders say that if you build it, they will come. People frequently don’t know what they’re missing, or what they need. Take the introduction of the personal computer, for example, the archetype for a completely orthogonal solution. No one demanded a computer, no one even knew what they were. Many were skeptical of their usefulness, yet today, obviously, they are ubiquitous.

In terms of continuations, an extended continue syntax, as an orthogonal abstraction, could very well solve problems we don’t even know we have. Stackless Python has shown the potential of a Python with no stack, while continuations may be able to work that power into a language with the safety of a stack. Native thread advocates constantly bemoan the GIL, yet continuations may have the potential to make efficient, expressive green threads a reality in Python, with very little legwork. We will learn to solve many problems today by transforming them into problems continuations solve far more simply. They are coming at us at right angles, and we never know where that will lead.

May 1, 2009 Posted by | Uncategorized | 2 Comments