The Skeptical Methodologist

Software, Rants and Management

My Five Things

It’s the cool thing to do, after all.

First, Jesse pointed out (actually, a friend of his pointed out) that if you can’t name five things you don’t like about your favorite language, then you still don’t understand it.  I think this is a little on the liberal side, though, as most reasonable people can figure out five things they don’t like about C++ before they understand even half of it.  Of course, in that case, understanding even half of C++ is about as far as most people get 😉

Then was Zed’s own contribution to the list.  Both of these guys are very active in their communities and contribute a lot of code via libraries and projects they work on.  Neither of them is a ‘dirty little schemer‘.  Yet now that they’ve been seen disagreeing with the politburo, the wagons have been circled and they must be torn apart with ad hominem attacks until everyone’s convinced Python’s perfect again.

As Jesse and Zed both tried to explain, I will as well.  I love the Python programming language.  It reads like pseudo-code, and is incredibly expressive.  It’s ease of incorporating C or C++ makes nearly any performance complaint moot, and the vast number of libraries out there gives it a gigantic code base to work from.

But, again, as Jesse said, if I thought Python was perfect, I’d be fooling myself.  In fact, I think it’d be an interesting challenge to hear Guido’s own list of five things he wish he could fix in Python.  GVR is hard to read, sometimes he’s incredibly reasonable, sometimes he’s a little irrational.

But this post is not about politics, on with the list!

1. Static Default Args

This gets brought up time and time again.  The arguments that you provide as defaults to functions are done at ‘compile-time’, and are not re-evaluated during runtime.  This means you get funny behavior like this:

  >> def func(arg=[]):
  ::   arg.append(1)
  ::   return arg
  >> func()
  [1]
  >> func()
  [1,1]

I’ve got two problems with this.  First of all, as is the classic way to shoot down ideas the community has already ‘decided’ upon (other than the ‘code it yourself’ comment), what exactly is the use case for this behavior?  What idioms does it allow that I’ve yet to see?  Indeed, ‘idiomatic’ python seems to be to work around it, by using ‘none’ as the only real default argument.  First you check for ‘none’, then if the arg is in fact ‘none’, you set it to its real default.  This is added boilerplate and, quite honestly, seems to be the default (forgive the pun) rather than the exception.

Another argument for the status quo is that it’s ‘more efficient’.  I don’t get this line of reasoning at all.  Virtually 99% of debates between ‘efficiency’ and ‘readability’ in Python always end up on the side of readability.  That’s why Python is Python!  Why efficiency gets a free ride in this most awkward of cases is beyond me.  Moreover, it’s not at all more efficient, as far as I can tell.  The replacement, checking for ‘none’ and then recomputing the value at runtime is exactly the same efficiency hit as the proposed dynamic default args!  Most people are already taking the efficiency hit.

Indeed, it looks like this is the rare case where Python has optimized for efficiency of a niche use of the language, which to be blunt, is not at all Pythonic.

2.  String Concatenation

Most people new to a programming language are going to get a feel for the way to do things.  When you want to put two lists together, they experiment in the Python shell, try simply +’ing them together, and hey, it works.  Doing the same thing to two strings also has the same effect.  So obviously, if you need to add many strings together, you add them together like you’d add many other things together.  Wrong.

Instead, the “Pythonic” way to do things is to put all of your strings in a list and do the following:

  >> "".join(myStringList)

Does that at all seem obvious at first?  The joining of many strings is a method on another string?  In its defense, the string being ‘joined’ on is the separator string.  But even that seems odd, since the separator really isn’t the ‘object’, in object-oriented terms, I’m worried about here.  The object is a list of strings!  The reason given by the Python community is that “”.join() is faster, and therefor preferable for large scale string processing.  While I might balk a bit at again optimizing for efficiency in Python, where everything else is optimized for consistency and readability, at least with string processing there’s a major use case for it.  Python is used for string crunching a lot, and the “”.join() construct has some specialized C routines to do it fast.

This brings me to a slightly tangential point, but one that really applies here.  Python is a notoriously ‘smart language, dumb compiler’ compared to languages like C++, where it’s a ‘dumb language, smart compiler’.  That codifies, in other words, the constant strive for readability and intelligence in language design.  But it also implies that there’s no major drive to optimize the Python compiler.  Scripting languages are slow, and are going to continue to be slow for awhile.  Besides, Python is mostly used for things that aren’t processor bound anyway, and the FFI is good enough to drop into C whenever you need.  Why optimize?

But there is a particular kind of optimization that enhances readability as well as speeding up the execution time, called Strength Reduction.  The idea is, when the compiler detects a construct that can be stated in an equivalent yet more efficient way, it replaces it behind the scenes.  That way you can still divide by two, because that’s how a human would read your algorithm, yet the compiler will go ahead and turn that into a much faster bit shift.

Similarly, appending multiple strings together is a pretty common operation in Python.  I still believe that the most obvious way to do this is to just repeatedly add them together.  Strength Reduction optimizations would detect this repeated addition and change it to the more efficient C routine – or hell, even an assembly routine.  GVR has been skeptical of optimizations in the past because he’s afraid they will change behavior from what the user’s expect, but in this case, behavior is entirely conserved.  Only the runtime changes.  In fact, from an optimization standpoint, Strength Reduction is more Pythonic since there’s far too much anecdotal evidence that the user’s original expectations is to use repeated addition rather than some special method on an empty string.  The ‘optimized’ approach to use a library function to drop into C is not expected and is actually pretty difficult to find the first time.

3. Non-recursive Reload

Working with the interactive shell is awesome for someone coming from mostly compiled languages.  Getting immediate feedback allows for easy exploratory development.  At first, I was entranced by the idea of having a large code file, making incremental changes in an IDE, then reloading it in the shell and testing various procedures on it.  That is, until I realized reload does not do what one expects it do.  When you import a file, it reads the module, and imports all the things that file imports, as well as defining things in that module’s namespace, and so on.  But when you reload a file, the imports – since they’ve already taken place and are in the global namespace – are skipped.  You only reload the things defined in that file.  If I’ve imported Module X, and it’s imported Module Y, and I change Module Y, simply reloading X won’t do me a bit of good.  X will still point to the old version of Y.  Instead, I need to reload Y… oops, no, that doesn’t work either.  First I need to import Y, and then reload it.  That doesn’t make any sense at all.  How come reloading tells me it’s not loaded, yet import assumes it is.  Importing a module that has already been imported by some other module just copies the namespaces over, I’d imagine, and reload seems to absolutely depend on those namespaces.  Either way, it’s a lot more complicated than it seems at first.

Upon reloading a file, imports need to behave differently, and files really ought to be re-imported.  I’m sorry if this is a performance hit to you, but you really shouldn’t be using reload in a library anyway.  Reload should only be used in interactive development, in my opinion.  Python’s reload and import process is similar to other languages Make process.  Make is highly recursive, indeed, that’s often a problem since you have to nest things so oddly in esoteric Make files.  But Python’s reload doesn’t even have the functionality to go out and see whether or not it needs to re-import libraries that might have changed.  The quick fix is to always re-import libraries, the good fix would be to check their load times and their current file modification dates like a Makefile would to look for changes.

4. Love/Hate relationship with Functional Programming

GVR seems to maintain that Python is not a Functional Programming language, and that Functional Programming is not Pythonic.  This is despite the ‘Pythonic’ list and generator comprehensions (generators are a form of ‘lazy evaluation’), closures, first class functions and currying.  What, pray tell, would make Python a Functional Programming language if not these things?

But this odd schizophrenia doesn’t really bother me.  It’s the casualties that get to me, namely, and yes, this one is debated again and again, neutered lambdas.  Why aren’t lambdas allowed closures like normal functions?  Because Python isn’t a functional programming language.  This seems like an odd excuse given all the other support for FP Python has, and it seems altogether too convenient to use whenever the Python community wants to.  Closures are ok, because those are ‘Pythonic’.  Lambdas, no, because Python isn’t an FP language.  I feel like I’m stuck in 1984.

GVR has made some valuable, and some less valuable, critiques of FP.  One of the places where FP seems to muck up ‘Pythonicness’ was in ‘reduce’, which is still in the language, it’s just not built-in anymore.  Folds are a major part of many FP languages, but it appears for now they’re a little too terse for most people to understand.

One of the criticisms of lambdas, or anonymous functions in general, is that they get abused and grow to a size in which they should be a named function to add some level of self-documentation to them.  This is fine, and I agree with it.  Lambdas in Python probably ought to remain one-liners.  But why they don’t support closures doesn’t seem at all to be covered by this line of reasoning, and indeed, they really ought to support closures or just be taken out of the language altogether.  I’ve refrained from using them because I’ve been bitten too many times by odd errors.  Why would such and odd construct be allowed – in fact, lambdas DO capture variables, they just don’t close over them.  It’s one of the weirdest scoping errors I’ve seen in a modern language.

5. No Encapsulation

There are obviously many differences between the way Python does OO and other languages.  It’s incredibly unpythonic, for example, to write getters and setters.  Hell, that’s just bad OO altogether.  Python provides properties for those sorts of operations.  But one thing that Python could sorely use is some explicit encapsulation protection like private, public and protected.

I can hear the Politburo now harping away at how such things are not at all Pythonic, how a person should be able to modify and change whatever he or she wants whenever he or she wants it.  I agree.

But private, protected and public, as well as any other encapsulation protection mechanisms  you’d like to add, add a large level of self-documentation to an API.  It is one more layer of encoding designer intent.  The best case I can give you is using the dir() command on any sufficiently large object you get from a library, or even a module.  You get a few screen fulls of non-sense and implementation defined stuff that doesn’t concern  you, and then  you have to go hunting through this giant list of operations for the one you want.  How would Python incorporate encapsulation?  Could be nothing less than a keyword.  Pass in ‘SeeImplementation=True’ to the dir() command and you get everything, leave it off, and you get only that which the designer said was public.  When I’m doing exploratory programming, I really don’t want to explore implementation minutia which wasn’t meant for the user anyway.  I want to see what tools were explicitly given to me.  It helps with the information overload that becomes all too common when working with large API’s.

Much like FP, Python implicitly agrees with this statement, that a to define and communicate whether or not something is implementation or interface is important, via the _underscoreFirst coding convention.  But, in a language that strictly enforces a white space coding convention, at least compared to languages like C, it seems odd to suddenly fall back on just a ‘general and friendly agreement between the whole of Python programmers’ that putting a _inFront of your variable or function means its implementation details and not to be trifled with.  If it’s conventional enough, why not enforce it?  Why not allow some explicit introspection rather than hand coding some hack to look for a ‘_’ in the front of a variable to provide your own dir and other functions?

Things that are not broken, and I agree with. (In which I try and gain the forgiveness of the Politburo for my heresies before I’m to be shot)

Explicit self.  I actually love this, and it’s changed the way I code in C++ to scope everything with a this-> pointer to be clear.  It also makes methods and normal functions nearly the same thing, allowing you to more easily pass around methods just like you would functions (but this is NOT FP, remember, it’s Pythonic 😉

Not allowing implicit TCO.  TCO, despite its name, is way more than an optimization.  As said before, it opens up an entirely new model of computation.  That’s hardly something I’d like to happen behind the scenes from a sufficiently smart compiler.  This is why I was an advocate of an explicit TCO via a keyword, although since I got branded a dirty little schemer (despite not knowing the language!), that idea got shot down by our good friends at the Kremlin.

Whitespacing.  You know how to get rid of all arguments about coding conventions?  By choosing one arbitrarily and enforcing it as apart of the language.  This and this alone has focused the Python community’s effort on much more useful things.  It also allows almost anyone to read any python code and be able to figure out it’s structure without getting a headache.  Readability does count.

No brackets.  Same as above – you don’t need explicit brackets if your white space shows the structure for you.

No doubt none of this will go anywhere.  But I thought, given the, er… ‘open-mindedness’ of my favorite programming language’s designer community, a little extra jabbing couldn’t hurt.  For my sake, let’s hope not 🙂

Advertisements

May 31, 2009 - Posted by | Uncategorized

1 Comment »

  1. I feel it on 1 and 2. The default arg things is so completely mystifying and mysterious, and impossible to debug the first time you see it. Sure it’s been written about ad infinitum, but it *still* bites me once in a while.

    The join string concat thing is a wart, straight-up. Having the compiler handle it is a fine solution.

    And seriously, why isn’t dir() smarter?

    Comment by Gregg Lind | September 28, 2009 | Reply


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: