Jacob Kaplan-Moss

Snakes on the Web

I wrote this post in 2009, more than 14 years ago. It may be very out of date, partially or totally incorrect. I may even no longer agree with this, or might approach things differently if I wrote this post today. I rarely edit posts after writing them, but if I have there'll be a note at the bottom about what I changed and why. If something in this post is actively harmful or dangerous please get in touch and I'll fix it.

A talk given at PyCon Argentina and PyCon Brazil, 2009.

Web development sucks.

It’s true: web development, at its worst, is difficult, repetitive, and boring. The tools we have suck. At best, they make web development slightly less painful, but we’re a long way from making web development awesome.

The history of web development tools is a history of trying to solve this problem. It’s a history of asking, “how can we make this suck less?” It’s important to understand this history, because we can look at past trends and use them to predict the future.

That’s exactly what I plan to do. I want to answer three questions:

  • What sucks, now, about web development?
  • How will we fix it?
  • Can we fix it with Python?

To do so, I’ll start with ancient [1] history.

A brief, opinionated history of web development

In the beginning, TBL invented the Web, and it was good.

I like to think of this time as the “Stone Age” of web development, an age characterized by clumsy, difficult tools. We wrote HTML, by hand, usually in a text editor. The concept of dynamic web pages didn’t exist: if you wanted to publish a thousand stories online, you wrote a thousand .html documents.

This sucked. Obviously.

And it led to an obvious question: what if we could generate HTML programatically, like from a database or something?

And thus was CGI born, and the Bronze Age of the web began. We had tools, finally, to automate some of our repetitive tasks. We could handle user input! Connect web pages to databases! Flow content through templates!

But CGI sucks. It’s horribly slow, for one. Worse, it encourages incredibly bad development style. Either you have dozens (hundreds!) of .cgi programs, with all sorts of repetitive code… or you end up with a single, gigantic, monolithic go.cgi. CGI usually proves horrifically hard to maintain. The fact that most CGI of this era were written in Perl – or, worse, C – doesn’t help.

So, again, smart people started asking questions. At first, we asked, “how can we make CGI suck less?” Note that this isn’t a very big question – it’s not really a rethinking of how web development should work – but it still led to a leap forward: the first generation of application servers.

I’m talking here about things like mod_perl, mod_python, and especially PHP. PHP, which is essentially “CGI done right,” quickly become overwhelmingly dominant, and the Iron Age of the web began [2].

Now, I mentioned before that the questions that led to PHP weren’t very good questions, and so the leap forward wasn’t all that dramatic. The Iron Age is very similar to the Bronze Age: the tools all look mostly the same, they just use slightly better tech. That is, PHP suffers from most of the same problems as CGI, but to a lesser extent.

The biggest problem, at least to my mind, with the first few ages of web development is that the mindset is essentially page-oriented. All we did in these early years was trade page.html for page.cgi for page.php. We still represented web sites as a collection of pages, written in a string of improving languages.

So the real revolution came when we started to question this basic assumption. “What if,” we asked, “we could think of these things as applications, not pages?”

This question led directly to the creation of Django, and to the Industrial Revolution: Web Frameworks.

Now, technical revolutions happen organically. Take the printing press: though Gutenberg is credited with its invention, in fact the press was simultaneously discovered by at least two other inventors [3] in Europe around the same time. Indeed, it’s actually inaccurate to credit any of these men with building the first press: press-printed material dating back to the 7th century has been found in China and Korea [4].

Like the printing press, then, frameworks existed long before the current crop (WebObjects is just one example). Like the Industrial Revolution, the Framework Revolution happened in many places, and in many different ways. I don’t want for a minute to pretend that Django was the first framework – or among the first – nor was Django born in a vacuum. Django is, like the other frameworks of our time, a product of the age and of these questions about web development.

The Industrial Age: Web Frameworks

Now we find ourselves in the Industrial Age, the Age of the Framework. Since I’m talking about “frameworks” quite a bit, I think it’s worth a bit of time to clarify what I mean. As I see it, the main characteristics of a modern web framework are:

  • They operate at a high conceptual level. Instead of thinking about HTTP, HTML, and web “pages”, frameworks allow us to think and to operate at the level of web application. This means less code, and it also allows us to be much more ambitious about what we’re building.

  • Frameworks provide much larger building blocks.

    I like to use a construction analogy: traditionally, most houses are stick-built: you just nail together a whole bunch of lumber, one stick at a time, until the house finally appears. The raw materials are simple: wood, nails, glue, shingles, bricks. You’ve got all sorts of flexibility, but construction takes a long, long time.

    Today there’s another option: factory-built homes. Here, the house is built in huge sections in a factory, mostly automatically. Each room, pre-built, is loaded on a truck and the a crane puts the house together on-site. Architecture is more constrained – you have to put the house together from the array of room options supported by the factory – but you can literally be moving into the house 30 days after signing off on the final blueprints.

  • Frameworks encourage rapid development. It’s no coincidence that the Age of the Framework is also the Age of Agile. Agile, XP, Scrum, etc. – frameworks are at their best when used in a rapid-iteration style.

  • Good frameworks are open source. I don’t think I need to justify this point to this particular crowd, so suffice to say that it’s no accident that there aren’t any proprietary frameworks with any real following to speak of [5].

  • Finally, good frameworks make development fun. Business folk like to think this is a silly requirement. It’s not. The best thing about the web framework world is our sense of fun: fun motivates, leads to experimentation, and hence to innovation.

What’s next?

I’ve described where we are now… so what’s next?

The best way to predict the future of web development, I think, is to keep asking ourselves the question that led to all the past advances: what sucks, and how can we fix it?

So: what sucks about web development?

Inter-operability

Modern web frameworks suck at inter-op.

Frameworks are good. But frameworks inevitably lead to lock-in. Lock-in is bad.

It’s important to realize that the most important kind of inter-operability is with the user’s code, and frankly web frameworks often suck here. A basic truth of software is that as it grows and matures it becomes more and more domain-specific, and less and less generic. I’ll talk more about this below; the important part for now is to realize that general frameworks should be able to cede control to domain-specific replacements as the stack grows. For the most part, frameworks don’t.

Of course, most people think of inter-op in terms of inter-operability between multiple frameworks. Nobody’s doing very well here, but unfortunately the Python web world’s worse than average. There’s a great deal of fragmentation in the Python web community, and frankly Django’s not helping. That’s a bug in the Django community, and there are similar bugs all over the Python web world. We need to fix these.

WSGI is helping here; WSGI’s the best thing ever to happen to Python web development. We can’t rest on our laurels, however: WSGI’s got some serious problems. They’re off-topic here, so I’ll simply point you to James Bennett’s Let’s talk about WSGI and say, “ditto.”

I should also mention Rack. Rack, in a nutshell, came about when the Ruby world, facing similar problems we’d faced in Pythonland, created a WSGI-inspired web gateway tool. It’s been a resounding success: Rails 3 is being rewritten in Rack. Rack is frankly a bit better than WSGI; we Pythonistas should be embarrassed by that.

The big problem, though – the elephant in the room – is that gateways suck, too. Gateways aren’t APIs. There’s a limit – and it’s a low one – to the level of inter-op you can obtain when the only interface you have is a gateway. Even if we improve WSGI – and we should – it’ll only take us so far.

Even worse, tools like WSGI and Rack do nothing to help inter-language inter-operability. I’d really like to write parts of my application in Python, parts in Clojure, parts in Ruby, and even parts in Perl. Things like web proxies, SOA, ROA, and language VMs help, here, but since gateways aren’t APIs there’s only so far we can go.

This is going to be a hard problem to fix, even if we only focus on Python. We’ve got a bunch of disparate communities, all comprised of volunteers. Very few people have overlapping knowledge, few know how to navigate multiple community standards, and fewer still have the impetus to work on inter-op. Nearly nobody is thinking about multi-language inter-op.

But this stuff is incredibly important. If Django fizzles, I’ll be sad. But if Python fails as a web language I’ll be devastated.

Rich web applications

I’m extremely excited about HTML 5. In fact, I think it could be the best thing to ever happen to web frameworks. If web apps can truly replace desktop apps then frameworks are going to be the place to be, and Python could kick some serious ass here.

Right now, though, the current crop of tools suck at creating rich applications. The current state-of-the-art is pitiful. The two approaches I’ve seen seem to be either building parallel MVC layers on the client and the server and then mashing them together somehow, or else inventing a tightly coupled back-end-with-generated-front-end framework like GWT or SproutCore. Neither approach makes me all that happy.

For example, take a look at 280Slides. It’s an amazing piece of web tech – the browser truly disappears; it’s hard to tell that you’re not in a native desktop app. It’s amazing.

However, the developers believed that 280Slides would be literally impossible to write using any of the current web tools. They not only built their own framework, Cappuccino; they actually invented a new language, Objective-J! If this is a trend, it’s worrying.

Handling complexity (a.k.a. the deployment problem)

It’s a well-recognized fact that web applications are getting more and more complex, and the list of things you need to successfully develop, deploy, and scale a web app is getting longer and longer.

It turns out that writing the app is now the easy part; managing the rest of the stack you need for successful deployment can be nearly impossible. In other words, we’re all ops people now.

Some time ago, Leonard Lin collected this list of all of this “other stuff” you need to worry about after developing your app:

  • API Metering

  • Backups & Snapshots

  • Counters

    • Cloud/Cluster Management Tools

      • Instrumentation/Monitoring (Ganglia, Nagios)
      • Failover
      • Node addition/removal and hashing
      • Autoscaling for cloud resources
  • CSRF/XSS Protection

  • Data Retention/Archival

    • Deployment Tools

      • Multiple Devs, Staging, Prod
      • Data model upgrades
      • Rolling deployments
      • Multiple versions (selective beta)
      • Bucket Testing
      • Rollbacks
      • CDN Management
  • Distributed File Storage

  • Distributed Log storage, analysis

  • Graphing

  • HTTP Caching

  • Input/Output Filtering

  • Memory Caching

  • Non-relational Key Stores

  • Rate Limiting

  • Relational Storage

  • Queues

  • Rate Limiting

  • Real-time messaging (XMPP)

    • Search

      • Ranging
      • Geo
  • Sharding

    • Smart Caching

      • dirty-table management

Yes, a modern web developer really needs to understand this stuff. Yikes.

The good news is that there’s open source software to fill all of these needs. The bad news is that they’re all immature, disparate pieces with no connections to each other. Getting even half of this stuff up, running, and integrated is a monumental task.

There’s a huge opportunity here for Python. Python’s historically been used as a “glue” language, though recently we’ve tried to de-emphasize that aspect. It’s nothing to be ashamed of: Python’s a very good glue language.

Python could easily be the glue that keeps this huge stack from toppling over.

Scale

Internet usage is growing explosively. Worldwide it’s doubled twice since 2000… and global penetration is only about 25% [6]. This number’s just going to keep going up.

Meanwhile, web sites are getting a lot more complex. Think back to 2000 – could you have even imagined a site like 280Slides then?

Meanwhile, traffic is growing. The average user is spending more and more time on the web, and think about what’s going to happen as the mobile web explodes.

We’re going to have to learn to deal with more and more and more traffic. And frameworks suck at scaling.

Frameworks are very good at generic tasks. They’re meant to be: they abstract away common difficulties. But as applications grow in scale, they need to get more and more domain specific to be able to deal with scale. There’s a direct correlation between the size of the site and how specific it is.

This usually breaks down as follows:

  • You develop your first little toy app using Framework X. (In the Django world these seems to be a blogging app – it seems like at least 75% of Django developers have built their own blog app.) This usually goes great.
  • Happily successful, you develop a product with the framework, and launch it. This usually goes well, too – sites at the initial-launch stage are still very similar to each other, and frameworks are great here..
  • As your site grows, you start to feel a bit of pain, and need to replace some bits of the framework with domain specific bits. This usually isn’t too bad: most frameworks, Django included, are modular enough that you can easily swap out the more common non-scalable bits.
  • Then one day you become Twitter, and all hell breaks loose. You end up having to essentially ditch the framework and re-write everything, from scratch, in very very specific ways, just to deal with the crazy, mind-boggling amounts of traffic you’ve got.

Frameworks work incredibly well to get you off the ground quickly… and then usually fail miserably when faced with the specific needs of big sites.

This is an impossible situation for framework developers: by optimizing for a quick start, by focusing on common needs, we’re essentially guaranteeing future failure. Remember the “Rails doesn’t scale” pseudo-controversy last year? I guarantee it’s only a matter of time until there’s an angry “Django FAIL” moment.

Frameworks ought to gracefully fade away as you replace them, bit by bit, with domain-specific code. (This is what I meant, above, that inter-op is also a scaling issue.) Right now, they don’t.

Concurrency

Of course, if you’re talking about scale, then you need to talk about concurrency. That’s right, I’m gonna go there. I’m gonna talk about the GIL. Don’t worry, though, I won’t dwell or complain.

First let’s look at some processors, shall we?

Today, right now, you can buy a top-of-the-line Intel Nehalem for about $2,000. It’s got 2 hardware threads per core, and it’s available in an 8-core configuration. This means 16 hardware threads on a single slot, so you can easily build a box with 64 hardware threads (4 CPUs, 8 cores per CPU, 2 threads per core).

Of course, if you want to get really serious you could buy something with Sun’s UltraSPARC T2 (a.k.a. Niagara). This chip has 8 cores, 8 threads per core, and you get two of ’em in a single box, so that’s a whopping 128 hardware threads per machine. Yes, the future of this machine is in doubt [7], but Sun’s been on the leading edge of concurrency for quite some time. It’s only a matter of time until Intel and AMD catch up.

Obviously concurrency is going to be a Very Big Deal in the future. It already is.

Much of my thinking about this comes to me from Ted Leung. I look up to Ted, and I’m sad to tell you that Ted says we’re screwed. I’m afraid that I’m starting to agree. To some extent the “shared-nothing” architecture of most web applications mean that we can just StartServers 128 to deal with 128 threads, but as applications grow you’ll usually need to start throwing up “shared nothing.”

Most languages can really only saturate a single core, and if you can only use a single core you’re in a lot of trouble.

Now, there’s lots of exciting work going on in the concurrency world today. Cool shit like Actors, STM, persistent data structures, dataflow, tuple spaces, and more. Ted’s A survey of concurrency constructs is a great introduction to these terms if you’ve not heard ’em yet.

Unfortunately, nearly all of this awesome work is going on in relatively obscure languages like Scala, Erlang, Clojure, or Haskell. There’s almost no forward motion in the Python community. Yes, I know all about Twisted, Kamelia, Eventlet, etc.; these are all just twists on threading or IO-based concurrency; there’s very little that’s really new going on in Python.

And though it’s sometimes considered taboo to say it, we have to be honest: this is partially the GIL’s fault. It’s not clear to me weather the GIL would preclude, say, STM, but it almost doesn’t matter: the existence of the GIL basically sends anyone interested in concurrency running for greener pastures.

I have hopes for Unladen Swallow: the prospect of removing the GIL from Python is a promising first step. However, really all we get from that are better threads, and threads suck as a concurrency mechanism. I want my Actors, dammit!

This is where us web guys really need your help. We operate at a higher level of abstraction so much of the time that we’re simply not qualified to figure out how to make concurrency better in Python. At least, I’m not. I frankly barely understand threads after a decade of using ’em, and there’s no way I’ll be the one to implement STM in Python.

Halp!

In the year 2020

By way of conclusion, I want you think try to imagine web development circa 2020. That’s no arbitrary year: it’s also Last Call for HTML 5, so it makes sense to think about what the web’s going to be like when HTML 5 is mature. When we’re finally developing the types of apps we’re just starting to dream of today.

I’m not sure I’ll be using Django in 2020. I hope I will, of course, but it may be that Django simply can’t adapt in the next Age of web development.

However, if I’m not still using Python in 2020 I’m going to be seriously pissed off.

Joel tells us that good software takes ten years, so I think we need to start right now. How can we work to make Python the language of choice for the developers of 2020?

First, we need better inter-op. A better WSGI – WSGI 2? – will help, but we need more communication and more APIs that work between frameworks.

The Django community needs to do a better job here, and I’m taking responsibility for that. Keep complaining to me about Django’s lack of inter-op, and I’ll keep working to fix it.

But more than that we need real leaders here. Someone who can show us a way forward, and keep an eye on the bigger picture, not just focus on a single framework.

A Python web-inter-op BDFL, perhaps?

We’ve got to get out in front of HTML 5. There’s a huge opportunity for Python to be the backend language of choice for HTML 5 web applications. We need to start thinking about this now.

We’ve already made a great transition from thinking about “web pages” to thinking about “web applications.” It’s time for a new transition, for us to start thinking about a holistic “web site,” and all it’s associated related tech. Again, there’s a huge opportunity for Python: it could be what binds our stacks together and makes deployment pleasant again.

I dream about a complete stack deployment framework, all tied together with Python, probably built around WSGI 2.

We need to be thinking about scale from day one. This means being incredibly skeptical of our own work, and continually asking ourselves where it’s going to fail. We need plan for the day that our framework will be phased out.

And holy crap we need better concurrency.

Thank you.

[1]Well, by web standards, at least.
[2]Much of the web, unfortunately, hasn’t progressed much beyond this point. PHP is still by leaps and bounds the most popular and widely-used web technology. The future may be here, but it’s certainly not evenly distributed yet.
[3]At least two others, Procopius Waldfoghel of France and Laurens Janszoon Coster of the Netherlands, may have been working on their own presses around the same time as Gutenberg, and those are only the ones we know about today.
[4]Wooden movable type in China dates to the 10th century, and there’s good evidance that both metal type printing presses were used in Korea as early as the 13th century. If you want to know more, Wikipedia’s History of typography in Easy Asia is as good a place to start as any.
[5]Except maybe for ASP MVC.NET. It’s hard to know how popular Microsoft technologies are because there’s no real community to speak of, and because Microsoft tends to lie about penetration.
[6]See http://www.internetworldstats.com/stats.htm.
[7]You have to wonder if Sun would have failed if we’d been able to write software that made us actually need this many cores…