Beautiful Data Polling

O’Reilly have released a new book in their “Beautiful…” series called “Beautiful Data.”  There’s a very comprehensive review on Slashdot which I highly recommend. The description of chapter eight caught my eye:

Chapter Eight is about social data APIs and pushes gnip heavily as the de facto social endpoint aggregator for programmers. The chapter mentions WebHooks as an up and coming HTTP Post event transmission project but doesn’t offer much more than a wake up call for programmers. The traditional polling has dominated web APIs and has lead to fragile points of failure. This chapter is a much needed call for sanity in the insane world of HTTP transactional polling. Unfortunately, the community seems to be so in love with the simplicity of polling that they use it for everything, even when a slightly more complicated eventing model would save them a large percentage of transactions.

The link “fragile points of failure” is worth following as it leads to a robust slashdot discussion on Twitter APIs and polling versus push for the web.

I think for a long time, the “web” as we know it has suffered from the lack of the Event/Listener paradigm. This is a pretty simple design concept that I’m going to refer to as the Observer [wikipedia.org]. Let’s say I want to know what Stephen Hawking is tweeting about and I want to know 24/7. Now if you have to make more than one call, something is wrong. That one call should be a notification to Twitter who I am, where you can contact me and what I want to keep tabs on–be it a keyword or user. So all I should ever have to do is tell Twitter I want to know everything from Stephen Hawking and everything with #stephenhawking or whatever and from that point on, it will try to submit that message to me via any number of technologies. Simple pub/sub [wikipedia.org] message queues could be implemented here to alleviate my need to continually go to Twitter and say: “Has Stephen Hawking said anything new yet? *millisecond pause* Has Stephen Hawking said anything new yet? *millisecond pause* …” ad infinitum.

And yet…

That’s not easy to do on a large scale. A persistent connection has to be in place between publisher and subscriber. Twitter would have to have a huge number of low-traffic connections open. (Hopefully only one per subscriber, not one per publisher/subscriber combination.) Then, on the server side, they’d have to have a routing system to track who’s following what, invert that information, and blast out a message to all followers whenever there was an update. This is all quite feasible, but it’s quite different from the classic HTTP model.

It’s been done before, though. Remember Push technology [wikipedia.org]? That’s what this is. PointCast sent their final news/stock push message [cnet.com] in February 2000. There’s more support for “push” in HTML5, incidentally.

Ahhh yes, I remember PointCast well. One of the early darlings of the dot-com era. This reply points at some new hope:

For messaging architectures (like, say, the internet), the pattern is usually described as “Publish/Subscribe”. All serious messaging protocols support it (XMPP, AMQP, etc.) and some are dedicated to it (PubSubHubbub). The basic problem with using it the whole way to the client is that many clients are run in environments where it is impractical to run a server which makes recieving inbound connections difficult.

There are fairly good solutions to that, mostly involving using a proxy for the client somewhere that can run a server which holds messages, and then having the client call the proxy (rather than the message sources) to get all the pending messages together.

Keep watching.

56 Architecture Case Studies

The recent brouhaha about Twitter scalability has highlighted the growth of the latest spectator sport in the blogosphere – “armchair architect”. Everyone’s a Monday morning expert on which language/database/framework is/isn’t the secret to extreme scalability. The real secret is the architecture and organizational maturity. Here are some case studies to prove it:

A Conversation with Werner Vogels where Werner talks about using service-orientation to scale out massively distributed services which power the Amazon e-commerce platform. This is one of my favourites because it covers organizational as well as technical aspects of scalability. One of the unique attributes of Amazon is that service-orientation pervades everything – even their organizational structure. Developers are responsible for running their own services.Werner characterises the adoption of services as a challenging and major learning experience, but it has become one of their main strategic advantages. Key lessons learned:

  • service-orientation is an excellent technique to achieve isolation and high levels of ownership and control
  • prohibiting direct database access allows scaling and reliability improvements without affecting clients
  • a single unified service access mechanism supports service aggregation, routing & tracking
  • service orientation improves development and operational processes leading to more agility
  • giving developers operational responsibility enhances the quality of the services.

The eBay Architecture (PDF) covers the evolution of eBay from 1998 to 2006. It’s a great example of how continuous reinvention is needed to keep up with rapidly growing scaling requirements. Frank Sommers writes a good summary and discussion where he argues that organizational capability is just as important as technical architecture for scalability. Another key ingredient of the eBay story is the ability to discard “conventional wisdom” when required. This is covered in an interview with Dan Pritchett, revealing some of the “rules” that eBay bends in order to scale.

  • eBay.com doesn’t use transactions – mainly for scalability and availability reasons
  • different data is treated in different ways – so best effort suffices for some
  • references the CAP theorem – consistency, availability, partitioning – pick any two
  • many have arrived at the same idea – and transactions are the first to go

Scalable Web Architectures is a great presentation on scalable web architectures in general and Flickr in particular. Also check out Cal Henderson’s list of his other presentations.

Architectures You’ve Always Wondered About provides slides from QCon London 2009 presentations. Case studies about eBay, Second Life, Yahoo!, Linked-In and Orbitz.

Avoiding the Fail Whale is a video in which Robert Scoble interviews architects from FriendFeed, Technorati and iLike.

Improving Running Components at Twitter – Evan Weaver describes how Twitter learned to scale by moving to a messaging architecture.

Real Life Architectures at High Scalability – provides a huge collection of pointers to architecture case studies from around the web.