Entries Tagged 'architecture' ↓

Amazon Shareholder Letter

This is a few months old, but Werner Vogels republishes the Amazon.com Shareholder Letter. A summary of how the quintessential 21st century company views its competitive advantage:

  1. Service Oriented Architecture
  2. Distributed state management
  3. Decision management

Regarding Service Orientation:

Our technologies are almost exclusively implemented as services: bits of logic that encapsulate the data they operate on and provide hardened interfaces as the only way to access their functionality. This approach reduces side effects and allows services to evolve at their own pace without impacting the other components of the overall system. Service-oriented architecture — or SOA — is the fundamental building abstraction for Amazon technologies. Thanks to a thoughtful and far-sighted team of engineers and architects, this approach was applied at Amazon long before SOA became a buzzword in the industry. Our e-commerce platform is composed of a federation of hundreds of software services that work in concert to deliver functionality ranging from recommendations to order fulfillment to inventory tracking. For example, to construct a product detail page for a customer visiting Amazon.com, our software calls on between 200 and 300 services to present a highly personalized experience for that customer.

Any Lessons from the AWS Outage?

You’ve probably heard that Amazon AWS had some problems recently. A question on Stackoverflow recently pointed out a detailed summary of the problem posted on the AWS message board.

Obviously every distributed system is different and every outage is unique so it is difficult to generalise. Some takeways I have are:

  1. Outages happen to even the best guys on the block…so you better plan for yours.
  2. Building distributed systems is hard…so you need experience and experienced friends.
  3. Manual changes are a common cause…not said explicitly in the AWS writeup, but strongly implied.
  4. Outages are often “emergent” phenomena whereby a simple error causes many systems to interact in a way which grows exponentially. The AWS writeup refers to this as a “storm” and I have witnessed similar “storms” in large distributed systems. The degree of coupling and simple aspects like backoff parameters can make the difference between a disturbance that grows exponentially or decays exponentially. Think of the Tacoma Narrows bridge – perhaps the analogy is a stretch, but tuning of a few simple parameters can avoid destructive resonances.
  5. One of the responses pointed to the Netflix Chaos Monkey as being vindicated by the outage. The “Lean” guys have taught us that if something is difficult (like testing or deployment) then you should do it often until it aint difficult any more. Perhaps system failure/resilience is the next frontier for this approach.

Who Cares About Technology?

Steve Jones reminds us that the business doesn’t care about technology – so stop harping about it and using it as an excuse for underperformance.

I totally agree that this is a key reason behind the endemic business/IT culture divide that is the root of many problems.

However this poses an obvious question – who does care about the technology?. The trick is not to over-engineer, but to engineer to just the right level to deliver business value now and into the future.

Somebody has to care about the technology (products, tools, methodologies), because otherwise you lose control and foster a legacy of technical debt which ultimately erodes business value.

I guess this is an axiom of Enterprise Architecture – that lack of governance leads to chaos and inefficiencies. Some would argue with this assertion, but I have never seen a counter-example. And of course the inverse statement is not necessarily true either.

So if the business doesn’t care about technology then who does? And if that is “nobody” then what happens?

Coupling as Inertia

In systems architecture, there are rarely any right answers – mostly just trade offs between one solution or another. In such cases it helps to bear in mind some fundamental principles as a guideline. One principle I often use is cost vs. benefit. Another useful principle is to minimize coupling between systems. Coupling is pervasive and leads to a kind of inertia in enterprise systems. Newton discovered that inertia prevents change and if there is one thing that enterprises struggle most with, it’s change.

Beautiful Data Polling

O’Reilly have released a new book in their “Beautiful…” series called “Beautiful Data.”  There’s a very comprehensive review on Slashdot which I highly recommend. The description of chapter eight caught my eye:

Chapter Eight is about social data APIs and pushes gnip heavily as the de facto social endpoint aggregator for programmers. The chapter mentions WebHooks as an up and coming HTTP Post event transmission project but doesn’t offer much more than a wake up call for programmers. The traditional polling has dominated web APIs and has lead to fragile points of failure. This chapter is a much needed call for sanity in the insane world of HTTP transactional polling. Unfortunately, the community seems to be so in love with the simplicity of polling that they use it for everything, even when a slightly more complicated eventing model would save them a large percentage of transactions.

The link “fragile points of failure” is worth following as it leads to a robust slashdot discussion on Twitter APIs and polling versus push for the web.

I think for a long time, the “web” as we know it has suffered from the lack of the Event/Listener paradigm. This is a pretty simple design concept that I’m going to refer to as the Observer [wikipedia.org]. Let’s say I want to know what Stephen Hawking is tweeting about and I want to know 24/7. Now if you have to make more than one call, something is wrong. That one call should be a notification to Twitter who I am, where you can contact me and what I want to keep tabs on–be it a keyword or user. So all I should ever have to do is tell Twitter I want to know everything from Stephen Hawking and everything with #stephenhawking or whatever and from that point on, it will try to submit that message to me via any number of technologies. Simple pub/sub [wikipedia.org] message queues could be implemented here to alleviate my need to continually go to Twitter and say: “Has Stephen Hawking said anything new yet? *millisecond pause* Has Stephen Hawking said anything new yet? *millisecond pause* …” ad infinitum.

And yet…

That’s not easy to do on a large scale. A persistent connection has to be in place between publisher and subscriber. Twitter would have to have a huge number of low-traffic connections open. (Hopefully only one per subscriber, not one per publisher/subscriber combination.) Then, on the server side, they’d have to have a routing system to track who’s following what, invert that information, and blast out a message to all followers whenever there was an update. This is all quite feasible, but it’s quite different from the classic HTTP model.

It’s been done before, though. Remember Push technology [wikipedia.org]? That’s what this is. PointCast sent their final news/stock push message [cnet.com] in February 2000. There’s more support for “push” in HTML5, incidentally.

Ahhh yes, I remember PointCast well. One of the early darlings of the dot-com era. This reply points at some new hope:

For messaging architectures (like, say, the internet), the pattern is usually described as “Publish/Subscribe”. All serious messaging protocols support it (XMPP, AMQP, etc.) and some are dedicated to it (PubSubHubbub). The basic problem with using it the whole way to the client is that many clients are run in environments where it is impractical to run a server which makes recieving inbound connections difficult.

There are fairly good solutions to that, mostly involving using a proxy for the client somewhere that can run a server which holds messages, and then having the client call the proxy (rather than the message sources) to get all the pending messages together.

Keep watching.