Push versus Pull

From OSCON via O’Reilly Radar here’s a good case study of an architectural decision driven by the system requirements rather than the usual religious considerations that pollute the bloggosphere.

FriendFeed needed update info from Flikr but a REST-based “pull” approach is highly inefficient in this case. Instead the solution architects opted for a “push” approach using xmpp as the message transport. This is a really good presentation because it goes into the architectural choices and implications of “push” versus “pull”.

I characterize this as “pull vs push” rather than “REST vs xmpp” (or “REST vs *” or “why REST is crap”) because fundamentally it comes down to the best choice of how to synchronize changes between systems. You make this choice based on the usage characteristics of the different systems, the likely traffic volumes this will result in and the consequential resource impacts. Having made the choice between push or pull you then choose the appropriate message transport.

The web doesn’t do a lot of “push” and consequently there is not a lot of discussion about push and REST. Dare Obasanjo characterises it nicely:

Polling is a good idea for RSS/Atom for a few reasons

  • there are a thousands to hundreds of thousands clients that might be interested in a resource so the server keeping track of subscriptions is prohibitively expensive
  • a lot of these end points aren’t persistently connected (i.e. your desktop RSS reader isn’t always running)
  • RSS/Atom publishing is as simple as plopping a file in the right directory and letting IIS or Apache work its magic

The situation between FriendFeed and Flickr is almost the exact opposite. Instead of thousands of clients interested in document, we have one subscriber interested in thousands of documents. Both end points are always on or are at least expected to be. The cost of developing a publish-subscribe model is one that both sides can afford.

Inside the firewall, the situation is often more akin to that between FriendFeed and Flikr. This is why messaging is more common inside the firewall than outside - not because of any universal superiority between REST versus messaging, but because the system requirements are different and often favour a push approach rather than pull.

While your over at Dare’s excellent Blog, be sure to also check out his discussion of push versus pull in the context of scaling Twitter and MS Exchange.  These are important considerations for designers of federated systems such as federated databases or federated messaging systems. The example of FriendFeed to Flikr could be considered as the first incremental step toward a federation.

Waiting for the great leap forward

One of the original and fundamental tenets of the SOAP standard was that the SOAP message is independent of the underlying transport. Ostensibly you could use SOAP over HTTP, JMS, email, FTP etc. but the reality is that a standard binding has only ever existed for SOAP over HTTP. To paraphrase Henry Ford - “you can have any SOAP transport you like, as long as its HTTP”.

While HTTP is undoubtedly a good choice for SOAP - given its ubiquity - there is at least one other transport which demands attention. This is the JMS transport which is widely used inside the firewall of many organizations. Of all the companies that I work with, their SOA infrastructure heavily relies on JMS transports inside the firewall, with HTTP transports outside the firewall or to selected service end-points such as web pages. Of course my experience has significant selection effects, but nevertheless JMS is an important transport in many SOAs. Testament to this is that every major web-services product vendor (save Microsoft) supports SOAP over JMS (and even Microsoft now has SOAP over MSMQ as an important part of WCF).

The fly in the ointment is that there has never been a standardized binding for SOAP over JMS and as a result there is little interoperability between SOAP/JMS solutions provided by different platform vendors. If you happen to have any combination of different web-service platforms in your organization, then they cannot easily communicate with each other using SOAP over JMS without performing some unnatural acts.

Some of issues that need to be considered with a SOAP binding to JMS are:

  • How do you represent the message content - text or binary? Most vendors have chosen a text message representation, but that has problems with multi-byte encodings, so other vendors have gone with a byte message representation.
  • What headers do you define and what should their names be? How do you use the standard JMS headers? different vendors have different naming conventions and semantics.
  • In the WSDL description, how do you represent the connection details to the JMS provider?
  • How do your service endpoints manage the different message exchange patterns that are available with message-oriented transports?

Each of the vendors went their own way on many of these issues and as far as interoperability was concerned they basically ceded the field to HTTP. They made life difficult for large organizations with heterogeneous platforms and in my opinion didn’t do themselves any favours on the way. (Actually SOAP-encoding interoperability was so broken for a while that noone noticed the JMS issues…so maybe it wasn’t so bad).

Subsequently it was great to see some of the vendors get together a couple of years ago to agree on a standard SOAP binding for JMS that addresses most of the important considerations. The result was a Member Submission to W3C in September last year. My understanding is that this submission was previously circulated through most of the vendor community so hopefully it has general agreement on the technical details.

This has now taken its first steps to standardization with the initiation of a SOAP-JMS Binding Working Group who aim to publish a recommendation by April next year. Hopefully vendor support of the binding will be hot on its trail.

Note that the standard binding won’t address the fact that different JMS implementations do not interoperate. For example, a TIBCO JMS client will not be able to talk to a Websphere JMS provider because JMS is an API standard, not a wire-protocol standard. What the SOAP/JMS binding standard does mean is that once you have settled on a standard JMS provider for your services, you could define your service description in standard WSDL and your service provider (say Websphere or TIBCO or WSO2) and your service consumer (say TIBCO or WebLogic or Axis) would be able to communicate directly using SOAP over JMS “out of the box”.

Its been eight years (almost to the day) since SOAP 1.1 came out with the HTTP binding. Wouldn’t it be great if a standard JMS binding could be achieved within the decade! It’s been a very long wait. The JMS binding should have happened a lot sooner and I can’t say the “wait has been worth it” but it does fill an important hole in the Web-Services standards.

So what do we do in the meantime? You can eschew JMS altogether and stick with HTTP, but that requires another lot of hard work. You can stick with one and only one service platform, but that is difficult in large heterogeneous organizations - which is where SOA is supposed to provide maximum benefit. Or you can continue to do what many SOA implementers have done and deal with SOAP directly at the JMS layer - effectively using SOAP as plain-old-XML over JMS. I wrote more about this approach recently.

Another thing you can do in the meantime is ask your vendor when will they support the new SOAP/JMS binding?

Pragmatic Web Services

In a recent ThoughtWorks podcast, Jim Webber introduced himself as a “MESTian”. This was a new term for me, so I had to investigate. MEST is a message-centric approach to SOA which resonates strongly with my own views on how services ought to be implemented. The MEST approach is a pragmatic approach to SOA to which I think/hope Web Services are evolving naturally. Therefore I agree with Neil Ward-Dutton that we don’t really need to coin a new term (MEST). This is really just Web Services “done properly”.

My views on this are a product of my past experience with MOM-based distributed computing. My earlier description of an ESB is based on a MOM approach and probably differs from the common perception of a “black box” ESB. The MEST approach would be very natural to people with an MQ, Rendezvous or JMS background…which is probably the minority of current SOA practitioners.

About 10 years ago, MOM messages were exchanged between systems using proprietary message representations such as COBOL Copybook or AE-Message formats. Enterprise concerns such as scalability, reliability and fault-tolerance were dealt with using techniques at the messaging level. MOM quality-of-service dealt with guaranteed message delivery and message ordering (in normal cases). Where possible, message endpoints were implemented in a stateless manner to allow for easy failover and load-balancing. This general approach is still valid today…only the message representation has changed.

When XML became more mature and accepted, MOM messages started to be implemented with XML payloads. Even after SOAP became a standard, my experience is that it wasn’t rapidly adopted by the MOM community. Proprietary XML message schemas ruled for a couple of years and SOAP had its initial application in RPC over HTTP implementations. But the great thing about SOAP is that it is a nice generic message envelope that is acceptable by everyone. Put your meta-data into the SOAP header and the payload into the SOAP body. If you didn’t have it, you would have to invent it - and many did. Hence, as a pragmatic approach SOAP was adopted as a generalized envelope over - now - JMS. Add the correct JMS headers and you have SOAP Document Literal Encoding over JMS. Additional standards like WS-Addressing, WS-Security are additional sets of meta-data in the SOAP header with meaning to the endpoints and intermediaries in the message journey. WSDL is simply a way of representing the contract between message producer and consumer. I think this is a relatively natural progression from proprietary MOM to more open mechanisms for message exchange which are compliant with the core Web Services standards.

Contrast this with the original RPC approach to Web Services. SOAP RPC Encoding was the original standard, buried within code generation tools which attempted to hide complexity from the developer. Unfortunately this resulted in Web Services which lacked interoperability and created tight couplings between provider and consumer. Moreover, the attempt to shield developers from the distributed nature of their services and the underlying transports - all very necessary concerns - led to huge problems with meeting the enterprise requirements for services. This is the experience of most Web Services developers and it is no wonder that Web Services have such a bad reputation. Subsequently, Web Services - SOAP in particular - has moved to more inter-operable approaches through WS-I. But a lot of damage has been done, and the continued tendency to ignore the distributed nature of Web Services continues to cause problems in terms of unrealistic expectations.

So I like the MEST approach and find that it resonates well with the “pragmatic” approach to Web Services via the adoption of SOAP and other WS-* standards by the MOM community. I can summarize this “pragmatic” approach as:

  • Transport Independence is a myth. Use the transports for their strengths - JMS for reliability and HTTP for ubiquity.
  • Understand the distributed nature of Web Services and use the long history of best practices from distributed computing and Message Oriented Middleware (MOM).
  • Understand the standards and how they fit together. Most importantly, know where the holes are.
  • Use the standards where they make sense. Augment them with your own enterprise standards and best practices where necessary.

The result will be better confidence and ownership of your SOA infrastructure. You will rule the standards and your tool vendors rather than the other way around. As an added bonus, you get asynchronous services as a natural part of your SOA - an area where the WS-* standards struggle right now.

The benefits of an ESB

In my last post on this topic I talked about the concept of an ESB. Here I talk about why you would want one.

There are plenty of whitepapers, analyst reports and vendor statements about the features and functions of the various ESB products. In my experience, the key advantages of using an ESB are less about features and functions and more about how you use it.

Standardization

One of the primary advantages of an ESB is that it gives you a standardized platform for integration. When everyone is using the same tools you can develop enterprise-wide frameworks, patterns and best practices for building re-usable services. Without a unifying platform, you get a divergence of integration methods which leads to inconsistency and higher cost of management and change. So an ESB platform helps with design-time governance. Note that this is not the same as standardization in the sense of using web-services standards. The important thing is that you use the ESB to support your own enterprise standards. These may be based on external standards - but that may be of secondary importance.

Loose Coupling

The bus architecture of an ESB encourages you toward a loosely coupled architecture.

  • Physically decoupled by making use of message passing mechanisms (e.g. JMS) versus direct network connections (e.g. HTTP).
  • Semantically decoupled by use of message transformation so that services are exposed in a system-neutral format which reduces application lock-in and reduces the cost of change.

Scalability and Reliability

Physical loose-coupling provides scalability advantages such as high-availability, fault-tolerance and load balancing. The messaging layer in the ESB directs messages between service endpoints to the appropriate instance of the endpoint. For example, in the event of a service provider failure, messages will be redirected to a backup provider - thus supporting high availability. In the case of load balancing, messages are distributed between redundant providers (or consumers) to handle high volumes of message traffic. You could say that physical loose-coupling supports change at the “micro” level where short term changes in the system topology can be compensated for via real-time message redirection.

Routing and mediation

Message routing supports scalability and fault tolerance. An ESB can also be used to support business-level routing and mediation. For example content-based routing allows services to be invoked based on the content of a service request. A business example would be routing of a customer enquiry to the branch where that customer account is located. A technical example would be the routing of a service request based on the version of the service being invoked.

Complex message exchange patterns

Traditional HTTP-based services support only one-to-one request-reply MEPs. An ESB supports more complex MEPs such as asynchronous one-way messaging and to multiple subscribers using topic-based messaging. Asynchronous publish and subscribe mechanisms support new ways of intermediating service consumers and subscribers - such as auditing, service monitoring - which are extremely useful for runtime management and governance of your services. Beyond mere governance, higher level business functions such as complex event processing (CEP) and business activity monitoring (BAM) are supported by this ability to “listen in” to service traffic on the ESB.

The benefits of an ESB that I’ve described above stem largely from the architecture of an ESB and in particular from the use of a message bus as the primary underlying transport. But it is important to understand that these benefits don’t automatically come “out of the box”. Your solution architecture (and your architects) must recognise and utilise the architectural principles underlying the ESB.

What is an ESB and why do I need one?

A question I often get is “what is an ESB and why do I need one”? This question is motivated by a number of concerns; non-technical people have heard the term but don’t understand the concept, semi-technical people are trying to figure out conflicting vendor definitions, and technical people are confused by the debate between different service enablement approachs - RESTful versus ws-* versus middleware-supported hybrids.

The Elevator Pitch

A service bus provides a uniform and consistent platform to allow service providers and service consumers to interoperate. An ESB provides benefits such as:

  • standardization
  • loose coupling
  • resilience and high availability
  • monitoring and intermediation

Hardware

I’m not sure of the provenance of the word “bus” as it is applied in the technical domain (I’m sure there is some interesting etymology there) but you can confidently trace it back to the concept of a computer hardware bus. The idea of a hardware bus (or backplane) is that hardware components - such as sound-cards, video cards, floating point accelerators, tape-drives, barcode scanners etc - can all slot into and interoperate through a shared infrastructure. By supporting a standard hardware interface and a standard software protocol, the hardware bus abstracts the details of each individual hardware component. The key features of the harwdare bus are:

  • standardized hardware connectivity to the backplane
  • standardized software protocol between each component and the backplane
  • hardware components can operate independently without having to know details about each other
  • a single infrastructure replaces multiple point-to-point connections between components (i.e. does away with a lot of ad hoc soldering).

Software

Networked systems arrived in the seventies and grew out of control in the eighties. Early network infrastructures such as Unix sockets were hard-wired point-to-point affairs with little or no abstraction of the the two programs that were working together.

The idea of a software bus is that software components can work together - yet independently - via a standardized message passing mechanism that would abstract away the need to create individual network connections between components. The software bus would take care of routing messages to the required location and also take care of all that hard stuff like quality-of-service, reliability and scalability. This is equivalent to standardizing the “hardware connectivity” in the hardware bus. TIBCO’s predecessor - Teknekron - articulated the concept of the software bus in the early nineties

The Service Bus

So the hardware bus standardizes hardware connectivity and the software bus standardizes software connectivity. The Service Bus has refined the concept of the software bus by taking a more service-oriented approach and adding support for the XML stack underlying web services and transport connectivity (e.g. bridging HTTP to JMS).

So why do you need an ESB? More on that anon…