Professional JMS

A Distributed System in the Pub/Sub Domain

In the last example, the messages in the PTP domain corresponded to tasks and it was (effectively) the responsibility of the JMS message server to ensure that each of these tasks was executed exactly once. In contrast, messages in the Pub/Sub domain often contain information that should be distributed to as many interested clients as necessary.

A classic example, and probably one of the most common applications of Pub/Sub messaging, is the distribution of financial data. In this section we will develop an example of a stock price distribution system for a financial trading room. This example draws on elements of several real financial information systems, but is simplified to serve as an architectural example without getting bogged down in too many details.

The initial system will be very skeletal in that it only distributes raw stock prices from market data feeds to traders' desktops as well as middle and back office systems that also require access to live data. We will then demonstrate how the architecture makes it easy to merge in additional services.

The skeletal system is shown in the figure below. The system is fed with price data from several realtime market data services. These may be available via dial-up modem, leased line, satellite feed, Internet, or other means. How each feed is delivered is not important. We will assume that for each feed there is an interface program that is capable of publishing the data to our JMS compliant message server. Let's assume that there are three feeds, and that each covers one of the three major financial regions: North America, Europe, and Asia.

In our trading room, activity is divided up by industry, so the feed interfaces will publish each price to one of the topics dedicated to technology, automotive, or pharmaceutical stock. The diagram also shows a number of subscribers that are consuming the price data. There are two trader desktop systems; each one is only subscribed to the topic containing prices for the industry in which that trader is interested. There are other institutional systems that subscribe to the raw prices. These systems need data from all industries, so they subscribe to all of the topics:

Let us have a look at what we get from the basic system:

Most traders who are watching stock prices are also interested in the volatility of those prices, so we will integrate a volatility calculator into the system. Volatility is a measure of the degree to which a price fluctuates over time, and is the result of a simple statistical calculation on the past stock prices. Our volatility calculator thus needs to subscribe to stock prices as input to the calculation. After performing the calculation, it will publish the results to new topics dedicated to volatility data. When these functions are implemented, the volatility calculator is fully integrated into the system. Other applications can access the calculated volatility data by subscribing to one of the volatility topics.

Now that we have access to volatility data, we would like to calculate theoretical stock option prices. The option price calculation requires stock price and volatility as input. (It actually requires interest rates and few other things also. Providing this additional data requires a hefty bit of infrastructure, so for the purposes of the example we will just assume that it is available by some other means.) The option price calculator is integrated into the system in the same way as the volatility calculator. It subscribes to stock prices and volatilities, and publishes option prices on a new set of topics.

Next we would like to make the calculation engines (for volatility and option prices) redundant in order to increase the reliability of the system. There are a few different possible approaches to this. The most basic approach is just to start a second instance of each engine. Now each time a new stock price is published, two identical volatility updates are published. If one of the volatility calculators fails, then the other still provides a complete set of data. If the subscribers of the calculated data are not adversely affected by receiving redundant updates, then we have already achieved high availability, but not in a particularly efficient way.

Consider this: for each stock price update, there are two identical volatility updates. If the option price calculator calculates a new price each time it receives an update for one of its input values, then each one will calculate three redundant price updates (one new stock price and two redundant volatilities). Two redundant option price calculators will produce a total of six redundant updates for each new stock price. This effect will tend to snowball, so we need to take a more sophisticated approach.

Added Sophistication

One approach is to build a bit more intelligence into update logic of the calculation engines. If the option calculator caches the input values of each calculation, then it can be programmed to calculate a new price only if a violatility price update contains a different value from that used in the last calculation. It might just delay the actual calculation so that several updates of input data that occur within a short time period only trigger one new price calculation. This curbs the snowball effect with minimal effort. The redundant calculators, however, will still produce redundant updates, so if this is not acceptable, we need to move on to the next technique.

The next level of sophistication requires each redundant calculator to subscribe to the topic to which it publishes its results. In this way it has a means to detect the results published by other engines. Each instance of the calculation engine should pause for a small, random delay before actually performing the calculation. If, during this pause, it receives a message from another calculation engine, then it knows that the task has already been performed and it can abort the calculation.

In this scenario, the random delay of the other calculation engine was shorter. This scheme still permits redundant calculations, but they will occur seldom. It is quite effective in providing both increased reliability and load balancing among the different parallel instances of the calculation engines, albeit it is a bit more difficult to implement than in the PTP case.

When a new subscriber is started, it would like to have the most recent price for each stock immediately, rather than wait an unknown amount of time until the stock price changes before seeing the current state of the market. This could be solved with durable subscribers, but durable subscribers, as defined by JMS, will deliver all the price changes that transpired while the subscriber was offline. In most cases, only the current prices are relevant.

To solve this problem, we will add a new service into the system: the MRV (Most Recent Value) service. The MRV service subscribes to all topics and stores the most recent value of each unique item in a database. It then listens for requests for MRVs for specific items on a special destination. It returns the appropriate value from its database to the destination specified in the ReplyTo header of the request. This service is actually best implemented with a queue, as each request is a single task that should be executed by exactly one consumer.

We will add one more useful service to the system. It may be desirable to have the history of all prices and calculations stored in a database for future reference. This can be accomplished by adding a subscriber that listens to all topics and writes all messages into a database. If the JMS and JDBC providers both support distributed transactions then this can be employed to ensure that every message is correctly copied to the database in spite of failures.

The complete system is depicted in this second figure:

Here are some of the noteworthy aspects of the final system:

High availability and load balancing of the individual services are more difficult to implement than in the PTP domain, but nevertheless can be done in the robust fashion described above. This is still quite advantageous when you consider what it takes to implement these features without leveraging JMS: Your application would need to include a process that knows about all of the calculation engines and can distribute tasks to them. This process should not be the single point of failure — there needs to be a second one that listens to heartbeats from the first one and takes over if it dies (and goes away again if the heartbeats start again). This is complicated stuff if you have to implement it yourself. JMS can make your life easier if you use it right.

JMS in Application Integration

When talking of integration, it is imperative that organizations understand both business processes and data. They must select which processes and data elements require integration. This may involve data-level integration, application interface-level integration, method-level integration and user-interface-level integration. In any business organization, both internal and external integration are related. Unless businesses have some kind of common integration infrastructure that created and maintained the interfaces between different systems, they will find that Enterprise Application Integration (EAI) is very labor-intensive. Another issue to consider is web-technologies in the context of enterprise integration since the Internet drives market forces today.

Therefore in addition to EAI, it is also imperative that we talk about Internet Application Integration (IAI). In the first place, a lot of clients are seeking integration in the context of the Internet. This is because the client's e-commerce retail web site has to be integrated with backend systems using a more flexible integration infrastructure rather than hard-coded or even paper-based links. This could use EAI technology, as the web site server could reside within the enterprise. However, a lot of e-business activity is not retail but business-to-business (B2B). This means that operational systems of different corporations must be linked together and this creates a whole new dynamic which sees XML, integration with application servers, and Java becoming more important than EAI. Similarly, vendors like IBM and BEA Systems have strategies that focus on both the EAI and the IAI segments.

With EAI the focus is integrating a set of applications, whether built or bought, inside an enterprise in order to automate an overall business process for that enterprise. With IAI the focus is integrating applications, whether built or bought, across multiple enterprises in order to automate multi-enterprise business processes where the Internet provides the communications backbone. Specific examples are trading groups or associations, virtual companies (components and assembly of a final product are totally out-sourced, the virtual company handles distribution, marketing, and finance) and integrated supply chains.

If you are building an interface between two areas of the company that represent parts of the business you might outsource, externalize to partners, or even offer as part of your service to other companies, then it needs to incorporate open, Internet-based technology. If you don't, the chances are you will have to rebuild the interface in the future when you need to open it up. It's always a trade off, but the thought should be there.

As its name suggests, the distinguishing feature of IAI technology is that it integrates more tightly with Internet technology, particularly Java and application servers. A lot of vendors are working on versions of their message brokers that would run as a task inside a java application server. There are some good reasons why they would want to do this. Application servers are becoming the focus of a lot of investment because they are highly reliable, transactional, and scalable — capabilities that you want available when you build a message broker. You don't need to build this into your message broker when you can run over an application server. This is important from the customer's point of view too, because if they are building a distributed computing environment as well as an integration infrastructure, both environments need to be integrated with directory, security, and management infrastructure — all complex problems from an installation and management perspective. Why not solve this problem once rather than twice?

Traditionally, there have been a lot of issues surrounding messaging APIs. Many organizations ended up building their own API on top of the original proprietary messaging system API. Clearly, there was some dissatisfaction with the APIs provided by the messaging vendors. Also, the International Middleware Association (IMWA) — formerly Message Oriented Middleware Association (MOMA) — the standards governing body for MOM, never sought to make it their mission to devise a single, industry-standard messaging API.

It is here that JMS could be used as such a messaging API standard. Similarly, with its integration of messaging and message brokers with Java application servers and other Internet-based technology, IAI is an enabler for multi-enterprise e-business processes and will form the majority of integration projects in the future.

JMS is very tightly integrated with Java, meaning that if you are a Java programmer looking at talking to IBM MQSeries, for example, it will be much easier and much more productive to use JMS than the MQSeries API for Java. Each messaging vendor will implement JMS. This means that a Java developer can use JMS with either TIBCO Rendezvous or IBM MQSeries or both.

JavaMail

When first exposed to JMS, many people do not see a big difference between JMS and e-mail. E-mail is, of course, a means for sending and receiving messages. E-mail also has a huge established infrastructure. This infrastructure is ubiquitous, reasonably reliable, and usually someone else's responsibility to maintain.

E-mail is intended for transmitting messages from humans to humans, but by using an API such as JavaMail, it is possible to use e-mail for inter-application messaging. Not only is it possible, but has certainly been done numerous times in the history of distributed programming. In some of these cases e-mail was the best tool for the job, but in others cases JMS may well have been a better choice but may not yet have been available or not well enough understood.

Consider the following example. A company has established, but conventional, retail channels, business processes, and back-office IT systems. This company needs to add the Internet as a new retail channel as soon as possible (or in a matter of weeks, whichever is sooner). The company is not big enough to justify the expenses associated with maintaining a highly available (say 99.9% or better) web site on its own premises, so it outsources this to an ASP. The backend systems must remain on premises for security reasons, but a failure of these systems should not impair the functionality of the web site (a backend failure is costly, but customers do not need to know about it). Thus, the web front end must be able to transmit merchandise orders to the backend system.

Both systems are behind restrictive firewalls, but with enough effort, a secure TCP connection could be tunneled from the web system to the backend. In order to fulfill the requirements, then the web system would need to be able to queue orders and continue processing if the back end is not available. This is possible, but involves re-inventing the (messaging) wheel (remember the time constraints). In this case, encrypted e-mail via JavaMail is used to solve the connectivity problem. The ASP provides access to an SMTP server at no extra cost, and there are almost no firewall issues.

Although e-mail proved to be a quick and effective solution in the example above, there are some shortcomings:

In addition to the points mentioned above, there are other shortcomings of using e-mail for inter-application messaging that were not relevant to the example:

In summary, despite the JavaMail API, e-mail is intended for the exchange of messages between humans. JMS was designed for inter-application messaging. The fact that both JavaMail and JMS are required components of the Java 2 Platform, Enterprise Edition (as of version 1.3) underscores the fact that they are meant to fulfill different needs. Although the difference between these may not seem dramatic at first glance, the devil is in the details.

JMS provides many features that support the use of messaging in automated, transactional applications. These features would need to be implemented at the application level if JavaMail were to be used in this context. On the other hand, the ability to use JavaMail to cheaply and quickly leverage a huge existing global infrastructure should not be overlooked. Perhaps one day we will see JMS to e-mail gateways, or JMS that uses e-mail as its underlying transport mechanism.

Категории