Service-oriented infrastructure: proof of concept demonstrator


Service oriented infrastructure (Part 2)

Service-oriented infrastructure: proof of concept demonstrator

P Deans and R Wiseman



Organisations are increasingly aware of the benefits they can gain by introducing Service–Oriented Architectures (SOAs), principally when it comes to flexibility, agility and reuse. Such benefits are not available by default, however. To secure them, an SOA must be underpinned by a Service-Oriented Infrastructure (SOI) in which both the networks and IT systems are monitored and intelligently controlled to match their performance to the user's expectations.
This paper looks at BT's research in this area and describes the proof of concept demonstrator we have constructed to demonstrate SOI's potential. Based on a real-life example, it features an SOA in which demand for the services on offer varies over time and shows how the underlying SOI can be adjusted automatically to keep overall performance within pre-defined limits. The result is used to explain how BT's SOI can support a current or potential customer's SOA strategy, keeping its business running smoothly as the demand on business processes and the services that support them varies.


1. Introduction


This paper details the creation of a proof of concept demonstrator which both validates our SOA (Service-Oriented Architecture) research and provides a means for line-ofbusiness stakeholders to present our research to their potential customers. It describes an SOA infrastructure using Enterprise Service Bus (ESB) technology combined with virtualisation, event correlation and event dashboarding and has been created to better understand and demonstrate the viability of BT's approach to Service-Oriented Infrastructure (SOI). The demonstrator also showcases how technologies like SOA and SOI benefit both those deploying the technology, such as operators like BT, and those customers who make use of it, for example, BT customers. Overall, the three key SOI benefits are flexibility, visibility and automated control.

1.1 Flexibility
Modern businesses need networked IT infrastructures that are highly flexible.

First, they need to be able to change their size and structure quickly in response to market pressures, when they complete mergers and acquisitions, and on a host of other occasions. Typically, such changes will require amendments to their IT infrastructures, and these must be completed quickly if their commitments to both existing and newly-acquired customers are to be honoured. Should the amendments take too long to complete, initiatives conceived to create competitive advantage can all too easily end up weakening a firm's market position. Second, businesses need to be able to respond quickly as demand for their products and services varies. Significant increases in demand can cause particular problems. Even if a policy of deliberate over-provisioning has been adopted, there is no guarantee an IT infrastructure will be able to cope.
SOI responds to these needs.

1.2 Visibility
Business and IT managers are often forced to make decisions based on data that has taken time to percolate through their organisation, and is therefore somewhat out of date. To improve the quality of their decision making, they need real-time access to information about their organisation's business processes. Consider the problems of operating an online store around the Christmas period, for example. Demand will be significantly higher than at other times, and this will push stock management, resource management and IT infrastructure to the limit. Clearly, managers will find it easier to cope if they have access to realtime information and, by inference, up-to-date knowledge.

The increased levels of real-time visibility possible using SOI also benefit businesses at times of change, ensuring that decisions made are relevant to the shifting conditions.

1.3 Automated control
Business processes can be performed manually or automated by machine. Automation reduces the need for repetitive tasks to be performed manually, thereby reducing cost in terms of both time and human error.
Other benefits of automation emerge when the IT and network infrastructure that supports business processes is considered. Businesses can react faster if their infrastructure's capability and performance are automatically adjusted in response to changes in demand.
For example, additional (virtual) processing resource could be provided automatically – and almost immediately – to support a sales process that's experiencing a temporary increase in demand. In contrast, it would take days, weeks or even months to provide the additional capacity using traditional 'manual' approaches based on the purchase, build and deployment of extra servers and software.
Gartner endorses this automated approach. �Exploit technologies like virtualization to lubricate the gears of IT, permitting quick shifts�, said Carl Claunch, a vice president at the company. �Apply automation; it not only helps cut down rising labour costs, but it accelerates responses to events and delivers consistent, repeatable actions.� [1]

1.4 Service-Oriented Architecture (SOA)
We see SOI is an essential enabler for the Service Oriented Architectures (SOAs) now being deployed by organisations all over the world [2].

In an SOA, functional software components exist as individual services. These are available over a network through standardised software interfaces. A major benefit is that the way services are implemented can be changed at any time. As long as their interfaces remain unchanged, there is no impact on the other services and systems that use them. By promoting reuse, such 'loose coupling' enables cost reduction.

Many companies have prioritised their investments around SOA in the hope that reusability and other attributes such as flexibility, agility and control are attained by default. SOA certainly offers abstracted ownership of processes, potentially allowing multiple users (or even organisations) to share processes. However, there are no guarantees that an increase in demand can be handled. Despite being service-oriented, SOA does not provide agility by default or performance guarantees.

The same principle applies to virtualised services. Applications are distributed or load-balanced across a number of machines, enhancing flexibility and reuse. However, because of the abstracted ownership and multiple servers, control becomes harder to achieve. Similarly, the task of billing customers becomes more complex if they share use of infrastructure and processes. To benefit fully from an SOA, organisations therefore also need a flexible, agile and controllable ICT infrastructure that's ready to support it – one that can adapt as demand for services comes and goes.

2. SOI concept demonstrator


BT's SOI has been created to guide businesses through SOA adoption. The high quality infrastructure is equipped with systems that ensure the levels of agility, flexibility and control organisations need to respond to the unpredictable demands typical during periods of growth and change.

Our concept demonstrator was designed both to demonstrate SOI's viability and to showcase the key technologies and associated benefits for business customers.

2.1 Key requirements
It was important for us to be able to convince both IT managers and business managers of SOI's key benefits. IT managers often obtain their budgets from business managers, while business managers often set IT-based policies on the advice from IT managers.

We decided that a demonstrator that emphasised the levels of flexibility, automated control and visibility possible through SOI would be the most powerful sales tool.

2.1.1 Requirement 1: Flexibility
The demonstrator must show how enterprises can become more agile by investing in SOI and the flexible infrastructure it offers.

The feature of SOI that is most relevant is its ability to automatically optimise a system in response to demand changes. The demonstrator highlights this capability by automatically recognising a negative situation and correcting it. It shows how an organisation can maintain control of its operations in the face of sudden changes in traffic.

2.1.2 Requirement 2: Automated control
By tying the performance of business processes to a Service Level Agreement (SLA), organisations can ensure their operations perform within defined parameters (such as sales process response time and sales per hour). Real-time information helps managers monitor such SLAs. The demonstrator must show both what happens when automatic system optimisation does occur – that is, performance is kept in line with the related SLA because resources were brought online to meet the additional with real-time information from the business processes across their enterprise. It can improve the 'currency' of their understanding of their organisation's performance.

The ability to see how a system is performing against an SLA before, during and after system optimisations, and thus whether a business is efficiently meeting its legal, contractual and physical obligations, therefore ranked highly as a requirement for the concept demonstrator.

2.2 Key architectural requirements
The main architectural requirement was that there must be a strict adherence to three well known principles of SOA [3], namely:

- service reuse – benefits are: accelerated implementation of new business functions and changes to existing ones, lower effort and risk, reduced cost, quicker implementation;
- composite applications built by combining services – simplifies creation of new applications to respond quickly to market changes; and
- loosely coupled systems – benefits are: greater flexibility, increased implementation agility, improved process efficiency, a higher degree of automation.

To meet these requirements, an ESB was chosen as the underlying software infrastructure for the test bed. The primary advantage of such a messaging technology is that it promotes flexibility in the transport layer, enabling loose coupling and easy connection between services, in essence helping facilitate an SOA.

3. Demonstration scenarios


We chose two scenarios based on real-world business problems. The first is based around a call centre, while the second focuses on a multi-channel retail organisation. The second demonstration builds on the first, adding in-store and online sales channels to the existing call centre sales channel.

3.1 Scenario 1: Call centre
3.1.1 Problem statement
In call centres, problems that affect customer throughput need to be resolved quickly and efficiently to avoid leaving customers with a poor impression. Taking BT's own business as an example, an increase in the number of faults experienced by our broadband customers could increase the number of calls in a call centre queue dedicated to handling broadband faults. The call centre supervisor might only become aware of the problem when a number of call centre operators observe and report the trend. A slow reaction to such problems inevitably means a backlog of increasingly disgruntled customers.

3.1.2 Solution statement
A system that improved or automated the control of resources in the call centre would minimise backlog and reduce the chance of customer upset. For example, when broadband fault reporting queues become overloaded, the system could notify the call centre supervisor automatically. Alternatively, a process could be invoked to reroute customers to alternative queues, or even to reassign operators from quieter queues to handle broadband fault reports. This would in turn require changes to the underlying ICT infrastructure, allowing requirement 1, to be illustrated through automated system optimisation in response to demand changes. As already discussed, an ESB facilitates an SOA which, if used as the architectural model in a call centre, allows Computer Telephony Interface (CTI) data and other call centre information to be aggregated and used as part of a decision making process. The CTI data involved could include caller location, customer waiting time, customer identity, topic of call and so on.

3.1.3 Design
The overall design pattern is that of a feedback loop. A Decision Management Service (DMS) subscribes to relevant topics of information available on the ESB such as queue size, location information and the nature of fault. It then maps problems to processes that counteract them.

To continue the above example, the input to the DMS would be the number of calls about broadband faults plus location information relating reported faults to particular exchanges. The output of the DMS (that is, the resultant action) would be instructions to run a number of processes, perhaps including 'page the current supervisor – perhaps with problem details' and 'reassign available operators to the BT broadband fault queue'.

3.2 Scenario 2: Multi-channel retail
Even if they have physical stores, retailers are increasingly likely to use online sales channels to increase their customer base and reduce the cost per sale. Often, they will want to make existing back-end processes available to these new channels. Our second scenario illustrates this multi-channel approach.

3.2.1 Problem statement
In a real-life situation, a particular hotel chain used a multichannel approach in which the same business process was used both by its telesales operators and by third-party websites. Queries from the websites caused intermittent increases in demand on the hotel chain's sales business processes that sometimes prevented its own telesales operators from making reservations. The results were dissatisfied customers and lost revenue.

3.2.2 Solution statement
The problem could be solved by giving queries from third-party websites a lower priority than those instigated by call centre agents. However, this approach is heavy-handed in that, even though many online queries don't turn into sales, plenty do. A better solution would be one where online queries are not restricted and no restrictions are placed on sales process usage at all. If the sales process becomes overloaded, a new sales process instance could be brought online. This flexible approach is at the heart of SOI: a temporary flexing of process power to meet a temporary increase in process demand.

We decided to incorporate this capability into the demonstrator to showcase virtualisation within SOI and meet requirement 1. By equipping the demonstrator with three channels, we would also be able to show how an SOI solution would respond as demand creeps from one channel to another (cross-channel interference). For example, if the online sales portal fails, customers are likely to use the call centre sales channel instead, increasing demand on the software, infrastructure and staff supporting telesales.

4. Demonstrator design


4.1 Logical design
Figure 1 shows a logical view of the demonstrator's architecture. It consists of three sales channels:

- an external web channel;
- a call centre (telesales) channel; and
- an (internal) in-store web channel1.

For the purposes of the demonstrator, each channel's traffic is created artificially by a traffic generator. Channels feed into a back-end sales process via internal and external portals; the sales process queries back-end databases and sends a confirmation email when an order is placed.

Figure 1 shows only the basic demonstrator environment. The visibility and control components are not shown. Visibility is provided via a dashboard that displays, in real time, raw and composite data collected from various points in the infrastructure. To provide the required control capability, this same data is utilised by a DMS to automatically effect changes that keep the system within its SLA. The DMS can, for example, assign additional operators to a call centre queue and bring additional machines online to cope with an overloaded sales process.

4.2 Component-level design
Components are built from standard ESB services and processes, custom Java services and, where necessary2, Java applications. Referring to figure 2, each component is attached to the bus via endpoints (queues or topics) so that the arrows between components represent a soft connection. Bus connectivity is an important contributor towards the visibility requirement as events can be made accessible to the dashboard in real time in either a raw or composite form.

Figure 1. Logical view
        Figure 1. Logical view



1 In-store query and sales traffic (representing intranet traffic from in-store staff) is also generated but never reaches the query/sales process. Its inclusion in the dashboard is primarily to enhance the multi-channel scenario narrative. An extension to this demonstrator might include in-store traffic monitoring and some reaction by the DMS to that traffic, resulting in an improved service for users of the system.

2 Services are managed by the ESB and can have only one entry endpoint (i.e., Java Message Service topic or queue) and as many exit endpoints as are required. Where a service requires more than one entry endpoint (e.g. the DMS) it has to be built as a separate JMS application.

5. Demonstrator components


5.1 Component types
Five categories of component are used in the demonstrator:

- infrastructure components, providing the setting for the demonstration – they are the stage upon which the demonstration takes place;
- hooks and monitors, enabling the visibility and control components;

Figure 2. Component view
        Figure 2. Component view

- dashboard components, making visible what is otherwise a 'black box' demonstration;
- control components, controlling the infrastructure to achieve the desired performance; and
- traffic-generation components, simulating use of the infrastructure components.

Communication between components is accomplished using XML messages. Each component will be described in the following sections.

5.2 Infrastructure components
The base upon which the demonstrator is built comprises:

- back-end systems, the sales processes used by the three communications channels;
- an external web channel, handling internet queries;
- a call centre channel, handling telephone calls as would a traditional call centre; and
- an in-store web channel, handling intranet queries from staff in the company's shops.

5.2.1 Back-end systems
One sales process was built to handle queries and another to handle orders. The query process involves parsing incoming XML messages to extract item identities, querying an ESBbased database of information and converting the returned XML. The order process is essentially the same, except that it also queries a Customer Relationship Management (CRM) database to complete the order and email the customer. A real system would also handle billing and stock levels.

5.2.2 External web channel
The main elements of the web channel are online requests, a web server and supporting back-end systems. The overall process is quite simple:

- The web-based user clicks on a link – for example, 'submit for a quote' or 'buy'.
- The web server receives and parses the request to determine what to do.
- The web server accesses back-end systems as appropriate to fulfil the request.
- The web server returns its response to the request.

The majority of the components in this channel are realised using the same technology as would be found in a live system, the primary exception being the user. The way in which users are simulated is discussed in section 5.6. Other components are discussed below.

In our demonstrator, an Apache Tomcat server is used to host a Java servlet that parses requests and creates and sends the required JMS (Java Message Service) messages to the backend sales process. It is essentially an HTTP-to-JMS proxy.

5.2.3 Call centre channel
Note: It must be understood that the call centre is not the focus of the demonstration, but a carefully chosen scenario in which to showcase SOI. At a very basic level, the main elements of a traditional call centre are the incoming calls, an Interactive Voice Response (IVR) system to steer the calls to queues, the queues themselves, a team of operators and back-end support systems. The overall process is:

1. A call arrives at the call centre.
2. The caller uses speech or the telephone's keypad to navigate some kind of menu structure and the call is directed to the appropriate call centre queue.
3. One or more operators are attending the queue, and as operators become free, they take the next waiting call from the queue.
4. An operator handling a call speaks to the caller and accesses back-end systems in order to fulfil the caller's requirements.

In our demonstrator, software substitutes perform each of the roles. These are summarised below and, where appropriate, described in detail in the following sections:

- Calls – XML messages represent the requirements of the fictitious caller.
- IVR system – Content-Based Routing (CBR) is used to direct each XML message to the appropriate queue.
- Queues – Since JMS queues and topics are the bedrock of the Sonic ESB on which the demonstrator is built, each call centre queue is easily mapped to a JMS queue.
- Operators – Each operator is represented by a running Java thread. This was originally achieved with each operator as a separate ESB service, but for various reasons a single Java application was used instead, running multiple threads.
- Back-end systems – ESB processes ('sales processes') were created to handle queries and orders from the operators, and in the interests of realism, were placed behind a web server.

Calls
Instead of actual calls being handed off from one component to another, XML messages are passed between components to represent them.
It is assumed that when a call arrives, its CLI (Caller Line Identification – the caller's phone number) is extracted and that the caller's regional information is inferred. The XML messages represent not only the call but the caller.

In a real IVR system, the caller makes selections based upon the purpose of their call. The absence of a real person in the demonstration system means this information must be extracted from the XML message. The call type and caller's geographic region information are then used to help decide the queue to which a call is passed. The IVR component augments the XML with a call ID, sending the extended message on after a randomly-generated delay corresponding to the time a call might have taken to pass through a real IVR system.

Additionally, an expected duration is supplied. This gives the traffic generator component finer-grained control over the simulation. (The actual duration of the call is based upon this expected duration but with an additional random component introduced.)

In a real call centre, when operators become available, they will accept a call from the queue on which they are working. In the demonstrator, this is modelled by the next XML message in the JMS queue being consumed by the Java thread that simulates the operator. As mentioned above, the actual time for the call to be processed is the expected duration with a random component added. Depending upon the type of call, some or all of the processing time will simply be achieved by the Java thread sleeping. More detail on this is provided under 'operators' below.

IVR system
In a real IVR system, callers use the telephone's keypad or their voice to navigate through a menu designed to direct their call to the most appropriate team of operators. In our demonstrator, 'call' direction is achieved using a Content Based Routing (CBR) service, which inspects the XML to extract the geographic region and call type and thereby deduce the appropriate queue.

Queues
In a real call centre, calls wait in queues to be attended. Messages flow across the ESB on JMS queues3, so these are ideal for simulating call centre queues. Both call centre queues and JMS queues operate on a 'first in, first out' basis. Each call or message is consumed by one operator, real or simulated (see below).



1 Topics (see glossary) also play a part, but queues are of most interest here because each message has only one consumer.

Operators
Human operators accept calls from the queues they are attending, speak to the caller, access back-end systems where appropriate and (hopefully) resolve calls satisfactorily. In the call centre simulation, the operator is represented by a running Java thread and, to enhance realism, this virtual 'operator' will, for certain types of call, make requests to the back-end sales system.
The sequence of requests that an operator can make is both logical (for example, any orders placed will only be for products that have previously been queried) and pseudorandom (in that probabilities are used to decide whether to make additional requests). Figure 11 illustrates this through a flowchart.

Web server
As for the external web channel, a web server is used to marshal requests to the back-end systems. Internal traffic will only come from agents or employees of the company, so a different Java servlet is used that has extended capabilities compared to that used for public queries. One example of an extended capability for employees is pricing. Employees may be given access to a range of offers that differ in price from those directly available to the public via websites. Employees would then achieve the most productive sale based on a combination of their sales technique and offers available.

5.2.4 In-store web channel
This channel is, in a sense, a combination of the other two. Requests are made via a web browser to a web server, but the server is the same one as for the call centre channel. The idea behind this is that the external server used for internet traffic would likely be capability-restricted compared to the internal server, which is only used by employees and agents of the company.

In our demonstrator, the in-store web channel is only implemented as a placeholder, using simulated status information.

5.3 Hooks and monitors
The dashboard and control components are driven by information from hooks and monitors distributed throughout the system. This section briefly describes each of these, how it is generated and how it is used.

5.3.1 Call centre queue lengths
A Java application was written that uses the Sonic MQ management application programming interface to monitor the length of the JMS queues that represent call centre queues. It then publishes updates as XML on a JMS topic. The queue lengths are displayed on the dashboard and are used by the DMS to assess whether additional operators are required on each queue.

5.3.2 Internal/external web portal request/query/ order rates
The Java servlets that implement the internal and external web portals in our demonstrator calculate the rate at which requests (that is, queries or orders) arrive. Each back-end sales process (there is one for queries and one for orders) has a step that calculates the rate of requests. These rates are all then published as XML on various JMS topics, displayed on the dashboard and used by the DMS to determine whether additional resource is required to handle requests on the back-end sales processes.

5.3.3 Virtualised service status
The Java application that manages and monitors virtualised compute resources publishes XML on a JMS topic. ESB processes and services are deployed into ESB containers, which are deployed into Management Framework (MF) containers. In turn, these are deployed onto servers – real or virtual. In the demonstrator, virtualised services are those deployed onto virtual servers.

Instances of one ESB container can be deployed into multiple MF containers, and the demonstrator uses this to implement virtualised services. Each instance of an ESB container can be started or stopped independently, so new instances can be brought online when required. Though this approach works well for demonstration purposes, showing a fast response to demand changes, it isn't a viable basis for a real system because the virtual machine already has to be powered up. In this case, it might as well be used constantly. We refer to this as a 'warm start'. Ideally, a 'cold start' arrangement would be used, in which machines are only powered up (from 'cold') as required. This is discussed in section 7.2.2.

The possible virtualised resource states, based upon containers' online/offline states, are:

- MF container offline – virtual server is effectively unavailable.
- MF container online, ESB container offline – virtualised services are standing by.
- MF container online, ESB container online – virtualised services are available for use.

Figure 3. The multi-channel retail demonstrator dashboard
        Figure 3. The multi-channel retail demonstrator dashboard

5.4 Dashboard components
Figure 3 shows a screen shot of the multi-channel retail demonstrator's dashboard. It was decided that the dashboard should display the length of each call centre queue, the average length of queue for the call centre, the total load on the backend sales processes, a trace of the number of virtualised services in use and, of particular importance, a simplified view of how close the business is to breaching its SLA.

The back-end process information panel includes a graph showing the number of virtualised services, which is the number of additional processors that are running instances of the sales process.

The data that the dashboard displays arrives as JMS messages from the various hooks and monitors described in the previous section. The general process for displaying information is for JMS messages to be converted to XML files that can optionally be aggregated before being passed to the Apama Dashboard. The dashboard is, in fact, only a small part of Progress Software's Apama software, which offers a much broader range of Complex Event Processing (CEP) functions [4]. If using CEP, data can be dashboarded straight from the processor. However, if CEP is not used (i.e., the correlation engine is not employed, also known as 'standalone mode'), the dashboard gets its data from XML files, updating its display as the files change.

As described earlier, various pieces of information are gathered to be used by the control and visibility components. The dashboard shows a variety of this information as well as calculated or aggregated data, as shown in table 1:

Table 1. Information shown on the dashboard
        Table 1. Information shown on the dashboard

Figure 4. Example XML message showing the length of a call centre queue
        Figure 4. Example XML message showing the length of a call centre queue

Figure 5. Example XML message showing the average length of a call centre's queues
        Figure 5. Example XML message showing the average length of a call centre's queues

5.4.1 Example: Displaying a trace of average queue length
As intimated above, the process of displaying information on the dashboard is not entirely trivial and involves a number of steps. For example, to display the average queue length on the dashboard, the sequence is as follows:

Step 1
The queue monitor component polls the call centre queues every five seconds to monitor their lengths, observes a change and outputs a message that details the new queue length, as in figure 44. The example says that queue nbis.CC02.Q01 (which is Queue 1 of Call Centre 2) now has 14 calls (represented by 14 XML messages) waiting.

Step 2
This message is received by the KPI generator, which keeps track of the length of each queue. It stores the new queue length, recalculates the average, and creates and sends a message containing the new average queue length, as in figure 5. This example says that Call Centre 2 now has an average queue length of 10.25. The format of the message is specified by the Apama dashboard.

Step 3
This message is received by a component that 'drops' it as an XML file.

Step 4
The dashboard detects that the XML file has changed; it inspects the XML and displays the new average queue length. (The dashboard actually shows average queue length as a trace with the most recent value at the right-hand side of the graph.)

5.4.2 Example: Displaying a bar graph of queue lengths
A similar sequence exists for displaying a bar graph showing queue lengths (see figure 6):

Step 1
The queue monitor notices a change and outputs the new queue length.

Step 2
This message is received by a component that 'drops' it as an XML file.

Step 3
A Perl script detects that the XML file has changed, and notes which call centre the change relates to. It then inspects all XML files for that call centre to extract the queue length, and outputs a single XML file containing all the queue lengths, as in figure 7. The format of the message is specified by the Apama dashboard. Figure 6 shows this example message displayed as a bar graph.



4 The format might seem a little elaborate for such a simple message; this is because queue length was originally monitored via the ESB's management framework and that generates this format. Technical problems necessitated moving to a Java application instead and the format was retained for simplicity.

Figure 6. An example of the bar graph that shows queue lengths
        Figure 6. An example of the bar graph that shows queue lengths

5.4.3 Extension of dashboard to highlight virtualised services
To emphasise the dynamic use of virtualised services during the demo, a Java application was written to illustrate the state of each real and virtual server using stylised graphics. Figure 8 shows the initial state of the application's display, where only the main server ('Cain', part of the business's own IT estate) is in use. Figure 9 shows that two virtualised compute resources have been added.

5.5 Control components
The Decision Management System (DMS) is the component that monitors the state of the system and decides what corrective action, if any, to take. It shares hooks and monitors with the dashboard, as described in the previous two sections, and sends out control messages to components that can effect changes.

It should be noted that the goal of the demonstrator is to showcase SOI technology, not to create innovative new resource management algorithms; the algorithms are kept intentionally simple as a result. The DMS has two areas of control: the number of operators per queue and the number of virtualised resources provided.

Figure 7. Example XML message for the dashboard to render as a bar graph
        Figure 7. Example XML message for the dashboard to render as a bar graph

5.5.1 Operators per queue
The number of operators required to attend a particular queue is defined as:

Function

where: - noperators is the number of operators required;
- ncalls is the number of calls in the queue (i.e., the queue length);
- C is a constant representing the acceptable number of queued calls per operator; and
- floor(�) returns the integer part of its argument.
If C is set to 3, the number of operators based upon the queue length is as shown in table 2.

Table 2. Operators vs. queue length
        Table 2. Operators vs. queue length

Figure 8. Stylised display of compute resource showing that only the main physical machine is in use
        Figure 8. Stylised display of compute resource showing that only the main physical machine is in use

Figure 9. Stylised display of compute resource showing that two virtualised resources are also in use
        Figure 9. Stylised display of compute resource showing that two virtualised resources are also in use

This relationship between these variables is of course overly simplistic: for example, in reality different queues would have different priorities5. At first glance, it may seem as though there are inactive operators waiting to be assigned a queue. However, contact centres (the more generic term for call centres) can handle email as well as telephone calls, so operators not attending a call queue might well be responding to email queries. Of course, email does not demand the quick response that a phone call does, so an operator can be moved swiftly and easily from email duties to phone duties.

5.5.2 Virtualised resources
Additional virtualised resource is deemed to be required if:

Function

where:
- r is the current total request rate;
- n is the current number of virtualised resources handling the load;
- N is the (fixed) number of physical (i.e., real, nonvirtualised) resources, which is 1 in the demonstrator; and
- Rproblem is the rate at which a server is operating at a capacity beyond which overloading problems will start to occur.
The number of additional virtualised resources required can be shown to be:

Function

where the ceiling(�) function rounds its argument up to the nearest integer.
In a real system, the actual load on each server would be monitored instead of the request rate.

5.6 Traffic generation components
5.6.1 Graphical user interface (GUI)
Figure 10 shows the simple web page used to control the traffic generation components. There are a number of pre-defined scenarios for 'steady state' (i.e., normal) conditions or problem conditions, which can be run separately or in combination. A standard demo run involves running 'steady state' for the duration of the demo, then triggering problem conditions separately. The three main problem scenarios6 are:

1 an increase in web traffic due to some trigger (e.g., in response to a TV advertisement);
2 an increase in calls to a particular call centre queue (e.g., in response to a problem with broadband in a particular region); and
3 an increase in calls on multiple call centre queues (e.g., a fault affecting multiple services).



5 Strategically, businesses may place different priorities against different queues. Similarly, specific communication methods (e-mail, calls, etc.) may be favoured for certain situations (faults, offers, etc.) and weightings may apply. For clarity, these equations do not include such strategic weightings.

6 As mentioned earlier, combinations of these scenarios can be shown at the same time to demonstrate how extreme demand is handled across multiple channels.

5.6.2 HTTP request generator
This component generates HTTP traffic for the external web portal. The HTTP requests are generated in the same way as when a call centre operator (component) makes requests on the back-end system; the differences are that the requests are made on a different web portal (because the requests are external and hence from untrusted sources) and that multiple customers are simulated instead of just one.

As mentioned under 'operators' in section 5.2.3, the simulation is pseudorandom but is based upon likely sequences of requests. Figure 11 shows a flow diagram used for the generation of HTTP requests. Note that, for clarity, the actual selection of items to query/order is not shown and that random(1) is a function that returns a random floating point number between 0 (inclusive) and 1 (exclusive).

5.6.3 Call generator
This component is triggered when the user selects a scenario (by clicking on the relevant text or associated button on the GUI) and generates call traffic in the form of JMS messages for the call channel, injecting them into the IVR system. Each scenario defines:

- a duration over which it takes place;
- which queues are involved in the scenario; and
- how many 'calls' each queue should receive.

Figure 10. The web-based GUI used to control the demonstrator
        Figure 10. The web-based GUI used to control the demonstrator

'Calls' are distributed evenly throughout the scenario. Each call centre queue has an average call duration associated with it which is used with a slight random adjustment to decide the duration of each 'call'. For example, scenario 1 involves overloading a single call centre queue: 'sales'. The average duration of 'calls' on this queue is defined to be 60 seconds, the duration of the scenario is 30 seconds and the number of 'calls' the queue will receive in this time is 30. Thus, the rate of calls is one per second. The 'steady state' for the sales queue is about three calls waiting, which equates to two operators attending the queue (see figure 12).

Three seconds after the scenario starts, there will be six calls in the queue, so another operator will be added (which will immediately start handling one of those calls). Four seconds later, the number of queued calls will be nine so another operator will be added, and so on until either the scenario finishes (which will be the case here) or the maximum number of operators (arbitrarily chosen to be 10) is reached.

Subject to certain simplifying assumptions7, figure 12 shows how the number of calls and operators varies with time from the start of scenario 1. The graph visualises how the system rectifies an overloaded queue automatically by adding more operators.

Figure11. Flowchart showing HTTP requests simulating external web traffic
        Figure11. Flowchart showing HTTP requests simulating external web traffic

Figure 12. How the number of calls and operators varies with time
in scenario 1
        Figure 12. How the number of calls and operators varies with time in scenario 1

6. The demonstrator


The demonstrator is highly configurable allowing a variety of problems (and corresponding solutions) to be showcased. They are:

- telesales centric problems (i.e., the demonstrating user can trigger telesales-related problems like overloaded queue or multiple overloaded queues);
- e-commerce centric problems (i.e., the demonstrator user can trigger e-commerce-related problems like overloaded process – the one described in this paper);
- telesales and e-commerce problems together (i.e., overloaded queues and processes), simply by clicking on two problem scenarios in the GUI.

The highly configurable nature of this demonstrator ensures that presentations can be tailored to meet the differing interests of a variety of customers.

6.1 Multi-channel retail scenario
The SOI concept demonstrator shows how business processes and applications can be created using virtualised resources. Feedback suggests that this is valuable to those demonstrating the principles of SOI and that it is easy to understand and widely relevant.

Feedback also suggests that as a concept, virtualisation is easily grasped but its significance to business in terms of cost reduction, reliance and agility less so. So this demonstrator, which shows how a virtualised (flexible) infrastructure supports business processes, provides a useful portion of the narrative when explaining SOI and how it helps support businesses during a time of high and unpredictable demand.

6.1.1 Demonstration part 1: without SOI
Demonstrator Part 1 shows that with a fixed IT estate there are limits on how much demand can be handled, across all channels. The SLA information graph shows a breach, indicating the infrastructure's inability to handle this peak in demand and more importantly the business's failure to maintain its side of an agreement. Because of paralysis in the sales process, the subsequent processes (e.g., billing, accounts, warehousing, dispatch and so on) do not receive any sales input. As a result, the business is unable to continue trading.

Figure 13. Dashboard showing the system under normal load
        Figure 13. Dashboard showing the system under normal load

Figure 14. Illustrating the effect of overload on the sales process
        Figure 14. Illustrating the effect of overload on the sales process

Figure 15. Handling increase in demand for the sales process
        Figure 15. Handling increase in demand for the sales process

Figure 16. Handling peak demand
        Figure 16. Handling peak demand

Figure 17. Virtualised sales process relinquished
        Figure 17. Virtualised sales process relinquished

Demonstrator characteristics
- Fixed IT estate
- No automation
- Visibility

Stage 1
System normal – i.e., the business experiences normal demand for its sales process. This is shown in figure 13. The machine that appears lit is the server that hosts the sales process – that is, the business's own IT estate.

Stage 2
Demand for the sales process increases to a level unmanageable with only fixed IT estate supporting it (figure 14, graph A). The SLA information graph indicates a breach (graph C), showing that the level of demand on the sales process was enough to paralyse all channels and prevent the ability to trade. Note that no virtualised services are used (graph B).

6.1.2 Demonstration part 2: with SOI
Demonstrator characteristics
- Flexible IT estate
- Automation
- Visibility

Stage 1
System normal – i.e., the business experiences normal demand for its sales process – same as figure 13.

Stage 2
Demand for the sales process increases as can be seen in figure 15. Both on the e-commerce channel traffic monitor and the total load monitor in graph B.
A DMS is programmed to recognise this state as requiring a proactive response, in this case, starting a virtualised sales process to help load-balance demand. Graph C indicates a container coming online to host that process. For demonstration purposes, this is also represented graphically (inset A).

Stage 3
Demand reaches its peak (figure 16, graph B). Accordingly, virtualised processes continue to be started to help loadbalance that demand (graph C and graphically, inset A). Unlike the first part of the demonstrator, the SLA information graph (graph D) shows that at no point has the SLA been breached. Due to the infrastructure's ability to flex, at no time and on no sales channel was the business unable to trade.

Stage 4
Demand on the sales process from the e-commerce channel decreases to the point where virtualised sales processes are relinquished (figure 17, inset A, graph B and graph C). Graph D shows that the SLA status was unaffected by running this demonstration.

7. Conclusion


The demonstrator successfully showcases the key benefits of SOI. When demand on the sales process increases beyond the capacity of the current IT estate, automated system optimisation can be observed. Stakeholder feedback confirms the demonstrator is a useful sales and marketing tool.

7.1 Observations
A representative definition of SOA is: ��a framework for integrating business processes and supporting IT infrastructure as secure, standardized components (services) that can be reused and combined to address changing business priorities.� [5]

Since one of our objectives was to establish whether SOA really is a useful architectural approach, we tried to keep to the spirit of this definition when building our demonstrator. We found this difficult in the area of service reuse. In theory, it sounds like an excellent idea, but it is a little more difficult to put into practice. Reusable services have to be sufficiently generic that they can be used outside of their original purpose, but this involves significant effort to implement. Of course, if the service ever does get reused, this additional effort was probably justified, but it has been our experience that this is rarely the case. Furthermore, reusable services tend by definition to be fine-grained, which means that multiple services are required to perform higher-level functions. This results in the additional overhead of message flow between these services, thereby reducing efficiency and speed.

Many of the services in the demonstrator were most simply, quickly and efficiently implemented as custom services. Indeed, efficiency is a consideration which seems to get overlooked when evaluating SOA: under normal load and network conditions a single dedicated, optimised service will inevitably perform better than a generic, dynamically-configured, distributed service. Care must therefore be taken when deciding whether to use a process utilising many low-level generic services rather than a single optimised service. The former will likely be more robust and will certainly adhere to SOA's reuse tenet, but the latter will be significantly faster.

7.2 Improvements and further research
7.2.1 Green credentials
By incorporating sustainability audit information on the dashboard, the green credentials of our demonstrator could be enhanced. The concept of a flexible infrastructure whose capacity rises and falls in line with demand is already a 'green' approach when compared to the over-provisioning alternative. In addition, it is beneficial for companies to measure and log the energy used by their servers (virtual or otherwise). Server capacity is now a tradable commodity in its own right.

Aside from the advantages of identifying where and how much energy is used, information regarding a company's carbon footprint can contribute to strategic plans.

7.2.2 Warm start to cold start
To achieve true energy saving, we need to be able to 'cold' start the machines in an SOI. In the current implementation, virtualised processes residing in software containers are started on machines that are already powered on – that is, a 'warm start'. If our demonstrator is to possess full green credentials, the machines should normally be off and until they are required (that is, operated in 'cold start' mode).

One possible next step is therefore to convert from 'warm start' to 'cold start', thus enhancing the demonstrator's green credentials and completing the flexible infrastructure narrative.

7.2.3 Sales metrics
While the benefits of SOI are clearly demonstrated, overall sales metrics would be a useful addition to the dashboard. Seeing sales start to fall during what should be a peak of demand would demonstrate even more clearly that infrastructure flexibility and real-time business process visibility are essential prerequisites for an agile business.

7.2.4 Improved dashboarding
The current demonstrator implementation renders events from the business process and IT layers as graphs and traces on the dashboard. Research is needed to identify how to combine events to produce information which the user will find helpful. Events by themselves are not the only consideration: reports and logs which indicate business health may also be available. A system that can return specific reports in response to a user's general request does not yet exist.

Demonstrating business process and IT event visibility in real time has been well received. There are many events that become useful as part of a bigger picture. Aggregation of events in an intelligent way, producing graphs or reports is an area of future research.

Glossary


- Container: There are two kinds of container in the Sonic ESB: Management Framework (MF) containers and ESB containers. The latter, which contain ESB service instances, are deployed into the former.
- Content Based Routing (CBR): XML is inspected using, for example, XPath and the message routed to an endpoint accordingly.
- Endpoint: An ESB Endpoint is an abstraction layer on top of JMS queues and topics that enables them to be used by services on the ESB without the service having to know the underlying messaging model (i.e., point-topoint for queue-based messaging or publish/subscribe for topic-based messaging).
- ESB Service: An ESB service is a consumer of messages that may or may not produce messages. Services subscribe and optionally publish to ESB endpoints.
- Queue: A JMS queue is designed to support a point-topoint messaging system. It is, however, helpful to consider queues in terms of the publish/subscribe messaging system: essentially, a queue can have multiple publishers and subscribers, but each message can only be consumed by a single subscriber. Each messages remains on a queue until a 'subscriber' consumes it. Queues are only used directly when connecting to the JMS layer that underpins the ESB; services on the ESB access the queue via a mapped endpoint.
- Sonic MQ: Sonic's implementation of the JMS (Java Message Service) API. The Sonic ESB is built upon Sonic MQ.
- Topic: A JMS topic is designed to support a publish/subscribe messaging system. Conceptually, a topic is essentially a queue (in the general sense) that can have multiple publishers and subscribers, but each message can be consumed by zero or more subscribers. In contrast to queues, a topic does not store messages by default unless a particular subscriber is marked as 'durable'. If it has no subscribers, any message published to the topic is immediately lost. Topics are only used directly when connecting to the JMS layer that underpins the ESB; services on the ESB access the topic via a mapped endpoint.

References


  1. Gartner Inc, 'Gartner says it leaders must take ownership of the business outcome, not just concentrating on the IT elements', Press release, October 8, 2007 (accessed at http://www.gartner.com/it/ page.jsp?id=529408) Top
  2. Wittgreffe J and Warren P, 'Editorial', BT Technology Journal, vol.26, no.1, September 2008 Top  
  3. Deans P and Wiseman R, 'SOI: technology and standards for integration', BT Technology Journal, vol.26, no.1, September 2008 Top  
  4. Apama web site, http://www.progress.com/apama Top
  5. FiereWorks, 'SOA Definitions', http://www.fiereworks.nl/soadefinitions. html Top

Paul DeansPaul Deans joined BT in 1984 and has undertaken a wide variety of roles across many technical areas, including analogue/ digital design, text-to-speech (BT's Laureate), 3D avatar creation for BT's TalkZone in the Millennium Dome, mobility, multimodal portals and SOA. In those areas he has performed application development, project management and research roles. As a Senior Researcher he leads research on SOA including architectural design, implementation and delivery of the SOI concept platform and scenarios. He also offers thought leadership on the Enterprise Service Bus and end-to-end visibility dashboards. He is currently studying for the BT MSc.





Richard WisemanRichard Wiseman is a Senior Researcher in the IT Futures Centre at BT. He joined BT in 1996 with an MEng honours degree in Electronic Systems Engineering from the University of York. He worked for three years researching pronunciation variation for speech and speaker recognition, followed by a few months simplifying the process of rationalising BT's customers' VPNs. Richard then worked for six years in the area of multimodal systems, first developing a portal that synchronises different types of web content (including voice and HTML), and subsequently focusing on multimodal applications for mobile devices. Most recently, he has worked in the areas of Service Oriented Architecture and Infrastructure (SOA and SOI), undertaking component design and implementation for a testbed and demonstrator to showcase BT's SOI technology.





« Previous | home | Next »