Service oriented infrastructure (Part 2) |
Service-oriented infrastructure: proof of concept demonstrator
P Deans and R Wiseman
Organisations are increasingly aware of the benefits they can gain by introducing Service–Oriented Architectures (SOAs),
principally when it comes to flexibility, agility and reuse. Such benefits are not available by default, however. To secure them,
an SOA must be underpinned by a Service-Oriented Infrastructure (SOI) in which both the networks and IT systems are
monitored and intelligently controlled to match their performance to the user's expectations.
This paper looks at BT's research in this area and describes the proof of concept demonstrator we have constructed to
demonstrate SOI's potential. Based on a real-life example, it features an SOA in which demand for the services on offer varies
over time and shows how the underlying SOI can be adjusted automatically to keep overall performance within pre-defined
limits. The result is used to explain how BT's SOI can support a current or potential customer's SOA strategy, keeping its
business running smoothly as the demand on business processes and the services that support them varies.
1. Introduction
This paper details the creation of a proof of concept
demonstrator which both validates our SOA (Service-Oriented
Architecture) research and provides a means for line-ofbusiness
stakeholders to present our research to their potential
customers. It describes an SOA infrastructure using Enterprise
Service Bus (ESB) technology combined with virtualisation,
event correlation and event dashboarding and has been
created to better understand and demonstrate the viability of
BT's approach to Service-Oriented Infrastructure (SOI).
The demonstrator also showcases how technologies like
SOA and SOI benefit both those deploying the technology,
such as operators like BT, and those customers who make use
of it, for example, BT customers.
Overall, the three key SOI benefits are flexibility, visibility
and automated control.
1.1 Flexibility
Modern businesses need networked IT infrastructures that
are highly flexible.
First, they need to be able to change their size and structure
quickly in response to market pressures, when they complete
mergers and acquisitions, and on a host of other occasions.
Typically, such changes will require amendments to their IT
infrastructures, and these must be completed quickly if their
commitments to both existing and newly-acquired customers
are to be honoured. Should the amendments take too long to
complete, initiatives conceived to create competitive advantage
can all too easily end up weakening a firm's market position.
Second, businesses need to be able to respond quickly as
demand for their products and services varies. Significant
increases in demand can cause particular problems. Even if a
policy of deliberate over-provisioning has been adopted,
there is no guarantee an IT infrastructure will be able to cope.
SOI responds to these needs.
1.2 Visibility
Business and IT managers are often forced to make decisions
based on data that has taken time to percolate through their
organisation, and is therefore somewhat out of date. To improve
the quality of their decision making, they need real-time access
to information about their organisation's business processes.
Consider the problems of operating an online store around
the Christmas period, for example. Demand will be significantly
higher than at other times, and this will push stock management,
resource management and IT infrastructure to the limit. Clearly,
managers will find it easier to cope if they have access to realtime
information and, by inference, up-to-date knowledge.
The increased levels of real-time visibility possible using
SOI also benefit businesses at times of change, ensuring that
decisions made are relevant to the shifting conditions.
1.3 Automated control
Business processes can be performed manually or automated by
machine. Automation reduces the need for repetitive tasks to be
performed manually, thereby reducing cost in terms of both time
and human error.
Other benefits of automation emerge when the IT and
network infrastructure that supports business processes is
considered. Businesses can react faster if their infrastructure's
capability and performance are automatically adjusted in
response to changes in demand.
For example, additional (virtual) processing resource
could be provided automatically – and almost immediately –
to support a sales process that's experiencing a temporary
increase in demand. In contrast, it would take days, weeks or
even months to provide the additional capacity using
traditional 'manual' approaches based on the purchase, build
and deployment of extra servers and software.
Gartner endorses this automated approach. �Exploit
technologies like virtualization to lubricate the gears of IT,
permitting quick shifts�, said Carl Claunch, a vice president at
the company. �Apply automation; it not only helps cut down
rising labour costs, but it accelerates responses to events and
delivers consistent, repeatable actions.� [1]
1.4 Service-Oriented Architecture (SOA)
We see SOI is an essential enabler for the Service Oriented
Architectures (SOAs) now being deployed by organisations all
over the world [2].
In an SOA, functional software components exist as
individual services. These are available over a network through
standardised software interfaces. A major benefit is that the
way services are implemented can be changed at any time. As
long as their interfaces remain unchanged, there is no impact
on the other services and systems that use them. By promoting
reuse, such 'loose coupling' enables cost reduction.
Many companies have prioritised their investments around
SOA in the hope that reusability and other attributes such as
flexibility, agility and control are attained by default. SOA
certainly offers abstracted ownership of processes, potentially
allowing multiple users (or even organisations) to share
processes. However, there are no guarantees that an increase in
demand can be handled. Despite being service-oriented, SOA
does not provide agility by default or performance guarantees.
The same principle applies to virtualised services.
Applications are distributed or load-balanced across a
number of machines, enhancing flexibility and reuse.
However, because of the abstracted ownership and multiple
servers, control becomes harder to achieve.
Similarly, the task of billing customers becomes more
complex if they share use of infrastructure and processes. To
benefit fully from an SOA, organisations therefore also need
a flexible, agile and controllable ICT infrastructure that's
ready to support it – one that can adapt as demand for
services comes and goes.
2. SOI concept demonstrator
BT's SOI has been created to guide businesses through SOA
adoption. The high quality infrastructure is equipped with
systems that ensure the levels of agility, flexibility and control
organisations need to respond to the unpredictable demands
typical during periods of growth and change.
Our concept demonstrator was designed both to
demonstrate SOI's viability and to showcase the key
technologies and associated benefits for business customers.
2.1 Key requirements
It was important for us to be able to convince both IT
managers and business managers of SOI's key benefits.
IT managers often obtain their budgets from business
managers, while business managers often set IT-based
policies on the advice from IT managers.
We decided that a demonstrator that emphasised the
levels of flexibility, automated control and visibility possible
through SOI would be the most powerful sales tool.
2.1.1 Requirement 1: Flexibility
The demonstrator must show how enterprises can become
more agile by investing in SOI and the flexible infrastructure
it offers.
The feature of SOI that is most relevant is its ability to
automatically optimise a system in response to demand changes.
The demonstrator highlights this capability by
automatically recognising a negative situation and correcting
it. It shows how an organisation can maintain control of its
operations in the face of sudden changes in traffic.
2.1.2 Requirement 2: Automated control
By tying the performance of business processes to a Service
Level Agreement (SLA), organisations can ensure their
operations perform within defined parameters (such as sales
process response time and sales per hour). Real-time
information helps managers monitor such SLAs.
The demonstrator must show both what happens when
automatic system optimisation does occur – that is,
performance is kept in line with the related SLA because
resources were brought online to meet the additional with real-time
information from the business processes across their
enterprise. It can improve the 'currency' of their
understanding of their organisation's performance.
The ability to see how a system is performing against an
SLA before, during and after system optimisations, and thus
whether a business is efficiently meeting its legal, contractual
and physical obligations, therefore ranked highly as a
requirement for the concept demonstrator.
2.2 Key architectural requirements
The main architectural requirement was that there must be a
strict adherence to three well known principles of SOA [3],
namely:
- service reuse – benefits are: accelerated implementation of
new business functions and changes to existing ones, lower
effort and risk, reduced cost, quicker implementation;
- composite applications built by combining services –
simplifies creation of new applications to respond
quickly to market changes; and
- loosely coupled systems – benefits are: greater
flexibility, increased implementation agility, improved
process efficiency, a higher degree of automation.
To meet these requirements, an ESB was chosen as the
underlying software infrastructure for the test bed. The
primary advantage of such a messaging technology is that it
promotes flexibility in the transport layer, enabling loose
coupling and easy connection between services, in essence
helping facilitate an SOA.
3. Demonstration scenarios
We chose two scenarios based on real-world business
problems. The first is based around a call centre, while the
second focuses on a multi-channel retail organisation. The
second demonstration builds on the first, adding in-store and
online sales channels to the existing call centre sales channel.
3.1 Scenario 1: Call centre
3.1.1 Problem statement
In call centres, problems that affect customer throughput need
to be resolved quickly and efficiently to avoid leaving customers
with a poor impression. Taking BT's own business as an
example, an increase in the number of faults experienced by
our broadband customers could increase the number of calls in
a call centre queue dedicated to handling broadband faults. The
call centre supervisor might only become aware of the problem
when a number of call centre operators observe and report the
trend. A slow reaction to such problems inevitably means a
backlog of increasingly disgruntled customers.
3.1.2 Solution statement
A system that improved or automated the control of resources in
the call centre would minimise backlog and reduce the chance of
customer upset. For example, when broadband fault reporting
queues become overloaded, the system could notify the call
centre supervisor automatically. Alternatively, a process could be
invoked to reroute customers to alternative queues, or even to
reassign operators from quieter queues to handle broadband fault
reports. This would in turn require changes to the underlying ICT
infrastructure, allowing requirement 1, to be illustrated through
automated system optimisation in response to demand changes.
As already discussed, an ESB facilitates an SOA which, if
used as the architectural model in a call centre, allows Computer
Telephony Interface (CTI) data and other call centre information
to be aggregated and used as part of a decision making process.
The CTI data involved could include caller location, customer
waiting time, customer identity, topic of call and so on.
3.1.3 Design
The overall design pattern is that of a feedback loop. A
Decision Management Service (DMS) subscribes to relevant
topics of information available on the ESB such as queue size,
location information and the nature of fault. It then maps
problems to processes that counteract them.
To continue the above example, the input to the DMS would
be the number of calls about broadband faults plus location
information relating reported faults to particular exchanges. The
output of the DMS (that is, the resultant action) would be
instructions to run a number of processes, perhaps including
'page the current supervisor – perhaps with problem details' and
'reassign available operators to the BT broadband fault queue'.
3.2 Scenario 2: Multi-channel retail
Even if they have physical stores, retailers are increasingly likely
to use online sales channels to increase their customer base and
reduce the cost per sale. Often, they will want to make existing
back-end processes available to these new channels. Our
second scenario illustrates this multi-channel approach.
3.2.1 Problem statement
In a real-life situation, a particular hotel chain used a multichannel
approach in which the same business process was
used both by its telesales operators and by third-party
websites. Queries from the websites caused intermittent
increases in demand on the hotel chain's sales business
processes that sometimes prevented its own telesales
operators from making reservations. The results were
dissatisfied customers and lost revenue.
3.2.2 Solution statement
The problem could be solved by giving queries from third-party
websites a lower priority than those instigated by call centre
agents. However, this approach is heavy-handed in that, even
though many online queries don't turn into sales, plenty do.
A better solution would be one where online queries
are not restricted and no restrictions are placed on sales
process usage at all. If the sales process becomes
overloaded, a new sales process instance could be brought
online. This flexible approach is at the heart of SOI: a
temporary flexing of process power to meet a temporary
increase in process demand.
We decided to incorporate this capability into the
demonstrator to showcase virtualisation within SOI and meet
requirement 1. By equipping the demonstrator with three
channels, we would also be able to show how an SOI solution
would respond as demand creeps from one channel to
another (cross-channel interference). For example, if the
online sales portal fails, customers are likely to use the call
centre sales channel instead, increasing demand on the
software, infrastructure and staff supporting telesales.
4. Demonstrator design
4.1 Logical design
Figure 1 shows a logical view of the demonstrator's
architecture. It consists of three sales channels:
- an external web channel;
- a call centre (telesales) channel; and
- an (internal) in-store web channel1.
For the purposes of the demonstrator, each channel's
traffic is created artificially by a traffic generator.
Channels feed into a back-end sales process via internal and
external portals; the sales process queries back-end databases
and sends a confirmation email when an order is placed.
Figure 1 shows only the basic demonstrator
environment. The visibility and control components are not
shown. Visibility is provided via a dashboard that displays, in
real time, raw and composite data collected from various
points in the infrastructure. To provide the required control
capability, this same data is utilised by a DMS to
automatically effect changes that keep the system within its
SLA. The DMS can, for example, assign additional operators
to a call centre queue and bring additional machines online
to cope with an overloaded sales process.
4.2 Component-level design
Components are built from standard ESB services and
processes, custom Java services and, where necessary2, Java
applications. Referring to figure 2, each component is
attached to the bus via endpoints (queues or topics) so that
the arrows between components represent a soft connection.
Bus connectivity is an important contributor towards the
visibility requirement as events can be made accessible to the
dashboard in real time in either a raw or composite form.

Figure 1. Logical view
1 In-store query and sales traffic (representing intranet traffic from in-store
staff) is also generated but never reaches the query/sales process. Its
inclusion in the dashboard is primarily to enhance the multi-channel
scenario narrative. An extension to this demonstrator might include in-store
traffic monitoring and some reaction by the DMS to that traffic, resulting in
an improved service for users of the system.
2 Services are managed by the ESB and can have only one entry endpoint
(i.e., Java Message Service topic or queue) and as many exit endpoints as are
required. Where a service requires more than one entry endpoint (e.g. the
DMS) it has to be built as a separate JMS application.
5. Demonstrator components
5.1 Component types
Five categories of component are used in the demonstrator:
- infrastructure components, providing the setting for the
demonstration – they are the stage upon which the
demonstration takes place;
- hooks and monitors, enabling the visibility and control
components;

Figure 2. Component view
- dashboard components, making visible what is
otherwise a 'black box' demonstration;
- control components, controlling the infrastructure to
achieve the desired performance; and
- traffic-generation components, simulating use of the
infrastructure components.
Communication between components is accomplished
using XML messages. Each component will be described in
the following sections.
5.2 Infrastructure components
The base upon which the demonstrator is built comprises:
- back-end systems, the sales processes used by the
three communications channels;
- an external web channel, handling internet queries;
- a call centre channel, handling telephone calls as
would a traditional call centre; and
- an in-store web channel, handling intranet queries
from staff in the company's shops.
5.2.1 Back-end systems
One sales process was built to handle queries and another to
handle orders. The query process involves parsing incoming
XML messages to extract item identities, querying an ESBbased
database of information and converting the returned
XML. The order process is essentially the same, except that it
also queries a Customer Relationship Management (CRM)
database to complete the order and email the customer. A
real system would also handle billing and stock levels.
5.2.2 External web channel
The main elements of the web channel are online requests, a
web server and supporting back-end systems. The overall
process is quite simple:
- The web-based user clicks on a link – for example,
'submit for a quote' or 'buy'.
- The web server receives and parses the request to
determine what to do.
- The web server accesses back-end systems as
appropriate to fulfil the request.
- The web server returns its response to the request.
The majority of the components in this channel are
realised using the same technology as would be found in a
live system, the primary exception being the user. The way in
which users are simulated is discussed in section 5.6. Other
components are discussed below.
In our demonstrator, an Apache Tomcat server is used to
host a Java servlet that parses requests and creates and sends
the required JMS (Java Message Service) messages to the backend
sales process. It is essentially an HTTP-to-JMS proxy.
5.2.3 Call centre channel
Note: It must be understood that the call centre is
not the focus of the demonstration, but a carefully
chosen scenario in which to showcase SOI.
At a very basic level, the main elements of a traditional call
centre are the incoming calls, an Interactive Voice Response
(IVR) system to steer the calls to queues, the queues
themselves, a team of operators and back-end support
systems. The overall process is:
1. A call arrives at the call centre.
2. The caller uses speech or the telephone's keypad to
navigate some kind of menu structure and the call is
directed to the appropriate call centre queue.
3. One or more operators are attending the queue, and as
operators become free, they take the next waiting call
from the queue.
4. An operator handling a call speaks to the caller and
accesses back-end systems in order to fulfil the caller's
requirements.
In our demonstrator, software substitutes perform each
of the roles. These are summarised below and, where
appropriate, described in detail in the following sections:
- Calls – XML messages represent the requirements of the
fictitious caller.
- IVR system – Content-Based Routing (CBR) is used to
direct each XML message to the appropriate queue.
- Queues – Since JMS queues and topics are the bedrock
of the Sonic ESB on which the demonstrator is built,
each call centre queue is easily mapped to a JMS queue.
- Operators – Each operator is represented by a running
Java thread. This was originally achieved with each
operator as a separate ESB service, but for various
reasons a single Java application was used instead,
running multiple threads.
- Back-end systems – ESB processes ('sales processes')
were created to handle queries and orders from the
operators, and in the interests of realism, were placed
behind a web server.
Calls
Instead of actual calls being handed off from one component
to another, XML messages are passed between components
to represent them.
It is assumed that when a call arrives, its CLI (Caller
Line Identification – the caller's phone number) is
extracted and that the caller's regional information is
inferred. The XML messages represent not only the call
but the caller.
In a real IVR system, the caller makes selections based
upon the purpose of their call. The absence of a real person
in the demonstration system means this information must be
extracted from the XML message. The call type and caller's
geographic region information are then used to help decide
the queue to which a call is passed. The IVR component
augments the XML with a call ID, sending the extended
message on after a randomly-generated delay
corresponding to the time a call might have taken to pass
through a real IVR system.
Additionally, an expected duration is supplied. This gives
the traffic generator component finer-grained control over
the simulation. (The actual duration of the call is based upon
this expected duration but with an additional random
component introduced.)
In a real call centre, when operators become
available, they will accept a call from the queue on which
they are working. In the demonstrator, this is modelled by
the next XML message in the JMS queue being consumed
by the Java thread that simulates the operator. As
mentioned above, the actual time for the call to be
processed is the expected duration with a random
component added. Depending upon the type of call,
some or all of the processing time will simply be achieved
by the Java thread sleeping. More detail on this is
provided under 'operators' below.
IVR system
In a real IVR system, callers use the telephone's keypad or
their voice to navigate through a menu designed to direct
their call to the most appropriate team of operators. In our
demonstrator, 'call' direction is achieved using a Content
Based Routing (CBR) service, which inspects the XML to
extract the geographic region and call type and thereby
deduce the appropriate queue.
Queues
In a real call centre, calls wait in queues to be attended.
Messages flow across the ESB on JMS queues3, so these are
ideal for simulating call centre queues. Both call centre
queues and JMS queues operate on a 'first in, first out' basis.
Each call or message is consumed by one operator, real or
simulated (see below).
1 Topics (see glossary) also play a part, but queues are of most interest here
because each message has only one consumer.
Operators
Human operators accept calls from the queues they are
attending, speak to the caller, access back-end systems where
appropriate and (hopefully) resolve calls satisfactorily.
In the call centre simulation, the operator is represented
by a running Java thread and, to enhance realism, this virtual
'operator' will, for certain types of call, make requests to the
back-end sales system.
The sequence of requests that an operator can make is
both logical (for example, any orders placed will only be for
products that have previously been queried) and
pseudorandom (in that probabilities are used to decide
whether to make additional requests). Figure 11 illustrates
this through a flowchart.
Web server
As for the external web channel, a web server is used to
marshal requests to the back-end systems. Internal traffic will
only come from agents or employees of the company, so a
different Java servlet is used that has extended capabilities
compared to that used for public queries. One example of an
extended capability for employees is pricing. Employees may
be given access to a range of offers that differ in price from
those directly available to the public via websites. Employees
would then achieve the most productive sale based on a
combination of their sales technique and offers available.
5.2.4 In-store web channel
This channel is, in a sense, a combination of the other two.
Requests are made via a web browser to a web server, but the
server is the same one as for the call centre channel. The idea
behind this is that the external server used for internet traffic
would likely be capability-restricted compared to the internal
server, which is only used by employees and agents of the
company.
In our demonstrator, the in-store web channel is only
implemented as a placeholder, using simulated status
information.
5.3 Hooks and monitors
The dashboard and control components are driven by
information from hooks and monitors distributed throughout
the system. This section briefly describes each of these, how
it is generated and how it is used.
5.3.1 Call centre queue lengths
A Java application was written that uses the Sonic MQ
management application programming interface to monitor
the length of the JMS queues that represent call centre
queues. It then publishes updates as XML on a JMS topic.
The queue lengths are displayed on the dashboard and
are used by the DMS to assess whether additional operators
are required on each queue.
5.3.2 Internal/external web portal request/query/
order rates
The Java servlets that implement the internal and external
web portals in our demonstrator calculate the rate at which
requests (that is, queries or orders) arrive. Each back-end
sales process (there is one for queries and one for orders) has
a step that calculates the rate of requests. These rates are all
then published as XML on various JMS topics, displayed on
the dashboard and used by the DMS to determine whether
additional resource is required to handle requests on the
back-end sales processes.
5.3.3 Virtualised service status
The Java application that manages and monitors virtualised
compute resources publishes XML on a JMS topic.
ESB processes and services are deployed into ESB
containers, which are deployed into Management
Framework (MF) containers. In turn, these are deployed onto
servers – real or virtual. In the demonstrator, virtualised
services are those deployed onto virtual servers.
Instances of one ESB container can be deployed into
multiple MF containers, and the demonstrator uses this to
implement virtualised services. Each instance of an ESB
container can be started or stopped independently, so new
instances can be brought online when required.
Though this approach works well for demonstration
purposes, showing a fast response to demand changes, it
isn't a viable basis for a real system because the virtual
machine already has to be powered up. In this case, it might
as well be used constantly. We refer to this as a 'warm start'.
Ideally, a 'cold start' arrangement would be used, in
which machines are only powered up (from 'cold') as
required. This is discussed in section 7.2.2.
The possible virtualised resource states, based upon
containers' online/offline states, are:
- MF container offline – virtual server is effectively
unavailable.
- MF container online, ESB container offline – virtualised
services are standing by.
- MF container online, ESB container online – virtualised
services are available for use.

Figure 3. The multi-channel retail demonstrator dashboard
5.4 Dashboard components
Figure 3 shows a screen shot of the multi-channel retail
demonstrator's dashboard. It was decided that the dashboard
should display the length of each call centre queue, the average
length of queue for the call centre, the total load on the backend
sales processes, a trace of the number of virtualised services
in use and, of particular importance, a simplified view of how
close the business is to breaching its SLA.
The back-end process information panel includes a
graph showing the number of virtualised services, which is
the number of additional processors that are running
instances of the sales process.
The data that the dashboard displays arrives as JMS
messages from the various hooks and monitors described in
the previous section. The general process for displaying
information is for JMS messages to be converted to XML files
that can optionally be aggregated before being passed to the
Apama Dashboard. The dashboard is, in fact, only a small
part of Progress Software's Apama software, which offers a
much broader range of Complex Event Processing (CEP)
functions [4]. If using CEP, data can be dashboarded straight
from the processor. However, if CEP is not used (i.e., the
correlation engine is not employed, also known as 'standalone
mode'), the dashboard gets its data from XML files,
updating its display as the files change.
As described earlier, various pieces of information are
gathered to be used by the control and visibility components.
The dashboard shows a variety of this information as well as
calculated or aggregated data, as shown in table 1:

Table 1. Information shown on the dashboard

Figure 4. Example XML message showing the length of a call centre queue

Figure 5. Example XML message showing the average length of a call centre's queues
5.4.1 Example: Displaying a trace of average queue
length
As intimated above, the process of displaying information on
the dashboard is not entirely trivial and involves a number of
steps. For example, to display the average queue length on
the dashboard, the sequence is as follows:
Step 1
The queue monitor component polls the call centre
queues every five seconds to monitor their lengths,
observes a change and outputs a message that details
the new queue length, as in figure 44. The example says
that queue nbis.CC02.Q01 (which is Queue 1 of Call
Centre 2) now has 14 calls (represented by 14 XML
messages) waiting.
Step 2
This message is received by the KPI generator, which keeps
track of the length of each queue. It stores the new queue
length, recalculates the average, and creates and sends a
message containing the new average queue length, as in
figure 5. This example says that Call Centre 2 now has an
average queue length of 10.25. The format of the message is
specified by the Apama dashboard.
Step 3
This message is received by a component that 'drops' it as an
XML file.
Step 4
The dashboard detects that the XML file has changed; it
inspects the XML and displays the new average queue
length. (The dashboard actually shows average queue length
as a trace with the most recent value at the right-hand side
of the graph.)
5.4.2 Example: Displaying a bar graph of queue
lengths
A similar sequence exists for displaying a bar graph showing
queue lengths (see figure 6):
Step 1
The queue monitor notices a change and outputs the new
queue length.
Step 2
This message is received by a component that 'drops' it as an
XML file.
Step 3
A Perl script detects that the XML file has changed, and notes
which call centre the change relates to. It then inspects all
XML files for that call centre to extract the queue length, and
outputs a single XML file containing all the queue lengths, as
in figure 7. The format of the message is specified by the
Apama dashboard. Figure 6 shows this example message
displayed as a bar graph.
4 The format might seem a little elaborate for such a simple message; this is
because queue length was originally monitored via the ESB's management
framework and that generates this format. Technical problems necessitated
moving to a Java application instead and the format was retained for simplicity.

Figure 6. An example of the bar graph that shows queue lengths
5.4.3 Extension of dashboard to highlight virtualised
services
To emphasise the dynamic use of virtualised services
during the demo, a Java application was written to
illustrate the state of each real and virtual server using
stylised graphics. Figure 8 shows the initial state of the
application's display, where only the main server ('Cain',
part of the business's own IT estate) is in use. Figure 9
shows that two virtualised compute resources have been
added.
5.5 Control components
The Decision Management System (DMS) is the component
that monitors the state of the system and decides what
corrective action, if any, to take. It shares hooks and monitors
with the dashboard, as described in the previous two
sections, and sends out control messages to components
that can effect changes.
It should be noted that the goal of the demonstrator is
to showcase SOI technology, not to create innovative new
resource management algorithms; the algorithms are kept
intentionally simple as a result.
The DMS has two areas of control: the number of
operators per queue and the number of virtualised resources
provided.

Figure 7. Example XML message for the dashboard to render as a bar graph
5.5.1 Operators per queue
The number of operators required to attend a particular
queue is defined as:

where:
- noperators is the number of operators required;
- ncalls is the number of calls in the queue (i.e., the queue
length);
- C is a constant representing the acceptable number of
queued calls per operator; and
- floor(�) returns the integer part of its argument.
If C is set to 3, the number of operators based upon the
queue length is as shown in table 2.

Table 2. Operators vs. queue length

Figure 8. Stylised display of compute resource showing that only the main physical machine is in use

Figure 9. Stylised display of compute resource showing that two virtualised resources are also in use
This relationship between these variables is of course
overly simplistic: for example, in reality different queues would
have different priorities5. At first glance, it may seem as
though there are inactive operators waiting to be assigned a
queue. However, contact centres (the more generic term for
call centres) can handle email as well as telephone calls, so
operators not attending a call queue might well be responding
to email queries. Of course, email does not demand the quick
response that a phone call does, so an operator can be moved
swiftly and easily from email duties to phone duties.
5.5.2 Virtualised resources
Additional virtualised resource is deemed to be required if:

where:
- r is the current total request rate;
- n is the current number of virtualised resources handling
the load;
- N is the (fixed) number of physical (i.e., real, nonvirtualised)
resources, which is 1 in the demonstrator; and
- Rproblem is the rate at which a server is operating at a
capacity beyond which overloading problems will start
to occur.
The number of additional virtualised resources required
can be shown to be:

where the ceiling(�) function rounds its argument up
to the nearest integer.
In a real system, the actual load on each server would be
monitored instead of the request rate.
5.6 Traffic generation components
5.6.1 Graphical user interface (GUI)
Figure 10 shows the simple web page used to control the traffic
generation components. There are a number of pre-defined
scenarios for 'steady state' (i.e., normal) conditions or problem
conditions, which can be run separately or in combination. A
standard demo run involves running 'steady state' for the duration
of the demo, then triggering problem conditions separately.
The three main problem scenarios6 are:
1 an increase in web traffic due to some trigger (e.g., in
response to a TV advertisement);
2 an increase in calls to a particular call centre queue (e.g.,
in response to a problem with broadband in a particular
region); and
3 an increase in calls on multiple call centre queues (e.g., a
fault affecting multiple services).
5 Strategically, businesses may place different priorities against different
queues. Similarly, specific communication methods (e-mail, calls, etc.) may be
favoured for certain situations (faults, offers, etc.) and weightings may apply.
For clarity, these equations do not include such strategic weightings.
6 As mentioned earlier, combinations of these scenarios can be shown at the
same time to demonstrate how extreme demand is handled across multiple
channels.
5.6.2 HTTP request generator
This component generates HTTP traffic for the external web
portal. The HTTP requests are generated in the same way as
when a call centre operator (component) makes requests on
the back-end system; the differences are that the requests
are made on a different web portal (because the requests are
external and hence from untrusted sources) and that
multiple customers are simulated instead of just one.
As mentioned under 'operators' in section 5.2.3, the
simulation is pseudorandom but is based upon likely
sequences of requests. Figure 11 shows a flow diagram used
for the generation of HTTP requests. Note that, for clarity,
the actual selection of items to query/order is not shown and
that random(1) is a function that returns a random floating
point number between 0 (inclusive) and 1 (exclusive).
5.6.3 Call generator
This component is triggered when the user selects a scenario
(by clicking on the relevant text or associated button on the
GUI) and generates call traffic in the form of JMS messages
for the call channel, injecting them into the IVR system.
Each scenario defines:
- a duration over which it takes place;
- which queues are involved in the scenario; and
- how many 'calls' each queue should receive.

Figure 10. The web-based GUI used to control the demonstrator
'Calls' are distributed evenly throughout the scenario.
Each call centre queue has an average call duration
associated with it which is used with a slight random
adjustment to decide the duration of each 'call'.
For example, scenario 1 involves overloading a single
call centre queue: 'sales'. The average duration of 'calls' on
this queue is defined to be 60 seconds, the duration of the
scenario is 30 seconds and the number of 'calls' the queue
will receive in this time is 30. Thus, the rate of calls is one per
second. The 'steady state' for the sales queue is about three
calls waiting, which equates to two operators attending the
queue (see figure 12).
Three seconds after the scenario starts, there will be six
calls in the queue, so another operator will be added (which
will immediately start handling one of those calls). Four seconds later, the number of queued calls will be nine so
another operator will be added, and so on until either the
scenario finishes (which will be the case here) or the
maximum number of operators (arbitrarily chosen to be 10)
is reached.
Subject to certain simplifying assumptions7, figure 12
shows how the number of calls and operators varies with
time from the start of scenario 1. The graph visualises how
the system rectifies an overloaded queue automatically by
adding more operators.

Figure11. Flowchart showing HTTP requests simulating external web traffic

Figure 12. How the number of calls and operators varies with time
in scenario 1
6. The demonstrator
The demonstrator is highly configurable allowing a variety of
problems (and corresponding solutions) to be showcased.
They are:
- telesales centric problems (i.e., the demonstrating user
can trigger telesales-related problems like overloaded
queue or multiple overloaded queues);
- e-commerce centric problems (i.e., the demonstrator
user can trigger e-commerce-related problems like
overloaded process – the one described in this paper);
- telesales and e-commerce problems together (i.e.,
overloaded queues and processes), simply by clicking on
two problem scenarios in the GUI.
The highly configurable nature of this demonstrator
ensures that presentations can be tailored to meet the
differing interests of a variety of customers.
6.1 Multi-channel retail scenario
The SOI concept demonstrator shows how business processes
and applications can be created using virtualised resources.
Feedback suggests that this is valuable to those
demonstrating the principles of SOI and that it is easy to
understand and widely relevant.
Feedback also suggests that as a concept, virtualisation
is easily grasped but its significance to business in terms of
cost reduction, reliance and agility less so. So this
demonstrator, which shows how a virtualised (flexible)
infrastructure supports business processes, provides a useful
portion of the narrative when explaining SOI and how it
helps support businesses during a time of high and
unpredictable demand.
6.1.1 Demonstration part 1: without SOI
Demonstrator Part 1 shows that with a fixed IT estate there
are limits on how much demand can be handled, across all
channels. The SLA information graph shows a breach,
indicating the infrastructure's inability to handle this peak in
demand and more importantly the business's failure to
maintain its side of an agreement. Because of paralysis in
the sales process, the subsequent processes (e.g., billing,
accounts, warehousing, dispatch and so on) do not receive
any sales input. As a result, the business is unable to
continue trading.

Figure 13. Dashboard showing the system under normal load

Figure 14. Illustrating the effect of overload on the sales process

Figure 15. Handling increase in demand for the sales process

Figure 16. Handling peak demand

Figure 17. Virtualised sales process relinquished
Demonstrator characteristics
- Fixed IT estate
- No automation
- Visibility
Stage 1
System normal – i.e., the business experiences normal
demand for its sales process. This is shown in figure 13. The
machine that appears lit is the server that hosts the sales
process – that is, the business's own IT estate.
Stage 2
Demand for the sales process increases to a level unmanageable
with only fixed IT estate supporting it (figure 14, graph A). The
SLA information graph indicates a breach (graph C), showing
that the level of demand on the sales process was enough to
paralyse all channels and prevent the ability to trade. Note that
no virtualised services are used (graph B).
6.1.2 Demonstration part 2: with SOI
Demonstrator characteristics
- Flexible IT estate
- Automation
- Visibility
Stage 1
System normal – i.e., the business experiences normal
demand for its sales process – same as figure 13.
Stage 2
Demand for the sales process increases as can be seen in
figure 15. Both on the e-commerce channel traffic monitor
and the total load monitor in graph B.
A DMS is programmed to recognise this state as requiring a
proactive response, in this case, starting a virtualised sales
process to help load-balance demand. Graph C indicates a
container coming online to host that process. For
demonstration purposes, this is also represented graphically
(inset A).
Stage 3
Demand reaches its peak (figure 16, graph B). Accordingly,
virtualised processes continue to be started to help loadbalance
that demand (graph C and graphically, inset A). Unlike
the first part of the demonstrator, the SLA information graph
(graph D) shows that at no point has the SLA been breached.
Due to the infrastructure's ability to flex, at no time and on no
sales channel was the business unable to trade.
Stage 4
Demand on the sales process from the e-commerce channel
decreases to the point where virtualised sales processes are
relinquished (figure 17, inset A, graph B and graph C). Graph
D shows that the SLA status was unaffected by running this
demonstration.
7. Conclusion
The demonstrator successfully showcases the key benefits
of SOI. When demand on the sales process increases
beyond the capacity of the current IT estate, automated
system optimisation can be observed. Stakeholder
feedback confirms the demonstrator is a useful sales and
marketing tool.
7.1 Observations
A representative definition of SOA is:
��a framework for integrating business processes
and supporting IT infrastructure as secure,
standardized components (services) that can be
reused and combined to address changing business
priorities.� [5]
Since one of our objectives was to establish whether
SOA really is a useful architectural approach, we tried to keep
to the spirit of this definition when building our
demonstrator. We found this difficult in the area of service
reuse. In theory, it sounds like an excellent idea, but it is a
little more difficult to put into practice. Reusable services
have to be sufficiently generic that they can be used outside
of their original purpose, but this involves significant effort
to implement. Of course, if the service ever does get reused,
this additional effort was probably justified, but it has been
our experience that this is rarely the case. Furthermore,
reusable services tend by definition to be fine-grained,
which means that multiple services are required to perform
higher-level functions. This results in the additional
overhead of message flow between these services, thereby
reducing efficiency and speed.
Many of the services in the demonstrator were most
simply, quickly and efficiently implemented as custom
services. Indeed, efficiency is a consideration which seems
to get overlooked when evaluating SOA: under normal load
and network conditions a single dedicated, optimised
service will inevitably perform better than a generic,
dynamically-configured, distributed service. Care must
therefore be taken when deciding whether to use a process
utilising many low-level generic services rather than a
single optimised service. The former will likely be more
robust and will certainly adhere to SOA's reuse tenet, but
the latter will be significantly faster.
7.2 Improvements and further research
7.2.1 Green credentials
By incorporating sustainability audit information on the
dashboard, the green credentials of our demonstrator could
be enhanced. The concept of a flexible infrastructure whose
capacity rises and falls in line with demand is already a
'green' approach when compared to the over-provisioning
alternative. In addition, it is beneficial for companies to
measure and log the energy used by their servers (virtual or
otherwise). Server capacity is now a tradable commodity in
its own right.
Aside from the advantages of identifying where and how
much energy is used, information regarding a company's
carbon footprint can contribute to strategic plans.
7.2.2 Warm start to cold start
To achieve true energy saving, we need to be able to 'cold'
start the machines in an SOI. In the current implementation,
virtualised processes residing in software containers are
started on machines that are already powered on – that is, a
'warm start'. If our demonstrator is to possess full green
credentials, the machines should normally be off and until
they are required (that is, operated in 'cold start' mode).
One possible next step is therefore to convert from
'warm start' to 'cold start', thus enhancing the
demonstrator's green credentials and completing the flexible
infrastructure narrative.
7.2.3 Sales metrics
While the benefits of SOI are clearly demonstrated, overall
sales metrics would be a useful addition to the dashboard.
Seeing sales start to fall during what should be a peak of
demand would demonstrate even more clearly that
infrastructure flexibility and real-time business process
visibility are essential prerequisites for an agile business.
7.2.4 Improved dashboarding
The current demonstrator implementation renders events
from the business process and IT layers as graphs and traces on
the dashboard. Research is needed to identify how to combine
events to produce information which the user will find helpful.
Events by themselves are not the only consideration: reports
and logs which indicate business health may also be available.
A system that can return specific reports in response to a user's
general request does not yet exist.
Demonstrating business process and IT event visibility in
real time has been well received. There are many events that
become useful as part of a bigger picture. Aggregation of
events in an intelligent way, producing graphs or reports is an
area of future research.
Glossary
- Container: There are two kinds of container in the Sonic
ESB: Management Framework (MF) containers and ESB
containers. The latter, which contain ESB service
instances, are deployed into the former.
- Content Based Routing (CBR): XML is inspected using,
for example, XPath and the message routed to an
endpoint accordingly.
- Endpoint: An ESB Endpoint is an abstraction layer on top
of JMS queues and topics that enables them to be used
by services on the ESB without the service having to
know the underlying messaging model (i.e., point-topoint
for queue-based messaging or publish/subscribe
for topic-based messaging).
- ESB Service: An ESB service is a consumer of messages
that may or may not produce messages. Services
subscribe and optionally publish to ESB endpoints.
- Queue: A JMS queue is designed to support a point-topoint
messaging system. It is, however, helpful to
consider queues in terms of the publish/subscribe
messaging system: essentially, a queue can have
multiple publishers and subscribers, but each message
can only be consumed by a single subscriber. Each
messages remains on a queue until a 'subscriber'
consumes it. Queues are only used directly when
connecting to the JMS layer that underpins the ESB;
services on the ESB access the queue via a mapped
endpoint.
- Sonic MQ: Sonic's implementation of the JMS (Java
Message Service) API. The Sonic ESB is built upon
Sonic MQ.
- Topic: A JMS topic is designed to support a
publish/subscribe messaging system. Conceptually, a
topic is essentially a queue (in the general sense) that
can have multiple publishers and subscribers, but each
message can be consumed by zero or more subscribers.
In contrast to queues, a topic does not store messages by
default unless a particular subscriber is marked as
'durable'. If it has no subscribers, any message published
to the topic is immediately lost. Topics are only used
directly when connecting to the JMS layer that
underpins the ESB; services on the ESB access the topic
via a mapped endpoint.
References
- Gartner Inc, 'Gartner says it leaders must take ownership of the
business outcome, not just concentrating on the IT elements', Press
release, October 8, 2007 (accessed at http://www.gartner.com/it/
page.jsp?id=529408) Top
- Wittgreffe J and Warren P, 'Editorial', BT Technology Journal, vol.26,
no.1, September 2008 Top
- Deans P and Wiseman R, 'SOI: technology and standards for
integration', BT Technology Journal, vol.26, no.1, September 2008 Top
- Apama web site, http://www.progress.com/apama Top
- FiereWorks, 'SOA Definitions', http://www.fiereworks.nl/soadefinitions.
html Top
Paul Deans joined BT in 1984 and has
undertaken a wide variety of roles across many
technical areas, including analogue/ digital
design, text-to-speech (BT's Laureate), 3D
avatar creation for BT's TalkZone in the
Millennium Dome, mobility, multimodal
portals and SOA. In those areas he has
performed application development, project
management and research roles. As a Senior
Researcher he leads research on SOA including
architectural design, implementation and
delivery of the SOI concept platform and
scenarios. He also offers thought leadership on
the Enterprise Service Bus and end-to-end
visibility dashboards. He is currently studying for the BT MSc.
Richard Wiseman is a Senior Researcher in
the IT Futures Centre at BT. He joined BT in
1996 with an MEng honours degree in
Electronic Systems Engineering from the
University of York. He worked for three
years researching pronunciation variation
for speech and speaker recognition,
followed by a few months simplifying the
process of rationalising BT's customers'
VPNs. Richard then worked for six years in
the area of multimodal systems, first
developing a portal that synchronises
different types of web content (including
voice and HTML), and subsequently
focusing on multimodal applications for mobile devices. Most recently, he
has worked in the areas of Service Oriented Architecture and Infrastructure
(SOA and SOI), undertaking component design and implementation for a
testbed and demonstrator to showcase BT's SOI technology.