Service oriented infrastructure (Part 2) |
Adaptation of service-orientated infrastructure using right-time business intelligence
J Shiangoli and M Spott
The subject of this paper is an approach, based on Business Intelligence (BI), that helps Service Oriented Infrastructures (SOIs)
adapt dynamically as demand for services varies by providing decision support. The approach takes data from various sources
as and when it becomes available, collects it to give an historic view and applies intelligent data mining techniques to discover
any patterns that exist. Connected to appropriate service level monitoring technology, the BI system can learn to predict
business application performance based on both internal environment configuration changes and external influences. In a
closed loop approach, predictions can be used to change the IT infrastructure dynamically to optimise its operational
performance. We discuss how dependencies between controllable parameters and the performance metrics that underpin
customer service levels can be learned using data mining techniques and then explored using what-if analysis.
1. Introduction
Commercial organisations use a number of processes,
practices and activities to deliver their products and services.
The goal of our Business Intelligence (BI) based system is to
provide insight into the efficiency of these processes and a
set of concepts and methods that can improve business
decision making by leveraging fact-based support systems1.
The questions a BI system will typically be expected to
answer include the following:
- Are we delivering to our customers at the right time or is
there a delay and where is the delay occurring in our
activities?
- Are we getting returns and where are we failing in terms
of quality?
- In sales and marketing, are we responding to customer
needs and any marketing opportunities?
- What are the effects of pricing and promotion? Are we
able to respond quickly to a sudden change in demand?
- How well are our business-critical applications
performing currently?
- Are we able to react to external influences?
BI is applied at three levels in an organisation – strategic,
tactical and operational:
- Strategic BI systems support long term business goals,
such as improving customer satisfaction by a certain
percentage within the following fiscal year, and are
targeted at the management and executive level.
- Tactical BI systems support ongoing improvements by
focusing on short term initiatives. They involve shorterterm
decisions or actions that have to be adapted
according to both internal and external influences.
Primarily, they are used by departmental analysts to
infer business trends by studying changes in business
metrics over time.
- Operational BI systems support an organisation's dayto-
day decision-making processes, typically to increase
efficiency and improve customer service.
The vast majority of BI implementations focus at the
tactical and strategic levels and, until recently, operational BI
was rarely implemented. In particular, the operational
performance of the underlying IT infrastructure and the application software performance have not usually been
taken into account, although they are critical for the overall
success of a business. This area has provided the main focus
of our research and is the subject of this paper.
Business Performance Management2 (BPM), a subset of
BI, marries operational performance to the planning and
control cycle of an organisation. This approach, which allows
systems and processes to respond quickly and flexibly to
adverse or opportune events or developments, is also known
as closed loop operational BI.
Consider online trading, for example. As people's habits
change and are influenced by factors such as seasonal and
recreational events, the number of visits to the site will vary.
From time to time, the load on the site may become
disproportionately higher than normal, putting the
underlying IT system's resources under pressure.
Often these days, retailers and others outsource
provision of their online trading platforms, so the web hosting
service provider will want to predict the number of web site
visits and understand the impact that varying levels of loading
could have on the user's experience. Using this information,
they can adapt the service they supply to accommodate
increases – and, for that matter, reductions – in demand. The
chance that losses of availability or performance will damage
user satisfaction is reduced. Fewer sales will be lost as a result
of poor performance and perceptions among the online
trader's customers will be improved. The chance that its brand
will be damaged by performance problems is reduced.
However, traditional process-driven options for
adapting online trading platforms and other services to
changing demand are tedious and slow, leading to demands
for the adaptation process to be automated.
Typically, the goal will be to maintain availability and
performance in accordance with a well-specified Service
Level Agreement (SLA). Service providers will want to ensure
that they:
- meet the obligations detailed in the SLA, avoid any
penalties;
- proactively spot and resolve issues such that SLA
breaches are avoided;
- identify the root causes of any issues that prevent
conformance to the SLA.
Hence it is of paramount importance that any problems
are recognised in advance and acted upon as quickly as
possible. A key requirement of an operational BI system is
therefore to forecast events that could impact performance
based on the historical data at its disposal. This allows it to be
used as a part of a Dynamic Application Service Management
(DASM) system that automates the configuration, activation
and scaling of enterprise applications in real time.
This paper introduces such an operational BI solution. It
forms part of a decision support system that can intelligently
adapt an SOI environment, taking data from sources as and
when it becomes available, storing it so that trends can be
identified and predicting factors relevant to the performance
of the hosted applications.
In a closed loop approach, our BI system's predictions
can be used to control any configurable component of the IT
infrastructure to optimise overall operational performance.
Using historic data, it automatically identifies any
relationships that exist between the infrastructure's
observed performance and the parameters specified in the
SLA. This allows the IT infrastructure to be configured
automatically in such a way that the customer requirements
are constantly met.
1 Fact-based support systems are sometimes also referred to as decision
support systems. They use facts that originate from information systems to
support decision making.
2 We define business performance management as a framework of
automated activities and systems that drives business performance. Metrics
based on the current financial state of the business are used to determine a
course of action that will improve the situation.
2. Scenario
The scenario we'll explore in this paper is that of a web
hosting provider whose platform supports a share trading
service. Our reasons for choosing this example are that the
content is volatile, transaction volumes are high and
significant swings are commonplace – for example, as a
result of emerging press information. Together, these factors
make usage very difficult to predict. Activity is also time
sensitive – for example, there will typically be bursts of
activity when markets open. Finally, there is a significant
element of complexity involved. The completion of
transactions can involve a number of back-end systems.
Orders may need to go through vetting procedures before
they are submitted to the market for execution, for instance.
The success of the service depends on the provider's
ability to design and implement an infrastructure – a
combination of hardware, networks and software –that
yields high performance, availability and reliability. The
property that determines the infrastructure's ability to adapt
as demand increases without putting performance at risk is
called its scalability. However, the ability to predict (and
therefore be able to make adjustments to accommodate)
demand variations is just as important to the infrastructure's
ultimate performance.
As we'll see, given the correct technology, usefullyaccurate
predictions are possible. The key is to understand the factors that influence use of the online trading service.
Bursts of activity that occur when markets open are
obviously predictable. Their extent will depend on external
market influences. Press releases can directly influence the
price of a stock, for example, so real-time information from a
web intelligence system might give an indication of the scale
and duration of a burst, as could page hit statistics from sites
such as Reuters.

Figure 1. An example web hosting provider architecture
The ability to respond quickly and effectively to
whatever indicators are available is key to the service
provider's success. A way of achieving this is the main topic
of this paper.
Remember, a web hosting service provider's objective is
to maintain its performance in accordance with relevant
SLAs. With this in mind, providers need to implement an
effective SLA-compliance strategy. This has many
advantages for both providers and their customers, including
reduced application downtime, improved service visibility,
achieving alignment of IT performance to business goals, and
ultimately being more competitive in their markets.
Figure 1 illustrates the challenge. The architecture of an
online trading system will include tiers of functionality such
as firewalls, load balancers, web servers, application servers
and back-end systems. As we'll see later in this paper, the use
of virtualisation techniques allows resources to be allocated
dynamically as demand for these key functions varies.
Performance will be managed through a hierarchy of
SLAs. The high-level SLA will be based on web page load
times, with what's deemed acceptable varying according to
the application and type of user. To understand how this SLA
relates to the performance of the underlying infrastructure,
we need to identify which resources transactions use and how
they use them. For example, we need to understand how
much CPU, memory and network capacity are used per transaction. Armed with these details, the high-level SLA can
be translated into a set of low-level SLAs that specify the
performance of individual resources. Examples of low-level
SLAs include the time it takes to serve a database request and
the network round trip time. The translation can be achieved
in an isolated test environment, although this rarely replicates
production conditions precisely. For this reason, the results of
the initial translation can only be considered to be a baseline
– a first iteration in the derivation of the relationships
between the high- and low-level SLAs. Later, in production,
we will apply machine learning techniques to reassess the
relationships. The granular breakdown of the SLA is important
not just because it allows us to decompose the high-level
objective into a set of low-level objectives, but because it
allows responsibility to be assigned to any third-party vendors
whose services form part of the overall solution. As the
environment changes and the application evolves, the use of
resources per transaction will change. The overall service
provider management system must therefore be able to
collect metrics via a monitoring system and provide decision
support data to a control system that will adapt the
infrastructure such that service levels are maintained.
The IT infrastructure needs to be modelled in a way that
allows low-level parameters or configurations to be changed
such that the high-level SLA goal is achieved at minimum
cost to the provider.
3. Predictive analytics in performance
management
In the following, we describe a methodology that allows a
performance model of an application transaction to be built
based on factors and performance measures provided by a
monitoring system. We also discuss the business analysis
functions associated with such a model – that is, the setting
of performance targets, the prediction of performance, the
performance of what-if analyses and the optimisation of
levers and transactions in ways that improve the overall enduser
experience and thence customer satisfaction.
3.1 Defining business performance
frameworks
Performance management functions are founded on profiles
of applications that allow their dependencies on component
applications and services and the relationships between
them to be discovered. In the scenario discussed previously,
the hardware, applications and services that contribute to
end user's experience of the web application must be
identified and related, for example.
A suitable performance model can be developed as
follows:
- Decide which strategic factors are important to the
success of the service provider's strategy.
- Identify the variables that might contribute to each
strategic factors.
- Determine operational levers ('what can be controlled
within the hosting infrastructure') and external influences
('what cannot be controlled but influences performance').
- Decide how to measure the performance of each factor
(definition of metrics) including the unit of measurement.
- Model relationships between levers and factors in a
directed graph.
The result of this exercise is a directed graph with
functional relationships in the vertices which we call a
business performance framework (BPF). An example is
shown in figure 2.
In this example, the most important factors are web page
load time and cost. The former is the elapsed time to load a
web page, which is used as an indicator of user experience.
(Other or additional measures are, of course, conceivable.)

Figure 2. A business performance framework
As shown in the figure, web page load time is
determined by the time it takes for the application
transaction to traverse the network (network time) and the
processing time (server time). Additionally, it is determined
by the time the user's client takes to respond (client time) but
we have deliberately omitted this factor because it is out of
the web hosting provider's control.
Both network and server time can be broken down even
further. For example, network time is a function of latency
and available bandwidth. In turn, latency is a function of
forwarding delay, serialisation delay, distance delay, queue
delay and protocol delay. Similarly, server time is a function
of a number of resources such as available memory, available
CPU and disk queue length.
The number of requests is an external influence. In
general, external influences will relate to factors such as
current news, the state of the market and seasonal factors
like the weather.
The links between the metric boxes indicate the
functional relationships. For example:
servertime = f(available memory, available CPU,
disk queue length, number of servers)
At the point in time of drawing the framework, the detail
of the relationship does not need to be known. It is sufficient
to add a link as an indication that there might be a
relationship. If the designer knows the relationship, he or she
can specify an equation. Otherwise, relationships can be
learned automatically from data using machine learning
techniques (as described below).
3.2 Learning relationship and forecasting
models
As discussed above, a relationship between linked measures
is represented by a function, which can be approximated by
models derived or learned from data.
Picking up from the example of the factor 'server time'
being a function of available memory, available CPU, disk
queue length and number of servers, we assume we have
collected time-stamped values for all factors. In a preprocessing
step, tuples (time stamp, server time, available
memory, available CPU, disk queue length, number of servers)
are formed to record data samples that evidence the unknown
relationship. If the data for different factors is not collected at
the same frequency or points in time, timestamps for the
sample data have to be defined and performance values for
them have to be computed using interpolation techniques.
Given a set of sample data, machine learning techniques
such as neural networks, support vector machine regression
and regression trees or more-traditional statistical
techniques like linear regression can be used to derive
models that approximate the unknown function.
In the case of machine learning techniques, models are
typically learned from a training subset of the sample data
and tested on a test subset. More refined techniques include
cross-validation and techniques like linear regression can
make use of the entire sample dataset. However the model is
derived, its validity can be assessed by looking at statistics
like the mean or variance of the approximation error on the
sample data (or the test subset). Such statistics can be used
as an indication of the expected error on unseen data3.
A feature of this approach is that it allows a performance
value to be estimated based on the values of the related
measures at a given time. It does not matter if that point in
time in question is the actual time, a point in the past or
fictitious – for example, if the user makes up a set of input
values to see what the consequences might be.
In contrast, forecasting uses both historic values and the
current value of a single performance measure to predict future
values for the same measure. As we explain in section 3.4, we
are particularly interested in forecasting values of an external
influence like number of requests in the example in figure 2.
3 For more about machine learning techniques, consult references on data
mining such as [5]. An approach to automate data mining to support nonexperts
in building models is described in [6].
Again, machine learning techniques like neural networks can be
used to learn a model from historic data. Other forecasting
algorithms – the extensively-used ARIMA [7], for example – can
compute future values 'on the fly' based on historic and present
values without the need to prepare and store a model.
Both learning relationship models and forecasting models
can be extended by introducing additional information – day of
the week, month, season, year, trading sector and region, for
example. For instance, we could look at the performance
measures and relationships in figure 2 for different web
applications, different seasons and so on. For each value of
these attributes, the relationships might be different. The
number of requests or latency will depend on the web
application because the audience will be different and their
back-end systems might be located in different regions. Such
attributes are termed conformed dimensions (see section 4.2
and table 1). In accordance with the definition of conformed
dimensions, we assume that every factor (box) in a business
performance framework may depend on these attributes – in
other words, they are global attributes for the entire
framework. For the data, this means that, rather than having
timestamp-value pairs for each factor, we have to extend the
pairs to tuples that include a potentially large number of value
combinations of attributes for every time stamp. For instance,
at a single point in time, we would need to keep many different
values of 'server time' for different applications, transactions
and so on. Consequently, we will end up having different
relationship models – for example, one for each application or
transaction combination.
Alternatively, the models could use the attributes as
additional input values, using them to differentiate attribute
values. Such an extension comes with two benefits. In case of
attributes like 'application' and 'transaction', models can be
used at different levels of aggregation, allowing users and
software to drill down from high-level views of performance to
explore smaller regions or more specific transactions. If
performance values depend on them as in the example of
'server time', attributes like 'day of week', 'season' and 'region'
can potentially improve the accuracy of models. The number of
requests to the web application, for instance, clearly shows
patterns regarding all three attributes. The volume of requests
differs depending on the day of the week, time of day and the
number of hits to related web sites (external influence).
3.3 What-if analysis
The web hosting provider will be keen to know what it can
change in the application infrastructure (operational levers) or
what might happen to external influences in the future and
how such changes would affect overall SLA performance.
Playing with such scenarios is called what-if analysis. It requires
knowledge of how different performance metrics are
mathematically related, as was discussed in section 3.1.
The user will specify values for all the leaf nodes – that is,
for all the nodes in a performance framework that don't have
an incoming edge. Leaf nodes are either business levers or
external influences. The values for the external influences
represent events in the environment that surrounds the
online service. They can be based on the user's experience or
derived from data. Business levers will be set to values that
seem reasonable to control the business. By applying the
functional relationships between the nodes, the system will
then propagate the values from the leaf nodes through the
entire relationship graph to the root nodes that represent the
strategic performance measures. Typically, users will vary the
values of some of the leaf nodes to get a feel for their
influence on other nodes.
3.4 Forecasting
The propagation mechanism of what-if analysis can also be
used to predict future values – that is, to forecast the value of
metrics higher up in the hierarchy based on projections of
values at leaf nodes. For instance, we could forecast the
impact on user experience that would result if the volume of
requests increased (external influence) and a change was
made to the environment, such as an increase the numbers
of servers in a cluster (operational lever).
3.5 Target and lever optimisation
What-if analyses help model the behaviour of the
performance of a web application and can also be used to
respond to any predicted adverse events. However, web
service provider and other businesses usually want to be able
to take a top-down approach. Having set a high-level SLA
target, they need to know how to set the operational levers
that are available such that it is achieved.
The functional relationships all point upwards in the
hierarchy from operational levers and external influences to
strategic SLA targets. Such relationships are not one-to-one
mappings – that is, they cannot be inversed. Target values
cannot therefore be propagated down from strategic factors
to operational levers. In some cases, there will be several
combinations of values at leaf nodes that allow a strategic
target to be met. In others, there may be none. Managers
will, of course, be seeking the best solution, which turns the
task into an optimisation problem. As a result, an objective
function must be defined that represents quantities like cost
and utility and a heuristic search algorithm must typically be
employed to find the solution that will minimise or maximise
its value. For instance, given an assumption about the
external influence number of requests, we might look for
values to apply at other leaf nodes (such as the number of
servers and bandwidth) to ensure an SLA is achieved at least
95 per cent of the time (the target agreed with the customer)
at minimal cost (the objective function).
4. Business intelligence framework for SOI
As we have already explained, the service provider's
environment needs to adapt dynamically as services and
external influences change. The primary goal is the
maintenance of an applicable SLA, but providers will also
want to improve resource usage which in turn will reduce
operating cost.
Traditionally, such systems would be subjected to ongoing
review by teams of experts who would manually translate SLAs
to system-level thresholds in an empirical manner.
In this section, we outline an implementation in which
we apply the modelling methods we have discussed using a
BI platform to translate SLAs to lower-level resource
requirements. These methods allow us to predict resource
usage such that the process of adapting a system or service
in a virtualised environment can be automated. By
eliminating manual involvement, the chance of service level
breaches is minimised.
4.1 Customer SLA discovery
To be able to use BI to predict adverse events and in turn
adapt service levels, we must first discover the factors within
the service provider platform that influence its performance
and the external influences that apply. We must also find out
how they relate to the high-level service agreements.
Consider the example of a simple online financial
application. The owner of the application may have a number
of partners. One partner may offer stock quotations and
online trading services. Others may offer commodity trading
services or investment banking. The financial application
service must therefore be mapped to a number of service
domains, each of which will have a service entry component
and a number of dependencies. These could be quantified
using learning techniques but, in the first instance, this is not
advisable because it would require the collection of all
possible dimensions and measures. Even when only internal
influences are considered, the numbers of dimensions and
measures involved can be large. Once external influences are
factored in, the situation can become significantly more
complex.
During the discovery phase, we therefore limit ourselves
to defining the scope of the BI system by defining the
relevant dimensions and measures. The exercise can be split
into two phases: a high-level analysis and a more-detailed
micro-level approach.
At the high-level, we identify and build our
understanding of the key web services the service provider
provides to its customer. We then break these down into
dependencies.
At the micro level, we define metrics for the usage of the
various transactions. Measures are proportioned across key
services in the infrastructure. This is particularly important in
a virtualised environment where hardware resources will be
shared across multiple services.
Depending on the size and complexity of an
organisation, the exercise can be a challenging one. Recent
developments in automated application discovery4 can help
automate the process.
4.2 SLA dependencies
The availability of a customer's applications is determined by
the performance of constituent transactions such as login and
order placement. The customer's applications are therefore
broken down into sets of key transactions and their
dependencies. These are then profiled to identify dependencies
in the IT infrastructure between components such as clients,
networks, application servers and database servers.
This process allows us to establish dimensions and
measures. Dimensions are business entities – location,
clients, applications, transactions, subnets, servers and so
on. Measures are quantities - the volume of network traffic,
elapsed time, utilisation, response time, available CPU, etc.
To allow third-party SLA monitoring tools to co-exist and
provide a BI environment in which we can evaluate cause and
effect, an integration framework is required. The basis of such a
framework is a matrix of SLA monitoring tools and dimensions
that allows us to align dimensions to one or more of these tools.
Using the framework, we integrate data sources across the
monitoring piece into a single conformed view – that is, a view
in which, instead of having duplication of the same dimension
member in multiple sources (SLA monitoring tools), we have a
single integrated view. This provides us with an end-to-end
'one truth' picture of measures of, for instance, an application
across the service provider infrastructure.
To illustrate this, consider the HTTP application protocol.
To monitor a server's health, you might configure HTTP keepalives5
or count connection timeouts6 or HTTP successful
responses. These measures will appear in the server health facts related to the dimension 'application' for the dimension
member 'HTTP'. (We use 'facts' here as another term for a
measure group.)
4 There are a number of application discovery tools in the market place.
These discover applications and IT infrastructure and identify resources and
application dependencies.
5 A browser will make multiple requests to retrieve an entire web page. To
enhance server performance, most web browsers request that the server
keep the connection open across these multiple requests – a feature known
as an 'HTTP keep-alive'. HTTP keep-alives are required for security and
connection-based authentication.
6 A browser will make multiple requests to retrieve an entire web page. To
enhance server performance, most web browsers request that the server
keep the connection open across these multiple requests – a feature known
as an 'HTTP keep-alive'. HTTP keep-alives are required for security and
connection-based authentication.

Table 1. Matrix linking conformed dimensions to SLA monitoring systems

Table 2. Example dimensions illustrating some possible hierarchies
Similarly, to assess network performance, you might
monitor traffic volume, percentage of bandwidth utilisation,
throughput and round trip time. These measures will appear
within the network health facts related to HTTP, which in turn
is a member of the application dimension. This provides the
database framework to be able to evaluate the network
performance for a given server configuration change.
Table 1 shows a scaled down example of a matrix to
illustrate the concept.
In the scenario under discussion, another example is the
search engine transaction which, for illustrative purposes,
we'll consider to be part of a share price information
application. Referring to table 2, the search engine
transaction would be a member of the 'Application'
dimension in the 'Application – Transaction' hierarchy for the
application 'Share Price Information'.
Dimension hierarchies allow us to organise the way we
view measures at different levels. For example, if we needed to
check the 'Elapsed Time' for a transaction for a given location
and for a particular day, an aggregate value could be retrieved
at this level. The aggregate function here would be maximum –
the most adverse value – for that particular day. The monitoring
system 'User Experience' would provide the measure 'Elapsed
Time' for the transaction. The 'Server Health' monitoring
system would provide the values of the performance counters
for this particular transaction on all the dependent servers
being monitored. And the 'Network Performance' and
'Application Performance' monitoring systems would provide
the given transaction's bandwidth usage and network latency.
This framework would provide both an overall view of the
health of the service for the transaction and details the factors
and levers that control its availability and performance.
5. Architecture
Thus far, we have identified transactions with their respective
dependencies including IT infrastructure alignment and have
quantified these in terms of measures. We have also
established that service providers need to be able to predict
IT infrastructure resource usage and to translate low-level
metrics to high-level SLAs and vice versa. Because
transactions are being processed on an on-going basis,
predictions and metric translations have to be executed 'on
the fly' at the 'right time'. In other words, information needs
to be provided within a realistic timeframe if it is to be able to
support the use of an autonomics system. With this in mind,
the BI system architecture must meet requirements such as
those we discuss below.
The architecture consists of two parts:
- a web service provider architecture; and
- a BI system architecture which is a subset of the web
service provider architecture
5.1 Web service provider architecture
The web hosting provider's environment is dynamic and its
business needs are constantly changing. As a result, the key
prerequisites for its architecture are that it should be agile,
adaptable and efficient.
At a high-level, the architecture will receive data from a
number different systems. This will be consumed by other
components, so the architecture's primary function is to
allow event messages to be transmitted from different
sources to all relevant support systems.
With this in mind, Service Orientated Architecture (SOA)
is a key enabler. Within such an architecture, a web-servicesbased
application integration system is required that allows
flow-related concepts such as transformations and routing
from a number of systems. These may be within the service
provider's domain, in the customer's environment or
operated by third-party suppliers. Commercial solutions such
as Enterprise Service Bus (ESB) from Sonic [9] and the BizTalk
server from Microsoft [10] can provide the functionality
required.
The flow of messages allows the components illustrated
in figure 3 to inform and execute changes to the infrastructure
and its configuration that ensure the web service functions at
an optimal level. Performance is measured by the servicelevel
monitoring system. It will gather end-to-end
performance metrics such as application transaction elapsed
time measured by the 'User Experience' system, bandwidth
throughput and latency from the 'Network Performance'
system and performance counters within the 'Server Health'
system. The application profile system would discover the
dependencies in the infrastructure for each given application.
This process is the key to the success of the business
intelligence system as a decision support tool in which the
profile empowers the collection of correct, reliable, relevant
and complete information. We need to clearly define what we
are monitoring and retrieve performance data from the
dependency hierarchy of the application in the infrastructure
as well as external market influences. This information would
inform the preparation of data for the data mining models
used within the business intelligence system. In practice,
external dependencies can be difficult to quantify, so
statistical techniques or machine learning algorithms must be
employed to learn them automatically from the historic data
stored in the events data warehouse.

Figure 3. An example service-orientated architecture
(The highlighted area is shown in more detail in figure 4)
Once the BI system is in place and the models are
continuously being refreshed with the latest data, predictive
expressions can be issued by the BI system. Results returned
to the decision support system will establish the effect of
changes it initiates and the events most likely to result from
particular environment conditions.
The decision support system is a policy-driven dynamic
application service management application. Because the
policies it implements depend on both prevailing and
predicted environmental conditions, the web hosting
platform's resources can be adapted in real time to ensure
SLAs are met. This is achieved using both virtualised
platforms and right-time configuration of networks and
server resources.
The deployment or decommissioning of these resources
is managed by an orchestration system under the decision
support system's control.
5.2 Business intelligence system architecture
The objective of the BI system is to react to business needs at
the right time. By anticipating issues, corrective action can
be taken before problems escalate or even become apparent.
The various parts of the architecture are as follows (see
figure 4).
5.2.1 Data acquisition
The data from the source systems is transformed into a
common XML format as it flows through the Web Services
Integration System (WSIS) illustrated in figure 3. The
advantage of using a common format is that the same data
can be consumed by several target applications, each of
which can reformat it to suit its own needs.
Messages passing though the WSIS are usually routed in
as close to real time as possible. Given the volume of
messages involved, the resources available may often not be
sufficient to do significant data transformation and lookup. A
WSIS is therefore more suited to doing basic message
reformatting than complex data transformation. The Extract,
Transform and Load (ETL) system illustrated in figure 4 must
generate the data needed by a given application profile or
business process from the messages it receives. Its task is to
read messages, import them into a relational database
management system and prepare the data based on our
matrix/dimensional model for loading into the multidimensional
On-Line Analytical Processing (OLAP) database
[8] such that analytical queries can be performed using preaggregations
as part of the hierarchical dimensional model
and data can be prepared that data mining models can use as
the basis of predictive analytics.

Figure 4. Architecture of the business intelligence system
5.2.2 Data storage
There are several ways in which real-time OLAP can be
implemented but such matters are beyond the scope of the
paper.
To ensure constant availability, each OLAP database will
be partitioned and duplicated. This allows one database to
be updated while the other is used to support analytical
queries. Swap-overs can be scheduled to achieve an
acceptable level of latency in the learnt algorithm.
5.2.3 Data delivery
The objective is to feed the decision support system with data
about predictive events. To achieve this goal, the decision
support system makes requests via the ESB to the data mining
engine which will then execute both predictive and analytical
queries against a number of models (see figure 4). The results
returned will allow the system to implement changes to the
infrastructure that are required – for example, to adapt the
resources in the customer's virtualised environment
dynamically to meet SLAs for a particular application
transaction, perhaps by adding or upgrading servers.
6. Conclusion
We have shown how a business intelligence system can be
used to control an ICT infrastructure. The system predicts
developments within an organisation and in its external
environment, using its insights to recommend changes that
optimise operational performance and ensure compliance
with SLAs.
To illustrate the system's operation, we have explored
the example of a web service provider. We discussed how
frameworks can be developed to build relationship models
that relate operational performance and service level
agreements and outlined how such a solution could be
implemented at the architectural level. By retrieving
messages and events from different sources, the solution
provides both a platform for storing and applying predictive
data mining models and the means to learn them through
data acquisition.
The data sources discussed were primarily focused on
internal influences such as the design of the infrastructures
used by service providers, customers and third-party suppliers.
We touched on a relatively new area of research – the
use of a web intelligence system to adapt IT infrastructure in
response to external influences – which involves the analysis
of the ways in which users interact with web site pages and
other features. In this regard, external influences can have a
big impact on the performance of a service provider's
infrastructure, so it's important to develop ways for the BI
system to measure them and take them into account.
We must also create a means of assessing the accuracy of
the mining models we use. This is important both during the
testing phase of an implementation, when the sample size
will be small, and later as the sample size grows and more
patterns and details become apparent. Ways of assessing the
accuracy of the BI system on an ongoing basis are another
important area of our research.
7. Acknowledgements
The author would like to thank Chris Webb for his help in
reading and commenting on a draft of this paper.
References
- Kaplan RS and Norton DP, 'Balanced scorecard: translating strategy
into action', Harvard Business School Press, 1996 Top
- Kaplan RS and Norton DP, 'Strategy maps: converting intangible assets
into tangible outcomes', Harvard Business School Press, 2004 Top
- Golfarelli M, Rizzi S and Zella I, 'Beyond data warehousing: what's next
in business intelligence?', Proceedings of the 7th ACM International
Workshop on Data Warehousing and OLAP, Washington, 2004 Top
- Martin W and Nu�dorfer R, 'Pulse check: operational, tactical and
strategic BPM', White Paper, CSA Consulting GmbH, 2003 Top
- Hand D, Mannila H and Smyth P, 'Principles of data mining', MIT Press,
2001 Top
- Nauck D, Spott M and Azvine B, 'SPIDA – a novel data analysis tool', BT
Technology Journal, vol.21, no.4, 2003, pp.104-112 Top
- Box GEP and Jenkins GM, 'Time series analysis: forecasting and
control', Holden-Day, San Francisco, 1970 Top
- Lemire D, 'Data Warehousing and OLAP', http://www.daniellemire.
com/OLAP (accessed December 2008) Top
- The Sonic Enterprise Service Bus, Progress Software Corporation,
http://www.sonicsoftware.com Top
- Microsoft BizTalk Server, http://www.microsoft.com/biztalk/ Top
John Shiangoli works in BT's centre for
information and security systems as a
Microsoft business intelligence solutions
specialist. Previously, he was lead design
developer for BT's Applications Assured
Infrastructure (AAI) dashboard. The tool
provided integrated reporting and analytics
based on information from a number of
application performance management tools.
Before coming to BT, John worked as a
reporting solutions specialist at Compuware
for a number of years, providing
enhancements and innovations to its
Vantage performance management suite and
winning awards including 'Best Led Field Developed Solution'. His current
interests include the introduction of business intelligence to SOI ('total ICT
orchestration') and transforming ICT operations (sustainability and ICT root
cause analysis). A graduate of University College London, he has experience
in implementing solutions for use in industries such as engineering,
insurance, logistics, health care and telecommunications.
After receiving an MSc in Mathematics from
the University of Karlsruhe in 1995, Martin
Spott continued working there as a researcher
in the university's computer science
department. On completing a PhD in
reasoning in fuzzy expert systems in 2001, he
joined BT where he now works as a principal
researcher in the company's intelligent
systems research centre. Martin's current
research interests include soft computing,
machine learning and data mining and their
application to real-world problems. He has
published numerous papers in these areas, is a
member of programme committees for
related conferences and has acted as a reviewer for related scientific
journals. Since joining BT, he has worked on intelligent data analysis projects
with goals including travel time prediction, complaint prevention and
automatic data analysis. More recently, he has worked on real-time business
intelligence tools.