Adaptation of service-orientated infrastructure using right-time business intelligence

Service oriented infrastructure (Part 2)

Adaptation of service-orientated infrastructure using right-time business intelligence

J Shiangoli and M Spott

The subject of this paper is an approach, based on Business Intelligence (BI), that helps Service Oriented Infrastructures (SOIs) adapt dynamically as demand for services varies by providing decision support. The approach takes data from various sources as and when it becomes available, collects it to give an historic view and applies intelligent data mining techniques to discover any patterns that exist. Connected to appropriate service level monitoring technology, the BI system can learn to predict business application performance based on both internal environment configuration changes and external influences. In a closed loop approach, predictions can be used to change the IT infrastructure dynamically to optimise its operational performance. We discuss how dependencies between controllable parameters and the performance metrics that underpin customer service levels can be learned using data mining techniques and then explored using what-if analysis.

1. Introduction

Commercial organisations use a number of processes, practices and activities to deliver their products and services. The goal of our Business Intelligence (BI) based system is to provide insight into the efficiency of these processes and a set of concepts and methods that can improve business decision making by leveraging fact-based support systems1. The questions a BI system will typically be expected to answer include the following:
- Are we delivering to our customers at the right time or is there a delay and where is the delay occurring in our activities?
- Are we getting returns and where are we failing in terms of quality?
- In sales and marketing, are we responding to customer needs and any marketing opportunities?
- What are the effects of pricing and promotion? Are we able to respond quickly to a sudden change in demand?
- How well are our business-critical applications performing currently?
- Are we able to react to external influences?
BI is applied at three levels in an organisation – strategic, tactical and operational:

- Strategic BI systems support long term business goals, such as improving customer satisfaction by a certain percentage within the following fiscal year, and are targeted at the management and executive level.
- Tactical BI systems support ongoing improvements by focusing on short term initiatives. They involve shorterterm decisions or actions that have to be adapted according to both internal and external influences. Primarily, they are used by departmental analysts to infer business trends by studying changes in business metrics over time.
- Operational BI systems support an organisation's dayto- day decision-making processes, typically to increase efficiency and improve customer service.

The vast majority of BI implementations focus at the tactical and strategic levels and, until recently, operational BI was rarely implemented. In particular, the operational performance of the underlying IT infrastructure and the application software performance have not usually been taken into account, although they are critical for the overall success of a business. This area has provided the main focus of our research and is the subject of this paper. Business Performance Management2 (BPM), a subset of BI, marries operational performance to the planning and control cycle of an organisation. This approach, which allows systems and processes to respond quickly and flexibly to adverse or opportune events or developments, is also known as closed loop operational BI.

Consider online trading, for example. As people's habits change and are influenced by factors such as seasonal and recreational events, the number of visits to the site will vary. From time to time, the load on the site may become disproportionately higher than normal, putting the underlying IT system's resources under pressure. Often these days, retailers and others outsource provision of their online trading platforms, so the web hosting service provider will want to predict the number of web site visits and understand the impact that varying levels of loading could have on the user's experience. Using this information, they can adapt the service they supply to accommodate increases – and, for that matter, reductions – in demand. The chance that losses of availability or performance will damage user satisfaction is reduced. Fewer sales will be lost as a result of poor performance and perceptions among the online trader's customers will be improved. The chance that its brand will be damaged by performance problems is reduced.

However, traditional process-driven options for adapting online trading platforms and other services to changing demand are tedious and slow, leading to demands for the adaptation process to be automated.

Typically, the goal will be to maintain availability and performance in accordance with a well-specified Service Level Agreement (SLA). Service providers will want to ensure that they:

- meet the obligations detailed in the SLA, avoid any penalties;
- proactively spot and resolve issues such that SLA breaches are avoided;
- identify the root causes of any issues that prevent conformance to the SLA.

Hence it is of paramount importance that any problems are recognised in advance and acted upon as quickly as possible. A key requirement of an operational BI system is therefore to forecast events that could impact performance based on the historical data at its disposal. This allows it to be used as a part of a Dynamic Application Service Management (DASM) system that automates the configuration, activation and scaling of enterprise applications in real time.

This paper introduces such an operational BI solution. It forms part of a decision support system that can intelligently adapt an SOI environment, taking data from sources as and when it becomes available, storing it so that trends can be identified and predicting factors relevant to the performance of the hosted applications.
In a closed loop approach, our BI system's predictions can be used to control any configurable component of the IT infrastructure to optimise overall operational performance. Using historic data, it automatically identifies any relationships that exist between the infrastructure's observed performance and the parameters specified in the SLA. This allows the IT infrastructure to be configured automatically in such a way that the customer requirements are constantly met.

1 Fact-based support systems are sometimes also referred to as decision support systems. They use facts that originate from information systems to support decision making.

2 We define business performance management as a framework of automated activities and systems that drives business performance. Metrics based on the current financial state of the business are used to determine a course of action that will improve the situation.

2. Scenario

The scenario we'll explore in this paper is that of a web hosting provider whose platform supports a share trading service. Our reasons for choosing this example are that the content is volatile, transaction volumes are high and significant swings are commonplace – for example, as a result of emerging press information. Together, these factors make usage very difficult to predict. Activity is also time sensitive – for example, there will typically be bursts of activity when markets open. Finally, there is a significant element of complexity involved. The completion of transactions can involve a number of back-end systems. Orders may need to go through vetting procedures before they are submitted to the market for execution, for instance. The success of the service depends on the provider's ability to design and implement an infrastructure – a combination of hardware, networks and software –that yields high performance, availability and reliability. The property that determines the infrastructure's ability to adapt as demand increases without putting performance at risk is called its scalability. However, the ability to predict (and therefore be able to make adjustments to accommodate) demand variations is just as important to the infrastructure's ultimate performance.

As we'll see, given the correct technology, usefullyaccurate predictions are possible. The key is to understand the factors that influence use of the online trading service. Bursts of activity that occur when markets open are obviously predictable. Their extent will depend on external market influences. Press releases can directly influence the price of a stock, for example, so real-time information from a web intelligence system might give an indication of the scale and duration of a burst, as could page hit statistics from sites such as Reuters.

Figure 1. An example web hosting provider architecture
        Figure 1. An example web hosting provider architecture

The ability to respond quickly and effectively to whatever indicators are available is key to the service provider's success. A way of achieving this is the main topic of this paper.
Remember, a web hosting service provider's objective is to maintain its performance in accordance with relevant SLAs. With this in mind, providers need to implement an effective SLA-compliance strategy. This has many advantages for both providers and their customers, including reduced application downtime, improved service visibility, achieving alignment of IT performance to business goals, and ultimately being more competitive in their markets. Figure 1 illustrates the challenge. The architecture of an online trading system will include tiers of functionality such as firewalls, load balancers, web servers, application servers and back-end systems. As we'll see later in this paper, the use of virtualisation techniques allows resources to be allocated dynamically as demand for these key functions varies.

Performance will be managed through a hierarchy of SLAs. The high-level SLA will be based on web page load times, with what's deemed acceptable varying according to the application and type of user. To understand how this SLA relates to the performance of the underlying infrastructure, we need to identify which resources transactions use and how they use them. For example, we need to understand how much CPU, memory and network capacity are used per transaction. Armed with these details, the high-level SLA can be translated into a set of low-level SLAs that specify the performance of individual resources. Examples of low-level SLAs include the time it takes to serve a database request and the network round trip time. The translation can be achieved in an isolated test environment, although this rarely replicates production conditions precisely. For this reason, the results of the initial translation can only be considered to be a baseline – a first iteration in the derivation of the relationships between the high- and low-level SLAs. Later, in production, we will apply machine learning techniques to reassess the relationships. The granular breakdown of the SLA is important not just because it allows us to decompose the high-level objective into a set of low-level objectives, but because it allows responsibility to be assigned to any third-party vendors whose services form part of the overall solution. As the environment changes and the application evolves, the use of resources per transaction will change. The overall service provider management system must therefore be able to collect metrics via a monitoring system and provide decision support data to a control system that will adapt the infrastructure such that service levels are maintained.

The IT infrastructure needs to be modelled in a way that allows low-level parameters or configurations to be changed such that the high-level SLA goal is achieved at minimum cost to the provider.

3. Predictive analytics in performance management

In the following, we describe a methodology that allows a performance model of an application transaction to be built based on factors and performance measures provided by a monitoring system. We also discuss the business analysis functions associated with such a model – that is, the setting of performance targets, the prediction of performance, the performance of what-if analyses and the optimisation of levers and transactions in ways that improve the overall enduser experience and thence customer satisfaction.

3.1 Defining business performance frameworks
Performance management functions are founded on profiles of applications that allow their dependencies on component applications and services and the relationships between them to be discovered. In the scenario discussed previously, the hardware, applications and services that contribute to end user's experience of the web application must be identified and related, for example.
A suitable performance model can be developed as follows:

- Decide which strategic factors are important to the success of the service provider's strategy.
- Identify the variables that might contribute to each strategic factors.
- Determine operational levers ('what can be controlled within the hosting infrastructure') and external influences ('what cannot be controlled but influences performance').
- Decide how to measure the performance of each factor (definition of metrics) including the unit of measurement.
- Model relationships between levers and factors in a directed graph.

The result of this exercise is a directed graph with functional relationships in the vertices which we call a business performance framework (BPF). An example is shown in figure 2.
In this example, the most important factors are web page load time and cost. The former is the elapsed time to load a web page, which is used as an indicator of user experience. (Other or additional measures are, of course, conceivable.)

Figure 2. A business performance framework
        Figure 2. A business performance framework

As shown in the figure, web page load time is determined by the time it takes for the application transaction to traverse the network (network time) and the processing time (server time). Additionally, it is determined by the time the user's client takes to respond (client time) but we have deliberately omitted this factor because it is out of the web hosting provider's control.

Both network and server time can be broken down even further. For example, network time is a function of latency and available bandwidth. In turn, latency is a function of forwarding delay, serialisation delay, distance delay, queue delay and protocol delay. Similarly, server time is a function of a number of resources such as available memory, available CPU and disk queue length.

The number of requests is an external influence. In general, external influences will relate to factors such as current news, the state of the market and seasonal factors like the weather.

The links between the metric boxes indicate the functional relationships. For example: servertime = f(available memory, available CPU, disk queue length, number of servers) At the point in time of drawing the framework, the detail of the relationship does not need to be known. It is sufficient to add a link as an indication that there might be a relationship. If the designer knows the relationship, he or she can specify an equation. Otherwise, relationships can be learned automatically from data using machine learning techniques (as described below).

3.2 Learning relationship and forecasting models
As discussed above, a relationship between linked measures is represented by a function, which can be approximated by models derived or learned from data. Picking up from the example of the factor 'server time' being a function of available memory, available CPU, disk queue length and number of servers, we assume we have collected time-stamped values for all factors. In a preprocessing step, tuples (time stamp, server time, available memory, available CPU, disk queue length, number of servers) are formed to record data samples that evidence the unknown relationship. If the data for different factors is not collected at the same frequency or points in time, timestamps for the sample data have to be defined and performance values for them have to be computed using interpolation techniques. Given a set of sample data, machine learning techniques such as neural networks, support vector machine regression and regression trees or more-traditional statistical techniques like linear regression can be used to derive models that approximate the unknown function.

In the case of machine learning techniques, models are typically learned from a training subset of the sample data and tested on a test subset. More refined techniques include cross-validation and techniques like linear regression can make use of the entire sample dataset. However the model is derived, its validity can be assessed by looking at statistics like the mean or variance of the approximation error on the sample data (or the test subset). Such statistics can be used as an indication of the expected error on unseen data3.

A feature of this approach is that it allows a performance value to be estimated based on the values of the related measures at a given time. It does not matter if that point in time in question is the actual time, a point in the past or fictitious – for example, if the user makes up a set of input values to see what the consequences might be.

In contrast, forecasting uses both historic values and the current value of a single performance measure to predict future values for the same measure. As we explain in section 3.4, we are particularly interested in forecasting values of an external influence like number of requests in the example in figure 2.

3 For more about machine learning techniques, consult references on data mining such as [5]. An approach to automate data mining to support nonexperts in building models is described in [6].

Again, machine learning techniques like neural networks can be used to learn a model from historic data. Other forecasting algorithms – the extensively-used ARIMA [7], for example – can compute future values 'on the fly' based on historic and present values without the need to prepare and store a model. Both learning relationship models and forecasting models can be extended by introducing additional information – day of the week, month, season, year, trading sector and region, for example. For instance, we could look at the performance measures and relationships in figure 2 for different web applications, different seasons and so on. For each value of these attributes, the relationships might be different. The number of requests or latency will depend on the web application because the audience will be different and their back-end systems might be located in different regions. Such attributes are termed conformed dimensions (see section 4.2 and table 1). In accordance with the definition of conformed dimensions, we assume that every factor (box) in a business performance framework may depend on these attributes – in other words, they are global attributes for the entire framework. For the data, this means that, rather than having timestamp-value pairs for each factor, we have to extend the pairs to tuples that include a potentially large number of value combinations of attributes for every time stamp. For instance, at a single point in time, we would need to keep many different values of 'server time' for different applications, transactions and so on. Consequently, we will end up having different relationship models – for example, one for each application or transaction combination.

Alternatively, the models could use the attributes as additional input values, using them to differentiate attribute values. Such an extension comes with two benefits. In case of attributes like 'application' and 'transaction', models can be used at different levels of aggregation, allowing users and software to drill down from high-level views of performance to explore smaller regions or more specific transactions. If performance values depend on them as in the example of 'server time', attributes like 'day of week', 'season' and 'region' can potentially improve the accuracy of models. The number of requests to the web application, for instance, clearly shows patterns regarding all three attributes. The volume of requests differs depending on the day of the week, time of day and the number of hits to related web sites (external influence).

3.3 What-if analysis
The web hosting provider will be keen to know what it can change in the application infrastructure (operational levers) or what might happen to external influences in the future and how such changes would affect overall SLA performance. Playing with such scenarios is called what-if analysis. It requires knowledge of how different performance metrics are mathematically related, as was discussed in section 3.1. The user will specify values for all the leaf nodes – that is, for all the nodes in a performance framework that don't have an incoming edge. Leaf nodes are either business levers or external influences. The values for the external influences represent events in the environment that surrounds the online service. They can be based on the user's experience or derived from data. Business levers will be set to values that seem reasonable to control the business. By applying the functional relationships between the nodes, the system will then propagate the values from the leaf nodes through the entire relationship graph to the root nodes that represent the strategic performance measures. Typically, users will vary the values of some of the leaf nodes to get a feel for their influence on other nodes.

3.4 Forecasting
The propagation mechanism of what-if analysis can also be used to predict future values – that is, to forecast the value of metrics higher up in the hierarchy based on projections of values at leaf nodes. For instance, we could forecast the impact on user experience that would result if the volume of requests increased (external influence) and a change was made to the environment, such as an increase the numbers of servers in a cluster (operational lever).

3.5 Target and lever optimisation
What-if analyses help model the behaviour of the performance of a web application and can also be used to respond to any predicted adverse events. However, web service provider and other businesses usually want to be able to take a top-down approach. Having set a high-level SLA target, they need to know how to set the operational levers that are available such that it is achieved.

The functional relationships all point upwards in the hierarchy from operational levers and external influences to strategic SLA targets. Such relationships are not one-to-one mappings – that is, they cannot be inversed. Target values cannot therefore be propagated down from strategic factors to operational levers. In some cases, there will be several combinations of values at leaf nodes that allow a strategic target to be met. In others, there may be none. Managers will, of course, be seeking the best solution, which turns the task into an optimisation problem. As a result, an objective function must be defined that represents quantities like cost and utility and a heuristic search algorithm must typically be employed to find the solution that will minimise or maximise its value. For instance, given an assumption about the external influence number of requests, we might look for values to apply at other leaf nodes (such as the number of servers and bandwidth) to ensure an SLA is achieved at least 95 per cent of the time (the target agreed with the customer) at minimal cost (the objective function).

4. Business intelligence framework for SOI

As we have already explained, the service provider's environment needs to adapt dynamically as services and external influences change. The primary goal is the maintenance of an applicable SLA, but providers will also want to improve resource usage which in turn will reduce operating cost.

Traditionally, such systems would be subjected to ongoing review by teams of experts who would manually translate SLAs to system-level thresholds in an empirical manner. In this section, we outline an implementation in which we apply the modelling methods we have discussed using a BI platform to translate SLAs to lower-level resource requirements. These methods allow us to predict resource usage such that the process of adapting a system or service in a virtualised environment can be automated. By eliminating manual involvement, the chance of service level breaches is minimised.

4.1 Customer SLA discovery
To be able to use BI to predict adverse events and in turn adapt service levels, we must first discover the factors within the service provider platform that influence its performance and the external influences that apply. We must also find out how they relate to the high-level service agreements.

Consider the example of a simple online financial application. The owner of the application may have a number of partners. One partner may offer stock quotations and online trading services. Others may offer commodity trading services or investment banking. The financial application service must therefore be mapped to a number of service domains, each of which will have a service entry component and a number of dependencies. These could be quantified using learning techniques but, in the first instance, this is not advisable because it would require the collection of all possible dimensions and measures. Even when only internal influences are considered, the numbers of dimensions and measures involved can be large. Once external influences are factored in, the situation can become significantly more complex.

During the discovery phase, we therefore limit ourselves to defining the scope of the BI system by defining the relevant dimensions and measures. The exercise can be split into two phases: a high-level analysis and a more-detailed micro-level approach. At the high-level, we identify and build our understanding of the key web services the service provider provides to its customer. We then break these down into dependencies.

At the micro level, we define metrics for the usage of the various transactions. Measures are proportioned across key services in the infrastructure. This is particularly important in a virtualised environment where hardware resources will be shared across multiple services. Depending on the size and complexity of an organisation, the exercise can be a challenging one. Recent developments in automated application discovery4 can help automate the process.

4.2 SLA dependencies
The availability of a customer's applications is determined by the performance of constituent transactions such as login and order placement. The customer's applications are therefore broken down into sets of key transactions and their dependencies. These are then profiled to identify dependencies in the IT infrastructure between components such as clients, networks, application servers and database servers.

This process allows us to establish dimensions and measures. Dimensions are business entities – location, clients, applications, transactions, subnets, servers and so on. Measures are quantities - the volume of network traffic, elapsed time, utilisation, response time, available CPU, etc. To allow third-party SLA monitoring tools to co-exist and provide a BI environment in which we can evaluate cause and effect, an integration framework is required. The basis of such a framework is a matrix of SLA monitoring tools and dimensions that allows us to align dimensions to one or more of these tools. Using the framework, we integrate data sources across the monitoring piece into a single conformed view – that is, a view in which, instead of having duplication of the same dimension member in multiple sources (SLA monitoring tools), we have a single integrated view. This provides us with an end-to-end 'one truth' picture of measures of, for instance, an application across the service provider infrastructure.
To illustrate this, consider the HTTP application protocol. To monitor a server's health, you might configure HTTP keepalives5 or count connection timeouts6 or HTTP successful responses. These measures will appear in the server health facts related to the dimension 'application' for the dimension member 'HTTP'. (We use 'facts' here as another term for a measure group.)

4 There are a number of application discovery tools in the market place. These discover applications and IT infrastructure and identify resources and application dependencies.

5 A browser will make multiple requests to retrieve an entire web page. To enhance server performance, most web browsers request that the server keep the connection open across these multiple requests – a feature known as an 'HTTP keep-alive'. HTTP keep-alives are required for security and connection-based authentication.

6 A browser will make multiple requests to retrieve an entire web page. To enhance server performance, most web browsers request that the server keep the connection open across these multiple requests – a feature known as an 'HTTP keep-alive'. HTTP keep-alives are required for security and connection-based authentication.

Table 1. Matrix linking conformed dimensions to SLA monitoring systems
        Table 1. Matrix linking conformed dimensions to SLA monitoring systems

Table 2. Example dimensions illustrating some possible hierarchies
        Table 2. Example dimensions illustrating some possible hierarchies

Similarly, to assess network performance, you might monitor traffic volume, percentage of bandwidth utilisation, throughput and round trip time. These measures will appear within the network health facts related to HTTP, which in turn is a member of the application dimension. This provides the database framework to be able to evaluate the network performance for a given server configuration change. Table 1 shows a scaled down example of a matrix to illustrate the concept.

In the scenario under discussion, another example is the search engine transaction which, for illustrative purposes, we'll consider to be part of a share price information application. Referring to table 2, the search engine transaction would be a member of the 'Application' dimension in the 'Application – Transaction' hierarchy for the application 'Share Price Information'.

Dimension hierarchies allow us to organise the way we view measures at different levels. For example, if we needed to check the 'Elapsed Time' for a transaction for a given location and for a particular day, an aggregate value could be retrieved at this level. The aggregate function here would be maximum – the most adverse value – for that particular day. The monitoring system 'User Experience' would provide the measure 'Elapsed Time' for the transaction. The 'Server Health' monitoring system would provide the values of the performance counters for this particular transaction on all the dependent servers being monitored. And the 'Network Performance' and 'Application Performance' monitoring systems would provide the given transaction's bandwidth usage and network latency. This framework would provide both an overall view of the health of the service for the transaction and details the factors and levers that control its availability and performance.

5. Architecture

Thus far, we have identified transactions with their respective dependencies including IT infrastructure alignment and have quantified these in terms of measures. We have also established that service providers need to be able to predict IT infrastructure resource usage and to translate low-level metrics to high-level SLAs and vice versa. Because transactions are being processed on an on-going basis, predictions and metric translations have to be executed 'on the fly' at the 'right time'. In other words, information needs to be provided within a realistic timeframe if it is to be able to support the use of an autonomics system. With this in mind, the BI system architecture must meet requirements such as those we discuss below.
The architecture consists of two parts:

- a web service provider architecture; and
- a BI system architecture which is a subset of the web service provider architecture

5.1 Web service provider architecture
The web hosting provider's environment is dynamic and its business needs are constantly changing. As a result, the key prerequisites for its architecture are that it should be agile, adaptable and efficient.
At a high-level, the architecture will receive data from a number different systems. This will be consumed by other components, so the architecture's primary function is to allow event messages to be transmitted from different sources to all relevant support systems. With this in mind, Service Orientated Architecture (SOA) is a key enabler. Within such an architecture, a web-servicesbased application integration system is required that allows flow-related concepts such as transformations and routing from a number of systems. These may be within the service provider's domain, in the customer's environment or operated by third-party suppliers. Commercial solutions such as Enterprise Service Bus (ESB) from Sonic [9] and the BizTalk server from Microsoft [10] can provide the functionality required.

The flow of messages allows the components illustrated in figure 3 to inform and execute changes to the infrastructure and its configuration that ensure the web service functions at an optimal level. Performance is measured by the servicelevel monitoring system. It will gather end-to-end performance metrics such as application transaction elapsed time measured by the 'User Experience' system, bandwidth throughput and latency from the 'Network Performance' system and performance counters within the 'Server Health' system. The application profile system would discover the dependencies in the infrastructure for each given application. This process is the key to the success of the business intelligence system as a decision support tool in which the profile empowers the collection of correct, reliable, relevant and complete information. We need to clearly define what we are monitoring and retrieve performance data from the dependency hierarchy of the application in the infrastructure as well as external market influences. This information would inform the preparation of data for the data mining models used within the business intelligence system. In practice, external dependencies can be difficult to quantify, so statistical techniques or machine learning algorithms must be employed to learn them automatically from the historic data stored in the events data warehouse.

Figure 3. An example service-orientated architecture
(The highlighted area is shown in more detail in figure 4)
        Figure 3. An example service-orientated architecture (The highlighted area is shown in more detail in figure 4)

Once the BI system is in place and the models are continuously being refreshed with the latest data, predictive expressions can be issued by the BI system. Results returned to the decision support system will establish the effect of changes it initiates and the events most likely to result from particular environment conditions.

The decision support system is a policy-driven dynamic application service management application. Because the policies it implements depend on both prevailing and predicted environmental conditions, the web hosting platform's resources can be adapted in real time to ensure SLAs are met. This is achieved using both virtualised platforms and right-time configuration of networks and server resources.

The deployment or decommissioning of these resources is managed by an orchestration system under the decision support system's control.

5.2 Business intelligence system architecture
The objective of the BI system is to react to business needs at the right time. By anticipating issues, corrective action can be taken before problems escalate or even become apparent. The various parts of the architecture are as follows (see figure 4).

5.2.1 Data acquisition
The data from the source systems is transformed into a common XML format as it flows through the Web Services Integration System (WSIS) illustrated in figure 3. The advantage of using a common format is that the same data can be consumed by several target applications, each of which can reformat it to suit its own needs. Messages passing though the WSIS are usually routed in as close to real time as possible. Given the volume of messages involved, the resources available may often not be sufficient to do significant data transformation and lookup. A WSIS is therefore more suited to doing basic message reformatting than complex data transformation. The Extract, Transform and Load (ETL) system illustrated in figure 4 must generate the data needed by a given application profile or business process from the messages it receives. Its task is to read messages, import them into a relational database management system and prepare the data based on our matrix/dimensional model for loading into the multidimensional On-Line Analytical Processing (OLAP) database [8] such that analytical queries can be performed using preaggregations as part of the hierarchical dimensional model and data can be prepared that data mining models can use as the basis of predictive analytics.

Figure 4. Architecture of the business intelligence system
        Figure 4. Architecture of the business intelligence system

5.2.2 Data storage
There are several ways in which real-time OLAP can be implemented but such matters are beyond the scope of the paper.
To ensure constant availability, each OLAP database will be partitioned and duplicated. This allows one database to be updated while the other is used to support analytical queries. Swap-overs can be scheduled to achieve an acceptable level of latency in the learnt algorithm.

5.2.3 Data delivery
The objective is to feed the decision support system with data about predictive events. To achieve this goal, the decision support system makes requests via the ESB to the data mining engine which will then execute both predictive and analytical queries against a number of models (see figure 4). The results returned will allow the system to implement changes to the infrastructure that are required – for example, to adapt the resources in the customer's virtualised environment dynamically to meet SLAs for a particular application transaction, perhaps by adding or upgrading servers.

6. Conclusion

We have shown how a business intelligence system can be used to control an ICT infrastructure. The system predicts developments within an organisation and in its external environment, using its insights to recommend changes that optimise operational performance and ensure compliance with SLAs.

To illustrate the system's operation, we have explored the example of a web service provider. We discussed how frameworks can be developed to build relationship models that relate operational performance and service level agreements and outlined how such a solution could be implemented at the architectural level. By retrieving messages and events from different sources, the solution provides both a platform for storing and applying predictive data mining models and the means to learn them through data acquisition.

The data sources discussed were primarily focused on internal influences such as the design of the infrastructures used by service providers, customers and third-party suppliers.

We touched on a relatively new area of research – the use of a web intelligence system to adapt IT infrastructure in response to external influences – which involves the analysis of the ways in which users interact with web site pages and other features. In this regard, external influences can have a big impact on the performance of a service provider's infrastructure, so it's important to develop ways for the BI system to measure them and take them into account.

We must also create a means of assessing the accuracy of the mining models we use. This is important both during the testing phase of an implementation, when the sample size will be small, and later as the sample size grows and more patterns and details become apparent. Ways of assessing the accuracy of the BI system on an ongoing basis are another important area of our research.

7. Acknowledgements

The author would like to thank Chris Webb for his help in reading and commenting on a draft of this paper.


  1. Kaplan RS and Norton DP, 'Balanced scorecard: translating strategy into action', Harvard Business School Press, 1996 Top
  2. Kaplan RS and Norton DP, 'Strategy maps: converting intangible assets into tangible outcomes', Harvard Business School Press, 2004 Top  
  3. Golfarelli M, Rizzi S and Zella I, 'Beyond data warehousing: what's next in business intelligence?', Proceedings of the 7th ACM International Workshop on Data Warehousing and OLAP, Washington, 2004 Top  
  4. Martin W and Nu�dorfer R, 'Pulse check: operational, tactical and strategic BPM', White Paper, CSA Consulting GmbH, 2003 Top
  5. Hand D, Mannila H and Smyth P, 'Principles of data mining', MIT Press, 2001 Top
  6. Nauck D, Spott M and Azvine B, 'SPIDA – a novel data analysis tool', BT Technology Journal, vol.21, no.4, 2003, pp.104-112 Top
  7. Box GEP and Jenkins GM, 'Time series analysis: forecasting and control', Holden-Day, San Francisco, 1970 Top
  8. Lemire D, 'Data Warehousing and OLAP', http://www.daniellemire. com/OLAP (accessed December 2008) Top
  9. The Sonic Enterprise Service Bus, Progress Software Corporation, Top
  10. Microsoft BizTalk Server, Top

John ShiangoliJohn Shiangoli works in BT's centre for information and security systems as a Microsoft business intelligence solutions specialist. Previously, he was lead design developer for BT's Applications Assured Infrastructure (AAI) dashboard. The tool provided integrated reporting and analytics based on information from a number of application performance management tools. Before coming to BT, John worked as a reporting solutions specialist at Compuware for a number of years, providing enhancements and innovations to its Vantage performance management suite and winning awards including 'Best Led Field Developed Solution'. His current interests include the introduction of business intelligence to SOI ('total ICT orchestration') and transforming ICT operations (sustainability and ICT root cause analysis). A graduate of University College London, he has experience in implementing solutions for use in industries such as engineering, insurance, logistics, health care and telecommunications.

Martin SpottAfter receiving an MSc in Mathematics from the University of Karlsruhe in 1995, Martin Spott continued working there as a researcher in the university's computer science department. On completing a PhD in reasoning in fuzzy expert systems in 2001, he joined BT where he now works as a principal researcher in the company's intelligent systems research centre. Martin's current research interests include soft computing, machine learning and data mining and their application to real-world problems. He has published numerous papers in these areas, is a member of programme committees for related conferences and has acted as a reviewer for related scientific journals. Since joining BT, he has worked on intelligent data analysis projects with goals including travel time prediction, complaint prevention and automatic data analysis. More recently, he has worked on real-time business intelligence tools.

« Previous | home | Next »