HistoryViewLinks to this page 2012 December 10 | 11:17 am

Business Goal

Reduce Mean Time to Root Cause a performance problem with an application middleware resource.

Technical Goals

  1. Performance Monitoring service providers will update, in real time, the health and performance of an application’s middleware resource components.
  2. Performance Monitoring consumers will query the service provider directly for data about the resource, or it may use a yellow pages-style search for the resource of interest using, for example, a central registry that provides lookup services and returns the locations of monitored resources and their providers.
  3. The consumers of the application’s health are able to dynamically determine the health of the components. Any time a user hovers over the icon representing the middleware resource, an OSLC client will dynamically determine the current providers and federate their data about the resource into a UI preview.

Preconditions

Postconditions

Steps

  1. An existing resource (e.g. web application server) is used to host an application.
  2. An application health consumer queries for monitoring information about the resource.
  3. The application health provider responds with a set of Best Practices metrics that summarize the health of the web application server and its applications.
  4. The end user visualizes the current health of the application server and its applications through the UI preview and uses that information to determine if the performance problem is a trend or an anomoly.
  5. Based on their determination, the user opens a ticket, launches into a deep-dive monitoring tool, or runs some automation to quickly resolve a known issue.

Examples

UI Preview Shows Implication(s)
From the UI preview, the app owner can see that the number of users connecting to a component and doing work is trending up This might point to a capacity problem under peak usage or point to a tuning problem during normal usage of the application.
The number of outstanding connections between components is trending up over time This points to a connection leak in the application. The app owner should open a defect against the app.
The heap usage of a software server is trending up over time This points to a memory leak. The app owner should open a defect against the app.
If the resource is an operating system/computer system, the user is presented with a list of running agents and can see that a monitor that should be running is not running
The user can see that available disk space is trending down or is simply lower than expected This points to an application not cleaning up logs or files or a capacity problem beginning to form
The user can see the Top 5 processes in terms of CPU utilization and one is pegging a CPU
The user can see the Top 5 processes in terms of Memory utilization
The user can see the database server’s operational status If non-Active, may require administration
The user can see that the percentage of used buffer pool is getting close to 100, and/or the number of connection entries is higher than normal May point to a connection leak in an application or a capacity problem
The user can see that a reorganization of a table is needed
The user sees that a particular database has a high number of failed SQL statements Could point to a poorly written application
Garbage collection count is high, the % of time garbage collection being used is large, the amount of free memory available after a garbage collection is decreasing Indicates a memory constraint, probably a leak in the Web Application Server

Variations

  1. This scenario does not preclude other resource types besides applications (e.g. computer systems, network switches, etc.)

Detailed Steps

Assumptions: The consumer and provider have shared knowledge about a target resource. It could be a shared name or shared properties. In the example below, the consumer and provider have shared properties about a resource. The consumer does a yellow pages-style search against a central registry of resource names and locations to find target candidates.

  1. Consumer queries the resource registry for a monitoring service provider URL for the selected resource
    1. Resource registry looks up resource record for resource and determines if any monitoring service provider URLs have been registered for it
    2. Resource registry finds monitoring service provider URL and returns the URL to the consumer as an RDF response
  2. Consumer invokes a GET method on monitoring service provider URL that was returned to it by the resource registry for the selected resource
    1. Consumer indicates compact XML in the content-type header because the receiving client is a UI preview window/iframe
    2. Consumer connects to monitoring service provider and issues a GET request on its URL for the target resource
  3. Monitoring Service provider responds to the UI preview with an HTML page embedded in a compact XML document
    1. Monitoring Service provider maps OSLC resource to an internal resource name
    2. Monitoring Service provider gets Best Practices health metrics data for resource
    3. Monitoring Service provider builds an HTML page, embedding the data and any UI elements (e.g. chart, labels) needed to display it
    4. Monitoring Service provide encodes a response document as compact-XML and returns it to the requesting consumer
  4. Consumer displays HTML page in UI preview window/iframe
  5. Based on the content returned, user takes appropriate action