HistoryViewLinks to this page Revision from: 2013 November 21 | 11:37 am
This is the revision from 2013 November 21 at 11:37 amView the current live version of the article.

This page covers the scenarios that are driving the Availability work that the Automation WG is undertaking.

Introduction

The goals of this workgroup are to describe automation scenarios in the context of availability. The scenarios range from simple operational tasks that are carried out automatically on a day-to-day basis, to high availability scenarios where failover activities ensure that the business recovers quickly or even stays up 24x7 through automation, up to disaster recovery scenarios where automated data replication and site switch procedures help to achieve a continuous or near-continuous availability solution for the business.

With availability, the concept of an Availability Resource is introduced. It has different status information, most importantly, it distinguishes:

  • observed status - in what status is the Availability Resource right now?
  • desired status - in what status should the Availability Resource be?
  • compound status - does observed and desired status match?

Availability Resources can also be collections of other Availability Resources to represent redundancy in various forms, for example a primary member with multiple cold standby backup members. In another use case, an Availability Resource represents a replication group with a primary data source and a secondary copy of it. Replication can be done synchronously or asynchronously.

Finally, Availability Resources have a history. Looking at the history, it is possible to forecast their behaviour in the future, for example, projecting the expected planned downtime of an Availability Resource or to compute a Mean-Time-To-Repair (MTTR) value.

Along with these attributes, a list of advanced automation scenarios come along that all together seem to justify the introduction of the concept of Availability Resources with their own scenarios and specification based on top of Automation.

Scenarios

Provide service to list workloads

Workloads are single entities or groups of entities executed on a server for the purpose of fulfilling a particular business value. Examples are started tasks on a z/OS system, a middleware subsystem consisting of several processes / address spaces or even multi-tiered business applications that can span multiple servers. This scenario is about listing all or selected workloads to retrieve status information or further associated workload-specific details.

Provide service to start and stop workloads

This scenarios builds on the scenario introduced above. Understanding the observed status of a given workload, this scenario is about starting and stopping such a workload in an automated way.

Provide service to obtain redundancy information for a workload

  • Owner: Jürgen Holtz, Tim Frießinger
  • Scenario: tbd

This scenario allows a consumer to retrieve redundancy information for a workload. Basically, this means that a list of members in a workload is returned with the function in terms of redundancy and the status of each individual member. So, for example, one member can be the designated primary (= active) member, while others are backup (= inactive) members.

Provide service to failover primary workload

  • Owner: Jürgen Holtz, Tim Frießinger
  • Scenario: tbd

A workload that has been configured to contain one primary member and one or more backup members is said to be highly available. In the case of planned or unplanned outages, one of the backup members can take over the work of the primary member, if the primary member is not available. This scenario describes the steps to failover the primary workload to one of its backup members in case of a planned outage. Note that the automatic failover in case of an unplanned outage is inherent to the automation system (= service provider) used responsible for keeping this workload highly available.