Failover workload

Sort of a ghost town here
Active specification development is now at Automation TC

Want to contribute?

History View Links to this page 2014 January 23 | 10:16 am

Failover workloads

When the current active (=primary) member has to be stopped, for example, because of software or system maintenance activities, a consumer will be able to “move” the role of the current active member to one of the backup members. The former primary member will become inactive and the selected backup member will become active.

This scenario actually builds on the scenario Obtain redundancy information for a workload.

As a pre-requisite, the consumer has obtained a list of all Availability Resources from the service provider, so that he understands the workloads that can be automated and which of them provide redundancy because they represent groups containing other Availability Resources as a member. See also scenarios Obtain list of workloads and Obtain redundancy information for a workload

A typical consumer scenario is when a system administrator wants to shutdown a server but doesn’t want to cause an outage of a business application at the same time. So, when the business application consists of redundant resources, the system administrator can first failover services to one of the remaining resources and then shutdown the server.

In another scenario, a disaster recovery manager can orchestrate the failover from one site to another programatically, knowing that there is a backup site that can compensate for a failure of the primary site. So here, the Availability Resource could actually represent a group of systems that themselves are represented as Availability Resources.

Generic Flow

The consumer is given the service provider to use.
The consumer knows that the Availability Resource provides redundancy
The consumer requests a list of members for such an Availability Resource with their status and their role as seen by the service provider. An overall redundancy status is provided as well.
The consumer requests from the service provider, that for the given Availability Resource, the current primary member should become a backup member and the selected member should become the new primary member.

Specific flow using Automation 2.1

The consumer is given the service provider to use.
The consumer knows that the Availability Resource provides redundancy.
The consumer requests a list of members for such an Availability Resource with their status and their role as seen by the service provider. An overall redundancy status is provided as well.
The consumer requests an Automation Plan for this Availability Resource (see Note 1).
The consumer picks a backup member of choice and creates an Automation Request for the Automation Plan requested before and sets the input parameters such that primary member of this Availability Resource will be stopped and the selected backup member will be started.
Once created, the Automation Request is executed asynchronously by the service provider. The service provider has created an Automation Result and returned that to the consumer.
The consumer periodically polls the Automation Result until the request has been fulfilled.
The consumer queries the observed (= current) status of the former backup Availability Resource member from the service provider.
The former backup Availability Resource is now the primary Availability Resource in this Redundancy Group.

Notes: 1. An Automation Plan as opposed to a specific service provided by an Availability Service provider is chosen because a failover means changing Availability Resources’ desired status changes. And since this can be a long-running process, depending on the complexity of the involved Availability Resources, this process is very similar to just starting or stopping an Availability Resource using an Automation Request.

Variations

The consumer requests an Automation Plan for the selected Redundancy Group to “move” the primary role back to the member that originally had the primary role before a failover was made.
The consumer creates an Automation Request for the Automation Plan requested before.
Once created, the Automation Request is executed asynchronously by the service provider. The service provider has created an Automation Result and returned that to the consumer.
The consumer periodically polls the Automation Result until the request has been fulfilled.
The consumer queries the observed (= current) status of the preferred Availability Resource member in the Redundancy Group from the service provider.
The preferred Availability Resource is now the primary Availability Resource in this Redundancy Group.

Examples

Software Maintenance

Software maintenance is required for the system, where the primary SAP central services region runs. A Redundancy Group exists with two contained SAP central regions. The consumer requests an Automation Plan for the SAP central region redundancy group and creates an Automation Request specifying the new primary member in this Redundancy Group. The Automation Request is executed asynchronously by the service provider. After successful completion of the request, the roles in the SAP central services group are switched.

Restore environment as it was before Failover

After software maintenance has been completed, the role of the primary SAP central services region should be restored on the system where it was before the Failover was made. The consumer requests an Automation Plan for the SAP central region redundancy group to move the primary role to the preferred member. The Automation Request is executed asynchronously by the service provider. After successful completion of the request, the roles in the SAP central services group are restored to what the were before the Failover.