[oslc-core] [Lifecycle-query-workgroup] TRS 2.0 Specification - Rollback Behavior

Fri Jun 21 15:06:47 EDT 2013

Hi Arthur, my comments below.

Regards
Vivek

From:   Arthur Ryman/Toronto/IBM at IBMCA
To:     Vivek Garg/Cupertino/IBM at IBMUS, 
Cc:     Benjamin Williams <bwilliams at uk.ibm.com>, 
lifecycle-query-workgroup at mailman.hursley.ibm.com, 
lifecycle-query-workgroup-bounces at mailman.hursley.ibm.com, 
oslc-core at open-services.net, "Oslc-Core" 
<oslc-core-bounces at open-services.net>
Date:   06/21/2013 11:54 AM
Subject:        Re: [oslc-core] [Lifecycle-query-workgroup] TRS 2.0 
Specification - Rollback        Behavior

Vivek,

It is not acceptable to simply re-index or remove a data source just 
because of some bad data that LQE interprets as a rollback. 
<vivek>. 
- It is not some bad data. In this case LQE encounters a change event 
older than the one it last processed before finding the last processed 
event. If it is not intended i.e. it is not due to real rollback, then in 
my opinion it is a defect in the tool that should be fixed.
- Re-index and Remove are the actions offered Today. I already mentioned 
automatic "undo" as a possible enhancement in a future release of LQE. 
Though I don't think we should add actions in LQE to deal with defects in 
the data provider. 
</vivek>

This can take days. It's simply not an option if we expect LQE to be used 
in production on very large data sets. The current behavior is too 
unstable. A human admin must be shown the alleged bad events and be 
allowed to make an informed decision, which must include the option to 
proceed by setting a new cutoff event. LQE should help by presenting the 
alleged rollback condition in context as clearly as possible. 
<vivek>
- LQE does record and display the information to the administrator about 
the offending event that is encountered as well as the information about 
the expected event. This information includes the event type, ID, order 
and the URI of the changed resource. It will not be easy for an 
administrator to determine precisely the change event that is most 
appropriate to renew the change log processing from. In case administrator 
is not able zero on the exact change event, it opens the possibility of an 
inconsistent index, I think It is better for us to fix the defect in the 
tool.
</vivek>

<vivek>
In summary I think doing the following changes resolves the issues around 
rollback.
1. Focal Point should fix the race condition that cause unintended 
rollbacks conditions
2. LQE should support "undo" capability in case of rollback. This would 
provide faster repair of the index vs a re-index.
3. LQE should continue to retry change log processing in case of a 
rollback or truncated change log instead of halting. This would help if 
the administrator is able to identify the problem based on the information 
from the LQE admin UI and fix the problem in the tool. In this cases LQE 
would continue processing once the TRS is behaving correctly.
</vivek>

The human admin may be able to correct the problem at the data source. For 
example, he may be able to create some new change events so that any 
skipped resources get re-indexed.

Regards, 
___________________________________________________________________________ 

Arthur Ryman 

DE, Chief Architect, Reporting &
Portfolio and Strategy Management
IBM Software, Rational 

Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile) 

From:   Vivek Garg/Cupertino/IBM at IBMUS
To:     Arthur Ryman <ryman at ca.ibm.com>, 
Cc:     Benjamin Williams <bwilliams at uk.ibm.com>, 
lifecycle-query-workgroup at mailman.hursley.ibm.com, 
lifecycle-query-workgroup-bounces at mailman.hursley.ibm.com, 
oslc-core at open-services.net, "Oslc-Core" 
<oslc-core-bounces at open-services.net>
Date:   06/21/2013 11:35 AM
Subject:        Re: [oslc-core] [Lifecycle-query-workgroup] TRS 2.0 
Specification - Rollback        Behavior

Few comments and questions: 

1. Current behavior: On a change log processing cycle, LQE scans the 
change log pages, looking for the change event it last processed (on a 
previous change log processing cycle). If LQE encounters a change event 
older than the event it last processed, before finding the last processed, 
LQE treats it as a rollback condition. In such cases, LQE essentials halts 
and waits for Admins input via UI (it is not an automatic re-index). The 
UI currently offers two recommended actions for the admin in such cases: 
Re-Index data source or Remove data source. Currently there is no Ignore 
action offered. 

2. Current behavior: LQE currently does not retain the change event 
history locally, for it to perform an undo in case of a rollback. This 
appears a good enhancement for us to make in a future release of LQE. 

3. Arthur you mentioned Ignore as another possible action to be offered to 
the administrator. What is the Ignore behavior from a client perspective? 
Also is the need for Ignore action still valid if Focal Point's issue with 
race condition was fixed (or any issues in the TRS spec that make it hard 
to implement the spec) ? 

Regards 
Vivek

From:        Arthur Ryman <ryman at ca.ibm.com> 
To:        Benjamin Williams <bwilliams at uk.ibm.com>, 
Cc:        oslc-core at open-services.net, 
lifecycle-query-workgroup-bounces at mailman.hursley.ibm.com, 
lifecycle-query-workgroup at mailman.hursley.ibm.com 
Date:        06/21/2013 10:11 AM 
Subject:        Re: [oslc-core] [Lifecycle-query-workgroup] TRS 2.0 
Specification -        Rollback        Behavior 
Sent by:        "Oslc-Core" <oslc-core-bounces at open-services.net> 

Ben,

Yes, if the server is rolled back, the the index should react so that is 
mirrors the actual state of the server. The index might do that 
efficiently if it stored change events. In the worst case (and the normal 
case) if re-indexes from scratch, which can take days.

My top priority would be so improve the admin UI so that an admin user can 

manually correct or override the problem, e.g. simply ignore it so LQE 
proceeds. In parallel, the admin can touch resources on the server to 
force them to get re-indexed later. We need to avoid a full re-index.

Regards, 
___________________________________________________________________________ 

Arthur Ryman 

DE, Chief Architect, Reporting &
Portfolio and Strategy Management
IBM Software, Rational 

Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile) 

From:   Benjamin Williams <bwilliams at uk.ibm.com>
To:     Arthur Ryman/Toronto/IBM at IBMCA, 
Cc:     lifecycle-query-workgroup at mailman.hursley.ibm.com, 
lifecycle-query-workgroup-bounces at mailman.hursley.ibm.com, 
oslc-core at open-services.net
Date:   06/13/2013 06:06 AM
Subject:        Re: [Lifecycle-query-workgroup] TRS 2.0 Specification - 
Rollback        Behavior

Arthur 

Is it true that if a server performs a rollback then the desired state of 
the index is to reflect the rolled-back state of indexed resources? 

In terms of desired outcome I would prioritise as below: 

1. Client detects a rollback (either through detecting change log 
inconsistencies or through an explicit trs:Rollback event) and processes 
the delta based on local history record 
2. Client detects a rollback (either through detecting change log 
inconsistencies or through an explicit trs:Rollback event) and - due to 
absence of local history - halts and waits for admin intervention to 
select re-index or ignore 
3. Client detects a rollback (either through detecting change log 
inconsistencies or through an explicit trs:Rollback event) and - due to 
absence of local history - proceeds with ignore 
4. Client detects a rollback (either through detecting change log 
inconsistencies or through an explicit trs:Rollback event) and - due to 
absence of local history - proceeds with re-index 

In all cases, a trs:Rollback event would seem a desirable addition, 
however I'm not sure of the real value, as most server rollbacks would 
likely be at the entire server/OS level and so the server would not be 
aware it had been rolled back in order to issue the event. 

With #1 being the optimal outcome, is there any guidance or 
recommendations regarding client implementations 'retaining a local record 

of previously processed events'? 

Regards, 

Ben Williams
Senior Product Manager
IBM Rational Systems Engineering 

Phone: 44-1344 443020
E-mail: bwilliams at uk.ibm.com
Find me on:   and within IBM on: 

5 Guillemot Street
Bracknell, Berkshire RG12 8ER
United Kingdom

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU 

From:        Arthur Ryman <ryman at ca.ibm.com> 
To:        oslc-core at open-services.net, 
Cc:        lifecycle-query-workgroup at mailman.hursley.ibm.com 
Date:        12/06/2013 19:13 
Subject:        [Lifecycle-query-workgroup] TRS 2.0 Specification - 
Rollback        Behavior 
Sent by:        lifecycle-query-workgroup-bounces at mailman.hursley.ibm.com 

The TRS spec mentions server rollbacks in several places, but never 
defines what these are. A definition should be added. There is actually no 

concrete representation for a rollback event. Instead, a server rollback 
is inferred when the client detects certain conditions. The spec [1] has 
the following text:

"In the (hopefully rare) situation that the Client fails to find its sync 
point event, one of two things is likely to have happened on the Server: 
either the Server has truncated its Change Log, or the Server has been 
rolled back to an earlier state.
If the Client had been retaining a local record of previously processed 
events, the Client may be able to detect a Server rollback if it notices 
the successor event of some previously processed event has been removed or 

changed to one with a different identifier than before."

My dev team is working with a client implementation of the TRS spec (LQE) 
that interprets certain contains in the TRS feed as indicating a rollback 
event, and then re-indexes the entire data source. This behavior is 
undesirable since indexing a large data source can take days, during which 

time users can't get accurate query results.

I recommend that we expand the guidance for how TRS clients should respond 

to an inferred rollback event. There should be other less disruptive 
courses of action. In some cases the rollback event is caused by other 
factors. We have observed that the spec is difficult to implement unless 
the server maintains certain information, e.g. a record of each change. In 

our experience, we have never actually rolled back our server, but due to 
race conditions we occasionally produce a change log that appears to 
contain a rollback event.

The alternate responses to a rollback include:
1. ignore - the client continues to process the change log and makes a 
sensible guess about where to cut off, e.g. by remembering some 
information from the previous change log
2. halt - the client stops processing and waits for an administration to 
explicitly select the next action which could be ignore or re-index

The client should be configured with a suitable policy, e.g. ignore, halt, 

or re-index, and have an admin interface so that a human administrator can 

take the best course of action. In any case, a unilateral automatic 
decision to re-index is problematic.

Another way to deal with rollback events is to add a new type of event to 
the change log, i.e. a trs:Rollback event. Only when this event is 
received should a client re-index.

Minor point: the text of the specification should not use both the terms 
"cutoff event" and "synch point". Let's pick one and use it throughout.

Regards, 
___________________________________________________________________________ 

Arthur Ryman 

DE, Chief Architect, Reporting &
Portfolio and Strategy Management
IBM Software, Rational 

Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile) 

_______________________________________________
Lifecycle-query-workgroup mailing list
Lifecycle-query-workgroup at mailman.hursley.ibm.com
http://mailman.hursley.ibm.com/mailman/listinfo/lifecycle-query-workgroup

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://open-services.net/pipermail/oslc-core_open-services.net/attachments/20130621/1969a320/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 360 bytes
Desc: not available
URL: <http://open-services.net/pipermail/oslc-core_open-services.net/attachments/20130621/1969a320/attachment.gif>