[oslc-core] TRS 2.0 Specification - Rollback Behavior

Wed Jun 12 14:04:08 EDT 2013

The TRS spec mentions server rollbacks in several places, but never 
defines what these are. A definition should be added. There is actually no 
concrete representation for a rollback event. Instead, a server rollback 
is inferred when the client detects certain conditions. The spec [1] has 
the following text:

"In the (hopefully rare) situation that the Client fails to find its sync 
point event, one of two things is likely to have happened on the Server: 
either the Server has truncated its Change Log, or the Server has been 
rolled back to an earlier state.
If the Client had been retaining a local record of previously processed 
events, the Client may be able to detect a Server rollback if it notices 
the successor event of some previously processed event has been removed or 
changed to one with a different identifier than before."

My dev team is working with a client implementation of the TRS spec (LQE) 
that interprets certain contains in the TRS feed as indicating a rollback 
event, and then re-indexes the entire data source. This behavior is 
undesirable since indexing a large data source can take days, during which 
time users can't get accurate query results.

I recommend that we expand the guidance for how TRS clients should respond 
to an inferred rollback event. There should be other less disruptive 
courses of action. In some cases the rollback event is caused by other 
factors. We have observed that the spec is difficult to implement unless 
the server maintains certain information, e.g. a record of each change. In 
our experience, we have never actually rolled back our server, but due to 
race conditions we occasionally produce a change log that appears to 
contain a rollback event.

The alternate responses to a rollback include:
1. ignore - the client continues to process the change log and makes a 
sensible guess about where to cut off, e.g. by remembering some 
information from the previous change log
2. halt - the client stops processing and waits for an administration to 
explicitly select the next action which could be ignore or re-index

The client should be configured with a suitable policy, e.g. ignore, halt, 
or re-index, and have an admin interface so that a human administrator can 
take the best course of action. In any case, a unilateral automatic 
decision to re-index is problematic.

Another way to deal with rollback events is to add a new type of event to 
the change log, i.e. a trs:Rollback event. Only when this event is 
received should a client re-index.

Minor point: the text of the specification should not use both the terms 
"cutoff event" and "synch point". Let's pick one and use it throughout.

Regards, 
___________________________________________________________________________ 

Arthur Ryman 

DE, Chief Architect, Reporting &
Portfolio and Strategy Management
IBM Software, Rational 

Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile)