[Oslc-Automation] Thoughts on teardown scenarios - increased number of resources & length of persistence

Wed May 29 11:56:30 EDT 2013

To summarise my previous email:

Provider implementations may want to be "thin adapters" over existing 
synchronous APIs. These implementations would want to generate OSLC 
resources on the fly for specific requests, based solely on information 
available from the existing API.

This means that Automation Results are not "deleted", merely there is a 
time when they can no longer be generated.

If the existing API is synchronous, then these Automation Results may 
appear to be "deleted" as soon as they have completed. (But they could 
have been returned to the consumer in the HTTP response to the Automation 
Request creation HTTP request, so they are not totally lost.)

Therefore, if we are imposing the requirement that OSLC providers cannot 
delete Results immediately, then such thin adapters must add extra weight 
to their implementation by storing the Automation Results themselves, if 
the existing API does not provide the information to generate them after 
the automation has completed. So they are not as "thin" as they could be.

However, the spec does not currently impose that requirement (at least not 
by my understanding of it), but perhaps in order to aid interoperability 
it ought to alert consumer implementors that the Results may be deleted 
immediately in the case of synchronous execution.

Do we want to explicitly support this immediate "deletion" explicitly? Or 
have another way of helping consumer implementations handle Results being 
deleted before they expect? (Such as a minimum "reasonable time" that they 
should exist for.)

Martin

From:   Martin P Pain/UK/IBM at IBMGB
To:     David N Brauneis <brauneis at us.ibm.com>, 
Cc:     oslc-automation at open-services.net, Oslc-Automation 
<oslc-automation-bounces at open-services.net>
Date:   29/05/2013 16:50
Subject:        Re: [Oslc-Automation] Thoughts on teardown scenarios - 
increased number of resources & length of persistence
Sent by:        "Oslc-Automation" 
<oslc-automation-bounces at open-services.net>

The example that I am thinking of: 

- is a thin adapter (which exposes an existing synchronous API over OSLC). 

(Side note: When I use the word "adapter", I am using it to mean an 
implementation that consumes one API and exposes the same functionality in 
another API, which is not necessarily the same as the RQM "execution 
adapter" sense of the word. Just to be clear...) 

- does not store any resources of its own, but rather only generates them 
on the fly as needed for a specific HTTP request (based on what is exposed 
through the existing non-OSLC API that is being adapted). 

- uses the OSLC-defined synchronous operation. (This provides the 
Automation Result in the HTTP response from the Automation Request 
creation request. That is how the consumer gets the "record of the 
success/failure".) 

This means that Automation Results are not "deleted", merely there is a 
time when they can no longer be generated. 

If the existing non-OSLC API does not expose any results or reports for 
automation executions that have completed (as all the necessary 
information is provided in a synchronous manner) then the thin adapter to 
OSLC cannot generate an Automation Result representation for completed 
executions as there is no data to generate them from. Therefore to 
consumers it looks like the Automation Results have been "deleted" as soon 
as they completed. 

-------------------------------- 

For a concrete example, I will use the VM image example again. 
So, there might be an existing API (perhaps over Java RMI, or some other 
non-HTTP, synchronous transport) for an enterprise (or perhaps even 
public) cloud that allows the deployment of VM images. This API exposes 
the VM images that are available, an action to "start" each of those VM 
images, a list of the current running VMs, and an action to "stop" and 
"restart" each of those running VMs. 

There is a requirement to expose those actions over OSLC. As the number of 
images, running VMs and actions taken are vast, and some of it may change 
very frequently, there is a requirement not to duplicate information 
stored in the existing system in the adapter. (At best there would be a 
cache for speed purposes, but this would not be an exhaustive store of all 
the information.) 

An OSLC consumer is used to pick a running VM from the cloud. The adapter 
(OSLC provider implementation) exposes these by looking up the list of 
running VMs that match the query from the existing API and provides RDF 
representations of these. (I am presuming for the sake of this example 
that there is a common [non-OSLC] vocabulary for describing these VMs, 
which both consumer and adapter understand.) The adapter exposes the 
"stop" and "reset" actions as Automation Plans linked from the "running 
VM" resource. 

The consumer POSTs an Automation Request to execute the "reset" operation. 
The adapter calls the (synchronous) call on the old API to perform the 
reset, holding the connection that the HTTP request came in on open 
without a response, until the call to the old API returns, at which point 
the adapter constructs an Automation Result representation of the result 
of the call, and returns this along with the Automation Request 
representation back to the consumer in an HTTP response. 

Once the consumer has that Automation result representation, it may (if it 
was not written for this specific adapter) attempt to look up the URL of 
the result (perhaps it was not written to take advantage of the 
synchronous method of returning the Auto Result in the response to that 
request). The adapter receives that request, decodes the URL or query to 
determine that it is looking for an Automation Result for a "reset" Plan. 
However, at this point, it has no means of telling when that Vm was last 
reset. 
So the adapter now has three options: 
- Not return any Automation Results. 
- Generate an Automation Result, assuming that the actin was performed, 
and assuming that it completed successfully (it may not be possible to 
assume this reliably) 
- Or the adapter implementation is forced to persist Auto Result 
resources, despite the intention that it should be a "thin" adapter merely 
generating resources on the fly. 

-------------------------------- 

If the people implementing the consumer had been alerted by the spec that 
results might be deleted as soon as they are completed, and that the only 
representation of the result that is available might be the one being 
returned as part of the Auto Request creation HTTP response, then they 
might have made sure that their consumer checked for the Result there, and 
so avoided the need to query for it. This way the thin provider does not 
need to provide a means to query Automation Results. 

Alternatively, if the spec had made clear that there was a minimum time 
that the Result must be available for, then the people implementing the 
adapter might have been able to identify that their desire for an 
implementation that merely generated resources on the fly was not 
compatible with an OSLC Automation implementation. Either this would have 
prevented the use of OSLC Automation (do we really want that?) or they 
would have had the need to find a compromise, and allow for Auto Results 
to be persisted for a short amount of time (approx 1 minute, maybe?) after 
they completed. 

So, if we want to support this type of adapter with resources "generated 
on the fly", then do we need to impose that either: 
1. They are only adapting from APIs that provide information on completed 
automation executions (which may not be provided for much software that 
was not designed for an Enterprise scale), or 
2. They must provide storage of Auto Results (which may be required to be 
persisted storage if in-memory storage may be reset at any time, such as 
with a J2EE web app) 

Perhaps we can say that in the vast majority of cases at least one of 
those two options are available, and leave other cases unsupported. (Which 
might require a note in the spec, or perhaps a link to a separate 
description of this issue.)  Or we could change the consumers' 
expectations about when a Result might be deleted.

From:        David N Brauneis <brauneis at us.ibm.com> 
To:        Martin P Pain/UK/IBM at IBMGB, 
Cc:        Charles Rankin <rankinc at us.ibm.com>, 
oslc-automation at open-services.net, Oslc-Automation 
<oslc-automation-bounces at open-services.net> 
Date:        28/05/2013 15:11 
Subject:        Re: [Oslc-Automation] Thoughts on teardown scenarios - 
increased number of resources & length of persistence 

Martin, 

I do not think that you can rely on a 404 to have either the meaning that 
the automation result has not finished yet (actually, I think the result 
*should* be created when the automation starts thus it having a state that 
lets you know where it is in the overall progress) or has not started. 

I guess I'm struggling to understand why an Automation Provider would 
immediately delete the result (record of the automation run and 
success/failure) - what would be the point? Do you have a concrete example 
of this type of provider? 

In my opinion, a 404 indicates that you have either a bad URL, the item 
has already been deleted, or that the Automation Result has not yet been 
created. 

Regards,
David
____________________________________________________________________
David Brauneis
STSM, Rational Software CTO Office, Advanced Technology & New Product 
Incubation
email: brauneis at us.ibm.com | phone: 720-395-5659 | mobile: 919-656-0874 

From:        Martin P Pain <martinpain at uk.ibm.com> 
To:        David N Brauneis/Raleigh/IBM at IBMUS, 
Cc:        oslc-automation at open-services.net, Oslc-Automation 
<oslc-automation-bounces at open-services.net>, Charles 
Rankin/Austin/IBM at IBMUS 
Date:        05/28/2013 04:25 AM 
Subject:        Re: [Oslc-Automation] Thoughts on teardown scenarios - 
increased number of resources & length of persistence 

David, 

My point about the Automatino Results not being available as soon as they 
have completed is based on the situation where the provider does not want 
to persist the results any longer than necessary. e.g. if it is an adapter 
to an API that does not persist information about anything that has 
already finished. In this case it would not be incompatible with the spec 
to delete the result as soon as it completed. "Providers can persist 
automation results for as long as they deem reasonable" does not state a 
minimum "reasonable time", so a provider implementation could deem zero 
time after completion to be "reasonable". 

So, yes, the 404 or empty query set would be when the result was deleted, 
but my question is "what about if the result is deleted as soon as it 
completes"? 

That is, when the OSLC automation resources "are just generated (or 
responded to) on the fly" as Charles mentioned, then the deletion of an 
Automation Result may not be an active operation - it may happen 
implicitly if the data that it is generated from is no longer available 
once the automation has completed. 

So I'll reword my question to clarify: 
>From the writing of the first version of the spec, what thoughts were 
there around the problems that might arise from results being deleted 
before consumers expect? Does a 404 (or an empty query result) necessarily 
mean it has finished? Or could that mean it hasn't started yet? 

Martin

From:        David N Brauneis <brauneis at us.ibm.com> 
To:        Martin P Pain/UK/IBM at IBMGB, 
Cc:        Charles Rankin <rankinc at us.ibm.com>, 
oslc-automation at open-services.net, Oslc-Automation 
<oslc-automation-bounces at open-services.net>, "Oslc-Automation" 
<oslc-automation-bounces at open-services.net> 
Date:        23/05/2013 17:57 
Subject:        Re: [Oslc-Automation] Thoughts on teardown scenarios - 
increased number of resources & length of persistence 

Martin, 

As for the following question from your note on this subject: 

> This is perhaps an issue with the words "Providers can persist 
automation results for as long as they deem reasonable" from the spec. 
>From the writing of the first version of the spec, what thoughts were 
there around the problems that might > arise from results disappearing 
before consumers expect? Does a 404 (or an empty query result) necessarily 
mean it has finished? Or could that mean it hasn't started yet? 

The result is not a truly transient resource but a resource that can be 
deleted - the request on the other hand is a transient resource and should 
not be depended upon. A result is persistent so I'm not sure I understand 
why you would think that the fact that is completed/finished would cause 
it to return a 404 or an empty query set - I would more likely expect that 
to happen if the result had been deleted. 

What I believe this was intended to mean is that a result exists for some 
amount of time but not necessarily forever - for example, if the 
automation plan is for continuous integration and they occur 10 times an 
hour, that would mean 240 results per day (or 1680 per week or 87600 per 
year)... keeping all of those result forever would eventually be costly 
for most implementation of automation providers, both from a data/disk 
usage and performance perspective. Most available automation providers 
that we looked at had some ability to remove automation results via either 
explicitly removing them or a policy to remove them after something occurs 
(time based, number of results based, etc.). 

Regards,
David
____________________________________________________________________
David Brauneis
STSM, Rational Software CTO Office, Advanced Technology & New Product 
Incubation
email: brauneis at us.ibm.com | phone: 720-395-5659 | mobile: 919-656-0874 

From:        Martin P Pain <martinpain at uk.ibm.com> 
To:        Charles Rankin/Austin/IBM at IBMUS, 
Cc:        oslc-automation at open-services.net, Oslc-Automation 
<oslc-automation-bounces at open-services.net> 
Date:        05/23/2013 09:21 AM 
Subject:        Re: [Oslc-Automation] Thoughts on teardown scenarios - 
increased number of resources & length of persistence 
Sent by:        "Oslc-Automation" 
<oslc-automation-bounces at open-services.net> 

There is another issue with modelling other actions with the 
plan/request/result model. 

Charles, you said "I think there are exactly 3 plans here...  Thus, it 
doesn't actually scale out based on the number of VM Instances. ...the 
plans are likely to not exist as real resources, but rather OSLC 
Automation facades to existing functionality". 

However, the number of Automation Requests and Automation Results would 
scale out based on the number of VM instances. While these might not need 
to exist for as long as the plans, they still need to be available for 
some amount of time. 

For example, with the teardown of a VM instance there might be cases where 
the length of time that that teardown will take is unknown - it could 
range from less than a second up to 5 or 10 minutes, depending on what's 
running on that VM and how carefully it (or its dependencies) need to be 
torn down - and this might be an unknown value to the automation provider. 
As such, if the request and result were no longer available once the 
teardown had finished, it is possible that the consumer will receive an 
HTTP 404 "not found" error when subsequently requesting the Automation 
Request, and no results when querying for the Automation Result, in which 
case is that enough to safely infer that the action completed 
successfully, that the resource was torn down? If a failed teardown would 
result in ongoing costs building up (e.g. per-minute costs for running a 
VM) and such a failure needs to be flagged up promptly to a human user to 
deal with, I do not think the consumer could safely ignore such a response 
from the provider without possibly missing an error case that the human 
user would need to look into. 
On the other hand if the resources are persisted for any length of time 
beyond the completion of the action then the fact that the resources "are 
just generated (or responded to) on the fly" is no longer true - they need 
to be persisted for perhaps longer than they would need to be in the 
provider's native model (if the native interactions were synchronous). 

If the action is performed very quickly, then the result might have 
finished and been removed before the consumer even knowns its URI - 
especially if the request was created from a delegate UI, which would mean 
that the Automation Result cannot be "included" in the response. 

This is perhaps an issue with the words "Providers can persist automation 
results for as long as they deem reasonable" from the spec. From the 
writing of the first version of the spec, what thoughts were there around 
the problems that might arise from results disappearing before consumers 
expect? Does a 404 (or an empty query result) necessarily mean it has 
finished? Or could that mean it hasn't started yet? 

Martin

From:        Charles Rankin <rankinc at us.ibm.com> 
To:        Stephen Rowles/UK/IBM at IBMGB, 
Cc:        oslc-automation at open-services.net 
Date:        22/05/2013 16:40 
Subject:        Re: [Oslc-Automation] Thoughts on teardown scenarios 
Sent by:        "Oslc-Automation" 
<oslc-automation-bounces at open-services.net> 

"Oslc-Automation" <oslc-automation-bounces at open-services.net> wrote on 
05/22/2013 02:52:44 AM:

> From: Stephen Rowles <stephen.rowles at uk.ibm.com> 
> 
> I don't see why Automation resources are (or should be) any 
> different from the other resources defined in OSLC. When you look 
> at, for example, Quality Management, the spec don't expect a Test 
> Script to simply be a pointer to another sort of resource that 
> really contains the information needed, it is a representation of 
> that information. 
> 
> I think that Automation resources should be the same, they should be
> representing the information directly not being a pointer to yet 
> another resource. I think this is more in keeping with the way other
> OSLC resources are defined. 

I agree that an Automation resource should represent its resource 
directly, and I think the description I provided is in line with that. 

> If you look at the language as defined in the spec: 
> 
> Automation Plan - Defines the unit of automation which is available 
> for execution. 
> Test Script Resource - Defines a program or list of steps used to 
> conduct a test 
> 
> The definition of both of these resources doesn't give any 
> indication that they are simple pointers to something else (at least
> to my reading). 

My feeling is that the Automation Plan is a definition of the *action* 
that is to be taken, not of the resource on which the action is to be 
taken.  Typical OSLC resources describe some form of "object" (give me a 
touch of latitude here for the sake of an upcoming analogy).  And OSLC 
describes mechanisms to do basic CRUD (Create/Read/Update/Delete) 
operations on them (in OO parlance, OSLC would provide new/delete and 
getter/setter methods).  My view is that the OSLC Automation spec provides 
a means to define arbitrary "functions" or "methods" for OSLC "objects" 
(or "actions" on "resources" if you prefer). 

In the v2 version of the spec, I think we basically worked through the 
mechanics of how to execute/invoke actions in a standardized way.  Now, as 
we look to the v3 version of the spec, we are really starting to 
understand how to apply that mechanism to various tasks and/or domains.   

> Taking the VM example that you defined I can see that having many 
> Automation plans is nice because there is little understanding 
> required about each one. However what if the running instance of the
> VM is something created many times a day, the number of Automation 
> Plans will rapidly get large, consider a VM template that is turned 
> into a real VM 20 times a day (not unreasonable if you have a large 
> scale dynamic provisioning system). 
> 
> If there needs to be 3 automation plans for each instance for 
> restart/start/stop that's 60 automation plans every day, this 
> rapidly will get out of hand. 

In the generic provider scenario, I think there are exactly 3 plans here, 
one for each of restart/start/stop.  One of the parameters into the plan 
would be the URL to the VM Instance resource upon which to act.  Thus, it 
doesn't actually scale out based on the number of VM Instances.  For the 
purpose built provider, I could easily see the same mechanism being used, 
meaning the references to the restart/start/stop plans on the VM Instance 
are pointing to the "generic" versions, and you still pass the VM Instance 
URL as one of the parameters.  And, if it's truly purpose built, then the 
plans are likely to not exist as real resources, but rather OSLC 
Automation facades to existing functionality.  So, the definitions are 
just generated (or responded to) on the fly. 

As an aside, if you take the viewpoint that the Plan/Result *is* the 
resource, I don't understand how you would otherwise account for these 
different actions.  You would invoke the Automation Plan (which would, I 
think, represent the VM Image) for instantiating the VM Instance, with, I 
presume, the Automation Result representing the actual VM Instance.  And, 
I get (I think) that the VM Instance would get deleted when the Automation 
Result goes away.  But, how do I restart/start/stop the instance in this 
scenario?   

Charles Rankin
Rational CTO Team -- Mobile Development Strategy
101/4L-002 T/L 966-2386_______________________________________________
Oslc-Automation mailing list
Oslc-Automation at open-services.net
http://open-services.net/mailman/listinfo/oslc-automation_open-services.net

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
_______________________________________________
Oslc-Automation mailing list
Oslc-Automation at open-services.net
http://open-services.net/mailman/listinfo/oslc-automation_open-services.net

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
_______________________________________________
Oslc-Automation mailing list
Oslc-Automation at open-services.net
http://open-services.net/mailman/listinfo/oslc-automation_open-services.net

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://open-services.net/pipermail/oslc-automation_open-services.net/attachments/20130529/82623e2b/attachment-0003.html>