[Oslc-Automation] Thoughts on teardown scenarios - increased number of resources & length of persistence
Martin P Pain
martinpain at uk.ibm.com
Wed May 29 11:56:30 EDT 2013
To summarise my previous email:
Provider implementations may want to be "thin adapters" over existing
synchronous APIs. These implementations would want to generate OSLC
resources on the fly for specific requests, based solely on information
available from the existing API.
This means that Automation Results are not "deleted", merely there is a
time when they can no longer be generated.
If the existing API is synchronous, then these Automation Results may
appear to be "deleted" as soon as they have completed. (But they could
have been returned to the consumer in the HTTP response to the Automation
Request creation HTTP request, so they are not totally lost.)
Therefore, if we are imposing the requirement that OSLC providers cannot
delete Results immediately, then such thin adapters must add extra weight
to their implementation by storing the Automation Results themselves, if
the existing API does not provide the information to generate them after
the automation has completed. So they are not as "thin" as they could be.
However, the spec does not currently impose that requirement (at least not
by my understanding of it), but perhaps in order to aid interoperability
it ought to alert consumer implementors that the Results may be deleted
immediately in the case of synchronous execution.
Do we want to explicitly support this immediate "deletion" explicitly? Or
have another way of helping consumer implementations handle Results being
deleted before they expect? (Such as a minimum "reasonable time" that they
should exist for.)
Martin
From: Martin P Pain/UK/IBM at IBMGB
To: David N Brauneis <brauneis at us.ibm.com>,
Cc: oslc-automation at open-services.net, Oslc-Automation
<oslc-automation-bounces at open-services.net>
Date: 29/05/2013 16:50
Subject: Re: [Oslc-Automation] Thoughts on teardown scenarios -
increased number of resources & length of persistence
Sent by: "Oslc-Automation"
<oslc-automation-bounces at open-services.net>
The example that I am thinking of:
- is a thin adapter (which exposes an existing synchronous API over OSLC).
(Side note: When I use the word "adapter", I am using it to mean an
implementation that consumes one API and exposes the same functionality in
another API, which is not necessarily the same as the RQM "execution
adapter" sense of the word. Just to be clear...)
- does not store any resources of its own, but rather only generates them
on the fly as needed for a specific HTTP request (based on what is exposed
through the existing non-OSLC API that is being adapted).
- uses the OSLC-defined synchronous operation. (This provides the
Automation Result in the HTTP response from the Automation Request
creation request. That is how the consumer gets the "record of the
success/failure".)
This means that Automation Results are not "deleted", merely there is a
time when they can no longer be generated.
If the existing non-OSLC API does not expose any results or reports for
automation executions that have completed (as all the necessary
information is provided in a synchronous manner) then the thin adapter to
OSLC cannot generate an Automation Result representation for completed
executions as there is no data to generate them from. Therefore to
consumers it looks like the Automation Results have been "deleted" as soon
as they completed.
--------------------------------
For a concrete example, I will use the VM image example again.
So, there might be an existing API (perhaps over Java RMI, or some other
non-HTTP, synchronous transport) for an enterprise (or perhaps even
public) cloud that allows the deployment of VM images. This API exposes
the VM images that are available, an action to "start" each of those VM
images, a list of the current running VMs, and an action to "stop" and
"restart" each of those running VMs.
There is a requirement to expose those actions over OSLC. As the number of
images, running VMs and actions taken are vast, and some of it may change
very frequently, there is a requirement not to duplicate information
stored in the existing system in the adapter. (At best there would be a
cache for speed purposes, but this would not be an exhaustive store of all
the information.)
An OSLC consumer is used to pick a running VM from the cloud. The adapter
(OSLC provider implementation) exposes these by looking up the list of
running VMs that match the query from the existing API and provides RDF
representations of these. (I am presuming for the sake of this example
that there is a common [non-OSLC] vocabulary for describing these VMs,
which both consumer and adapter understand.) The adapter exposes the
"stop" and "reset" actions as Automation Plans linked from the "running
VM" resource.
The consumer POSTs an Automation Request to execute the "reset" operation.
The adapter calls the (synchronous) call on the old API to perform the
reset, holding the connection that the HTTP request came in on open
without a response, until the call to the old API returns, at which point
the adapter constructs an Automation Result representation of the result
of the call, and returns this along with the Automation Request
representation back to the consumer in an HTTP response.
Once the consumer has that Automation result representation, it may (if it
was not written for this specific adapter) attempt to look up the URL of
the result (perhaps it was not written to take advantage of the
synchronous method of returning the Auto Result in the response to that
request). The adapter receives that request, decodes the URL or query to
determine that it is looking for an Automation Result for a "reset" Plan.
However, at this point, it has no means of telling when that Vm was last
reset.
So the adapter now has three options:
- Not return any Automation Results.
- Generate an Automation Result, assuming that the actin was performed,
and assuming that it completed successfully (it may not be possible to
assume this reliably)
- Or the adapter implementation is forced to persist Auto Result
resources, despite the intention that it should be a "thin" adapter merely
generating resources on the fly.
--------------------------------
If the people implementing the consumer had been alerted by the spec that
results might be deleted as soon as they are completed, and that the only
representation of the result that is available might be the one being
returned as part of the Auto Request creation HTTP response, then they
might have made sure that their consumer checked for the Result there, and
so avoided the need to query for it. This way the thin provider does not
need to provide a means to query Automation Results.
Alternatively, if the spec had made clear that there was a minimum time
that the Result must be available for, then the people implementing the
adapter might have been able to identify that their desire for an
implementation that merely generated resources on the fly was not
compatible with an OSLC Automation implementation. Either this would have
prevented the use of OSLC Automation (do we really want that?) or they
would have had the need to find a compromise, and allow for Auto Results
to be persisted for a short amount of time (approx 1 minute, maybe?) after
they completed.
So, if we want to support this type of adapter with resources "generated
on the fly", then do we need to impose that either:
1. They are only adapting from APIs that provide information on completed
automation executions (which may not be provided for much software that
was not designed for an Enterprise scale), or
2. They must provide storage of Auto Results (which may be required to be
persisted storage if in-memory storage may be reset at any time, such as
with a J2EE web app)
Perhaps we can say that in the vast majority of cases at least one of
those two options are available, and leave other cases unsupported. (Which
might require a note in the spec, or perhaps a link to a separate
description of this issue.) Or we could change the consumers'
expectations about when a Result might be deleted.
From: David N Brauneis <brauneis at us.ibm.com>
To: Martin P Pain/UK/IBM at IBMGB,
Cc: Charles Rankin <rankinc at us.ibm.com>,
oslc-automation at open-services.net, Oslc-Automation
<oslc-automation-bounces at open-services.net>
Date: 28/05/2013 15:11
Subject: Re: [Oslc-Automation] Thoughts on teardown scenarios -
increased number of resources & length of persistence
Martin,
I do not think that you can rely on a 404 to have either the meaning that
the automation result has not finished yet (actually, I think the result
*should* be created when the automation starts thus it having a state that
lets you know where it is in the overall progress) or has not started.
I guess I'm struggling to understand why an Automation Provider would
immediately delete the result (record of the automation run and
success/failure) - what would be the point? Do you have a concrete example
of this type of provider?
In my opinion, a 404 indicates that you have either a bad URL, the item
has already been deleted, or that the Automation Result has not yet been
created.
Regards,
David
____________________________________________________________________
David Brauneis
STSM, Rational Software CTO Office, Advanced Technology & New Product
Incubation
email: brauneis at us.ibm.com | phone: 720-395-5659 | mobile: 919-656-0874
From: Martin P Pain <martinpain at uk.ibm.com>
To: David N Brauneis/Raleigh/IBM at IBMUS,
Cc: oslc-automation at open-services.net, Oslc-Automation
<oslc-automation-bounces at open-services.net>, Charles
Rankin/Austin/IBM at IBMUS
Date: 05/28/2013 04:25 AM
Subject: Re: [Oslc-Automation] Thoughts on teardown scenarios -
increased number of resources & length of persistence
David,
My point about the Automatino Results not being available as soon as they
have completed is based on the situation where the provider does not want
to persist the results any longer than necessary. e.g. if it is an adapter
to an API that does not persist information about anything that has
already finished. In this case it would not be incompatible with the spec
to delete the result as soon as it completed. "Providers can persist
automation results for as long as they deem reasonable" does not state a
minimum "reasonable time", so a provider implementation could deem zero
time after completion to be "reasonable".
So, yes, the 404 or empty query set would be when the result was deleted,
but my question is "what about if the result is deleted as soon as it
completes"?
That is, when the OSLC automation resources "are just generated (or
responded to) on the fly" as Charles mentioned, then the deletion of an
Automation Result may not be an active operation - it may happen
implicitly if the data that it is generated from is no longer available
once the automation has completed.
So I'll reword my question to clarify:
>From the writing of the first version of the spec, what thoughts were
there around the problems that might arise from results being deleted
before consumers expect? Does a 404 (or an empty query result) necessarily
mean it has finished? Or could that mean it hasn't started yet?
Martin
From: David N Brauneis <brauneis at us.ibm.com>
To: Martin P Pain/UK/IBM at IBMGB,
Cc: Charles Rankin <rankinc at us.ibm.com>,
oslc-automation at open-services.net, Oslc-Automation
<oslc-automation-bounces at open-services.net>, "Oslc-Automation"
<oslc-automation-bounces at open-services.net>
Date: 23/05/2013 17:57
Subject: Re: [Oslc-Automation] Thoughts on teardown scenarios -
increased number of resources & length of persistence
Martin,
As for the following question from your note on this subject:
> This is perhaps an issue with the words "Providers can persist
automation results for as long as they deem reasonable" from the spec.
>From the writing of the first version of the spec, what thoughts were
there around the problems that might > arise from results disappearing
before consumers expect? Does a 404 (or an empty query result) necessarily
mean it has finished? Or could that mean it hasn't started yet?
The result is not a truly transient resource but a resource that can be
deleted - the request on the other hand is a transient resource and should
not be depended upon. A result is persistent so I'm not sure I understand
why you would think that the fact that is completed/finished would cause
it to return a 404 or an empty query set - I would more likely expect that
to happen if the result had been deleted.
What I believe this was intended to mean is that a result exists for some
amount of time but not necessarily forever - for example, if the
automation plan is for continuous integration and they occur 10 times an
hour, that would mean 240 results per day (or 1680 per week or 87600 per
year)... keeping all of those result forever would eventually be costly
for most implementation of automation providers, both from a data/disk
usage and performance perspective. Most available automation providers
that we looked at had some ability to remove automation results via either
explicitly removing them or a policy to remove them after something occurs
(time based, number of results based, etc.).
Regards,
David
____________________________________________________________________
David Brauneis
STSM, Rational Software CTO Office, Advanced Technology & New Product
Incubation
email: brauneis at us.ibm.com | phone: 720-395-5659 | mobile: 919-656-0874
From: Martin P Pain <martinpain at uk.ibm.com>
To: Charles Rankin/Austin/IBM at IBMUS,
Cc: oslc-automation at open-services.net, Oslc-Automation
<oslc-automation-bounces at open-services.net>
Date: 05/23/2013 09:21 AM
Subject: Re: [Oslc-Automation] Thoughts on teardown scenarios -
increased number of resources & length of persistence
Sent by: "Oslc-Automation"
<oslc-automation-bounces at open-services.net>
There is another issue with modelling other actions with the
plan/request/result model.
Charles, you said "I think there are exactly 3 plans here... Thus, it
doesn't actually scale out based on the number of VM Instances. ...the
plans are likely to not exist as real resources, but rather OSLC
Automation facades to existing functionality".
However, the number of Automation Requests and Automation Results would
scale out based on the number of VM instances. While these might not need
to exist for as long as the plans, they still need to be available for
some amount of time.
For example, with the teardown of a VM instance there might be cases where
the length of time that that teardown will take is unknown - it could
range from less than a second up to 5 or 10 minutes, depending on what's
running on that VM and how carefully it (or its dependencies) need to be
torn down - and this might be an unknown value to the automation provider.
As such, if the request and result were no longer available once the
teardown had finished, it is possible that the consumer will receive an
HTTP 404 "not found" error when subsequently requesting the Automation
Request, and no results when querying for the Automation Result, in which
case is that enough to safely infer that the action completed
successfully, that the resource was torn down? If a failed teardown would
result in ongoing costs building up (e.g. per-minute costs for running a
VM) and such a failure needs to be flagged up promptly to a human user to
deal with, I do not think the consumer could safely ignore such a response
from the provider without possibly missing an error case that the human
user would need to look into.
On the other hand if the resources are persisted for any length of time
beyond the completion of the action then the fact that the resources "are
just generated (or responded to) on the fly" is no longer true - they need
to be persisted for perhaps longer than they would need to be in the
provider's native model (if the native interactions were synchronous).
If the action is performed very quickly, then the result might have
finished and been removed before the consumer even knowns its URI -
especially if the request was created from a delegate UI, which would mean
that the Automation Result cannot be "included" in the response.
This is perhaps an issue with the words "Providers can persist automation
results for as long as they deem reasonable" from the spec. From the
writing of the first version of the spec, what thoughts were there around
the problems that might arise from results disappearing before consumers
expect? Does a 404 (or an empty query result) necessarily mean it has
finished? Or could that mean it hasn't started yet?
Martin
From: Charles Rankin <rankinc at us.ibm.com>
To: Stephen Rowles/UK/IBM at IBMGB,
Cc: oslc-automation at open-services.net
Date: 22/05/2013 16:40
Subject: Re: [Oslc-Automation] Thoughts on teardown scenarios
Sent by: "Oslc-Automation"
<oslc-automation-bounces at open-services.net>
"Oslc-Automation" <oslc-automation-bounces at open-services.net> wrote on
05/22/2013 02:52:44 AM:
> From: Stephen Rowles <stephen.rowles at uk.ibm.com>
>
> I don't see why Automation resources are (or should be) any
> different from the other resources defined in OSLC. When you look
> at, for example, Quality Management, the spec don't expect a Test
> Script to simply be a pointer to another sort of resource that
> really contains the information needed, it is a representation of
> that information.
>
> I think that Automation resources should be the same, they should be
> representing the information directly not being a pointer to yet
> another resource. I think this is more in keeping with the way other
> OSLC resources are defined.
I agree that an Automation resource should represent its resource
directly, and I think the description I provided is in line with that.
> If you look at the language as defined in the spec:
>
> Automation Plan - Defines the unit of automation which is available
> for execution.
> Test Script Resource - Defines a program or list of steps used to
> conduct a test
>
> The definition of both of these resources doesn't give any
> indication that they are simple pointers to something else (at least
> to my reading).
My feeling is that the Automation Plan is a definition of the *action*
that is to be taken, not of the resource on which the action is to be
taken. Typical OSLC resources describe some form of "object" (give me a
touch of latitude here for the sake of an upcoming analogy). And OSLC
describes mechanisms to do basic CRUD (Create/Read/Update/Delete)
operations on them (in OO parlance, OSLC would provide new/delete and
getter/setter methods). My view is that the OSLC Automation spec provides
a means to define arbitrary "functions" or "methods" for OSLC "objects"
(or "actions" on "resources" if you prefer).
In the v2 version of the spec, I think we basically worked through the
mechanics of how to execute/invoke actions in a standardized way. Now, as
we look to the v3 version of the spec, we are really starting to
understand how to apply that mechanism to various tasks and/or domains.
> Taking the VM example that you defined I can see that having many
> Automation plans is nice because there is little understanding
> required about each one. However what if the running instance of the
> VM is something created many times a day, the number of Automation
> Plans will rapidly get large, consider a VM template that is turned
> into a real VM 20 times a day (not unreasonable if you have a large
> scale dynamic provisioning system).
>
> If there needs to be 3 automation plans for each instance for
> restart/start/stop that's 60 automation plans every day, this
> rapidly will get out of hand.
In the generic provider scenario, I think there are exactly 3 plans here,
one for each of restart/start/stop. One of the parameters into the plan
would be the URL to the VM Instance resource upon which to act. Thus, it
doesn't actually scale out based on the number of VM Instances. For the
purpose built provider, I could easily see the same mechanism being used,
meaning the references to the restart/start/stop plans on the VM Instance
are pointing to the "generic" versions, and you still pass the VM Instance
URL as one of the parameters. And, if it's truly purpose built, then the
plans are likely to not exist as real resources, but rather OSLC
Automation facades to existing functionality. So, the definitions are
just generated (or responded to) on the fly.
As an aside, if you take the viewpoint that the Plan/Result *is* the
resource, I don't understand how you would otherwise account for these
different actions. You would invoke the Automation Plan (which would, I
think, represent the VM Image) for instantiating the VM Instance, with, I
presume, the Automation Result representing the actual VM Instance. And,
I get (I think) that the VM Instance would get deleted when the Automation
Result goes away. But, how do I restart/start/stop the instance in this
scenario?
Charles Rankin
Rational CTO Team -- Mobile Development Strategy
101/4L-002 T/L 966-2386_______________________________________________
Oslc-Automation mailing list
Oslc-Automation at open-services.net
http://open-services.net/mailman/listinfo/oslc-automation_open-services.net
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
_______________________________________________
Oslc-Automation mailing list
Oslc-Automation at open-services.net
http://open-services.net/mailman/listinfo/oslc-automation_open-services.net
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
_______________________________________________
Oslc-Automation mailing list
Oslc-Automation at open-services.net
http://open-services.net/mailman/listinfo/oslc-automation_open-services.net
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://open-services.net/pipermail/oslc-automation_open-services.net/attachments/20130529/82623e2b/attachment-0001.html>
More information about the Oslc-Automation
mailing list