--
IanGreen - 05 Oct 2009
Action taken at bi-weekly on this data to investigate Matt's performance needs, and possible solution using paging of results.
Statement of concern.
that providers and consumers will suffer poor time/space performance if they are processing a
RequirementCollection? resource which contains very many requirements.
how can this cause poor performance:
- the resource representation grows in size with the number of requirements it contains. a large number of requirements would be a large xml document.
- mitigation - don't put anthing more than the URI of the requirement in the collection (this way, there are lots of things, but they are small). (At the time of writing this note, the specification does just that.)
- in certain scenarios the client will GET a collection, then iterate over the URIs GETting each requirement resource: this will require N+1 GETs (1 for the collection and N for each of the requirements) this incurrs a high (time) penalty even on very low latency networks. it might be be better in such scenarios to provide all the requirements "in one go", perhaps by inlining the requirement into the collection.
- contra- the web already deals with this case, at least to some extent. caching browser/http proxy will cache each of these requirement representations, so the need for "in one go" is lessened. indeed, "in one go" can hurt performance in some cases because of this.
- contra-contra- this breaks down for representations which are declared as uncacheable by the orginating server. then, there is a need to use conditional GET and so the latency problem comes back.
- contra- the inlined collection might be quite large, so space performance may be a concern.
- if the inlined collection is changing frequently, clients will need to repeatedly GET this large resource.
- we could allow inlining at the request of the client (say, GET ../requirementcollection24?inline=dc:description, or ...?inline=*) so that selected attribues are inlined in the collection representation. the client is then taking responsibility for dealing with these state management & caching issues.
- another approach is to allow paging (e.g., ATOM paging, RFC5005) so that large requirementcollections are GETted in chunks. such an approach has been adopted for query results by both CM and QM specs.
- contra- it can be tricky for implementations to selectively expire pages based on the changing parts of a collection. i'm not aware that this is done in practice.
- contra- a requirementcollection is different from a query result set. whilst it makes sense to abandon paging through a query result set (perhaps the user grows impatient), doing so on a requirementcollection seems less likely. there is a quality of atomicity about a requirementcollection. also, there are consistency issues in paging through a requirementcollection. the spec would need to consider what happens when paging through a collection whose contents were changing (requirements added or removed). RFC5005 does not address this concern directly (the next/prev URIs could expire in such cases, but in highly concurrent enviroments this would make the ATOM paging useless). It might be acceptable to repeat a requirement but to omit one could be unacceptable.
- It is accepted (by img) that paging makes good sense on query results and we would have to have a paging mechanism when query is introduced
- The RFC5005 spec is a custom XML vocabulary and is not RDF/XML. The RSS 1.0 spec is RDF, but doesn't seem to include pagination.
- We would look into this when query is admitted into the specification.
Background
- CM and QM specs include pagination of query result resources, but not pagination of other resources. Thus, a CR with many attributes will not be paginated; this is mitigated to some extent by provision of path-math URIs that allow elision (by name) of unwanted attributes.
My recommendation for OSLC RM 1.0 is to not introduce pagination. Beyond v1.0, we consider this matter further.
As far as inlining resources into collections is concerned I see that Scenario A requires that a requirement collection to be linked to a test plan, and for each requirement in the test plan to be linked to a test case. In practice, I can't imagine that the test case linking would be done without getting a representation of the requirement - it is likely for example that the name of the test plan will be seeded from the dc:title of the requirement. So I see there being some merit in having a resource which is an "inlined" collection of requirements, as Matt was suggesting.
One way of achieving this is to have a resource which is the inlined requirement collection be a linked resource: Here is an example (from which i've elided the other properties that a
RequirementCollection? has):
<RequirementCollection rdf:about="reqcoll32">
<!-- Non-inlined case -->
<li rdf:resource="req23"/>
<li rdf:resource="req300"/>
<li rdf:resource="req90"/>
<oslc_rm:inlined rdf:resource="reqcol32?inlined=true"/>
</RequirementCollection>
Notice that: no path-math is required, and clients must not infer the existence of a query language from the presence of the ?inlined=true in the URI of the inlined collection. "inlined" seems wrong vocabulary.
A GET on reqcol?inlined=true would yield
<RequirementCollection rdf:about="reqcoll32?inlined=true">
<!-- inlined collection -->
<li>
<Requirement rdf:about="req23">
... contents of requirement 23
</Requirement>
</li>
<li>
<Requirement rdf:about="req300">
... contents of requirement 300
</Requirement>
</li>
<li>
<Requirement rdf:about="req90">
... contents of requirement 90
</Requirement>
</li>
</RequirementCollection>
The content-type of this inlined collection would be the same as for the non-inlined case. I'm only providing for one "inlined" case since that simplifies the resource model and finesses what would otherwise require some query design or a more elaborate resource model. we can judge what "inlined" needs to mean, based on the scenarios. Perhaps it is only the mandatory properties of a <Requirement/>?