[oslc-core] ChangeLog Proposal moving to Convergence Phase

Tue Apr 12 16:48:02 EDT 2011

Hi Frank,

I've been giving the OSLC indexing spec proposal a lot of thought. I've 
been looking for a good way to layer the specification, so that it is 
sufficiently general to garner wide use, and not overly dependent on other 
OSLC mechanisms that would make it more awkward for applications that have 
nothing to do with any OSLC domains.

First, a couple of general comments on the 3 capabilities in the current 
spec:

re: Resource Publishing Capability. The need to enumerate is clear. 
However, it is unclear how an index-building client would use this. Even 
if there was stable paging, it's unclear how the client would then catch 
up on all the changes that happened since the client started walking the 
enumeration. I believe the client needs to have a way to learn exactly 
which event it needs to carry on from using the change log. The only way 
to do this currently is for the client to make a separate request to 
retrieve the current event number from the change log before it requests 
the enumeration. Also, the resources enumeration is tied to the time of 
the request. For a server with a very large set of resources, this may be 
expensive. If it were possible for the server to answer an enumeration of 
its resource set at a point in the past of its choosing, the server would 
have more flexibility as to how and when to enumerate its resource set.

re: Resource Changelog Capability. A stable, paged representation of 
change entries in reverse chronological order based on event sequence 
number - feels right, except for format of the representation. Using an 
RDF representation for this seems awkward and unwarranted (RDF does not do 
ordering easily). Using Atom and AtomPub would be more appropriate, and 
arguably closer to what people would expect of an internet change log 
protocol. Also, given that the server is allowed to truncate change 
entries from the change log, it might make sense to tell the client up 
front the number of the lowest numbered event in the entire log. That way 
the client can at least determine when they've "missed the boat" - missed 
out on some events that were crucial to incrementally updating their 
picture.

re: Resource Security Capability. I didn't look at this - but agree we 
will need something that addresses security. One observation: the Resource 
Publishing Capability and Resource Changelog Capability are designed to be 
used by a different clientele from that of the other capabilities found in 
an OSLC service provider. The latter capabilities implicitly require an 
authenticated user, and constrain access based on the permission of that 
user. The former capabilities likely require an authenticated client 
application, will need to reify access constraints, and later apply those 
access constraints when running queries on behalf of an authenticated 
user.

More generally, here's how I've come to think about this problem.

A server maintains a particular set of resources, and wants to make that 
set of resources available to its clients. These clients, who have no a 
priori knowledge of which resources are or are not in the set, need a way 
to enumerate the URIs of the resources in the set. (Hence the Resource 
Publishing Capability.) The set of resources may be continually changing 
under foot, and clients need a way to track how those changes affect the 
set of resources. (Hence the Resource Changelog Capability.) Our primary 
envisioned clients do both. They start off enumerating, and afterwards 
switch to incremental updating. And the reason our clients are interested 
in certain sets of resources is that they are trying to retrieve them to 
get RDF triples to put into a RDF triple store. However, 99.99% of the 
protocol is about dealing with a large active set of resources, and only 
0.01% about these resources being bearers of RDF triples.

Rather than specify it as two separate capabilities, it would makes sense 
to specify them as a single capability, with a single endpoint. Concieved 
of this way, the capability at the heart of things is a protocol for 
dealing with big sets of resources. Call this the Big Resource Set 
protocol. A server would implement Big Resource Set protocol to expose its 
set of resources; a client would consume the Big Resource Set protocol to 
initially enumerate the resource set, and afterwards to continue 
monitoring for incremental changes affecting resources in the set. The Big 
Resource Set protocol would be neutral on how the resource set comes into 
being, what causes it to change, and which resources might end up as 
members of the set. The protocol would also be neutral on the 
representation of the resources; all the spec would need to promise is 
that all resources are identified by URIs, and that HTTP etags are used to 
identify distinct resource states.

The Big Resource Set protocol would serve as the lowest level protocol 
spec. We would build a second separate layer atop it. For our problem at 
hard - defining general purpose sources of indexable RDF content - a 
server would provide an endpoint implementing the Big Resource Set 
protocol, with the added provisio that all resources in the resource set 
are dereferencable to RDF content, with the etag varying with significant 
changes to that RDF content. (We would also address the matter of security 
at this level, and spell out the expectations about how these endpoints 
are available only to trusted indexer clients that can pick up ACL 
information for the resources and correctly apply it when the index data 
is shown to regular users. We would also need to spell out expectations 
regarding the logical consistency of the RDF content across resources in 
the resource set, since it will likely be undesirable if the fact base 
contains contradictions. The matter of overlapping resource sets that you 
raise would also be addressed at this level.)

We would add a thin layer on top the second to tie things in to an OSLC 
domain. In the context of an OSLC domain specification, we would further 
specify that an OSLC service provider should expose one or more RDF index 
source endpoints and make them discoverable via markup in the OSLC service 
provider. The resources in the resource sets would be the "published 
resources" belonging to that OSLC service provider.

Regards,
Jim

From:
Frank Budinsky/Toronto/IBM at IBMCA
To:
oslc-core at open-services.net
Cc:
RELM Development <RELM_Development%IBMCA at ca.ibm.com>
Date:
04/05/2011 04:23 PM
Subject:
[oslc-core] ChangeLog Proposal moving to Convergence Phase
Sent by:
oslc-core-bounces at open-services.net

I've updated the Change Log proposal to include all the issues we've 
discussed, and to provide a little more elaboration for things that people 
didn't seem to easily pick up from earlier drafts. It is available here:
http://open-services.net/pub/Main/IndexingProposals/OSLC_indexing_0404.doc 

I'll look at converting it to the proper OSLC TWiki format next.

Thanks,
Frank._______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net