[oslc-core] ChangeLog Proposal moving to Convergence Phase
Jim des Rivieres
Jim_des_Rivieres at ca.ibm.com
Tue Apr 12 16:48:02 EDT 2011
Hi Frank,
I've been giving the OSLC indexing spec proposal a lot of thought. I've
been looking for a good way to layer the specification, so that it is
sufficiently general to garner wide use, and not overly dependent on other
OSLC mechanisms that would make it more awkward for applications that have
nothing to do with any OSLC domains.
First, a couple of general comments on the 3 capabilities in the current
spec:
re: Resource Publishing Capability. The need to enumerate is clear.
However, it is unclear how an index-building client would use this. Even
if there was stable paging, it's unclear how the client would then catch
up on all the changes that happened since the client started walking the
enumeration. I believe the client needs to have a way to learn exactly
which event it needs to carry on from using the change log. The only way
to do this currently is for the client to make a separate request to
retrieve the current event number from the change log before it requests
the enumeration. Also, the resources enumeration is tied to the time of
the request. For a server with a very large set of resources, this may be
expensive. If it were possible for the server to answer an enumeration of
its resource set at a point in the past of its choosing, the server would
have more flexibility as to how and when to enumerate its resource set.
re: Resource Changelog Capability. A stable, paged representation of
change entries in reverse chronological order based on event sequence
number - feels right, except for format of the representation. Using an
RDF representation for this seems awkward and unwarranted (RDF does not do
ordering easily). Using Atom and AtomPub would be more appropriate, and
arguably closer to what people would expect of an internet change log
protocol. Also, given that the server is allowed to truncate change
entries from the change log, it might make sense to tell the client up
front the number of the lowest numbered event in the entire log. That way
the client can at least determine when they've "missed the boat" - missed
out on some events that were crucial to incrementally updating their
picture.
re: Resource Security Capability. I didn't look at this - but agree we
will need something that addresses security. One observation: the Resource
Publishing Capability and Resource Changelog Capability are designed to be
used by a different clientele from that of the other capabilities found in
an OSLC service provider. The latter capabilities implicitly require an
authenticated user, and constrain access based on the permission of that
user. The former capabilities likely require an authenticated client
application, will need to reify access constraints, and later apply those
access constraints when running queries on behalf of an authenticated
user.
More generally, here's how I've come to think about this problem.
A server maintains a particular set of resources, and wants to make that
set of resources available to its clients. These clients, who have no a
priori knowledge of which resources are or are not in the set, need a way
to enumerate the URIs of the resources in the set. (Hence the Resource
Publishing Capability.) The set of resources may be continually changing
under foot, and clients need a way to track how those changes affect the
set of resources. (Hence the Resource Changelog Capability.) Our primary
envisioned clients do both. They start off enumerating, and afterwards
switch to incremental updating. And the reason our clients are interested
in certain sets of resources is that they are trying to retrieve them to
get RDF triples to put into a RDF triple store. However, 99.99% of the
protocol is about dealing with a large active set of resources, and only
0.01% about these resources being bearers of RDF triples.
Rather than specify it as two separate capabilities, it would makes sense
to specify them as a single capability, with a single endpoint. Concieved
of this way, the capability at the heart of things is a protocol for
dealing with big sets of resources. Call this the Big Resource Set
protocol. A server would implement Big Resource Set protocol to expose its
set of resources; a client would consume the Big Resource Set protocol to
initially enumerate the resource set, and afterwards to continue
monitoring for incremental changes affecting resources in the set. The Big
Resource Set protocol would be neutral on how the resource set comes into
being, what causes it to change, and which resources might end up as
members of the set. The protocol would also be neutral on the
representation of the resources; all the spec would need to promise is
that all resources are identified by URIs, and that HTTP etags are used to
identify distinct resource states.
The Big Resource Set protocol would serve as the lowest level protocol
spec. We would build a second separate layer atop it. For our problem at
hard - defining general purpose sources of indexable RDF content - a
server would provide an endpoint implementing the Big Resource Set
protocol, with the added provisio that all resources in the resource set
are dereferencable to RDF content, with the etag varying with significant
changes to that RDF content. (We would also address the matter of security
at this level, and spell out the expectations about how these endpoints
are available only to trusted indexer clients that can pick up ACL
information for the resources and correctly apply it when the index data
is shown to regular users. We would also need to spell out expectations
regarding the logical consistency of the RDF content across resources in
the resource set, since it will likely be undesirable if the fact base
contains contradictions. The matter of overlapping resource sets that you
raise would also be addressed at this level.)
We would add a thin layer on top the second to tie things in to an OSLC
domain. In the context of an OSLC domain specification, we would further
specify that an OSLC service provider should expose one or more RDF index
source endpoints and make them discoverable via markup in the OSLC service
provider. The resources in the resource sets would be the "published
resources" belonging to that OSLC service provider.
Regards,
Jim
From:
Frank Budinsky/Toronto/IBM at IBMCA
To:
oslc-core at open-services.net
Cc:
RELM Development <RELM_Development%IBMCA at ca.ibm.com>
Date:
04/05/2011 04:23 PM
Subject:
[oslc-core] ChangeLog Proposal moving to Convergence Phase
Sent by:
oslc-core-bounces at open-services.net
I've updated the Change Log proposal to include all the issues we've
discussed, and to provide a little more elaboration for things that people
didn't seem to easily pick up from earlier drafts. It is available here:
http://open-services.net/pub/Main/IndexingProposals/OSLC_indexing_0404.doc
I'll look at converting it to the proper OSLC TWiki format next.
Thanks,
Frank._______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net
More information about the Oslc-Core
mailing list