[oslc-core] New OSLC ChangeLog Proposal

Ian Green1 ian.green at uk.ibm.com
Fri Jan 7 12:18:51 EST 2011


Thanks Frank, Nick.  Some comments on your document.

I think it would be good to have some material that puts the change log
resource model in the context of the integrations that would benefit from
it, in particular, the resource model that the central index would offer to
consumers (cross-provider query & reporting over OSLC Defined Resources).
I would suggest that the implementation and specification issues be
separated.  Whilst the implementation insights are valuable, I think we
should focus initially on the problem and the proposed specification.

What guidelines should a consumer follow when deciding whether to run
queries against the central index or against the provider-specific query
service?  Would we not want to expose OSLC Query over the central index (as
well as/instead of SPARQL)?

There is no discussion of security; in the current distributed model, each
client can make queries across OSLC providers, and each provider is
responsible for securing resources within its authority.  Coping data into
a central index means that this simplicity is lost - how is the central
index secured?

There is a Talis RDF vocabulary for change set [1] which includes some
parts of your proposal .  It includes the idea of a change to a resource,
and additionally (and relevant to Olivier's follow-up) information on the
nature of the change in terms of triples added/removed.  It does not
expressly deal with deletion or creation. (The Talis resource model doesn't
have the idea of a log.)

Some more detailed comments below.

Timestamps
On reading the draft spec I was wondering about the role that timestamps
have.  Am i right in thinking that oslc:at is being used as a sequence
number and need not be a timestamp?  Or, is oslc:at required to be the time
at which the change occurred?   If so, MUST this timestamp be identical to
the dc:modified of the corresponding OSLC Defined Resource (if it has such
a property)?

The difference between sequence and timestamp could be valuable for some
change log providers.  Knowing that something has changed is less
information than additionally knowing when that change happened.  In an
implementation that I'm experimenting with, the component providing the
change log doesn't know the time of the change, but it does know the
sequence in which all changes occurred.  It does know the time at which a
change notification was received, but this is not the same as the recorded
dc:modifed of the requirement that underwent change.   Perhaps this is an
idiosyncrasy of this particular system that we ought not to let influence
the spec?

The alternative is that oslc:at MAY be a timestamp;  what is REQUIRED is
oslc:at be an xsd:integer which orders the changes as they were made.

Sensitivity of recursive crawling to the OSLC Shape of provided resources
The crawl configuration centralized in the linked data service reflects
aspects of what is really a distributed type system. Eg  How would the
administrator of the crawl configuration react to a change in the OSLC
Resource Shape from some change log provider.  The addition of new
properties is dynamic in most systems so unsuited to a centralized
configuration.   Instead, providers could contribute declarative
configurations into each of the their indexers, which would GET those
configuration resources before starting a crawl.  OSLC Core would need to
specify these configuration resources.

The recursive crawler needs to know how to deal with cyclic graphs but this
isn't mentioned in the design.  A simple policy would be to stop when that
graph has already been indexed, irrespective of when it was indexed.  Any
discrepancies will be picked up by the incremental indexing to achieve
eventual consistency.

I think this initial seeding of the central index is a key problem and I'm
pretty sure that crawling over the resources as described in this draft
proposal will not suffice because it will take too long and as a result
have large inconsistencies and "gaps" in query results.  Knowing more of
the requirements of the central index would allow us to assess performance
characteristics.   Policies on "what to crawl first" might help in this
regard.  One policy that would be attractive is to give priority to
"recently accessed" resources.

[1] http://n2.talis.com/wiki/Changesets

best wishes,
    -ian

ian.green at uk.ibm.com (Ian Green1/UK/IBM at IBMGB)
Chief Software Architect, Requirements Definition and Management
IBM Rational

oslc-core-bounces at open-services.net wrote on 20/12/2010 15:45:56:

> From: Frank Budinsky <frankb at ca.ibm.com>
> To: oslc-core at open-services.net
> Cc: Martin Nally <nally at us.ibm.com>
> Date: 20/12/2010 15:47
> Subject: [oslc-core] New OSLC ChangeLog Proposal
> Sent by: oslc-core-bounces at open-services.net
>
> Hello,
>
> Nick Crossley and I would like to submit a proposal for adding a
> ChangeLog service to the OSLC core specification. This new service
> will be key to the success of an indexer, and therefore we would
> like to queue it up for discussion as soon as possible in January.
>
> The OSLC proposal, itself, is described in section 1.5 of the
> attached document, while the rest of the document describes an
> indexer prototype, including how it intends to use the change log.
>
> (See attached file: RDF_indexer_overview_1220.doc)
>
> Any comments/feedback on the new proposal or the indexer prototype
> itself would be very welcome.
>
> Thanks,
> Frank.[attachment "RDF_indexer_overview_1220.doc" deleted by Ian
> Green1/UK/IBM] _______________________________________________
> Oslc-Core mailing list
> Oslc-Core at open-services.net
> http://open-services.net/mailman/listinfo/oslc-core_open-services.net





More information about the Oslc-Core mailing list