[oslc-core] New OSLC ChangeLog Proposal

Frank Budinsky frankb at ca.ibm.com
Fri Jan 7 17:25:19 EST 2011


Please ignore this part of my reply (below), it actually makes no sense.

If not a time stamp, the indexer would need to keep track of the "last
sequence number" from each service provider, while with a time stamp it
just needs the last change's timestamp (regardless of SP). And, of course,
you loose the interleaving of changes between SPs, but maybe that doesn't
matter.

I was confusing ServiceProvider with Service, but ServiceProvider would be
managing the sequence numbers anyway.

Thanks,
Frank.


                                                                                                                        
  From:       Frank Budinsky/Toronto/IBM                                                                                
                                                                                                                        
  To:         Ian Green1 <ian.green at uk.ibm.com>                                                                         
                                                                                                                        
  Cc:         Martin Nally <nally at us.ibm.com>, Nick Crossley <ncrossley at us.ibm.com>, oslc-core at open-services.net        
                                                                                                                        
  Date:       01/07/2011 01:56 PM                                                                                       
                                                                                                                        
  Subject:    Re: [oslc-core] New OSLC ChangeLog Proposal                                                               
                                                                                                                        




Thanks for the feedback Ian. I've added some answers below as <FB></FB>.

Thanks,
Frank.



                                                                                                                        
  From:       Ian Green1 <ian.green at uk.ibm.com>                                                                         
                                                                                                                        
  To:         Frank Budinsky/Toronto/IBM at IBMCA, Nick Crossley <ncrossley at us.ibm.com>                                    
                                                                                                                        
  Cc:         Martin Nally <nally at us.ibm.com>, oslc-core at open-services.net                                              
                                                                                                                        
  Date:       01/07/2011 12:18 PM                                                                                       
                                                                                                                        
  Subject:    Re: [oslc-core] New OSLC ChangeLog Proposal                                                               
                                                                                                                        





Thanks Frank, Nick.  Some comments on your document.

I think it would be good to have some material that puts the change log
resource model in the context of the integrations that would benefit from
it, in particular, the resource model that the central index would offer to
consumers (cross-provider query & reporting over OSLC Defined Resources).
<FB>I agree. We will add this.</FB>
I would suggest that the implementation and specification issues be
separated.  Whilst the implementation insights are valuable, I think we
should focus initially on the problem and the proposed specification.
<FB>We got this comment from several people. I've already separated the
material into two documents. Next version, I'll only post the spec
doc.</FB>

What guidelines should a consumer follow when deciding whether to run
queries against the central index or against the provider-specific query
service?  Would we not want to expose OSLC Query over the central index (as
well as/instead of SPARQL)?
<FB>Good question. Another question is, could the central index's Query
service (SPARQL or OSLC/SPARQL) replace the need for Query support in
individual services? That is, if a service provides a ChangeLog, then it
doesn't need to implement the Query service.</FB>

There is no discussion of security; in the current distributed model, each
client can make queries across OSLC providers, and each provider is
responsible for securing resources within its authority.  Coping data into
a central index means that this simplicity is lost - how is the central
index secured?
<FB>We also have a security proposal in the works. The material isn't quite
ready to share, but we recognize the need to discuss security in this
context, so we're aiming to have something to share within a couple of
weeks.</FB>

There is a Talis RDF vocabulary for change set [1] which includes some
parts of your proposal .  It includes the idea of a change to a resource,
and additionally (and relevant to Olivier's follow-up) information on the
nature of the change in terms of triples added/removed.  It does not
expressly deal with deletion or creation. (The Talis resource model doesn't
have the idea of a log.)
<FB>Right. Marcelo Paternostro also pointed this out on another thread.
I've been thinking about how much of it, if any, we could/should reuse. As
you said, most of it has to do with the fine-grain resource change details,
which we are currently wanting to keep out of this model. Steve Speicher
suggested (also on another thread) that we should use standard predicates
where appropriate (e.g., dc:date instead of oslc:at). I was wondering if
cs:subjectOfChange would a better replacement for oslc:resource (in the
current proposal). Would you consider this ChangeSet model to be "standard"
enough to justify reusing parts in our OSLC models?</FB>

Some more detailed comments below.

Timestamps
On reading the draft spec I was wondering about the role that timestamps
have.  Am i right in thinking that oslc:at is being used as a sequence
number and need not be a timestamp?  Or, is oslc:at required to be the time
at which the change occurred?   If so, MUST this timestamp be identical to
the dc:modified of the corresponding OSLC Defined Resource (if it has such
a property)?
<FB>I think it's safe for the timestamp to be any time after the actual
time of modification (although if not fairly accurate, it can result in
unnecessary extra fetches). It's a good question of whether this could
simply be a sequence number, instead of a timestamp. If not a time stamp,
the indexer would need to keep track of the "last sequence number" from
each service provider, while with a time stamp it just needs the last
change's timestamp (regardless of SP). And, of course, you loose the
interleaving of changes between SPs, but maybe that doesn't matter.</FB>

The difference between sequence and timestamp could be valuable for some
change log providers.  Knowing that something has changed is less
information than additionally knowing when that change happened.  In an
implementation that I'm experimenting with, the component providing the
change log doesn't know the time of the change, but it does know the
sequence in which all changes occurred.  It does know the time at which a
change notification was received, but this is not the same as the recorded
dc:modifed of the requirement that underwent change.   Perhaps this is an
idiosyncrasy of this particular system that we ought not to let influence
the spec?

The alternative is that oslc:at MAY be a timestamp;  what is REQUIRED is
oslc:at be an xsd:integer which orders the changes as they were made.

Sensitivity of recursive crawling to the OSLC Shape of provided resources
The crawl configuration centralized in the linked data service reflects
aspects of what is really a distributed type system. Eg  How would the
administrator of the crawl configuration react to a change in the OSLC
Resource Shape from some change log provider.  The addition of new
properties is dynamic in most systems so unsuited to a centralized
configuration.   Instead, providers could contribute declarative
configurations into each of the their indexers, which would GET those
configuration resources before starting a crawl.  OSLC Core would need to
specify these configuration resources.

The recursive crawler needs to know how to deal with cyclic graphs but this
isn't mentioned in the design.  A simple policy would be to stop when that
graph has already been indexed, irrespective of when it was indexed.  Any
discrepancies will be picked up by the incremental indexing to achieve
eventual consistency.
<FB>Right now, the prototype is stopping as you suggest (if it's already
indexed).</FB>

I think this initial seeding of the central index is a key problem and I'm
pretty sure that crawling over the resources as described in this draft
proposal will not suffice because it will take too long and as a result
have large inconsistencies and "gaps" in query results.  Knowing more of
the requirements of the central index would allow us to assess performance
characteristics.
<FB>Yes, we are also concerned about this. We expect to be doing lots of
experiments with performance tuning before we have a good sense of how this
can be made to work.  One thought is that there is no "recursive crawl".
The index only indexes the resources returned from the queryBase of each
service included in the index.</FB>
Policies on "what to crawl first" might help in this
regard.  One policy that would be attractive is to give priority to
"recently accessed" resources.

[1] http://n2.talis.com/wiki/Changesets

best wishes,
    -ian

ian.green at uk.ibm.com (Ian Green1/UK/IBM at IBMGB)
Chief Software Architect, Requirements Definition and Management
IBM Rational

oslc-core-bounces at open-services.net wrote on 20/12/2010 15:45:56:

> From: Frank Budinsky <frankb at ca.ibm.com>
> To: oslc-core at open-services.net
> Cc: Martin Nally <nally at us.ibm.com>
> Date: 20/12/2010 15:47
> Subject: [oslc-core] New OSLC ChangeLog Proposal
> Sent by: oslc-core-bounces at open-services.net
>
> Hello,
>
> Nick Crossley and I would like to submit a proposal for adding a
> ChangeLog service to the OSLC core specification. This new service
> will be key to the success of an indexer, and therefore we would
> like to queue it up for discussion as soon as possible in January.
>
> The OSLC proposal, itself, is described in section 1.5 of the
> attached document, while the rest of the document describes an
> indexer prototype, including how it intends to use the change log.
>
> (See attached file: RDF_indexer_overview_1220.doc)
>
> Any comments/feedback on the new proposal or the indexer prototype
> itself would be very welcome.
>
> Thanks,
> Frank.[attachment "RDF_indexer_overview_1220.doc" deleted by Ian
> Green1/UK/IBM] _______________________________________________
> Oslc-Core mailing list
> Oslc-Core at open-services.net
> http://open-services.net/mailman/listinfo/oslc-core_open-services.net



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://open-services.net/pipermail/oslc-core_open-services.net/attachments/20110107/b3645db2/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://open-services.net/pipermail/oslc-core_open-services.net/attachments/20110107/b3645db2/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://open-services.net/pipermail/oslc-core_open-services.net/attachments/20110107/b3645db2/attachment-0001.gif>


More information about the Oslc-Core mailing list