[oslc-core] OSLC Compact representation, titles with markup

Arthur Ryman ryman at ca.ibm.com
Fri Aug 26 16:58:10 EDT 2011


Dave,

It's good to know that the W3C RDF Validator accepted your private 
datatype. However, I haven't seen new datatypes being introduced by other 
vocabularies. If we do that it might reduce interoperablility.

Too bad Jena doesn't give you a DOM. Reparsing doesn't seem too 
inconvenient.

Omitting the datatypes because "we" know what they should be would reduce 
interoperability. We assume other applications will use our RDF resources, 
so we should make them explicit.

I agree the spec could be clearer. I believe Steve is improving it.

Regards, 
___________________________________________________________________________ 

Arthur Ryman 

DE, PPM & Reporting Chief Architect
IBM Software, Rational 
Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile) 





From:
Dave Steinberg/Toronto/IBM at IBMCA
To:
oslc-core at open-services.net
Date:
08/26/2011 04:39 PM
Subject:
Re: [oslc-core] OSLC Compact representation, titles with markup
Sent by:
oslc-core-bounces at open-services.net



Hi again Arthur,

Lots of good points to address, so my responses are inline (also note that 
I wrote my previous, short reply to your later message in the middle of 
writing this longer one, so there may be a bit of overlap).

[Disclaimer: I realize I'm talking a lot about Jena in this message. 
That's because it's what I know. If there are other RDF toolkits in use, 
their behaviour is certainly relevant, too, and it would be good to hear 
from people with knowledge of them.]

Arthur Ryman/Toronto/IBM wrote on 08/25/2011 04:27:40 PM:
> 
> 1. XML Namespaces. 
> 
> The spec doesn't say "for XHTML, you need to insert an xmlns attribute 
for 
> http://www.w3.org/1999/xhtml" because that is part of the XHTML 
> standard, i.e. it's not XHTML unless the elements are in the XHTML 
namespace. 

I agree that XHTML does mean using the XHTML namespace, but I also believe 
it would have been helpful to underline that fact in the OSLC Core spec. 
Also, I still can't find a single example in the spec that actually shows 
markup in the value of an XMLLiteral-typed literal, which also would have 
been helpful. I notice that RTC doesn't use a namespace, so I suppose it 
wasn't obvious to them.

> 2. Jena
> 
> I loaded the sample RDF/XML  into Fuseki which uses Jena and it 
> produced the correct result. I assume the Jena API lets you get an 
> XML DOM from the literal value.

Thanks for this! I didn't know this, and it's reassuring. I was able to 
replicate this behaviour with pure Jena, simply by loading an RDF/XML 
resource containing your input and printing the type and lexical form of 
the literal:

Datatype: http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral
Lexical form:  12345: <s xmlns="http://www.w3.org/1999/xhtml">Null pointer 
exception during startup</s>

Unfortunately, though, your assumption is not correct: Jena provides no 
access to underlying DOM. It appears that during parsing it computes the 
exclusive canonical form of the XML subset, as the RDF specs prescribe, 
and simply records that as the literal's lexical value. But, Jena's 
built-in XMLLiteral type support just uses java.lang.String as its value 
type. I was surprised when I first discovered that, too, but you can see 
for yourself in XMLLiteralType.parse().

> 3.  XHTML versus HTML
> 
> The primary reason is that RDF supports XHTML via the XMLLiteral 
> datatype. There is no parsing support for HTML built into RDF.

RDF doesn't require parsing support to use other datatypes, at least not 
in the abstract sense. Abstractly, a typed literal is just a datatype 
(identified by a URI) and a lexical form. Of course, it's very helpful in 
an implementation if you can automatically convert that to a more useful 
typed value (i.e. an Integer, Boolean, etc.), but that's purely a toolkit 
concern. Jena will support any datatype via its BaseDatatype.TypedValue 
wrapper type, but it's pluggable, so you can add specific parsing support 
for any type you wish.

> Another strong reason is that the syntax of HTML is very irregular 
> and hard to parse correctly - that is one of the reasons XML was 
> invented. This is very important from a security viewpoint. To guard
> against script injection attacks, you really should parse the input 
> and remove any <script> elements or Javascript attributes. Doing 
> that correctly for HTML requires a full HTML parser. On the other 
> hand, the XHTML is given to you as a DOM which you can easily 
> traverse or process using XSLT or XPATH.

But as you said, HTML Tidy can be used to parse HTML. You could even plug 
it in to Jena to do the parsing automatically for a defined datatype, 
which would be more convenient than the built in XMLLiteral support. Also, 
I really think that whether or not you need to parse and cleanse depends 
upon what you're doing with the data. It may not be necessary for an OSLC 
adapter that's merely passing along data from a trusted source, since it's 
incumbent upon a security-conscious client to do that, itself.

> 4. Datatypes
> 
> The specs do specify the datatypes for some properties. Look at the 
> Value-Type column of the tables, e.g. [1]. You need to include the 
> datatype explicitly for ints, dates, XML. etc. You specify that 
> using rdf:datatype in RDF/XML, or using ^^ in Turtle. 

Sorry if I wasn't clear. What I meant was that specs don't appear to say 
whether consumers and producers should use/expect typed literals or plain 
literals. I do see that the tables prescribe datatypes for the various 
literal-typed properties, but it doesn't say anywhere whether that means 
that typed literals should actually be used to specify those types, or 
that plain literals should be used (since those prescribed types are 
already presumed to be known).

> 5. Inventing new Datatypes
> 
> The RDF spec defines the XSD datatypes and the XMLLiteral datatype. 
> RDF parsers know how to parse those. If someone introduces a new 
> datatype URI, it could break parsers since they won't know how to 
> parse the contents. There is no standard way to define new datatypes. 
> 
> Try it with the RDF Validation service [2]

I had no problem parsing the following RDF/XML document (in which I've 
used my own HTML type) with the service:

<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:oslc_cm="http://open-services.net/ns/cm#" > 
  <rdf:Description rdf:about="http://example.com/bugs/2314">
    <dcterms:title rdf:datatype="
http://open-services.net/ns/core/types#HTML">12345: <s>Null pointer 
exception during startup</s></dcterms:title>
    <rdf:type rdf:resource="http://open-services.net/ns/cm#ChangeRequest
"/>
  </rdf:Description>
</rdf:RDF>


Cheers,
Dave

-- 
Dave Steinberg
IBM Rational Software
davidms at ca.ibm.com_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net







More information about the Oslc-Core mailing list