[oslc-core] OSLC Compact representation, titles with markup

Arthur Ryman ryman at ca.ibm.com
Thu Aug 25 16:32:02 EDT 2011


Dave,

1. XML Namespaces. 

RDF/XML is well-formed XML so it must support namespaces correctly. For 
triples whose datatype is XML Literal, the value of this literal is a 
well-formed XML fragment, and therefore the namespaces should be present 
in the content. If there is an enclosing <span> element, then the 
namespace should be there. Otherwise, each element in the content should 
have the namespace. 

The spec doesn't say "for XHTML, you need to insert an xmlns attribute for 
http://www.w3.org/1999/xhtml" because that is part of the XHTML standard, 
i.e. it's not XHTML unless the elements are in the XHTML namespace. 

2. Jena

I loaded the sample RDF/XML  into Fuseki which uses Jena and it produced 
the correct result. I assume the Jena API lets you get an XML DOM from the 
literal value.

The input contained:    <dcterms:title rdf:parseType="Literal" xmlns="
http://www.w3.org/1999/xhtml"> 12345: <s>Null pointer exception during 
startup</s> </dcterms:title>

The output value is:   " 12345: <s xmlns="http://www.w3.org/1999/xhtml
">Null pointer exception during startup</s> "^^<
http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>
 
3.  XHTML versus HTML

The primary reason is that RDF supports XHTML via the XMLLiteral datatype. 
There is no parsing support for HTML built into RDF.

Another strong reason is that the syntax of HTML is very irregular and 
hard to parse correctly - that is one of the reasons XML was invented. 
This is very important from a security viewpoint. To guard against script 
injection attacks, you really should parse the input and remove any 
<script> elements or Javascript attributes. Doing that correctly for HTML 
requires a full HTML parser. On the other hand, the XHTML is given to you 
as a DOM which you can easily traverse or process using XSLT or XPATH.

4. Datatypes

The specs do specify the datatypes for some properties. Look at the 
Value-Type column of the tables, e.g. [1]. You need to include the 
datatype explicitly for ints, dates, XML. etc. You specify that using 
rdf:datatype in RDF/XML, or using ^^ in Turtle. 

I don't know what the state of adoption is. We really should get some test 
suites written for the specs.

5. Inventing new Datatypes

The RDF spec defines the XSD datatypes and the XMLLiteral datatype. RDF 
parsers know how to parse those. If someone introduces a new datatype URI, 
it could break parsers since they won't know how to parse the contents. 
There is no standard way to define new datatypes. 

Try it with the RDF Validation service [2]

[1] http://open-services.net/bin/view/Main/OSLCCoreSpecAppendixA
[2] http://www.w3.org/RDF/Validator/

Regards, 
___________________________________________________________________________ 

Arthur Ryman 

DE, PPM & Reporting Chief Architect
IBM Software, Rational 
Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile) 





From:
Dave Steinberg/Toronto/IBM at IBMCA
To:
oslc-core at open-services.net
Date:
08/24/2011 03:05 PM
Subject:
Re: [oslc-core] OSLC Compact representation, titles with markup
Sent by:
oslc-core-bounces at open-services.net



Hi Arthur,

Thanks for the response. Apologies for being slow in replying; I've been 
out sick for the last day and a half.

I agree that putting the XML namespace on the enclosing element would be a 
convenience, but only if tools supported that. As far as I could find, 
Jena provides no fine-grained access to namespace declarations (i.e. other 
than at the model level), so I believe that one couldn't use it to produce 
or consume the fragment that you suggested. Moreover, the other RDF 
representations offer no such convenience, even in theory.

So, it seems to me that the suggestion to use a namespace was actually a 
pretty significant one, and not one that's reflected in the specs, since 
you'd always need an enclosing element for your XML content.

Thanks for the suggestion of using Tidy to convert from HTML to XHTML. 
That was very helpful for me. But I must admit, I'm still left wondering 
what makes XHTML superior to HTML for interchanging formatted text, 
especially in light of the compact representation example and my own 
experiences, where the opposite seems to be true.

One last thing that I'll emphasize is that I mentioned a lack of guidance 
in the OSLC specs specifically about plain vs. typed literals. It seems so 
odd to me that plain literals seem to be favoured everywhere, except when 
in comes to using XMLLiteral with rdf:parseType="literal", but none of 
this is acknowledged or explained anywhere. It looks like using a typed 
literal in this one case is accepted merely as a requirement to benefit 
from the prettier RDF/XML syntax for XML content. However, I view things 
completely in the opposite light. To me, typed literals are a powerful 
benefit of RDF. You can use a typed literal to decide how to handle a 
literal value, without looking at the value itself, but that advantage is 
lost without a sufficiently specific type. Thus, I don't understand how 
defining and using a new RDF datatype to identify something as widely 
recognized and understood as HTML would impair interoperability. I think 
it would do the opposite.

Cheers,
Dave

-- 
Dave Steinberg
IBM Rational Software
davidms at ca.ibm.com


Arthur Ryman---08/23/2011 10:09:55 AM---Dave, Thx for the comments.


From:

Arthur Ryman/Toronto/IBM

To:

Dave Steinberg/Toronto/IBM at IBMCA

Cc:

oslc-core at open-services.net, oslc-core-bounces at open-services.net

Date:

08/23/2011 10:09 AM

Subject:

Re: [oslc-core] OSLC Compact representation, titles with markup


Dave,

Thx for the comments.

I agree that the guidance on using XMLLiteral is not very clear in the 
spec. There was a lot of discussion about this at the time the spec was 
under development, but not much of that discussion survived the editorial 
process. The only place I see it is in the appendix on standard properties 
- dcterms:title and dcterms:description. [1]

The guidance was that dcterms:title should be valid XHTML <span> content 
and dcterms:description valid XHTML <div> content. This means that the RDF 
datatype should be XMLLiteral and that appropriate namespaces should be 
used for XHTML content.

Putting the XHTML namespace on the enclosing element is a convenience. The 
parser should propagate that to the content, i.,e. when you look at the 
triples, the XML literal node should have the inherited namespace. 

If you wanted the namespace directly in the content then you could enclose 
the content in a <div> or <span> and put the namespace there.

Using XHTML is the best way to achieve interchange of formatted text. 
There are converter from HTML to XHTML, e.g. Tidy. However, in the case of 
preview, why would conversion be needed? Shouldn't we be defining content 
that is XHTML?

In another use case, people wanted to use native Wiki text as the content. 
However, that would cause a big interop problem since there are many Wiki 
syntaxes. All of these are convertible to XHTML since that is what the 
Wikis do to display the formatted result. In another use case, people 
wanted to include Rich Text.

The general theme is that developers want to use whatever native format 
their tool supports, e,g, HTML, wiki text, and Rich Text, since it avoids 
conversions. However, this would couple the resource to the tool. OSLC is 
trying to achieve interoperability among heterogeneous tools. Therefore a 
common rich text format is needed.

The alternative of defining new RDF datatypes for HTML, wiki text, RTF 
etc. would mean that OSLC resources would not be understood by other 
applications. In general, the creation of new RDF datatypes is discouraged 
since it impairs interoperability.

[1] 
http://open-services.net/bin/view/Main/OSLCCoreSpecAppendixA?sortcol=table;up=#Dublin_Core_Properties


Regards, 
___________________________________________________________________________ 

Arthur Ryman 


DE, PPM Chief Architect

IBM Software, Rational 

Toronto Lab | +1-905-413-3077 
Twitter | Facebook | YouTube




Dave Steinberg---08/23/2011 12:06:32 AM---Hi all, I've been following this 
thread with interest, as it touches on some of the


From:

Dave Steinberg/Toronto/IBM at IBMCA

To:

oslc-core at open-services.net

Date:

08/23/2011 12:06 AM

Subject:

Re: [oslc-core] OSLC Compact representation, titles with markup

Sent by:

oslc-core-bounces at open-services.net



Hi all,

I've been following this thread with interest, as it touches on some of 
the more general confusion/discomfort I've been developing over the past 
several weeks or months about the use of XMLLiteral with 
rdf:parseType="Literal" for HTML content.

Adam's comments below are particularly interesting. In general, it's not 
clear to me who benefits from the use of the unescaped literal 
representation, or in what scenario. And that approach, then, requires the 
use of the XMLLiteral type, which I also wonder about (as I'll explain 
further). If there is some benefit that I don't know about, perhaps it 
derails this whole line of thought. But if there isn't, could this be a 
case of the concrete representation tail wagging the abstract syntax dog?

One thing that always struck me as odd was that rdf:parseType="Literal" 
examples were the only ones I could find anywhere in OSLC that use typed 
literals (the XMLLiteral type is implicit with this special RDF/XML 
syntax). Moreover, I couldn't find any guidance in the specs about the use 
of plain vs. typed literals at all. From the perspective of a client, 
anyway, it would seem a very nice thing if a particular provider would use 
a typed literal to tell you that a title, for example, should be treated 
as a simple string or as HTML content. And that's the very thing that 
typed literals do. It could be that the presence of an XMLLiteral type is 
supposed to signal the use of XHTML content, and the absence of any type 
is supposed to signal plain text. But I couldn't find that spelled out 
anywhere -- if it is, perhaps it's hard to find, or perhaps I just did a 
poor job of looking -- and I'd argue it would be better to include types 
in both cases. [1]

It's this line of thinking that leads me to question the use of XMLLiteral 
in the first place. I saw in some old discussions that the intention in 
OSLC was not for XMLLiteral to imply XHTML necessarily. Using it for other 
XML languages was considered and endorsed, in principle. But where does 
that leave XHTML? With a type that doesn't really say what it is or what 
you can do with it. We have specs that communicate the XHMTL intent in 
words, but we also have a mechanism built into RDF that could communicate 
this, and we're not really using it fully. Thus, I think it would be 
preferable to define and use a type that specifically represents HTML. And 
note, I suggest HTML, not XHTML, since using any type other than 
XMLLiteral eliminates the "benefit" of the special rdf:parseType="Literal" 
syntax. And without that, I don't see a particular benefit in the stricter 
XHTML syntax.

One other possibility that I've considered, which Arthur suggested 
previously, is using a namespace to identify that the XML is XHTML, in 
particular, instead of doing it directly in the literal type. And I 
believe that, strictly, the XHTML namespace is required for the elements 
to be valid XHTML. But I found no hint of this in the spec or any 
examples, and certainly RTC doesn't do this (I haven't checked other 
providers). Moreover, I believe it's also a worse approach, since there's 
no guarantee that your RDF runtime of choice will give you access to 
namespaces declared on the property element (I don't believe Jena does), 
and detecting a namespace inside the element content would require 
actually parsing the value as XML. If all you want to do is pass markup 
along for display in a browser, it would be unfortunate to have to 
actually parse the content to determine that it's XHTML.

And this is where I close the loop on my thinking, by coming back to how a 
consumer might actually want to make use of HTML content. Even outside of 
the compact rendering scenario, ultimately it's probably going to get 
displayed by a browser, whether as part of a larger Web page or in a 
browser-backed widget in a rich client. And for that, HTML is probably 
just as good as, if not better than, XHTML. Rather than worrying about 
whether the content is well-formed XML, it's probably sufficient to just 
give it to the browser and see what it can do with it. I would assert that 
"something a browser can render" has been the working definition of HTML 
for a good number of years now, while XHTML has largely faded in 
importance.

Going the other way, the appeal of HTML really shows. If a provider 
natively deals with HTML (without concern for XML well-formedness), it 
would be attractive to not have to convert that into XHTML to expose it 
via OSLC. Likewise, a consumer may use a rich text control that yields 
HTML. Generalized parsing of HTML for conversion to XHTML is non-trivial, 
and it seems unfortunate to impose that conversion task onto everyone, 
just so that we can use rdf:parseType="Literal" in RDF/XML and avoid 
applying normal XML encoding to markup content (of course, some encoding 
will likely be required for other RDF syntaxes anyway).

So, those are my thoughts on this (admittedly enlarged) topic. Even if 
they all do make perfect sense (and I'm not necessarily claiming they do), 
I realize we may be well past the point of being able to act on them. 
Still, I thought I'd put them out there and see what others make of them.

Cheers,
Dave


[1] In fact, I think that the consistent use of typed literals in general 
would be beneficial. You could even imagine exploiting them as a 
compatibility measure, if it was decided that the type of a property 
needed to change. This is a related, but separate, topic, which I'd be 
thrilled to discuss further, but I don't want to open too many cans of 
worms at once.

[2] Or, perhaps, a less kind way of putting that is that the XHTML 
namespace is required for the elements to 

-- 
Dave Steinberg
IBM Rational Software
davidms at ca.ibm.com


Adam Archer---08/22/2011 06:20:05 PM---The big concern to me is not the 
ability to process the RDF/XML with XPath, it's the ability to do

From:

Adam Archer/Toronto/IBM at IBMCA

To:

Arthur Ryman/Toronto/IBM at IBMCA

Cc:

"oslc-core at open-services.net" <oslc-core at open-services.net>, Randy Hudson 
<hudsonr at us.ibm.com>, oslc-core-bounces at open-services.net

Date:

08/22/2011 06:20 PM

Subject:

Re: [oslc-core] OSLC Compact representation, titles with markup

Sent by:

oslc-core-bounces at open-services.net



The big concern to me is not the ability to process the RDF/XML with 
XPath, it's the ability to do so in a browser environment. Currently all 
implementations of all rich hovers in all Jazz based products encode any 
html tags in their dcterms:title attributes (and doubly encode special 
characters). For the consumer on the browser side, this means simply 
taking the content of the attribute, decoding it (which browsers are very 
good at) and slapping the result into the dom (which browsers are also 
very good at). 

The alternative would be a total consumability nightmare from the point of 
view of a browser (which is the most important consumer of this entire 
spec). If the tags are actually child nodes in the xml representation, it 
means we will have child elements in the resulting document that we get 
back from the xml http request which means we will have to traverse a dom 
tree and recreate a structure which could easily be represented as an 
escaped string, like everyone is doing today. 

I realize that implementation is not supposed to lead the spec, but I 
don't even think that would be the case here. The oslc compact spec grew 
organically out of the old jazz compact rendering spec which can be found 
here: 

https://jazz.net/wiki/bin/view/Sandbox/CompactRenderingV1P1 

If we look at the semantic description of the dc:title and jp:abbreviation 
it states explicitly that the content MUST be escaped: 

> The HTML markup MUST be escaped; for example, "<b>" as "<b>". 

This decision was made consciously for very well defined technical reasons 
(discussed above) in the original spec. If that decision was reversed in 
the creation of the OSLC compact spec then I believe that to have been a 
huge mistake and would like to see the spec fixed rather than for all 
providers to have to change how their compact documents are served and all 
consumers to have to go to the trouble of walking the dom to determine 
what the provider is actually trying to show. 

Adam Archer
Jazz Developer
IBM Toronto Lab 



From: Arthur Ryman/Toronto/IBM 
To: Samuel Padgett <spadgett at us.ibm.com> 
Cc: Adam Archer/Toronto/IBM at IBMCA, Randy Hudson <hudsonr at us.ibm.com>, 
"oslc-core at open-services.net" <oslc-core at open-services.net>, 
oslc-core-bounces at open-services.net 
Date: 08/22/2011 04:40 PM 
Subject: Re: [oslc-core] OSLC Compact representation, titles with markup 


Sam, 

You wrote: 

It's very difficult to parse the former using XPath. For instance, the
expression "/oslc:Compact/dcterms:title" takes out the "<s>" and "</s>"

I don't think problems using XPath are a valid reason to encode markup 
since RDF/XML itselt is very difficult to process using XPath. At one 
point we tried to define an OSLC-variant of RDF/XML that looked like 
"normal" XML. However, we abandonned that and now require support for 
generic RDF/XML. 

The are many equivalent ways to represent a given set of triples in 
RDF/XML. It would therefore be very problematic to use XPath, XSLT, or 
XQuery to process RDF/XML. The safe way to process RDF/XML is to use an 
RDF toolkit like Jena. 

Regards, 
___________________________________________________________________________ 

Arthur Ryman 


DE, PPM Chief Architect 

IBM Software, Rational 

Toronto Lab | +1-905-413-3077 
Twitter | Facebook | YouTube






From: 
Samuel Padgett <spadgett at us.ibm.com> 
To: 
"oslc-core at open-services.net" <oslc-core at open-services.net> 
Cc: 
Adam Archer/Toronto/IBM at IBMCA, Randy Hudson <hudsonr at us.ibm.com> 
Date: 
08/07/2011 01:01 PM 
Subject: 
[oslc-core] OSLC Compact representation, titles with markup 
Sent by: 
oslc-core-bounces at open-services.net





I believe the spec is a bit confusing when it comes to titles with markup
for UI Preview.

The Compact representation has a dcterms:title property. It's defined as 
an
XML Literal that can contain XHTML markup [1]. My understanding of XML
Literals as discussed in the RDF Primer [2] means a title with markup 
would
look like this,

<dcterms:title>12345: <s>Null pointer exception during
startup</s></dcterms:title>

The example [3] of this resource has a title like this, however,

<dcterms:title> 12345: <s>Null pointer exception during
startup</s> </dcterms:title>

The example doesn't seem to fit with the description.

It's very difficult to parse the former using XPath. For instance, the
expression "/oslc:Compact/dcterms:title" takes out the "<s>" and "</s>"
Most implementations I'm aware also follow the example where markup is
encoded. It means special characters need to be "double encoded." For
instance, "12345: Values > 1000 incorrectly calculated" would be,

<dcterms:title>12345: Values &gt; 1000 incorrectly
calculated</dcterms:title>

I think we should add more clarity to the spec here, as getting this wrong
can open up consumers to cross-site scripting attacks. I'd also suggest we
say that providers MUST NOT use any markup with a <script> tag and 
consumer
MUST NOT display any markup with a <script> tag to guard against this
problem.

Best Regards,
Sam


[1]
http://open-services.net/bin/view/Main/OslcCoreUiPreview?sortcol=table;up=#Representation_Compact

[2] http://www.w3.org/TR/rdf-syntax/#xmlliterals
[3]
http://open-services.net/bin/view/Main/OslcCoreUiPreview?sortcol=table;up=#XML_Representation_Format



_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net

_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net

_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net


_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net







More information about the Oslc-Core mailing list