[oslc-core] OSLC Compact representation, titles with markup

Dave Steinberg davidms at ca.ibm.com
Fri Sep 23 10:04:53 EDT 2011


Hi Arthur and all,

Sorry for being slow to respond.

Given all this, I agree with you: I think it makes sense to leave things
alone, make this a special case in this particular content type, and
document it as such. I still don't think that introducing a new property to
represent the same information in another format is ideal, and it's not
necessary if we document title as a plain text string that may contain
valid HTML markup, as you suggest. Sticking with a single property also
seems most pragmatic, as it means not imposing any additional burden on
consumers.

As an aside, I'm still a fan of using typed literals in actual RDF
representations, and I think it would be worth considering that for OSLC.
Something interesting and related that was recently pointed out to me:
There's a current effort to define a new version of RDF (1.1), and they are
considering removing simple literals altogether, replacing them with
syntactic sugar for string-typed literals (see
http://www.w3.org/TR/2011/WD-rdf11-concepts-20110830/#section-Graph-Literal).

Cheers,
Dave

--
Dave Steinberg
IBM Rational Software
davidms at ca.ibm.com



                                                                                                             
  From:       Arthur Ryman/Toronto/IBM                                                                       
                                                                                                             
  To:         Dave Steinberg/Toronto/IBM at IBMCA                                                               
                                                                                                             
  Cc:         oslc-core at open-services.net, oslc-core-bounces at open-services.net                               
                                                                                                             
  Date:       09/09/2011 11:28 AM                                                                            
                                                                                                             
  Subject:    Re: [oslc-core] OSLC Compact representation, titles with markup                                
                                                                                                             




Dave,

I recently discussed this at the Core working group.

I think the best approach is to not regard this as an RDF discussion at all
since the compact rendering format is explicitly NOT RDF. It's content type
is application/x-oslc-compact+xml . By historical accident, it happens to
be valid RDF/XML. However, it's intended use is for Web UI, so it is very
appropriate to have content that is only going to be presented in a Web
browser.

Since the content is not RDF, it does not seem useful to perpetuate the
masquerade that it is RDF and go to the lengths of introducing a new RDF
datatype. I therefore favour either leaving it as is, and explicity
documenting the fact that the value is a plain text string that MAY contain
valid HTML markup, or adding a new property e.g. oslc:htmlTitle.

I do not think we should provide a mechanism for adding HTML values to
general RDF content since that leads us back to the multiple text format
(XHTML, HTML, RTF, wiki, ...) set of problems.

Regards,
___________________________________________________________________________
                                                                              
 Arthur Ryman                                                                 
                                                                              
 DE, PPM & Reporting Chief Architect                                          
                                                                              
 IBM Software, Rational                                                       
                                                                              
 Toronto Lab | +1-905-413-3077 (office) |                                     
 +1-416-939-5063 (mobile)                                                     
                                                                              





                                                                                                             
  From:       Dave Steinberg/Toronto/IBM at IBMCA                                                               
                                                                                                             
  To:         oslc-core at open-services.net                                                                    
                                                                                                             
  Date:       09/01/2011 11:10 AM                                                                            
                                                                                                             
  Subject:    Re: [oslc-core] OSLC Compact representation, titles with markup                                
                                                                                                             
  Sent by:    oslc-core-bounces at open-services.net                                                            
                                                                                                             





Hi Arthur,

Thanks for the engagement, for seeing both sides, and for figuring out what
was going on with the W3C Validator (and submitting a problem report).

Regarding XHTML vs. HTML in general, I still think it would have been
pragmatic to look at who is actually consuming/producing marked up text and
where it's coming from/what's being done with it, to choose a format that
minimizes the amount of conversion required. That said, I do see your
reasons for favouring XHTML from the outset, and of course I recognize that
the decision was made long ago and revisiting would have been difficult.
Also, I do appreciate that you considered my point of view.

On the particular issue of compact rendering, I would strongly advocate for
option 2, defining a new datatype for HTML and using it together with the
existing dcterms:title property. Defining such a type places no greater
practical burden on providers or consumers than defining a new property. In
either case, it's one new resource in the vocabulary to recognize, and they
can handle values in exactly the same way (either by leaving the content as
a string and leaving it to a browser to render, or by parsing and
interpreting it themselves). However, using a new type separates the
expression of the concerns in the standard RDF way: the property identifies
the characteristic of the subject that the statement specifies, and the
type suggests how to interpret the lexical form of the statement's object.
Moreover, if we define a type, it can be reused with other properties, like
dcterms:description, if that is ever needed.

I would also suggest that the spec should explicitly provide guidance on
typed vs. plain literals, hopefully in favour of the former.

Cheers,
Dave

--
Dave Steinberg
IBM Rational Software
davidms at ca.ibm.com


Inactive hide details for Arthur Ryman---08/31/2011 09:17:40
AM---Dave/Randy, Thx for persisting on this point. It turns out thArthur
Ryman---08/31/2011 09:17:40 AM---Dave/Randy, Thx for persisting on this
point. It turns out that the W3C RDF Validator is in fact dis
                                                                           
                                                                           
 From:           Arthur Ryman/Toronto/IBM                                  
                                                                           
                                                                           
 To:             Dave Steinberg/Toronto/IBM at IBMCA                          
                                                                           
                                                                           
 Cc:             oslc-core at open-services.net,                              
                 oslc-core-bounces at open-services.net                       
                                                                           
                                                                           
 Date:           08/31/2011 09:17 AM                                       
                                                                           
                                                                           
 Subject:        Re: [oslc-core] OSLC Compact representation, titles with  
                 markup                                                    
                                                                           




Dave/Randy,

Thx for persisting on this point. It turns out that the W3C RDF Validator
is in fact displaying markup characters in strings wrong. It is escaping
them. You can see the correct, unescaped, results by turning on the
Advanced option of N-Triples output.

This discussion has made me realize that my suggested name for a new
oslc:htmlEncodedTitle property is misleading. Encoding is only required
when you put the triple in an XML document., e.g. the OSLC compact
rendering resource The encoding is removed by the parser and you end up
with the unescaped string. Since we are defining RDF predicates, the
reference to encoding is inappropriate because there is no encoding at the
RDF value level.

We therefore have the following alternatives for markup in the title:

1. Use XML Literal datatype and XHTML content.
2. Define a new datatype for HTML
3. Define a new predicate for HTML titles, e.g. oslc:htmlTitle

Using HTML within the context of the UI preview is OK since the UI is
expected to be a web UI and you'd just copy the string.

However, I think using HTML in RDF is not a good idea because all readers
of the data would then have to cope with it, I mentioned that Tidy could be
used by the writer of the data to convert it to XHTML. That does not mean
this is practical for all readers of the data. In general, when you are
designing a format for interoperability, you should convert diverse formats
into one common format. We should therefore adopt XHTML as the one common
format for marked up text interchange.

Recall that HTML is only one alternate format. We also have sources that
produce rich text (RTF), and wiki text. Agreeing on XHTML is a useful
simplification.

Regards,
___________________________________________________________________________
                                                                              
 Arthur Ryman                                                                 
                                                                              
 DE, PPM & Reporting Chief Architect                                          
                                                                              
 IBM Software, Rational                                                       
                                                                              
 Toronto Lab | +1-905-413-3077 (office) |                                     
 +1-416-939-5063 (mobile)                                                     
                                                                              


Inactive hide details for Arthur Ryman---08/30/2011 05:49:46 PM---Dave, My
point was that when you use rdf:datatype, the contenArthur
Ryman---08/30/2011 05:49:46 PM---Dave, My point was that when you use
rdf:datatype, the content of the element
                                                                           
                                                                           
 From:           Arthur Ryman/Toronto/IBM at IBMCA                            
                                                                           
                                                                           
 To:             Dave Steinberg/Toronto/IBM at IBMCA                          
                                                                           
                                                                           
 Cc:             oslc-core at open-services.net,                              
                 oslc-core-bounces at open-services.net                       
                                                                           
                                                                           
 Date:           08/30/2011 05:49 PM                                       
                                                                           
                                                                           
 Subject:        Re: [oslc-core] OSLC Compact representation, titles with  
                 markup                                                    
                                                                           
                                                                           
 Sent by:        oslc-core-bounces at open-services.net                       
                                                                           





Dave,

My point was that when you use rdf:datatype, the content of the element
must be a string, not XML. When you use rdf:parseType="Literal" the
content is expected to be XML. In the RDF data model, the lexical space of
XML consists of well-formed XML fragments, i.e. there is no escaping other
than that required by XML.

You managed to get the rdf:datatype case to validate by escaping the XML
markup, i.e. turning it into a string, which seems like unnecessary work
if you already have an XML fragment.

BTW, I don't understand why the W3C RDF Validation service is displaying
the XML content as escaped. That means the data is actually
double-escaped. I'd be happier seeing plain text  N-Triples or Turtle.

It seems to me that since RDF/XML is well-formed XML, then the natural way
to include XML literals is as XML, not as a string that contains escaped
XML markup. However, I concede your point that in principle we don't need
rdf:parseType="Literal"  if you are sure that we get exactly the same set
of triples using just rdf:datatype. If so, you are correct in saying that
rdf:parseType="Literal" is just syntactic sugar.

I see where you are going with this. You want OSLC to create a new
datatype for HTML and you are demonstrating that rdf:datatype gives you
the mechanism to do this. As I said before, creating a new datatype will
limit interoperability since other processors will not know how to process
the new datatype. There is no standard way to define the meaning of a new
RDF datatype.

Regards,
___________________________________________________________________________


Arthur Ryman

DE, PPM & Reporting Chief Architect
IBM Software, Rational
Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile)





From:
Dave Steinberg/Toronto/IBM at IBMCA
To:
oslc-core at open-services.net
Date:
08/26/2011 05:31 PM
Subject:
Re: [oslc-core] OSLC Compact representation, titles with markup
Sent by:
oslc-core-bounces at open-services.net



Hi Arthur,

Sorry, but I just don't agree. The two links you gave are both to the
RDF/XML spec, and they describe a special syntax for XMLLiteral-typed
literals and a general syntax for typed literals. They do not state that
the general syntax cannot be used for the case of XMLLiteral, and they
don't say anything that contradicts my understanding of the RDF abstract
data model.

Indeed, if you follow the "XML literals" link in Section 2.8, the RDF
Concepts spec defines XMLLiteral, like any other datatype, with a lexical
space, a value space and a mapping between the two. So, given any XML
value, what is to prevent you from using that mapping to compute a
corresponding lexical form, combining it with the datatype URI, and using
the ordinary literal notation (in any RDF concrete syntax)?

I just tried entering the following two RDF/XML documents into the
validation service:

<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:rdf="
http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="http://example.com/bugs/2314">
<dcterms:title rdf:parseType="Literal" xmlns="http://www.w3.org/1999/xhtml
"> 12345: <s>Null pointer exception during startup</s></dcterms:title>
</rdf:Description>
</rdf:RDF>

<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:rdf="
http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="http://example.com/bugs/2314">
<dcterms:title rdf:datatype="
http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"> 12345: <s
xmlns="http://www.w3.org/1999/xhtml">Null pointer exception during
startup</s></dcterms:title>
</rdf:Description>
</rdf:RDF>

It yielded exactly the same result in both cases:



I can also confirm Steve's claim that Jena can be configured to write out
exactly the same triples using either syntax.

Cheers,
Dave

--
Dave Steinberg
IBM Rational Software
davidms at ca.ibm.com


Arthur Ryman---08/26/2011 03:58:51 PM---Dave, No, it's not just syntactic
sugar. You need rdf:parseType="Literal" if you include element con


From:

Arthur Ryman/Toronto/IBM

To:

Dave Steinberg/Toronto/IBM at IBMCA

Cc:

oslc-core at open-services.net, oslc-core-bounces at open-services.net

Date:

08/26/2011 03:58 PM

Subject:

Re: [oslc-core] OSLC Compact representation, titles with markup


Dave,

No, it's not just syntactic sugar. You need rdf:parseType="Literal" if you
include element content. If you use rdf:datatype then only character
content is allowed.

This is explained in the spec at [1] and [2]. rdf:parseType="Literal"
allows XML Literal content. rdf:datatype="whatever" allows string content.

However, since specs are hard to understand, I suggest you convince
yourself of this, as I did, by using the W3C RDF Validation service. [3]

[1] http://www.w3.org/TR/REC-rdf-syntax/#section-Syntax-XML-literals
[2] http://www.w3.org/TR/REC-rdf-syntax/#section-Syntax-datatyped-literals
[3] http://www.w3.org/RDF/Validator/

Regards,
___________________________________________________________________________


Arthur Ryman

DE, PPM & Reporting Chief Architect
IBM Software, Rational
Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile)


Dave Steinberg---08/26/2011 03:22:10 PM---Arthur, I believe you're
mistaken. I think that parseType="Literal" is just


From:

Dave Steinberg/Toronto/IBM at IBMCA

To:

oslc-core at open-services.net

Date:

08/26/2011 03:22 PM

Subject:

Re: [oslc-core] OSLC Compact representation, titles with markup

Sent by:

oslc-core-bounces at open-services.net



Arthur,

I believe you're mistaken. I think that parseType="Literal" is just
syntactic sugar (RDF Primer: "RDF/XML provides a special notation to make
it easy to write literals of this kind"). Either way you write it, you end
up with the same statement. Two statements with the same subject, the same
predicate and a typed literal with the same type (<
http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>) and the same
lexical form are indistinguishable.

Also, if you were correct, parseType="Literal" would provide RDF/XML with
some sort of privileged XMLLiteral representation that couldn't written
out using any other RDF notation.

Cheers,
Dave

--
Dave Steinberg
IBM Rational Software
905-413-3705
davidms at ca.ibm.com


Arthur Ryman---08/26/2011 02:22:29 PM---Randy, Your example makes the
content a string that looks like XHTML, i.e. the

From:

Arthur Ryman/Toronto/IBM

To:

Randy Hudson/Raleigh/IBM at IBMUS

Cc:

Dave Steinberg <davidms at ca.ibm.com>, oslc-core at open-services.net,
oslc-core-bounces at open-services.net

Date:

08/26/2011 02:22 PM

Subject:

Re: [oslc-core] OSLC Compact representation, titles with markup



Randy,

Your example makes the content a string that looks like XHTML, i.e. the
content contains no XHTML elements since all the markup characters are
encoded. A string is simply parsed character data and is valid XML.

The correct way to include the XHTML elements is:

<dcterms:title rdf:parseType="Literal"> 12345: <s xmlns="
http://www.w3.org/1999/xhtml">Null pointer exception during
startup</s></dcterms:title>

The OSLC Guidelines about escaping are for the case where you need to
include characters that might get misinterpreted as XML markup. For
example, consider a math statement like "1 < 2". When you put that in an
XML element, you need to encode it as "1 < 2"

Regards,
___________________________________________________________________________



Arthur Ryman

DE, PPM & Reporting Chief Architect
IBM Software, Rational
Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile)





From:
Randy Hudson/Raleigh/IBM at IBMUS
To:
Arthur Ryman <ryman at ca.ibm.com>
Cc:
Dave Steinberg <davidms at ca.ibm.com>, oslc-core at open-services.net,
oslc-core-bounces at open-services.net
Date:
08/25/2011 07:06 PM
Subject:
Re: [oslc-core] OSLC Compact representation, titles with markup


The following input is also equivalent:

<dcterms:title rdf:datatype="
http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"> 12345: <s
xmlns="http://www.w3.org/1999/xhtml">Null pointer exception during
startup</s></dcterms:title>

So there are (at least) two different ways to serialize a property value
of type XML literal.  But, the OSLC guidelines state:

1.2 If property value is a Literal value-type
1.2.1 Inside the XML element add the value as a string with any required
escaping

That would seem to suggest that the above form should be used.

-Randy




From:
Arthur Ryman <ryman at ca.ibm.com>
To:
Dave Steinberg <davidms at ca.ibm.com>
Cc:
oslc-core at open-services.net, oslc-core-bounces at open-services.net
Date:
08/25/2011 04:34 PM
Subject:
Re: [oslc-core] OSLC Compact representation, titles with markup
Sent by:
oslc-core-bounces at open-services.net



Dave,

1. XML Namespaces.

RDF/XML is well-formed XML so it must support namespaces correctly. For
triples whose datatype is XML Literal, the value of this literal is a
well-formed XML fragment, and therefore the namespaces should be present
in the content. If there is an enclosing <span> element, then the
namespace should be there. Otherwise, each element in the content should
have the namespace.

The spec doesn't say "for XHTML, you need to insert an xmlns attribute for


http://www.w3.org/1999/xhtml" because that is part of the XHTML standard,
i.e. it's not XHTML unless the elements are in the XHTML namespace.

2. Jena

I loaded the sample RDF/XML  into Fuseki which uses Jena and it produced
the correct result. I assume the Jena API lets you get an XML DOM from the


literal value.

The input contained:    <dcterms:title rdf:parseType="Literal" xmlns="
http://www.w3.org/1999/xhtml"> 12345: <s>Null pointer exception during
startup</s> </dcterms:title>

The output value is:   " 12345: <s xmlns="http://www.w3.org/1999/xhtml
">Null pointer exception during startup</s> "^^<
http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>

3.  XHTML versus HTML

The primary reason is that RDF supports XHTML via the XMLLiteral datatype.


There is no parsing support for HTML built into RDF.

Another strong reason is that the syntax of HTML is very irregular and
hard to parse correctly - that is one of the reasons XML was invented.
This is very important from a security viewpoint. To guard against script
injection attacks, you really should parse the input and remove any
<script> elements or Javascript attributes. Doing that correctly for HTML
requires a full HTML parser. On the other hand, the XHTML is given to you
as a DOM which you can easily traverse or process using XSLT or XPATH.

4. Datatypes

The specs do specify the datatypes for some properties. Look at the
Value-Type column of the tables, e.g. [1]. You need to include the
datatype explicitly for ints, dates, XML. etc. You specify that using
rdf:datatype in RDF/XML, or using ^^ in Turtle.

I don't know what the state of adoption is. We really should get some test


suites written for the specs.

5. Inventing new Datatypes

The RDF spec defines the XSD datatypes and the XMLLiteral datatype. RDF
parsers know how to parse those. If someone introduces a new datatype URI,


it could break parsers since they won't know how to parse the contents.
There is no standard way to define new datatypes.

Try it with the RDF Validation service [2]

[1] http://open-services.net/bin/view/Main/OSLCCoreSpecAppendixA
[2] http://www.w3.org/RDF/Validator/

Regards,
___________________________________________________________________________




Arthur Ryman

DE, PPM & Reporting Chief Architect
IBM Software, Rational
Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile)





From:
Dave Steinberg/Toronto/IBM at IBMCA
To:
oslc-core at open-services.net
Date:
08/24/2011 03:05 PM
Subject:
Re: [oslc-core] OSLC Compact representation, titles with markup
Sent by:
oslc-core-bounces at open-services.net



Hi Arthur,

Thanks for the response. Apologies for being slow in replying; I've been
out sick for the last day and a half.

I agree that putting the XML namespace on the enclosing element would be a


convenience, but only if tools supported that. As far as I could find,
Jena provides no fine-grained access to namespace declarations (i.e. other


than at the model level), so I believe that one couldn't use it to produce


or consume the fragment that you suggested. Moreover, the other RDF
representations offer no such convenience, even in theory.

So, it seems to me that the suggestion to use a namespace was actually a
pretty significant one, and not one that's reflected in the specs, since
you'd always need an enclosing element for your XML content.

Thanks for the suggestion of using Tidy to convert from HTML to XHTML.
That was very helpful for me. But I must admit, I'm still left wondering
what makes XHTML superior to HTML for interchanging formatted text,
especially in light of the compact representation example and my own
experiences, where the opposite seems to be true.

One last thing that I'll emphasize is that I mentioned a lack of guidance
in the OSLC specs specifically about plain vs. typed literals. It seems so


odd to me that plain literals seem to be favoured everywhere, except when
in comes to using XMLLiteral with rdf:parseType="literal", but none of
this is acknowledged or explained anywhere. It looks like using a typed
literal in this one case is accepted merely as a requirement to benefit
from the prettier RDF/XML syntax for XML content. However, I view things
completely in the opposite light. To me, typed literals are a powerful
benefit of RDF. You can use a typed literal to decide how to handle a
literal value, without looking at the value itself, but that advantage is
lost without a sufficiently specific type. Thus, I don't understand how
defining and using a new RDF datatype to identify something as widely
recognized and understood as HTML would impair interoperability. I think
it would do the opposite.

Cheers,
Dave

--
Dave Steinberg
IBM Rational Software
davidms at ca.ibm.com


Arthur Ryman---08/23/2011 10:09:55 AM---Dave, Thx for the comments.


From:

Arthur Ryman/Toronto/IBM

To:

Dave Steinberg/Toronto/IBM at IBMCA

Cc:

oslc-core at open-services.net, oslc-core-bounces at open-services.net

Date:

08/23/2011 10:09 AM

Subject:

Re: [oslc-core] OSLC Compact representation, titles with markup


Dave,

Thx for the comments.

I agree that the guidance on using XMLLiteral is not very clear in the
spec. There was a lot of discussion about this at the time the spec was
under development, but not much of that discussion survived the editorial
process. The only place I see it is in the appendix on standard properties


- dcterms:title and dcterms:description. [1]

The guidance was that dcterms:title should be valid XHTML <span> content
and dcterms:description valid XHTML <div> content. This means that the RDF


datatype should be XMLLiteral and that appropriate namespaces should be
used for XHTML content.

Putting the XHTML namespace on the enclosing element is a convenience. The


parser should propagate that to the content, i.,e. when you look at the
triples, the XML literal node should have the inherited namespace.

If you wanted the namespace directly in the content then you could enclose


the content in a <div> or <span> and put the namespace there.

Using XHTML is the best way to achieve interchange of formatted text.
There are converter from HTML to XHTML, e.g. Tidy. However, in the case of


preview, why would conversion be needed? Shouldn't we be defining content
that is XHTML?

In another use case, people wanted to use native Wiki text as the content.


However, that would cause a big interop problem since there are many Wiki
syntaxes. All of these are convertible to XHTML since that is what the
Wikis do to display the formatted result. In another use case, people
wanted to include Rich Text.

The general theme is that developers want to use whatever native format
their tool supports, e,g, HTML, wiki text, and Rich Text, since it avoids
conversions. However, this would couple the resource to the tool. OSLC is
trying to achieve interoperability among heterogeneous tools. Therefore a
common rich text format is needed.

The alternative of defining new RDF datatypes for HTML, wiki text, RTF
etc. would mean that OSLC resources would not be understood by other
applications. In general, the creation of new RDF datatypes is discouraged


since it impairs interoperability.

[1]
http://open-services.net/bin/view/Main/OSLCCoreSpecAppendixA?sortcol=table;up=#Dublin_Core_Properties





Regards,
___________________________________________________________________________




Arthur Ryman


DE, PPM Chief Architect

IBM Software, Rational

Toronto Lab | +1-905-413-3077
Twitter | Facebook | YouTube




Dave Steinberg---08/23/2011 12:06:32 AM---Hi all, I've been following this


thread with interest, as it touches on some of the


From:

Dave Steinberg/Toronto/IBM at IBMCA

To:

oslc-core at open-services.net

Date:

08/23/2011 12:06 AM

Subject:

Re: [oslc-core] OSLC Compact representation, titles with markup

Sent by:

oslc-core-bounces at open-services.net



Hi all,

I've been following this thread with interest, as it touches on some of
the more general confusion/discomfort I've been developing over the past
several weeks or months about the use of XMLLiteral with
rdf:parseType="Literal" for HTML content.

Adam's comments below are particularly interesting. In general, it's not
clear to me who benefits from the use of the unescaped literal
representation, or in what scenario. And that approach, then, requires the


use of the XMLLiteral type, which I also wonder about (as I'll explain
further). If there is some benefit that I don't know about, perhaps it
derails this whole line of thought. But if there isn't, could this be a
case of the concrete representation tail wagging the abstract syntax dog?

One thing that always struck me as odd was that rdf:parseType="Literal"
examples were the only ones I could find anywhere in OSLC that use typed
literals (the XMLLiteral type is implicit with this special RDF/XML
syntax). Moreover, I couldn't find any guidance in the specs about the use


of plain vs. typed literals at all. From the perspective of a client,
anyway, it would seem a very nice thing if a particular provider would use


a typed literal to tell you that a title, for example, should be treated
as a simple string or as HTML content. And that's the very thing that
typed literals do. It could be that the presence of an XMLLiteral type is
supposed to signal the use of XHTML content, and the absence of any type
is supposed to signal plain text. But I couldn't find that spelled out
anywhere -- if it is, perhaps it's hard to find, or perhaps I just did a
poor job of looking -- and I'd argue it would be better to include types
in both cases. [1]

It's this line of thinking that leads me to question the use of XMLLiteral


in the first place. I saw in some old discussions that the intention in
OSLC was not for XMLLiteral to imply XHTML necessarily. Using it for other


XML languages was considered and endorsed, in principle. But where does
that leave XHTML? With a type that doesn't really say what it is or what
you can do with it. We have specs that communicate the XHMTL intent in
words, but we also have a mechanism built into RDF that could communicate
this, and we're not really using it fully. Thus, I think it would be
preferable to define and use a type that specifically represents HTML. And


note, I suggest HTML, not XHTML, since using any type other than
XMLLiteral eliminates the "benefit" of the special rdf:parseType="Literal"


syntax. And without that, I don't see a particular benefit in the stricter


XHTML syntax.

One other possibility that I've considered, which Arthur suggested
previously, is using a namespace to identify that the XML is XHTML, in
particular, instead of doing it directly in the literal type. And I
believe that, strictly, the XHTML namespace is required for the elements
to be valid XHTML. But I found no hint of this in the spec or any
examples, and certainly RTC doesn't do this (I haven't checked other
providers). Moreover, I believe it's also a worse approach, since there's
no guarantee that your RDF runtime of choice will give you access to
namespaces declared on the property element (I don't believe Jena does),
and detecting a namespace inside the element content would require
actually parsing the value as XML. If all you want to do is pass markup
along for display in a browser, it would be unfortunate to have to
actually parse the content to determine that it's XHTML.

And this is where I close the loop on my thinking, by coming back to how a


consumer might actually want to make use of HTML content. Even outside of
the compact rendering scenario, ultimately it's probably going to get
displayed by a browser, whether as part of a larger Web page or in a
browser-backed widget in a rich client. And for that, HTML is probably
just as good as, if not better than, XHTML. Rather than worrying about
whether the content is well-formed XML, it's probably sufficient to just
give it to the browser and see what it can do with it. I would assert that


"something a browser can render" has been the working definition of HTML
for a good number of years now, while XHTML has largely faded in
importance.

Going the other way, the appeal of HTML really shows. If a provider
natively deals with HTML (without concern for XML well-formedness), it
would be attractive to not have to convert that into XHTML to expose it
via OSLC. Likewise, a consumer may use a rich text control that yields
HTML. Generalized parsing of HTML for conversion to XHTML is non-trivial,
and it seems unfortunate to impose that conversion task onto everyone,
just so that we can use rdf:parseType="Literal" in RDF/XML and avoid
applying normal XML encoding to markup content (of course, some encoding
will likely be required for other RDF syntaxes anyway).

So, those are my thoughts on this (admittedly enlarged) topic. Even if
they all do make perfect sense (and I'm not necessarily claiming they do),


I realize we may be well past the point of being able to act on them.
Still, I thought I'd put them out there and see what others make of them.

Cheers,
Dave


[1] In fact, I think that the consistent use of typed literals in general
would be beneficial. You could even imagine exploiting them as a
compatibility measure, if it was decided that the type of a property
needed to change. This is a related, but separate, topic, which I'd be
thrilled to discuss further, but I don't want to open too many cans of
worms at once.

[2] Or, perhaps, a less kind way of putting that is that the XHTML
namespace is required for the elements to

--
Dave Steinberg
IBM Rational Software
davidms at ca.ibm.com


Adam Archer---08/22/2011 06:20:05 PM---The big concern to me is not the
ability to process the RDF/XML with XPath, it's the ability to do

From:

Adam Archer/Toronto/IBM at IBMCA

To:

Arthur Ryman/Toronto/IBM at IBMCA

Cc:

"oslc-core at open-services.net" <oslc-core at open-services.net>, Randy Hudson
<hudsonr at us.ibm.com>, oslc-core-bounces at open-services.net

Date:

08/22/2011 06:20 PM

Subject:

Re: [oslc-core] OSLC Compact representation, titles with markup

Sent by:

oslc-core-bounces at open-services.net



The big concern to me is not the ability to process the RDF/XML with
XPath, it's the ability to do so in a browser environment. Currently all
implementations of all rich hovers in all Jazz based products encode any
html tags in their dcterms:title attributes (and doubly encode special
characters). For the consumer on the browser side, this means simply
taking the content of the attribute, decoding it (which browsers are very
good at) and slapping the result into the dom (which browsers are also
very good at).

The alternative would be a total consumability nightmare from the point of


view of a browser (which is the most important consumer of this entire
spec). If the tags are actually child nodes in the xml representation, it
means we will have child elements in the resulting document that we get
back from the xml http request which means we will have to traverse a dom
tree and recreate a structure which could easily be represented as an
escaped string, like everyone is doing today.

I realize that implementation is not supposed to lead the spec, but I
don't even think that would be the case here. The oslc compact spec grew
organically out of the old jazz compact rendering spec which can be found
here:

https://jazz.net/wiki/bin/view/Sandbox/CompactRenderingV1P1

If we look at the semantic description of the dc:title and jp:abbreviation


it states explicitly that the content MUST be escaped:

> The HTML markup MUST be escaped; for example, "<b>" as "<b>".

This decision was made consciously for very well defined technical reasons


(discussed above) in the original spec. If that decision was reversed in
the creation of the OSLC compact spec then I believe that to have been a
huge mistake and would like to see the spec fixed rather than for all
providers to have to change how their compact documents are served and all


consumers to have to go to the trouble of walking the dom to determine
what the provider is actually trying to show.

Adam Archer
Jazz Developer
IBM Toronto Lab



From: Arthur Ryman/Toronto/IBM
To: Samuel Padgett <spadgett at us.ibm.com>
Cc: Adam Archer/Toronto/IBM at IBMCA, Randy Hudson <hudsonr at us.ibm.com>,
"oslc-core at open-services.net" <oslc-core at open-services.net>,
oslc-core-bounces at open-services.net
Date: 08/22/2011 04:40 PM
Subject: Re: [oslc-core] OSLC Compact representation, titles with markup


Sam,

You wrote:

It's very difficult to parse the former using XPath. For instance, the
expression "/oslc:Compact/dcterms:title" takes out the "<s>" and "</s>"

I don't think problems using XPath are a valid reason to encode markup
since RDF/XML itselt is very difficult to process using XPath. At one
point we tried to define an OSLC-variant of RDF/XML that looked like
"normal" XML. However, we abandonned that and now require support for
generic RDF/XML.

The are many equivalent ways to represent a given set of triples in
RDF/XML. It would therefore be very problematic to use XPath, XSLT, or
XQuery to process RDF/XML. The safe way to process RDF/XML is to use an
RDF toolkit like Jena.

Regards,
___________________________________________________________________________




Ar thur Ryman


DE, PPM Chief Architect

IBM Software, Rational

Toronto Lab | +1-905-413-3077
Twitter | Facebook | YouTube






From:
Samuel Padgett <spadgett at us.ibm.com>
To:
"oslc-core at open-services.net" <oslc-core at open-services.net>
Cc:
Adam Archer/Toronto/IBM at IBMCA, Randy Hudson <hudsonr at us.ibm.com>
Date:
08/07/2011 01:01 PM
Subject:
[oslc-core] OSLC Compact representation, titles with markup
Sent by:
oslc-core-bounces at open-services.net





I believe the spec is a bit confusing when it comes to titles with markup
for UI Preview.

The Compact representation has a dcterms:title property. It's defined as
an
XML Literal that can contain XHTML markup [1]. My understanding of XML
Literals as discussed in the RDF Primer [2] means a title with markup
would
look like this,

<dcterms:title>12345: <s>Null pointer exception during
startup</s></dcterms:title>

The example [3] of this resource has a title like this, however,

<dcterms:title> 12345: <s>Null pointer exception during
startup</s> </dcterms:title>

The example doesn't seem to fit with the description.

It's very difficult to parse the former using XPath. For instance, the
expression "/oslc:Compact/dcterms:title" takes out the "<s>" and "</s>"
Most implementations I'm aware also follow the example where markup is
encoded. It means special characters need to be "double encoded." For
instance, "12345: Values > 1000 incorrectly calculated" would be,

<dcterms:title>12345: Values &gt; 1000 incorrectly
calculated</dcterms:title>

I think we should add more clarity to the spec here, as getting this wrong
can open up consumers to cross-site scripting attacks. I'd also suggest we
say that providers MUST NOT use any markup with a <script> tag and
consumer
MUST NOT display any markup with a <script> tag to guard against this
problem.

Best Regards,
Sam


[1]
http://open-services.net/bin/view/Main/OslcCoreUiPreview?sortcol=table;up=#Representation_Compact




[2] http://www.w3.org/TR/rdf-syntax/#xmlliterals
[3]
http://open-services.net/bin/view/Main/OslcCoreUiPreview?sortcol=table;up=#XML_Representation_Format






_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net

_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net

_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net


_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net




_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net




_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.n et/mailman/listinfo/oslc-core_open-services.net


_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net




_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net


_______________________________________________
Oslc-Core mailing list
Oslc-Core at open-services.net
http://open-services.net/mailman/listinfo/oslc-core_open-services.net


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://open-services.net/pipermail/oslc-core_open-services.net/attachments/20110923/0249c186/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://open-services.net/pipermail/oslc-core_open-services.net/attachments/20110923/0249c186/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://open-services.net/pipermail/oslc-core_open-services.net/attachments/20110923/0249c186/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 18564587.gif
Type: image/gif
Size: 360 bytes
Desc: not available
URL: <http://open-services.net/pipermail/oslc-core_open-services.net/attachments/20110923/0249c186/attachment-0002.gif>


More information about the Oslc-Core mailing list