[oslc-cm] A Modest Proposal for Attachments

Wed Jul 20 07:51:58 EDT 2011

Hi Dave,

Greatly appreciate you taking the time to put together such a well thought 
out proposal.  I've in fact been contemplating how to eliminate the 
intermediate "AttachmentDescriptor" resource type and think this is a nice 
alternative.  I'll include specific comments inline in the proposal.

> From: Dave Steinberg <davidms at ca.ibm.com>
> To: oslc-cm at open-services.net
> Date: 07/19/2011 02:19 PM
> Subject: [oslc-cm] A Modest Proposal for Attachments
> Sent by: oslc-cm-bounces at open-services.net
> 
> Hi all,
> 
> The discussion on attachments in the last couple of meetings has been 
very 
> interesting. Various issues and alternatives have been discussed, but 
speaking
> personally, I was more trying to understand the ideas than expressing 
strong 
> preferences. I really needed to take some time to think about it (and 
look 
> into some things that came up in the discussion that I didn't know much 
about)
> before forming opinions. I may be mistaken, but my sense is that the 
same 
> might have been true for some other participants. I didn't get the sense 
that 
> a consensus was formed, though if I'm incorrect about that, my 
apologies.
> 
> Anyhow, now that I have thought things through, I thought I'd share my 
> thoughts. If you like, it's my proposal for what I believe is the 
simplest, 
> most straightforward approach to attachments that meets our needs. 
> Unfortunately, I'm going to be away over the next few days (and so 
unable to 
> attend tomorrow's meeting). Apologies for dropping this and running. 
That 
> wasn't my intention, but unfortunately, I just wasn't able to get it 
done 
> until today. If this spurs any discussion, please don't take my silence 
as 
> disinterest. I'll speak up again as soon as I'm back.
> 
> My proposal is to use a collection resource for attachments, with a URI 
that's
> separate from the change request itself, i.e.
> 
> <http://example.com/bugs/2314>
> rdf:type oslc_cm:ChangeRequest ;
> dcterms:identifier "00002314" ;
> oslc:shortTitle "Bug 2314" ;
> ...
> oslc_cm:attachments <http://example.com/bugs/2314/attachments> .
> 
> The main reason for this approach is to allow adding attachments by 
simply 
> POSTing to this URI. That said, I think there are other benefits, too. 
It 
> means that a consumer need not be overwhelmed with attachment 
information when
> retrieving a change request. I think that's quite reasonable, as 
attachments 
> are often handled as a sort of secondary concern. Moreover, in cases 
where 
> attachments are needed, it's just one additional GET to obtain all of 
the 
> information about the attachments (as you'll see in a moment). Finally, 
I'll 
> point out that there is precedent for the collection resource approach 
in 
> Core's discussion mechanism. I can't see any reason why it's less 
appropriate 
> here (indeed, I see reasons why it's even more appropriate).
> 
> I think that the other proposal for creating attachments, a separate 
factory, 
> is just a bad fit here. The nice RESTful factory approach would be to 
create 
> the attachment resource by POSTing to it, then to add the attachment to 
a 
> change request. But, as we discussed, that's not possible because some 
of 
> underlying systems actually create and attach an attachment in a single 
step. 
> The proposal to deal with this was to require that a change request be 
> specified in/with the attachment, but I fear that might seem non-obvious 
and 
> arbitrary to consumers. The problem, I think, is that it's backwards. 
Our 
> model should make the change request the primary thing, with the 
attachment 
> subordinate to it (because that's the model used in several systems). 
And the 
> familiar RESTful approach for that scenario is to use a collection 
property of
> the primary thing as the factory for the subordinate thing, i.e. the 
approach 
> I suggest above.
> 
> I should back up a bit and talk about what the attachments collection 
resource
> should look like. Here's what my first suggestion would be:
> 
> <http://example.com/bugs/2314/attachments>
> rdf:type oslc_cm:AttachmentList ;
> rdf:_1 <http://example.com/bugs/2314/attachments/screenshot.png> ;
> rdf:_2 <http://example.com/bugs/2314/attachments/fix.patch> ;
> ... .
> 
> I'm just using the standard RDF container membership properties here. 
That 
> said, I know they're not popular in OSLC (though I don't really 
understand 
> why), 

I'm not really sure there is any true apposition to it.

>       so my secondary proposal would be to use a new property:
> 
> <http://example.com/bugs/2314/attachments>
> rdf:type oslc_cm:AttachmentList ;
> oslc_cm:attachment <
http://example.com/bugs/2314/attachments/screenshot.png> ,
> <http://example.com/bugs/2314/attachments/fix.patch> , ... .
> 
> Here's where, I think, my proposal diverges from the ideas we discussed 
in the
> meetings, and I'm strongly convinced it's an improvement: The objects of 
the 
> attachment statements represent the attachments themselves, that is, if 
you 
> did a GET on one of those URI's, you would retrieve the attachment 
content 
> itself. The reason I think that's a very good thing is that it 
eliminates the 
> intermediate per-attachment descriptive resource entirely. That means 
that 
> creating an attachment really is just one POST -- no need for 
complications 
> like multipart/mixed. So, where does that descriptive information go? 
Right in
> the collection resource:

I'm not sure this is a true statement.  The triples have as an object the 
actual attachment they are describing.  You are requesting additional 
semantics of when you do a GET on the attachment collection resource that 
you must also return the triples about the attachments.  So this indicates 
a special case of GET, which may be worth doing but wanted to indicate as 
such.  I'll suggest an alternative in a bit.

> 
> <http://example.com/bugs/2314/attachments>
> rdf:type oslc_cm:AttachmentList ;
> oslc_cm:attachment <
http://example.com/bugs/2314/attachments/screenshot.png> ,
> <http://example.com/bugs/2314/attachments/fix.patch> , ... .
> 
> <http://example.com/bugs/2314/attachments/screenshot.png>
> dcterms:title "screenshot.png" ;
> dcterms:format "image/png" ;
> dcterms:created "2011-07-18T13:22:30.45-05:00" ;
> ... .
> 
> <http://example.com/bugs/2314/attachments/fix.patch>
> dcterms:title "fix.patch" ;
> dcterms:format "text/x-diff" ;
> dcterms:created "2011-07-19T15:03:54.00-05:00" ;
> ... .
> 
> The purpose of this information is to help a consumer that wants to 
render a 
> UI listing all of the attachments associated with the change request, so 

> putting it right in the list resource is ideal. I think it's highly 
preferable
> to the proposal where an oslc_cm:attachment property would be used 
repeatedly 
> in the change request itself, with the value of each being a separate 
> intermediate resource. That would require the consumer to do a separate 
GET 
> for each attachment description in order to render such a UI.
> 
> I'll point out that RDF was designed to handle this very scenario well: 
we're 
> just adding additional information about an existing resource (the 
attachment)
> by making statements about it in another resource. There was some talk 
of 
> using reification in the last meeting, but I don't think it would be 
> appropriate here. These are definitely statements about the attachment 
> resource, not about the oslc:attachment statements themselves (e.g., the 

> format of the patch attachment, not the statement about it, is 
text/x-diff).
> 
> So now, I think there's just one more question to answer: How did this 
> information get there in the first place? I mentioned that eliminating 
the 
> intermediate descriptive resource means that creating an attachment can 
be a 
> simple POST, but what about providing this extra information?
> 
> A couple of observations about this information (dare I call it 
metadata?): 
> First, some of it (creator, created) probably doesn't need to specified 
at 
> all. The provider can determine it by itself. Second, some if it 
(format, 
> contentSize) overlaps beautifully with the standard Content- HTTP 
headers. So 
> that can be specified by the consumer right in the POST request and 
returned 
> by the provider right in the GET response, along with the content 
itself. 
> Thus, duplicating that information in the collection resource is merely 
a 
> convenience for consumers (for example, to render an attachment listing 
UI, as
> I described above).
> 
> For any information that we want to represent but for which there isn't 
a 
> corresponding standard header, we can just define a header ourselves. I 
think 
> it will be a very small set.
> 
> I haven't gotten into all the details of exactly what properties and 
headers 
> to use here (and particularly if we're defining our own headers -- I 
don't 
> know if this has been done elsewhere in OSLC and there's a standard form 

> already). But, if people like my approach, I'm sure those details could 
be 
> worked out within the group.
> 
> So, that's my proposal. I hope people find it helpful. I'm happy to 
answer any
> questions or address any concerns. Unfortunately, as I mentioned at the 
top, I
> won't be able to do so for a few days.
> 

Let me suggest a slight modification to this concept and that is a what to 
handle the attachment "metadata" independent of the attachment collection 
it belongs.
Let's start again with your example:

<http://example.com/bugs/2314/attachments>
   rdf:type oslc_cm:AttachmentList ;
   oslc_cm:attachment <
http://example.com/bugs/2314/attachments/screenshot.png> ,
   <http://example.com/bugs/2314/attachments/fix.patch> .

Now let's say I want to retrieve the metadata about attachment 
screenshot.png.
We could use some URL math to add on a request for metadata such as: GET 
http://example.com/bugs/2314/attachments/screenshot.png?metadata
This has the nice quality of being a different URL than the attachment 
itself, therefore in RDF terms is a different resource.  It can be 
computed easily from the attachment.  You can PUT on that URL to just 
update the metadata and not require special request headers.  The downside 
is if the server really doesn't understand ?metadata, you may get back the 
attachment when you don't want it.
Another alternative is HTTP content negotiation, you could request Accept: 
text/turtle on the attachment URL and get just the metadata.  This has the 
drawback of conflicting when your attachment IS text/turtle....maybe a 
rare case but still an issue.

We will discuss some on today's call and will await your feedback on this. 
 I'll also capture the notes from the discussion today and send them out. 
I have a conflict for the August 3rd next meeting, perhaps we can 
reschedule for next week to get something agreed to with attachments.  I 
think we are near.

Thanks,
Steve Speicher | IBM Rational Software | (919) 254-0645