Introduction
One of the goals of the Open Services initiative is to enable loosely coupled tools to tightly collaborate through RESTful web services, a common data linkage strategy, and common resource representations using an RDF model. The importance of focusing on good, consumable resource designs cannot be overstated. In fact, tool designers should view the resource formats as the contract among collaborating tools. The reason is simple: programming languages, architectures, APIs, and server platforms will come and go, but if the data matters it will live on for a very long time.
In our experience, many application developers are not accustomed to thinking that resource formats are this important. Often, developers live and breathe their objects and run-time data structures, with files serving only to serialize these data structures. This perspective leads to formats that are tightly bound to particular implementation technologies. These tight bindings lead to situations where the only reasonable way to access the application data is through the application itself. A more thoughtful approach to resource design is required to enable loosely coupled programs written in a variety of technologies that collaborate via shared data.
Overall, the goals of the guidance are to ensure:
- Simplicity - tend towards the simplest thing that can possibly work. Often, this means reusing existing standards where possible rather than inventing new solutions to old problems.
- Extensibility - the ability for one tool/user to extend an existing resource by adding additional information to it in a safe manner.
- Stability - the ability for a resource “schema” to change over time without impacting clients.
- Composability - the ability to treat data not as a fixed monolithic structures but as collections of reusable components that can be composed.
- Understandability - the ability for a human to be able to understand a document structure, even if the document is primarily produced/consumed by a tool.
This document offers guidance to resource designers on their resource formats, focusing on XML-based resources for use in RESTful applications that can participate in the Open Services for Lifecycle Collaboration initiative. We provide guidance on:
Resource Granularity
There are several reasons to make resources as granular as imaginable.
- To make it easier to incorporate data from other contexts. One resource can easily refer to a small piece by its URL, indicating that it is being used in a new context. When the contents of the referred-to resource are needed, a simple GET retrieves it. While URLs with fragment identifiers can alternatively be used to indicate a portion of a larger resource, there are limitations to this approach.
- To yield a more flexible permissions model. Individual resources can easily have different permissions associated with them. If a resource contains data that has different permissions than the resource itself, then either an RPC-style interface (with its own ACL) would be needed to update that data; or a PUT that examines the contents of the resource being updated and then determines if the user is permitted to make the requested change or not would be required. The former approach is not RESTful; the latter is unwieldy, counter-intuitive, and error-prone.
- It’s easier to get smaller groups of people to agree on small things, than it is to get larger groups of people to agree on large things. If two tools from different domains need to operate on some common data elements, and that data is stored in its own resource, the two tool authors need only agree on the representation of that data. If each tool vendor believed that this data is part of a larger aggregate (i.e. if the requirements tool considers use cases part as part of a requirements document, and the UML tool thinks of them as part of a use-case model), then the tool vendors need to either agree on a resource format for a unification of their domains (highly unlikely, and leading to tighter coupling between the tools), or their users will need to share data through an import/export strategy (an often unwieldy approach for users, prone to synchronization errors).
Performance is often offered as an argument for aggregating resources rather than seeking granularity. The fear is that requiring a series of GETs to retrieve all of the resources which comprise a resource (e.g. a user-visible element of work like a diagram showing many model elements) will be prohibitively expensive. However, there are strategies for mitigating this, including careful design of the containing artifact to enable the primary use cases without having to GET all of the linked-to resources. Additionally, it is easier to imagine server-side constructs (such as a good query implementation) that will return aggregations of resources with one request than it is to imagine a fine-grained PUT allowing users to update only a portion of a portion of a resource.
As a result, resource designers should choose as fine-grained a representation as possible (but no finer). In particular, consider breaking entities into individual resources when:
- users will want to find the entities by name,
- users will want to reuse the entities in another context,
- different access control rules than those for the containing resource may apply to the entity, or
- the version history for the entity needs to be tracked independently from that of the containing resource.
For example, in a glossary, individual terms are entities that users want to find by name; they are also candidates for reuse across multiple glossaries. Therefore, they should be in their own resource. In a requirements document, different permissions may apply to requirment comments and discussions. Therefore, they should be defined as resources separate from the requirement. Since it is desirable to track the revision history of each requirement individually, each requirement should be in its own resource as well.
As another example, workflow-related data (for example, the state of a requirement) introduces some interesting characteristics. Often, there are differences in the permissions for who may affect the state of a requirement versus who is allowed to change the requirement’s data. Moreover, requirements are often shared across multiple projects, with the shared instances being in potentially different states in the different projects. Changes to the requirement’s data might be tracked independently from changes to its state. Therefore, the state of a requirement might best be defined as a different resource than the requirement. In fact, a reasonable approach is that the requirement state be a work item resource which tracks what is needed to be done to complete a requirement.
Conversely, when there are entities with such strong cohesion that they must share a common version history, cannot sensibly be reused independently, must have identical permissions, and can only be conceived of in some common context, aggregating them into a single resource may make sense. Care must be taken, however, because once this decision is made, these entities can, in fact, only be created within that common context, which might not always correspond to a sound user experience for the application.
RDF Property and Class names
Applications often need to deal with three kinds of data:
- Data that the tool understands deeply: This data might correspond to the tool’s native data structures (such as its java object model) and often maps directly into the world-view that the tool presents to its users.
- Opaque foreign data: The tool knows nothing about this data, but has to carry it along as a payload. The tool doesn’t even know how to show this data to its users (except perhaps in a debug mode). The tool has no algorithms to analyze or process this data – it is just responsible for carrying it along, unmodified. An email attachment is an example of this for an email client.
- Generic Extension Data: This is data that the tool represents generically in its object model (often as name/value pairs), presents to the users fairly generically in a property view, doesn’t have code to reason about, but knows that it is important to users while using its tool.
For data in category one, most designers will choose RDF Property and Class names that correspond to the domain data being represented. That is, a glossary editing tool would call the root element of its glossary resource glossary; the element describing the glossary would be called description, etc.
For opaque data (category 2), there are several choices, depending on the application. Often the data may be embedded in the primary resource in its native format (RDF). To enable this, resources should be designed to accept “open content.” That is, tools should be completely happy with RDF Property names of the form <foons:my-attr>foo</foons:my-attr>. When the data can not be expressed in RDF Statements, it may either be embedded as CDATA or stored in its own resource and linked to from the primary resource. In this case, the name of the element carrying the link should correspond to its role in the primary resource (as in <attachment href=”http:\\\\\\\\\\\\\\\\example.org\\\\\\\\myattachment.pdf”/>).
Generic extension data provides an interesting case. As an example, many XML formats are defined to include a generic extension mechanism in the form <string-attribute key=”my-attr” value=”foo”/>. This closely mimics what many developers have historically done in their run-time data structures. However, RDF is already an extensible and open model; if every tool invents its own extension mechanism, then we will have lost one of the benefits of RDF.
Class names should follow upper camel case, for example a change request would be called ChangeRequest
.
Property names should follow lower camel case, for example resource shape predicate should be resourceShape
.
Do not include within your names characters that need to be URI-encoded, such as whitespace ( ).
RDF Vocabularies
There are many common RDF vocabularies already defined and resource designers should try to reuse them where possible. For example, the Dublin Core Metadata Initiative (http://purl.org/dc) describes a common set of metadata (title, description, creator, etc.), and a corresponding RDF vocabulary. By including the http://purl.org/dc/terms/ namespace and using terms from that namespace in documents, designers can avoid re-inventing terminology for these common concepts. Additionally, designers will have enabled tools that already recognize that vocabulary to understand their resources. If a family of tools agrees to support Dublin Core, any tool in this family can at least discover, for example, the title and description of a resource, even if it does not know the details of the format. Reusing common RDF vocabularies in this way helps tools interoperate while remaining loosely coupled.
See also Core URI Naming Guidance.
Stability
It’s important that we consider the stability of resource representations as a part of their design and evolution. As we have indicated, designers should view resource formats as the contract between tools and as such, the handling of resources must be documented and managed through the life of the product. More importantly, resources must be designed with evolution in mind, evolution must happen with backwards compatibility in mind, and processors must be aware of these trade-offs.
Concretely:
* Resources should be designed so that they may later be augmented without breaking existing clients. This implies that once defined, most changes to a resource format should be additive changes. When “old” clients encounter the additions, they should be designed to accept and preserve them, even if they do not understand them.
* Resources should be able to accept “open content” wherever possible. This means that tools can add their own data to resources with confidence that other processors will accept and preserve it.
* Resource formats may include format version numbers by way of new namespace URIs. This allows tools that understand previous versions to not be affected. New statements will occur in the new namespace URI. Creating new namespace URIs should be a rare occurance, adding new properties to existing namespace would be preferred.
Additional Guidance
- Don’t invent. RDF and related technologies already provide solutions to many problems. For example, RDF specifications usually shouldn’t concern themselves with extensibility models(RDF is already extensible) or mechanisms for describing types. Inventing new answers to these problems in the context of a domain’s specification means that tool builders need to learn more things to interoperate with each other. Where possible, tool builders should need to learn only standard RDF technologies, and the semantics of the domain’s vocabulary.
- In order to simplify query:
- Minimize usage of RDF blank nodes. Also you lose the ability to identify a resource externally with blank nodes.
- Minimize usage of RDF constructs such as Seq and Bag.
- Minimize usage of RDF reification.
See also the OSLC Primer for some additional guidelines.
Category:Supporting Documents