Copyright © IBM Corporation 2012. This document is available under the W3C Document License. See the W3C Intellectual Rights Notice and Legal Disclaimers for additional information.
This document describes the motivation, scope, use cases, and requirements for best practices and simple approach for a Linked Data architecture, based on HTTP access to web resources that describe their state using RDF.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is a part of the Linked Data Basic Profile Submission, which comprises two documents:
By publishing this document, W3C acknowledges that the Submitting Members have made a formal Submission request to W3C for discussion. Publication of this document by W3C indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. This document is not the product of a chartered W3C group, but is published as potential input to the W3C Process. A W3C Team Comment has been published in conjunction with this Member Submission. Publication of acknowledged Member Submissions at the W3C site is one of the benefits of W3C Membership. Please consult the requirements associated with Member Submissions of section 3.3 of the W3C Patent Policy. Please consult the complete list of acknowledged W3C Member Submissions.
Linked Data was defined by Tim Berners-Lee with the following [4Rules]
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.
These four rules have proven very effective in guiding and inspiring people to publish Linked Data on the web. The amount of data, especially public data, available on the web has grown rapidly, and an impressive number of extremely creative and useful “mashups” has been created using this data as result.
There has been much less focus on the potential of Linked Data as a model for managing data on the web - the majority of the Application Programming Interfaces (APIs) available on the Internet for creating and updating data follow a Remote Procedure Call (RPC) model rather than a Linked Data model.
If Linked Data were just another model for doing something that RPC models can already do, it would be of only marginal interest. Interest in Linked Data arises from the fact that applications with an interface defined using Linked Data can be much more easily and seamlessly integrated with each other than applications that offer an RPC interface. In many problem domains, the most important problems and the greatest value are found not in the implementation of new applications, but in the successful integration of multiple applications into larger systems.
Some of the features that make Linked Data exceptionally well suited for integration include:
Experience implementing applications and integrating them using Linked Data has shown very promising results, but has also demonstrated that the original four rules defined by Tim Berners-Lee for Linked Data are not sufficient to guide and constrain a writable Linked Data API. As was the case with the original four rules, the need generally is not for the invention of fundamental new technologies, but rather for a series of additional rules and patterns that guide and constrain the use of existing technologies in the construction of a Basic Profile for Linked Data to achieve interoperability.
The following list illustrates a few of the issues that require additional rules and patterns:
A good goal for the Basic Profile for Linked Data would be to define a specification required to allow the definition of a writable Linked Data API equivalent to the simple application APIs that are often written on the web today using the Atom Publishing Protocol (APP). APP shares some characteristics with Linked Data, such as the use of HTTP and URLs. One difference is that Linked Data relies on a flexible data model with RDF, which allows for multiple representations.
This section collects a limited number of high-level use cases to illustrate the need for a Basic Profile for Linked Data, both for data consumption and for the creation and modification of data. The W3C maintains a repository of Semantic Web Case Studies and Use Cases [SEMWEB-UC]. Some of that material may apply to the Use Cases for the Basic Profile for Linked Data and will be called out in the following sections where appropriate.
Many of us have multiple email accounts that include information about the people and organizations we interact with – names, email addresses, telephone numbers, instant messenger identities and so on. When someone’s email address or telephone number changes (or they acquire a new one), our lives would be much simpler if we could update that information in one spot and all copies of it would automatically be updated. In other words, those copies would all be linked to some definition of “the contact.” There might also be good reasons (like off-line email addressing) to maintain a local copy of the contact, but ideally any copies would still be linked to some central “master.”
Agreeing on a format for “the contact” is not enough, however. Even if all our email providers agreed on the format of a contact, we would still need to use each provider’s custom interface to update or replace the provider’s copy, or we would have to agree on a way for each email provider to link to the “master”. If we look outside our own personal interests, it would be even more useful if the person or organization exposed their own contact information so we could link to it.
What would work in either case is a common understanding of the resource, a few formats needed, and access guidance for this resources. This would support how to acquire a link to a contact, and how to use those links to interact with a contact (including reading, updating, and deleting it), as well as how to easily create a new contact and add it to my contacts and when deleting a contact, how it would be removed from my list of contacts. It would also be good to be able to add some application-specific data about my contacts that the original design didn’t consider. Ideally we’d like to eliminate multiple copies of contacts, there would be additional valuable information about my contacts that may be stored on separate servers and need a simple way to link this information back to the contacts. Regardless of whether a contact collection is my own, shared by an organization, or all contacts known to an email provider (or to a single email account at an email provider), it would be nice if they all worked pretty much the same way.
In our daily lives, we deal with many different organizations in many different relationships, and they each have data about us. However, it is unlikely that any one organization has all the information about us. Each of them typically gives us access to the information (at least some of it), many through websites where we are uniquely identified by some string – an account number, user ID, and so on. We have to use their applications to interact with the data about us, however, and we have to use their identifier(s) for us. If we want to build any semblance of a holistic picture of ourselves (more accurately, collect all the data about us that they externalize), we as humans must use their custom applications to find the data, copy it, and organize it to suit our needs.
Would it not be simpler if at least the Web-addressable portion of that data could be linked to consistently, so that instead of maintaining various identifiers in different formats and instead of having to manually supply those identifiers to each one’s corresponding custom application, we could essentially build a set of bookmarks to it all? When we want to examine or change their contents, would it not be simpler if there were a single consistent application interface that they all supported? Of course it would.
Our set of links would probably be a simple collection. The information held by any single organization might be a mix of simple data and collections of other data, for example, a bank account balance and a collection of historical transactions. Our bank might easily have a collection of accounts for each of its collection of customers.
System and software development tools typically come from a diverse set of vendors and are built on various architectures and technologies. These tools are purpose built to meet the needs for a specific domain scenario (modeling, design, requirements and so on.) Often tool vendors view integrations with other tools as a necessary evil rather than providing additional value to their end-users. Even more of an afterthought is how these tools’ data -- such as people, projects, customer-reported problems and needs -- integrate and relate to corporate and external applications that manage data such as customers, business priorities and market trends. The problem can be isolated by standardizing on a small set of tools or a set of tools from a single vendor, but this rarely occurs and if does it usually does so only within small organizations. As these organizations grow both in size and complexity, they have needs to work with outsourced development and diverse internal other organizations with their own set of tools and processes. There is a need for better support of more complete business processes (system and software development processes) that span the roles, tasks, and data addressed by multiple tools. This demand has existed for many years, and the tools vendor industry has tried several different architectural approaches to address the problem. Here are a few:
It is fair to say that although each of those approaches has its adherents and can point to some successes, none of them is wholly satisfactory. The use of Linked Data as an application integration technology has a strong appeal. [OSLC]
The W3C Library Linked Data working group has a number of use cases cited in their Use Case Report. [LLD-UC] These referenced use cases focus on the need to extract and correlate library data from disparate sources. Variants of these use cases that can provide consistent formats, as well as ways to improve or update the data, would enable simplified methods for both efficiently sharing this data as well as producing incremental updates without the need for repeated full extractions and import of data.
Across various cities, towns, counties, and various municipalities there is a growing number of services managed and run by municipalities that produce and consume a vast amount of information. This information is used to help monitor services, predict problems, and handle logistics. In order to effectively and efficiently collect, produce, and analyze all this data, a fundamental set of loosely coupled standard data sources are needed. A simple, low-cost way to expose data from the diverse set of monitored services is needed, one that can easily integrate into the municipalities of other systems that inspect and analyze the data. All these services have links and dependencies on other data and services, so having a simple and scalable linking model is key.
For physicians to analyze, diagnose, and propose treatment for patients requires a vast amount of complex, changing and growing knowledge. This knowledge needs to come from a number of sources, including physicians’ own subject knowledge, consultation with their network of other healthcare professionals, public health sources, food and drug regulators, and other repositories of medical research and recommendations.
To diagnose a patient’s condition requires current data on the patient’s medications and medical history. In addition, recent pharmaceutical advisories about these medications are linked into the patient’s data. If the patient experiences adverse affects from medications, these physicians need to publish information about this to an appropriate regulatory source. Other medical professionals require access to both validated and emerging effects of the medication. Similarly, if there are geographical patterns around outbreaks that allow both the awareness of new symptoms and treatments, this information needs to quickly reach a very distributed and diverse set of medical information systems. Also, reporting back to these regulatory agencies regarding new occurrences of an outbreak, including additional details of symptoms and causes, is critical in producing the most effective treatment for future incidents.
Additional Contributors:
Arthur Ryman (IBM), Ian Green (IBM), Barbara McKee (IBM), Sally Hehir (IBM)