Mapping Versus Standard Terms
Introduction
Different tools will use different terms for the same concepts. For example, when estimating effort by labor category, one tool may use the term Developer and another the term Programmer to describe a role that is responsible for writing source code. A project management tool may compare these estimates. How can it determine that Developer and Programmer are synonyms? There are two approaches, namely to define a mapping between the terms or to only use standard terms in the interface.
These approaches will be discussed in a faux debate between
SteveAbrams and
LeeFischman during the
2009-08-14 telecon. Their positions statements are as follows:
Interoperability and Vocabularies
Steve Abrams
(Note that this does not necessarily represent the views of IBM Corporation. This is a somewhat polarized view of a position to help make a mock debate more entertaining.)
Let’s say I have two systems that each have a notion of a motorized vehicle. System C calls it a car; system A calls it an automobile. System C’s cars have motors and wheels, track the distance traveled in miles, and the fuel consumed in gallons; system A’s automobiles have engines, wheels, and tires, track the distance traveled in kilometers, and the fuel consumed in liters. Now let’s say that I have a third system Q. It wants to tally up the number of autos (not cars or automobiles, of course) managed in systems like A and C, and wants to report the average distance traveled by each auto in furlongs. Your mission, should you choose to accept it, is to enable Q to draw on information in A and C to produce an accurate result.
We could teach system Q the terminology used by systems A and C. This would indeed solve the problem. However, this is somewhat fragile – when another system (D) comes along with its own vocabulary, we’d have teach Q a new lesson. Further, if we continue to apply this strategy, what will we do when system R comes along with its own notion of a motorized vehicle and the mission of computing average gas mileage in rods [an Imperial unit of length equal to 5.5 yards] per cubic yard? The only option would be to teach R all of the vocabularies of all of the systems. We’ll have now written point-to-point translations to allow two systems (Q and R) to draw on data from three systems (A, C, and D). Following this approach, whenever you have
m systems that want to understand the data in
n systems, you’ll need
m x
n pair-wise translations. As the number of systems that manage car data and provide car metrics grow, this gets unwieldy. In fact, it may get so complicated that a person with a great idea about a new car metric is, essentially, blocked from entering the ecosystem. He would have to write too many translators to understand the data in all of the car management systems. As a result, progress in the industry slows down. (As a practical matter, the distinction between producers and consumers of the data is not helpful – so instead of talking about
m producers taking with
n consumers requiring
m x
n translations, we’ll talk about
n systems that want to interoperate, requiring
n2 translations.)
One can imagine a configuration tool which simplifies the task of writing each of these point integrations – for example, it might display the schema of systems A, C, and D, and allow a user to make declarations about the relationships to the terms used by system Q. However, make no mistake – this is still a tedious and error-prone task. Among other things, the user of that tool needs to be fluent in the terminology used by all four systems in order to make sound decisions.
What alternatives do we have? We could build a gigantic dictionary. Anybody with a new way of discussing cars would need to define his terms in the dictionary (semantic web folks often call this an ontology). So system Q could consult the dictionary and find out that system A’s automobiles and system C’s cars are equivalent to Q’s autos. It could also learn the relationships among the various units of measures. Now, a new player in the game only has to (in principle) understand the dictionary representation and explain its concepts in the dictionary, and it’s good to go. This can get pretty complicated, though – when Q draws on data from A, C, and D, it has to consult the dictionary and translate the messages it gets into its own terms. Q is relying on its interpretation of the data from A, C, and D based on the dictionary.
Another alternative is to get a group of people together and agree on terminology. We don’t have to agree on the terminology used within the systems for managing their data – perhaps there are critically important user-relevant or regulatory reasons why C must use rods, and Q must use furlongs. Maybe it is crucial that A understand the relationships among tires and wheels, while C must avoid the complexity of considering them anything but a coherent system. However, we can agree that when we converse with each other, we speak a common language.
That is one thing that an effort such as OSLC is trying to do – bring a group of people together to decide what the common language is for discussing certain concepts. For this to work, we need to agree on three things:
- the nature of the conversations that we need to have – what we usually call the scenarios;
- what concepts we must share in order to have these conversations; and
- how should we represent those concepts in our resources (the specifications).
Once we have done this, any system wishing to participate in these conversations needs to produce (and likely accept) representations of its data that conform to our specifications. In other words, each of the
n systems that want to interoperate have one piece of work to do: translate data to and from the ‘lingua franca’ defined by the group. This is generally much simpler and less error prone than describing data in some kind of dictionary and leaving the interpretation up to the consumer.
What about critical data that the systems need to exchange which goes outside of the bounds of the defined vocabulary? There is no such thing as a free lunch. For this data, we’re back to our initial set of choices – mappings, ontologies, or shared vocabularies. Presumably, if the conversation that requires this nonstandard data were very common, the standard would accommodate the data. Therefore, not all
n systems would care to share this data, so we don’t have the full
n2 problem. If enough systems want to have this conversation, this data would make its way into a revised version of a common vocabulary.
What about the case where a system is highly user-customizable? Isn’t it impossible to standardize an industry-wide vocabulary if a vendor can’t even get its customers to use a standard vocabulary? Well, no, it isn’t. Firstly, if there is a standard vocabulary, customers can be encouraged to adhere to it as much as possible. This is a powerful force, and has been one of the keys to interoperability in our industry for years. And when that is impossible, the customer should go ahead and customize to their heart’s content. But they must also say something about the relationship between their customized vocabulary and the industry standard vocabulary, if they want their customized tool to participate in these collaborations. The tool vendor can provide a configurable ‘façade’ which surfaces its user's customized data, according to a user-specified mapping, in terms of the industry standard vocabulary. The user only needs to understand two vocabularies for this to work -- the industry standard vocabulary, and his own customized vocabulary.
What about the case where there are conflicting or overlapping standards? Well, the nice thing about standards is that there are so many to choose from. Kidding aside, each standard should (in general) have some way that a client specifies that it is having a conversation using that standard's vocabulary. In a REST API, this could happen (for example) by specifying (i.e. through Accept: headers) what representation the client wants. If a service wants to participate in multiple standards, it needs to provide appropriate façades, triggered through the standards’ specified mechanisms.
As a practical matter, if one of the system vendors has enough clout, it may tell everyone that it’s not wasting its time writing any translations – anyone wants to interoperate with that vendor's products must use its vocabulary. If the world is unfortunate enough for this to be the vendor of system R, we’ll be stuck referring to gas mileage as rods per cubic yard forever. This is why it is important for groups of people to get together and agree on these standards – to ensure that a single skewed point of view does not dominate the industry simply due to market strength.
Therefore, we feel that OSLC should be a vehicle for driving industry consensus on standard vocabularies used for common ALM interoperability scenarios. It won't eliminate the need for user-customizable tools, and it won't solve all of the integration problems, but it will dramatically simplify what's needed to get tools to interoperate.
I basically agree with Steve. However, I believe that terms for software labor categories (designers, programmers, etc.) and activities (inception, elaboration, etc.) may vary dramatically depending on the development philosophy. We need to consider multiple 'lingua francas' and an internal mapping between them. A contributing tool could then send information that conforms to one of the standards, and an internal mapping could handle the translation.
Comments
Add your comments here:
First, I believe this should be a "cross-workgroup discussion"; the general idea--where is a concept configured--applies to all OSLC interfaces.
That there should be a standard for a particular concept that all tools within an enterprise that use the concept map to, rather than each pair of tools mapping to each other, is a great idea. But that standard may be at multiple levels, depending on the content. In some cases, the OSLC interface may be able to define the standard. In other cases, the enterprise (i.e. a customer environment that implements a collection of tools) may define the standard for that enterprise to be used by all tools, and in yet other cases, it may be configurable at a lower level (project, for example, or template used for multiple projects). What level of configuration is needed is dependent on the concept. One generality we MAY be able to make: concepts defined by an enumerated list may be "standard concepts" (so OSLC can define the name of the concept) with configurable (at various levels) enumerations. This probably applies to "role"....see
use of role within the enterprise goes beyond estimation and project management
--
AndyBerner - 16 Aug 2009
Great comment -- we had the same kind of discussion within the OSLC CM group on the topic of "state." While we may not be able to standardize the states that a work item can go through, we may be able to standardize the fact that there are (eg) open and closed states. Tools might be able to define their own states, but they must be classified as open or closed. This lets tools have conversations about 'open' and 'closed' states, at least.
--
SteveAbrams - 17 Aug 2009
You have made a very strong argument for having a "controlled" common vocabulary that grows only when the need for the common vocabulary increases (e.g. number of providers and/or consumers increase). I agree with that idea and believe that it will certainly help reducing the complexity in many integrations. However, as you also indicated that the need for some type of mapping, user customization capability, and/or facade does not go away with a limited common vocabulary. Tool vendors will not stop supporting customization out of fear that it may not meet a customer's needs and hence a customer may choose to customize simply because they can or because they have to. I believe that a standard OSLC interface should also provide a mechanism so that a customer using tools that are "OSLC-compliant" service providers or consumers can support a customer's own organization standards (that are different from the OSLC standard common vocabulary) without having to configure the tools m X n times.
Take a customer that has various change management tools in use and they have various tools integrated into the various change management tools - SCM, Customer Relationship Management / Cusotmer Support, Requiment Mgmt, and Quality Mgmt tools. The customer wants to reduce their "Order n squared" (Big O Notation) configurations of the tools. Taking the example that you gave in your comment about tools grouping their states as "open" or "closed", let's assume that these "state groups" become part of the OSLC standard common vocabulary. The customer with multiple similar tools was able to bring about an organization-wide agreement on the standard that the "state groups" for the various CM tools must be configured to be "new", "approved", "open", "closed", and "deferred" since that would support their end-to-end business processes. An OSLC standard vocabulary won't help the customer since it doesn't match up with the customer's agreed upon standard. What if OSLC provides a standard way to define a "facade"/"mapping" and a standard way to create/update/read/delete the "facade"/"mapping". Then the customer could create a "facade"/"mapping" for each tool implementation (consumers and producers) to the five "state groups". This would bring down the number of configuration to "Order n".
BTW, enjoyed your analogy. Sorry that I didn't carry forward - not sure how I would have made my point.
--
SamitMehta - 19 Aug 2009
--
AndyBerner - 20 Aug 2009
Would it be necessary (or desirable) for the "target" of the facade mapping (the configured set to which each individual tool is mapped) to have a URL itself? This allows checking when the facade is created or updated for a particular tool: does the facade conform to the standard?
That also allows this idea to apply when the concept is cross-tool type (like "role" that applies to multiple different OSLC interfaces) as well as a concept that must be harmonized against all implementations of a single interface like Change Management State.
--
AndyBerner - 20 Aug 2009
It is clear when we can accomplish a scenario utilizing common vocabularies, we should. Though there are scenarios which make it unavoidable to provide metadata. Query and reporting scenarios are excellent candidates for providing common vocabularies, results can simple be consumed without schema. Some of the complexities of this approach arise with the variants of how many change management systems work today, as mentioned with the support of advanced scenarios where tools need to learn more about each other: state model and progression of states, mandatory values based on these transitions, synchronization tools that need to preserve integrity of change request without fascade/translation loss. Also, some advanced query building and reporting tools would like a standardize way to learn more about the extended shape to provide additional capability. The ability to provide this metadata in a consistent standard way, also provides interoperability benefits.
--
SteveSpeicher - 31 Aug 2009
It's tough to dislike the argument Abrams gives. I like it. It looks clearly "better". But that's a relative measurement. It doesn't tell us much about whether the approach is "good".
The metaphor is a two-edged sword. It does a nice job helping us understand the approach. It also does a nice job hiding some of the practical problems we'll encounter. Good metaphors are like that.
So in the interest of balance, I'm going to ditch the metaphor and surface a few specific problems.
First of all, we're going to want the common language to be well designed, but it will need the ability to accurately express translations of things that are poorly designed. For example, our product,
SourceGear? Fortress, has work item tracking. It is not very customizable, and most of its fields are fairly simple, so it SHOULD be easy to support OSLC interop. However, one field is probably going to be a big problem. Instead of a state field and a resolution field, we have one field which tries to handle both. I doubt we're the only product on earth with this regrettable design choice, but let's suppose we are. The CM group is not going to be eager to include quirky stuff like this if it makes our standard vocabulary feel all screwed up.
SourceGear? is a tiny company which could probably be ignored, and my membership on this group need not change that. But similar examples are going to show up with circumstances that are much tougher to ignore.
So, I claim that designing the lingua franca will be nearly impossible unless we resign ourselves to a certain amount of tolerance for ugliness.
Next problem: Some things are going to be essentially inexpressible. Lots of work item tracking systems have customization capabilities which go beyond declarative things into procedural things. State changes trigger other changes. Constraints can be defined. The value of one field can determine the allowable values for another. Etc.
It's not just concepts. It's actions.
And every system defines these actions differently. Any common language which tries to solve these aspects of the problem will essentially need to be a scripting language which can provide exact equivalence to all the world's other scripting languages. I'm exaggerating, but you get the idea.
Which brings me to the observation I made last week: An approach based on a common language is a heckuva lot easier if we don't need to do any roundtrips. In almost all situations, the common language will be a subset of the tool's native language. So, I can translate my native language into the common language for the purpose of getting the main point across to somebody who only speaks "OSLC CM Common". But when I have to translate the common language into my native tongue, things are going to get lost. This is especially bad if a single record is making a roundtrip, since I started with rich detail, then I lost it when I translated it to the common language, and then I didn't get it back.
Next problem: We're counting on cooperation from vendors, and in some of the most important cases, we are not likely to get it.
Microsoft's Team Foundation Server might end up being one of the vendors we would really like to support our common language. I can't speak for them, but I rather doubt they'll find the idea super appealing.
Technologically, they'll be one of the toughest cases anyway. All work item fields in TFS are "custom". The customization language sends me groping for just the right adjective, but none of the ones I need are terribly positive. It's XML based, but it has a tendency to mix up verbs and nouns and adjectives in very counter-intuitive ways. What I said above about embracing ugliness would apply to TFS work item tracking in at least ten ways.
So, TFS and OSLC are a difficult marriage, for multiple reasons. That doesn't mean OSLC could not be a successful effort without Microsoft's cooperation. I just figure that speaking of vendor cooperation will sometimes require us to speak more specifically about the oddball cases.
Finally, let me acknowledge that this note might be interpreted to say:
This is impossible. Let's give up.
But that's not what I am trying to say. Rather, what I mean is:
A complete and perfect solution is impossible, so let's carefully define which parts we will solve, and let's be as aware as we can be of the ones we will not solve.
--
EricSink - 16 Sep 2009