Friday, October 13, 2006

DITA and the Dublin Core

One of the challenges that we are going to face in this metadata project is the mapping of the DITA schema to the document metadata schema that we choose to implement (let’s call it Industry Schema).

We have a Technical Writer who is eager to introduce a content reuse policy (something very much in line with my own objectives) and would like to use a metadata architecture called the Darwin Information Typing Architecture (DITA) to do it. It’s basically an XML-based architecture that helps organisations create technical publications without having to recreate content. If you want to more, there is a pretty good Wikipedia entry on DITA and you could check out the DITA section of the Oasis website, the organisation now responsible for its maintenance (it started out as an IBM architecture).

My problem with using only this architecture is that when it comes time to create a composite document, how you know which bits of metadata should be used to describe the document? For example, it is entirely possible that each fragment has a different Subject and Creator. When assigning these elements of metadata to the composite document, I wouldn’t use any of the Creators as the Creator for the document nor would I use all of them – they would now be considered Contributors (at least in the Dublin Core Metadata Standard) and the Creator would be (I suppose) the organisation. As for the Subject, clearly a combination of individual components about certain concepts, when pulled together, do not make up a document that is about all of those separate concepts. So we need a document level metadata schema – enter Industry Schema.

I’m thinking that by creating a document that maps the Industry Schema to the DITA Document Type Definition (DTD) that we use, we can populate the DITA architecture based on the metadata associated with the original document.

When it comes time to catalogue a composite document, we will have to do so from scratch. This isn’t the end of the world; we’d have to do so if we weren’t using DITA to create composite documents so it isn’t like we have to do more. I just can’t help but think that the DITA metadata could be used to inform the Industry Schema metadata that we choose to assign. Is there a tool that we could use?

There is no doubt an article in here somewhere on the use and application of metadata at different levels of content granularity (document versus segment in this case). I would really like to speak to someone who has done this before…

