Monday, June 11, 2007

Metadata and the RMS

The Business Support team (of which I am a member) has been working on the last few aspects of a project to introduce a workflow management system that we refer to as the Research Management System (RMS). When I started looking at how we might apply a metadata schema to our publications and how we might store that metadata (database vs. embedded), the RMS seemed like the most sensible way of capturing the metadata and certainly presented a way of storing it as well (though this isn’t my preferred option – more on that later).

Creating a schema

I looked around at other organisations and tried to find a schema that we might adopt but they were either too tailored to their current applications or were too general. My conclusion was that it was best to create something that could be adopted by the industry as a whole but that would certainly meet R&D’s needs. I have already written a bit about this work but as a reminder, I basically assembled a series of elements from the Dublin Core and the e-Government Metadata Standard and then created a few that were specific either to the industry (e.g. Asset Type) or to R&D (e.g. Research Objective). For as many elements as possible, I have used existing, internationally-recognised encoding schemes (e.g. W3C’s Date-Time format) and for the R&D-specific ones, I have used schemes developed through a consultative process with our Heads of Research Section and a selection of Research Managers (e.g. Audience Group). I have created the Application Profile though I have yet to create the XML definitions and publish them on our website, though this is ultimately my objective.

Applying the metadata

The RMS is due to go live next week and the metadata schema, along with the controlled vocabularies that I have had to create to support some of the R&D- and industry-specific fields, will be put to the test. Without going into too much detail here, we have worked as many of the elements into the process flow as possible so that as our Research Managers work through a project and record it in the RMS, some of the data that they enter is held in metadata fields for later application to any publications that emerge from the research. Clearly, not all of the elements can be populated this way (title for example can only be completed once the publication has been completed) but many can (such as research topic).

At the end of the research process, there is a knowledge management stage where the method for publishing, promoting and evaluating the publication is captured and the remaining metadata elements are completed. Some of these are set to defaults that will almost certainly not need to change, such as the publisher (that is pretty much always going to be our organisation). It is all looking like it should work but of course there is really only one way to find out for sure.

"Subjectlessness"

My one disappointment in this project was the inability to sort out a controlled vocabulary for subject element in time for the launch of the RMS. The problem I encountered is that existing controlled vocabularies are either too granular (for example, SELCAT (http://www.levelcrossing.net/) have a highly detailed thesaurus on the topic of Level Crossings, just one of the many areas of research that we pursue) or insufficiently granular (the Integrated Public Sector Vocabulary, or IPSV, directs users to categorise anything having to do with the railways, from electrification to passenger crowding, under Rail transport). Developing one of our own is just too big a task to try to sort out in only a few weeks (at the same time as all of the other projects that I am working on) so for the time being, the field in the RMS will be populated with: [IPSV] Rail transport. Although this is largely useless to us, it helps us tie in our work with that of the Department for Transport and other government bodies applying the IPSV.

Towards completion and "subjectfulness"?

The next stage of this project for me will have three aspects:
  1. The first is to refine the schema and existing vocabularies (Do we have all of the elements that we need? Are there any that have been included that just aren’t necessary? Are the controlled vocabularies: sufficiently granular? too granular? incomplete?)
  2. The second will be to create a final version of the application profile and to publish the element and encoding scheme definitions that I have had to create to accommodate some of the metadata that is specific to our requirements.
  3. The third and final aspect is to resolve this issue of subject headings. I am attending a CILIP workshop tomorrow called “How to construct a thesaurus” which I am hoping will give me some ideas and strategies for solving this problem. I would like to use existing vocabularies (to build in as much interoperability as possible) where possible. Maybe the solution will be to use things like the SELCAT vocabulary and the rolling stock manufacturers’ parts vocabularies but to only go down to a particular level within them (not sure what the implications of doing so would be, yet).
Metadata in the database vs. metadata in the document

I have already written about this little bug-bear of mine but it is still an issue for me. At the moment, I am going along with a database-held metadata solution but this is largely due to the presence of this option and the distinct lack of any alternatives. I think that once the metadata schema is relatively set and the encoding schemes in use, that I will turn my attention to resolving the issue of how we embed the metadata into the documents themselves...


No comments: