ENRICH Project and TEI P5

Article on TEI P5 development and use in frame of ENRICH project by James Cummings and Lou Burnard, Oxford University

There are several quite distinct traditions for the description of primary sources, in particular manuscripts. Unlike books, such sources are unique objects, often of great cultural value, which are typically catalogued locally by the many different institutions holding them. This contributes further to the diversity of approaches taken. Great institutions are able to produce richly detailed, highly scholarly descriptions, while smaller or less well-resourced institutions cannot hope to do so. But with the widespread increase in the practice of digitization of such primary sources, there is increasing pressure to make their cataloguing uniform, so as to facilitate cross-site searching, and the sharing of information about resources held at many different institutions. The World Wide Web makes it comparatively easy to share representations of the manuscripts held in all the collections of the world, but differing cataloguing practices, different views of what is essential, and different levels of resource, all make it difficult to share any but the most basic of of such materials.


ENRICH Project aims to create seamless access to information about the vast collections of manuscripts and incunables distributed across major European libraries. To make this possible, it offers both a specific software platform, the Manuscriptorium, developed for the National Library of the Czech Republic over many years by AIP-Beroun, and an open conceptual model for the description of the resources managed by that system. The existence of this model, which is expressed using international standards such as XML and TEI, makes it possible to integrate existing data sources from many different traditions, and also to convert the resulting data into many different formats. As a simple example, one collection may hold very detailed information about each item's paleography as well as its intellectual contents and curation history, while another records simply title, shelfmark, and date. A user of the Manuscriptorium can extract information which is common to both collections by performing meaningful searches against the structured data it holds. The results of such searches can be displayed, or returned automatically in standard formats, so that they may be integrated into existing library automation systems.

History

The ENRICH project builds upon a long tradition of expertise and over twenty years of experience in the elaboration of encoding systems. It has adopted as its basis the Recommendations of the Text Encoding Initiative (TEI), now in their fifth edition. These recommendations, TEI P5: Guidelines for Electronic Text Encoding and Interchange cover many topics and constitute a de facto standard format for digital texts in scholarly research projects.

The TEI was first established in 1987 initially as a research project to develop, maintain, and promulgate hardware and software independent methods for encoding humanities and cultural heritage data in electronic form. Since 2000 the TEI has been incorporated as an international membership consortium to further these aims. Several North American and European libraries are members of the Consortium, and its Recommendations are at the heart of many current digital library initiatives and projects.

Currently the TEI P5 Guidelines (over 1300 pages in its printed form) contains 23 chapters, not including front-matter and appendices, and defines over 500 XML elements to distinguish or discuss textual phenomena and metadata. The ENRICH project's format specification is based primarily on the chapter for Manuscript Description, but also benefits from incorporation of the recommendations from many of the other chapters, for example relating to the description of digital images, of non-Unicode characters, and of paleographic or transcriptional data.

The idea of encoding manuscript descriptions in digital form goes back many years; with the explosion of the World Wide Web in the 1990s came an increased recognition of the need for agreed metadata standards as a prerequisite for any sort of interoperability or union catalogue of manuscript descriptions. In 1996 the Mellon foundation funded an exploratory collaboration between three pioneering American projects in this area. In November 1996, at Studley Priory near Oxford, a workshop was held that brought together representatives of many major manuscript holding institutions in Europe and experts in metadata standards, such as MARC, TEI and Dublin Core. The consensus developed at that meeting was continued by an EU-funded project called MASTER (Manuscript Access through Standards for Electronic Records), in which several leading European and American institutions worked together to formulate an XML-based standard for the description of manuscript materials. This standard was then further elaborated by a group of TEI Experts before its incorporation into the current TEI P5 chapter on Manuscript Description.

The ENRICH Specification

The TEI is a very general scheme, permitting maximal flexibility in records conforming to it. It also covers very many aspects of bibliographic, linguistic, historical, literary and other textual materials, not simply manuscript. For these reasons, any application of it involves a tailoring or customization. Such customizations are expressed using a special XML vocabulary, also standardized by the TEI. This specification has several uses. For example, it documents which aspects of the TEI are being used, and declares any additional constraints required for the project, for example to determine the legal values of certain attributes, or to remove unwanted elements. The specification is also used to generate project-specific reference documentation in several different languages. And finally, it is used to generate formal schemas which standard XML software can use to validate document instances according to the needs of the project.

In defining the ENRICH specification, care was taken to maintain full compatibility with the published TEI P5 standard. A document prepared in conformance to the ENRICH schema is therefore also a TEI-conformant document, and can be used by any TEI-aware software. Furthermore, because of its use of the TEI, the ENRICH specification provides a complete suite of encoding possibilities, covering not simply the cataloguing and description of manuscripts or early printed books, but also the encoding of a digital edition in which metadata, digital image, transcribed text, edited text, and editorial annotation are all integrated in a standard framework.

Conclusion

The ENRICH project illustrates how XML and web-based technologies promote greater access to the
rich cultural heritage of European institutions, without compromising their inherent complexity. It builds upon many years of expertise in the development of technical and operational standards for metadata and encoding. Its use of TEI XML in particular ensures that its outputs remain interoperable with new and developing systems worldwide. The partners in the ENRICH project are also well-placed to influence further development of the TEI standard, as they have already done in the course of developing the ENRICH specification.

For more information on the TEI P5 Guidelines see: http://www.tei-c.org/Guidelines/P5/.