WP6: Multilingual and user friendly sophisticated access

Summary

  • Work package number: 6
  • Start date: Month 3
  • End date: Month 24
  • Work package title: Multilingual and user friendly sophisticated access

Objectives

  • This work package aims to the integration of a multilingual module via a user friendly sophisticated access: multilingual search application, multilingual forums, and multilingual ontology editor.
  • Based on SYSTRAN’s machine translation technology, this module will provide also terminology extraction and machine translation customization tools for the construction and retrieval of personalised metadata within the aim to create new multilingual digital documents and multilingual ontologies in Czech, Polish, Spanish, Portuguese, German, Italian, English, French, Danish, Hungarian, Russian and Serbo-Croatian.

Description of work

  • Work package leader: SYS
  • Task 6.1:Multilingual access development (m0-m12)
    • Task leader: SYS
    • Participants in Task: : NKP, AIP
    • The project will provide two types of multilingual access:
      • via the API integration in the data retrieval interface associated or independent of a multilingual search
      • a dedicated translation interface where ENRICH expert users can fine-tune dynamically the machine translation tools thanks to adapted linguistic tools for terminology extraction and translation post-editing and customization. The parameters and resources constructed will be automatically taken into account by the API in the access presented above
  • Task 6.2:Translation Stylesheet design and use (m3-m24)
    • Task leader: SYS
    • Participants in Task: NKP, AIP, CCP, KU, BNCF, UZK, DSP, BNE, BUTE, ULW
    • Activities:
      • analysis of heterogeneity of metadata regarding machine translation.
      • implementation of STS exploiting metadata information.
      • cross-language validation of STS, optimization of translation parameters.
    • As far as the Metadata translation module implementation is concerned SYSTRAN will provide a fully customized Translation Stylesheet.
    • SYSTRAN Translation Stylesheets (STS) use XSLT to drive and control the machine translation of XML documents (native XML document formats or XML representations — such as XLIFF — of other kinds of document formats). STS will provide a simple way to indicate which part of the document text is to be translated, and will enable the fine-tuning of translation, especially by using the structure of the document to help disambiguate natural language semantics and determine proper context. Thanks to STS machine translation is considered as part of the authoring and publishing process: source documents can be annotated with natural language mark-up produced by the author, a mark-up which will be processed by STS to improve the quality of translation, the gateway to the automatic publishing of a multilingual website from a monolingual (annotated) source. The mechanism is implemented through XSLT extension functions for consulting and for setting linguistics options in the translation engine. SYSTRAN will deliver this xslt file in order to fine-tune the system according to the ENRICH xml data elements.
  • Task 6.3:VICODI implementation (m6-m24)
    • Task leader: SYS
    • Participants in Task: NKP, AIP
    • Activities:
      • definition and homogenization of initial ontology applicable for this project
      • specification of user-friendly web-interface for visualization of multilingual ontology - special interface for modification
      • implementation of the web-interface
    • Based on previous experience in the visualization and contextualization of digital content (IST project VICODI) SYSTRAN technology has been implemented for the construction of multilingual ontologies. The Research Center for Information Technologies (FZI) constructed multilingual ontologies available under GNU Free Documentation License (FDL) thanks to the EU-funded IST project Vicodi (http://www.vicodi.org/ ). Enrich will implement and use VICODI ontologies for the contextualization of the digital content.

(Inter-) Dependencies, milestones and expected result

  • Based on the ENRICH Corpus Analysis (Month 6) SYSTRAN will build ENRICH Translation Stylesheet. After a quality assessment procedure and based on the evaluation results (Month 20) SYSTRAN will proceed to the finalisation of ENRICH Translation Stylesheet (Month 24).
  • The WP depends mainly on the results of WP3, but is also interrelated with WP4.
  • The feedback is necessary from WP7.

Deliverables