Corpora of Written Language

Collaborations of the Project

External Collaborations

Within the framework of the EU-project CLARIN and the project CLARIN-D, funded by both the  BMBF (Federal Ministry of Education and Research) and MWK-BW (Ministry of Science, Research and Arts), the project “Development and Maintenance of Corpora of Contemporary Written Language” collaborates with the following partners to create a research infrastructure (FI) for linguistics:

Main areas within the collaboration framework are:

  • creating explicit research infrastructure centers and securing their sustainability
  • canonisation and standardisation of formats and interfaces
  • best practice guidelines for handling resources that are not free from third party rights as well as corresponding licensing models
  • facilitating persistent identifiability (and citability) of electronic resources - a corresponding ISO standardisation process has been initiated together with the MPI Nijmegen and the Department of Linguistics at the University of Tübingen

Collaboration within the IDS

  • Standing in close connection with the project Methods of Corpus Analysis and Corpus Classification is vital with regard to general methodological and epistemological maxims in order to ensure as close a cycle as possible from the generation of primary research data to their linguistic analysis.
  • Apart from that, the project also collaborates closely with the Central Data Processing Services department and the COSMAS II project.
  • Eric Seubert is significantly involved in the development of the TEI-based text model of the IDS. Moreover, he develops programs for XML conversion and for quality assurance of corpora. He is also responsible for the supervision of student assistants correcting and tagging source texts.
  • Peter Harders analyses data processing formats of acquired raw data and develops programs for their conversion.
  • The project cooperates with the project Research and Teaching Corpus (FOLK) in the  Archive for spoken German  (department of pragmatics) in particular on issues such as the clarification of legal and ethical questions concerning the collection, processing and provision of linguistic data
  • The project also advises the project Historical Text Corpus (department of lexics).
  • Doris al-Wadi is advising the project (after many years of being a staff member) on issues concerning the IDS text model, especially the corpus text bibliography.