Corpora of Written Language
Collaborations of the Project
Within the framework of the EU-project CLARIN and the project CLARIN-D, funded by both the BMBF (Federal Ministry of Education and Research) and MWK-BW (Ministry of Science, Research and Arts), the project “Development and Maintenance of Corpora of Contemporary Written Language” collaborates with the following partners to create a research infrastructure (FI) for linguistics:
- Max Planck Institute for Psycholinguistics, Nijmegen
- Department of Linguistics, Computational Linguistics at the University of Tübingen
- Berlin-Brandenburg Academy of Sciences
- ASV - Institute of Computer Science at the University of Leipzig
- Institute for Natural Language Processing at the University of Stuttgart
- BAS - Bavarian Archive for Speech Signals, Munich
- Hamburg Center for Language Corpora
- Applied Linguistics, Translation and Interpreting, Saarland University
Main areas within the collaboration framework are:
- creating explicit research infrastructure centers and securing their sustainability
- canonisation and standardisation of formats and interfaces
- best practice guidelines for handling resources that are not free from third party rights as well as corresponding licensing models
- facilitating persistent identifiability (and citability) of electronic resources - a corresponding ISO standardisation process has been initiated together with the MPI Nijmegen and the Department of Linguistics at the University of Tübingen
Collaboration within the IDS
- Standing in close connection with the project Methods of Corpus Analysis and Corpus Classification is vital with regard to general methodological and epistemological maxims in order to ensure as close a cycle as possible from the generation of primary research data to their linguistic analysis.
- Apart from that, the project also collaborates closely with the Central Data Processing Services department and the COSMAS II project.
- Eric Seubert is significantly involved in the development of the TEI-based text model of the IDS. Moreover, he develops programs for XML conversion and for quality assurance of corpora. He is also responsible for the supervision of student assistants correcting and tagging source texts.
- Peter Harders analyses data processing formats of acquired raw data and develops programs for their conversion.
- The project cooperates with the project Research and Teaching Corpus (FOLK) in the Archive for spoken German (department of pragmatics) in particular on issues such as the clarification of legal and ethical questions concerning the collection, processing and provision of linguistic data
- The project also advises the project Historical Text Corpus (department of lexics).
- Doris al-Wadi is advising the project (after many years of being a staff member) on issues concerning the IDS text model, especially the corpus text bibliography.