Corpora of Written Language
Collaborations of the Project
Within the framework of the EU-project CLARIN
and the project CLARIN-D
, funded by both the BMBF (Federal Ministry of Education and Research)
and MWK-BW (Ministry of Science, Research and Arts)
, the project “Development and Maintenance of Corpora of Contemporary Written Language” collaborates with the following partners to create a research infrastructure (FI) for linguistics:
Main areas within the collaboration framework are:
- creating explicit research infrastructure centers and securing their sustainability
- canonisation and standardisation of formats and interfaces
- best practice guidelines for handling resources that are not free from third party rights as well as corresponding licensing models
- facilitating persistent identifiability (and citability) of electronic resources - a corresponding ISO standardisation process has been initiated together with the MPI Nijmegen and the Department of Linguistics at the University of Tübingen
Collaboration within the IDS
- Standing in close connection with the project <link kl projekte methoden>Methods of Corpus Analysis and Corpus Development is vital with regard to general methodological and epistemological maxims in order to ensure as close a cycle as possible from the generation of primary research data to their linguistic analysis.
- Apart from that, the project also collaborates closely with the <link zdv>Central Data Processing Services department and the COSMAS II project.
- Eric Seubert is significantly involved in the development of the <link kl projekte korpora textmodell.html>TEI-based text model of the IDS. Moreover, he develops programs for XML conversion and for quality assurance of corpora. He is also responsible for the supervision of student assistants correcting and tagging source texts.
- Peter Harders analyses data processing formats of acquired raw data and develops programs for their conversion.
- The project cooperates with the project Research and Teaching Corpus (FOLK) in the Archive for spoken German (department of pragmatics) in particular on issues such as the clarification of legal and ethical questions concerning the collection, processing and provision of linguistic data
- The project also advises the project <link lexik historischeskorpus>Historical Text Corpus (department of lexics).
- Doris al-Wadi is advising the project (after many years of being a<link kl projekte korpora ehemalige.html> staff member) on issues concerning the<link kl projekte korpora textmodell.html> IDS text model, especially the corpus text bibliography.