Neuer Aufsatz zur 'Construction and dissemination' des FOLK-Korpus

von Thomas Schmidt

Abstract: This paper is about the workflow for construction and dissemination of FOLK (Forschungs- und Lehrkorpus Gesprochenes Deutsch – Research and Teaching Corpus of Spoken German), a large corpus of authentic spoken interaction data, recorded on audio and video. Section 2 describes in detail the tools used in the individual steps of transcription, anonymization, orthographic normalization, lemmatization and POS tagging of the data, as well as some utilities used for corpus management. Section 3 deals with the DGD (Datenbank für Gesprochenes Deutsch - Database of Spoken German) as a tool for distributing completed data sets and making them available for qualitative and quantitative analysis. In section 4, some plans for further development are sketched.

  • Schmidt, Thomas (2017): Construction and Dissemination of a Corpus of Spoken Interaction - Tools and Workflows in the FOLK project. In: Corpus Linguistic Software Tools, Journal for Language Technology and Computational Linguistics (JLCL 31/1), by Kupietz, Marc & Geyken, Alexander (Hrsg.), S. 127-154. PDF