The European Reference Corpus EuReCo
One of the main ideas of the open EuReCo initiative, founded in 2012, is to join national, reference and other corpora to comparable corpora just virtually: All corpora stay at their hosting institutions to avoid legal issues and to automatically benefit from local maintenance and curation. They are joint virtually by using the same FOSS analysis platform KorAP which allows for dynamically defining virtual (comparable) subcorpora, arbitrary annotation layers, data size, and an extensible set of query languages.
Pilot projects
Corpora
- German Reference Corpus DeReKo
- Reference Corpus of Contemporary Romanian Language CoRoLa
- drukola.20180909.1b virtual DeReKo subcorpus comparable to CoRoLa with respect to topic domain (and publication date) composition
- Hungarian National Corpus HNC
- 1M sample of the National Corpus of Polish (NKJP1M-SGJP)
Tools
- CoRoLaVecs – distributional semantic and syntactic analysis for Romanian based on CoRoLa and DeReKoVecs
Conference Presentations
- CMLC-5-Workshop at Corpus Linguistics Conference 2017
- UCCTS 2018
- IDS-Jahrestagung 2020
- IVG-Kongress 2021
- International Corpus Linguistics Conference 2023
- International Contrastive Linguistics Conference (ICLC-10) 2023
- EuReCo Workshop at the CLARIN Annual Conference 2023
- 17th Workshop on Building and Using Comparable Corpora (BUCC 2024)
Publications
- Kupietz, Marc/Bański, Piotr/Diewald, Nils/Trawiński, Beata/Witt, Andreas (forthcoming): EuReCo: Not Building and Yet Using Federated Comparable Corpora for Cross-Linguistic Research. In: Proceedings of the BUCC Workshop within LREC 2024. Torino, Italy.
- Bański, Piotr/Diewald, Nils/Kupietz, Marc/Trawiński, Beata (2023): Applying the newly extended European reference corpus EuReCo. Pilot studies of light-verb constructions in German, Romanian, Hungarian and Polish. In: Trawiński, Beata/Kupietz, Marc/Proost, Kristel/Zinken, Jörg (eds.): Book of Abstracts of the 10th International Contrastive Linguistics Conference (ICLC-10). Mannheim: IDS-Verlag, S. 274–276. https://doi.org/10.14618/f8rt-m155.
- Trawiński, Beata/Kupietz, Marc (2021): Von monolingualen Korpora über Parallel- und Vergleichskorpora zum Europäischen Referenzkorpus EuReCo. In: Lobin, Henning/Witt, Andreas/Wöllstein, Angelika (Hrsg.): Deutsch in Europa. Sprachpolitisch, grammatisch, methodisch. Jahrbuch des Instituts für Deutsche Sprache 2020. (= Jahrbuch des Instituts für Deutsche Sprache 2020). Berlin/Boston: de Gruyter, 2021. S. 209-234. →IDS-Publikationsserver →Verlag
- Kupietz, Marc/Diewald, Nils/Trawiński, Beata/Cosma, Ruxandra/Cristea, Dan/Tufiş, Dan/Váradi, Tamás/Wöllstein, Angelika (2020): Recent developments in the European Reference Corpus EuReCo. In: Granger, Sylviane/Lefer, Marie-Aude (Hrsg.): Translating and Comparing Languages: Corpus-based Insights. (= Corpora and Language in Use, Proceedings 6). Louvain-la-Neuve: Presses universitaires de Louvain, 2020. S. 257-273.
- Kupietz, Marc/Cosma, Ruxandra/Cristea, Dan/Diewald, Nils/Trawiński, Beata/Tufiş, Dan/Váradi, Tamás/Wöllstein, Angelika (2018): Recent developments in the European Reference Corpus (EuReCo). In: Granger, Sylviane/Lefer, Marie-Aude/Aguiar de Souza Penha Marion, Laura (eds.): Using Corpora in Contrastive and Translation Studies Conference (5th edition). Book of Abstract. Louvain-la-Neuve: CECL, 2018. S. 101-103. →text
- Kupietz, Marc/Witt, Andreas/Bański, Piotr/Tufiş, Dan/Cristea, Dan/Váradi, Tamás (2017): EuReCo – Joining Forces for a European Reference Corpus as a sustainable base for cross-linguistic research. In: Bański, Piotr/Kupietz, Marc/Lüngen, Harald/Rayson, Paul/Biber, Hanno/Breiteneder, Evelyn/Clematide, Simon/Mariani, John/Stevenson, Mark/Sick, Theresa (eds.): Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section. Birmingham, 24 July 2017. Mannheim: Institut für Deutsche Sprache, 2017. S. 15-19. →IDS-Publikationsserver