Multidimensional Corpus Analyses

Startseite

Organisation

Digitale Sprachwissenschaft

Corpus Linguistics

Projects

Methods of Analysis

Multidimensional Corpus Analyses

ZVG-Infrastructure

Methods of Corpus Analysis and Corpus Classification

Multidimensional Corpus Analyses

In this subproject we explore methods with the help of which a given linguistic phenomenon can be reviewed in order to find out, if this phenomenon shows a noticeable frequency distribution, which could be relevant for a given linguistic question. This includes the dimensions time, genre, topic or style, for example. We understand linguistic phenomena as all objects that occur in a given linguistic sample and can principally be quantified: ranging from single words over complex expressions to abstract syntactical structures or communication events.

Results (Selection)

Visualisation of temporal progressions in a semantic context
Systematic Generation of Time Behaviour Charts for the Online Dictionary of Neologisms of the IDS project Neuer Wortschatz (see also Online Documentation for OWID)
based on the diachronic frequency distribution of a word, various formal measures quantify the confidence that it qualifies as a neologism candidate
various filters that distinguish known groups from obvious non-neologisms (regionalisms, proper names, editorial abbreviations, ...)
an empirically constructed typology of the diachronic frequency distribution of verified neologisms

Relevant Research Aspects

typology of possible dimensions (linear order, hierarchical structure, unstructured)
universal and dimension-specific analysis methods
unidimensional and multidimensional analysis
use of epiphenomena / artefacts (base frequency effects, length of text effects, saturation effects)
exploration and evaluation in specific linguistic application scenarios

Current Main Subjects

The current research works concentrate on the dimension of time. In this process, methods for automatic detection of neologism candidates emerge. These are words, which show a diachronic frequency distribution typical for neologisms. In a collaboration with the in-house project Lexical Innovations, these methods are being evaluated and developed further in an ongoing process.

Publications (Selection)

Fankhauser, Peter / Kupietz, Marc (2017): Visualizing Language Change in a Corpus of Contemporary German (Corpus Linguistics Conference, Birmingham).
Keibel, Holger (2009): Mathematische Häufigkeitsmaße in der Korpuslinguistik: Eigenschaften und Verwendung. (Erw. und überarb. 2. Aufl.). Mannheim: Institut für Deutsche Sprache.
Keibel, Holger / Sophie Hennig / Rainer Perkuhn (2011): Effiziente halbautomatische Detektion von Neologismuskandidaten. Technical Report IDS-KL-2010-01. Mannheim: Institut für Deutsche Sprache.

Back to Project Page

Contact: Dr. Harald Lüngen <luengen@ids-...>

Sitemap Search Impressum Contakt Print

Methods of Corpus Analysis and Corpus Classification

Multidimensional Corpus Analyses

Results (Selection)

Relevant Research Aspects

Current Main Subjects

Publications (Selection)

Organisationsstruktur

Informationen

Schnelleinstieg

Kontakt

Social Media