Methods of Corpus Analysis and Corpus Classification

Multidimensional Corpus Analyses

In this subproject we explore methods with the help of which a given linguistic phenomenon can be reviewed in order to find out, if this phenomenon shows a noticeable frequency distribution, which could be relevant for a given linguistic question. This includes the dimensions time, genre, topic or style, for example. We understand linguistic phenomena as all objects that occur in a given linguistic sample and can principally be quantified: ranging from single words over complex expressions to abstract syntactical structures or communication events.

Results (Selection)

  • Visualisation of temporal progressions in a semantic context
  • <link kl projekte methoden mdca zvgs>Systematic Generation of Time Behaviour Charts for the Online Dictionary of Neologisms  of the IDS project<link lexik neuer-wortschatz> Neuer Wortschatz (see also <link digspra kl neoplots daten_methoden>Online Documentation for OWID)
  • based on the diachronic frequency distribution of a word, various formal measures quantify the confidence that it qualifies as a neologism candidate
  • various filters that distinguish known groups from obvious non-neologisms (regionalisms, proper names, editorial abbreviations, ...)
  • an empirically constructed typology of the diachronic frequency distribution of verified neologisms

Relevant Research Aspects

  • typology of possible dimensions (linear order, hierarchical structure, unstructured)
  • universal and dimension-specific analysis methods
  • unidimensional and multidimensional analysis
  • use of epiphenomena / artefacts (base frequency effects, length of text effects, saturation effects)
  • exploration and evaluation in specific linguistic application scenarios

Current Main Subjects

The current research works concentrate on the dimension of time. In this process, methods for automatic detection of neologism candidates emerge. These are words, which show a diachronic frequency distribution typical for neologisms. In a collaboration with the in-house project <link lexik lexikalischeinnovationen>Lexical Innovations, these methods are being evaluated and developed further in an ongoing process.

Publications (Selection)

<link kl projekte methoden>Back to Project Page Contact: Dr. Harald Lüngen <luengen@ids-...>                                    Sitemap     Search     Impressum     Contakt    Print