Methods of Corpus Analysis and Corpus Classification

Multidimensional Corpus Analyses

In this subproject we explore methods with the help of which a given linguistic phenomenon can be reviewed in order to find out, if this phenomenon shows a noticeable frequency distribution, which could be relevant for a given linguistic question. This includes the dimensions time, genre, topic or style, for example. We understand linguistic phenomena as all objects that occur in a given linguistic sample and can principally be quantified: ranging from single words over complex expressions to abstract syntactical structures or communication events.

Results (Selection)

Relevant Research Aspects

  • typology of possible dimensions (linear order, hierarchical structure, unstructured)
  • universal and dimension-specific analysis methods
  • unidimensional and multidimensional analysis
  • use of epiphenomena / artefacts (base frequency effects, length of text effects, saturation effects)
  • exploration and evaluation in specific linguistic application scenarios

Current Main Subjects

The current research works concentrate on the dimension of time. In this process, methods for automatic detection of neologism candidates emerge. These are words, which show a diachronic frequency distribution typical for neologisms. In a collaboration with the in-house project Lexical Innovations, these methods are being evaluated and developed further in an ongoing process.

Publications (Selection)

Back to Project Page

Contact: Dr. Harald Lüngen <luengen@ids-...>  

                               

 Sitemap     Search     Impressum     Contakt    Print