ANNIS

A web browser-based search and visualization architecture for complex multilayer linguistic corpora with diverse types of annotation.

Research Projects Employing ANNIS and PAULA

Cooperations within SFB 632 “Information Structure”:
- A5: “Focus realization, focus interpretation, and focus use from a cross-linguistic perspective”
- A6: “A constraint-based analysis of information structure in German, Spanish and French”
- A8: “Structuring linguistic information using discourse particles”
- B1: “The Interaction of Information Structure and Grammar in Gur and Kwa Languages” (data elicited with QUIS; project completed)
- B2: “Information Structuring in Chadic Languages” (data elicited with QUIS; project completed)
- B4: “The Role of Information Structure in the Development of Word Order Regularities in Germanic”
- B6: “Grammatical Reduction and Information Structural Preferences in a Contact Variety of German: Kiezdeutsch”
- B7: “Predicate-centered focus types: A sample-based typological study in African languages”
- C1: “Contextually Licensed Non-canonical Word Order in Language Comprehension” (completed)
- C6: “Experimental and Corpus Investigations of Information Structure in Hindi” (completed)
- D1: “Linguistic Database for Information Structure: Annotation and Retrieval”
- D2: “Typology of Information Structure” (completed)
Argument Structure in Texts - A comparative-typological joint project at the universities of Erfurt (Germany) and Pavia (Italy), working on a quantitative investigation of argument structure in Classical Greek and Yucatec Maya (ANNIS integration in progress)
Atomic - A versatile and platform-independent annotation tool with connection to ANNIS via SaltNPepper developed at the University of Zurich and Friedrich Schiller University Jena
BeMaTaC - The Berlin Map Task Corpus: A deeply annotated multimodal map-task corpus of spoken learner and native German
Modelling Textual Organisation: Coherence and Cohesion - project at CLCG (Center for Language and Cognition, Groningen, NL), hosting a multilayer annotated text corpus of Dutch in ANNIS
Coptic SCRIPTORIUM (Georgetown University/HU Berlin/University of the Pacific): A digital humanities project on resources for Sahidic Coptic texts
DDB - Deutsch Diachrone Baumbank - A comparable treebank of Old, Middle and Early New High German
DDD - Deutsche Diachron Digital - Referenzkorpus Altdeutsch - a reference corpus of historical German texts (8th-13th century)
Falko - Fehlerannotiertes Lernerkorpus des Deutschen als Fremdsprache / An error-annotated learner corpus of German as a foreign language, HU Berlin
Friedrich-Schiller-Universität Jena:
- Informationsstruktur in älteren indogermanischen Sprachen
- Frühneuzeitliche Fürstinnenkorrespondenz im mitteldeutschen Raum
Forschungsverbund Linguistik - Bioinformatik - Syntactic annotation of diachronic German language stages for the calculation of linguistic distance and phylogeny, HU Berlin (project completed)
GME - Gradable Modal Expressions - semantic analysis of expressions such as ‘probable’, ‘permissible’ and ‘likelihood’, called Gradable Modal Expressions (GMEs)
GUM - Georgetown University Multilayer Corpus - a collaborative multilayer annotation project as part of the Computational Linguistics curriculum at Georgetown University.
Kobalt-DaF - Korpusbasierte Analyse von Lernertexten für Deutsch als Fremdsprache - a research network on German learner language
KOMeT - Korpuslinguistische Methoden für e-Philologie mit TEI - a junior researcher group in the Digital Humanities funded by the German Ministry of Education and Science (BMBF)
KOMPOST - BMBF project on identification of competence indicators in school children’s writing
LAUDATIO - Long-term Access and Usage of Deeply Annotated Information - A project working on sustainable [repository}(http://www.laudatio-repository.org/repository/) storage for historical corpora at Humboldt-Universität zu Berlin
MASC - The Manually Annotated SubCorpus of the Open American National Corpus (OANC)
MATAS - Morphologically Annotated Corpus of the Lithuanian Language at the Centre for Computational Linguistics (Vytautas Magnus University)
PROIEL (Oslo) - Pragmatic Resources in Old Indo-European Languages
Perseus Latin and Ancient Greek Treebank - Dependency Treebanks of Ancient Greek and Latin Classics
Ramsès - an Egyptological project at the Université de Liège producing corpora and annotation tools for Late Egyptian texts
Referenzkorpus des Frühneuhochdeutschen - a reference corpus or Early New High German from 1350 to 1650
RIDGES - Register in Diachronic German Science: A project on the development of German as a language of science in the 16th-19th centuries, funded by two Google Digital Humanities Research Awards
Roman de Flamenca - a multilayer parallel corpus of the 13th century Old Occitan narrative Le roman de Flamenca compiled at Indiana University
sms4science - a multilingual corpus of multilayer-annotated text messages (SMS)
SUMMaR - Text Summarization Systems for Robust and High Quality Summaries, Potsdam (project completed)
The Anselm Project - “Questions by Saint Anselm about the Lord’s Passion” - an interdisciplinary research project on a 14th-16th century German text at the Ruhr University Bochum
The Language Archive at the Max Planck Institute for Psycholinguistics (Nijmegen, NL), with discourse-annotated corpora of Dutch in ANNIS
University of Regensburg DFG project on Grammaticalization of Peripheral Subjects in Slavic Languages:
- RRuDi - Regensburg Russian Diachronic Corpus
- PolDi - Regensburg Polish Diachronic Corpus
What’s up, Switzerland? - a corpus project using WhatsApp chats