A cross-platform multi-layer corpus annotation tool – and extensible platform – for the desktop.


Atomic does not put constraints on annotation layers, and provides tooling and infrastructure for multi-layer corpus annotation.


Atomic is implemented in Java and runs on all major operating systems (Linux, Mac OS, Windows, etc.).

open source

Atomic is open source under the Apache License, Version 2.0, the code is published on GitHub.


Atomic is extensible via plugins, e.g., for new editors, data views, or NLP components.


Atomic includes Pepper, a conversion framework for a large number of linguistic formats.


Atomic’s data model Salt works independent from certain theories, annotation schemes, tagsets, etc. Bring your own tagset!

NOTE: Atomic » Hexatomic
Atomic development has come to an end. The resulting prototype platform is currently used to develop Hexatomic, a fully-featured multiy-layer annotation software. For more info, go to the Hexatomic project website.

Atomic originates in the research project Towards a corpus-based typology of clause linkage - An analytical framework and case studies on non-local dependencies (LinkType) at the universities of Zurich and Jena.

Atomic is an architectural prototype for an extensible multi-layer annotation platform for linguistic data. It will be developed into Hexatomic, a stable, production-ready software product, in a research project funded by the DFG as part of the “Research Software Sustainability” programme, running from 2018-2021.

Atomic and Hexatomic work on a concrete implementation of the generic graph-based meta model Salt, and embed its complementary conversion framework Pepper, allowing for n : m mapping between data formats. Hexatomic will also embed the ANNIS search engine for linguistic data as well as an interface to its query language AQL.

If you want to refer to the concepts Atomic implements, please cite it as follows.

Druskat, Stephan, Lennart Bierkandt, Volker Gast, Christoph Rzymski & Florian Zipser. 2014. Atomic: an open-source software platform for multi-layer corpus annotation. In Josef Ruppert and Gertrud Faaß (eds.): Proceedings of the 12th Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2014), Hildesheim, October 2014. 228–234. ISBN 978-3-934105-46-1. PDF

Atomic is an architectural prototype.

Past experimental releases have been removed.

Current development of a stable annotation software is under way in the Hexatomic research project.



For Atomic-related questions, please write a message to the atomic-user mailing list:

atomic-user @ listserv · uni-jena · de


For questions related to the LinkType research project, please consult the project website:

Keep up to date

Subscribe to the Atomic users mailing list: atomic-user.

Subscribe to the Atomic developers mailing list: atomic-dev.


The following information (Impressum) is required under German law.

Responsible for the content of this site:
Volker Gast
Friedrich Schiller University Jena
Institut für Anglistik und Amerikanistik
Ernst-Abbe-Platz 8
07743 Jena
Tel: +49 (0)3641 944500
atomic @ corpus-tools · org

Universität Jena     Deutsche Forschungsgemeinschaft (DFG)     Humboldt-Universität zu Berlin, Department of corpus linguistics and morphology