A cross-platform multi-layer corpus annotation tool – and extensible platform – for the desktop.


Atomic does not put constraints on annotation layers, and provides tooling and infrastructure for multi-layer corpus annotation.


Atomic is implemented in Java and runs on all major operating systems (Linux, Mac OS, Windows, etc.).

open source

Atomic is open source under the Apache License, Version 2.0, the code is published on GitHub.


Atomic is extensible via plugins, e.g., for new editors, data views, or NLP components.


Atomic includes Pepper, a conversion framework for a large number of linguistic formats.


Atomic’s data model Salt works independent from certain theories, annotation schemes, tagsets, etc. Bring your own tagset!

Atomic originates in the research project Towards a corpus-based typology of clause linkage - An analytical framework and case studies on non-local dependencies (LinkType) at the universities of Zurich and Jena.

Atomic is an architectural prototype for an extensible multi-layer annotation platform for linguistic data. It will be developed into Hexatomic, a stable, production-ready software product, in a research project funded by the DFG as part of the “Research Software Sustainability” programme, starting in autumn 2018.

Atomic and Hexatomic work on a concrete implementation of the generic graph-based meta model Salt, and embeds its complementary conversion framework Pepper, allowing for n : m mapping between data formats. Hexatomic will also embed the ANNIS search engine for linguistic data as well as an interface to its query language AQL.

If you want to refer to the concepts Atomic implements, please cite it as follows.

Druskat, Stephan, Lennart Bierkandt, Volker Gast, Christoph Rzymski & Florian Zipser. 2014. Atomic: an open-source software platform for multi-layer corpus annotation. In Josef Ruppert and Gertrud Faaß (eds.): Proceedings of the 12th Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2014), Hildesheim, October 2014. 228–234. ISBN 978-3-934105-46-1. PDF

Atomic is an architectural prototype. It will be released for reference before the start of the Hexatomic project in autumn 2018.

Past experimental releases have been removed.

To receive updates about releases, please subscribe to the Atomic users mailing list: atomic-user. Subscribers to the list will be notified of any releases.

Atomic will be released under the open source Apache License, Version 2.0, so that you can use the software without restrictions, as well as contribute yourself.

Suggest a feature?

If you want to let us know what feature, new or better functionality you would like to see in a future release of Atomic, please create a new issue over at the Atomic GitHub site.

Alternatively, you can write us an email at atomic-feature-requests @ corpus-tools · org.

Found a bug?

If you have found something that doesn’t work, or doesn’t work as expected, please create a new issue over at the Atomic GitHub site.

Alternatively, you can write us an email at atomic-bugs @ corpus-tools · org.


Do you want to contribute actively to the development of Atomic? Write your own plugin? Add an editor? The Atomic source code is published at GitHub:

Feel free to fork or clone the sources and get started!

It would be great if you kept us updated on your ideas and enhancements! Please write an email to atomic @ corpus-tools · org.



For Atomic-related questions, please write a message to the atomic-user mailing list:

atomic-user @ listserv · uni-jena · de


For questions related to the LinkType research project, please consult the project website:

Keep up to date

Subscribe to the Atomic users mailing list: atomic-user.

Subscribe to the Atomic developers mailing list: atomic-dev.


The following information (Impressum) is required under German law.

Responsible for the content of this site:
Volker Gast
Friedrich Schiller University Jena
Institut für Anglistik und Amerikanistik
Ernst-Abbe-Platz 8
07743 Jena
Tel: +49 (0)3641 944500
atomic @ corpus-tools · org

Universität Jena     Deutsche Forschungsgemeinschaft (DFG)     Humboldt-Universität zu Berlin, Department of corpus linguistics and morphology