A cross-platform multi-layer corpus annotation tool – and extensible platform – for the desktop.


Atomic does not put constraints on annotation layers, and provides tooling and infrastructure for multi-layer corpus annotation.


Atomic is implemented in Java and runs on all major operating systems (Linux, Mac OS, Windows, etc.).

open source

Atomic is open source under the Apache License, Version 2.0, the code is published on GitHub.


Atomic is extensible via plugins, e.g., for new editors, data views, or NLP components.


Atomic includes Pepper, a conversion framework for a large number of linguistic formats.


Atomic’s data model Salt works independent from certain theories, annotation schemes, tagsets, etc. Bring your own tagset!

Atomic has been originally developed in the context of Towards a corpus-based typology of clause linkage - An analytical framework and case studies on non-local dependencies (LinkType), a DFG-funded linguistic research project that is jointly carried out by the universities of Zurich and Jena. The project aims at statistically modeling linguistic variation in the area of complex sentences. It is based on the investigation of corpora from genealogically and geographically diverse languages that have been richly annotated.

Atomic is built as an open source annotation software for multi-layer deep linguistic annotation of text corpora. It is based on the Eclipse RCP, a modular rich client platform implemented in Java, and hence extensible via plugins.

Atomic works on a concrete implementation of the generic graph-based meta-model Salt, and embeds its complementary conversion framework Pepper, allowing for n : m mapping between data formats.

Development status

In the context of the LinkType project, Atomic has been developed as an architectural prototype, i.e., as a test bed for an extensible, sustainable annotation platform. Hence, all current functionality must be considered experimental, i.e., unstable and due to change.

Currently, Atomic is being developed towards an initial stable release.

To track progress, have a look at the development branch on GitHub. We aim for a transparent development process via issues and the Features project on the GitHub site. Feel free to file an issue if you want to contribute.

If you use Atomic in your scientific work, please cite it as follows.

Druskat, Stephan, Lennart Bierkandt, Volker Gast, Christoph Rzymski & Florian Zipser. 2014. Atomic: an open-source software platform for multi-layer corpus annotation. In Josef Ruppert and Gertrud Faaß (eds.): Proceedings of the 12th Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2014), Hildesheim, October 2014. 228–234. ISBN 978-3-934105-46-1. PDF

Atomic is currently being developed towards an initial stable release.

Past experimental releases have been removed.

To receive updates about releases, please subscribe to the Atomic users mailing list: atomic-user. Subscribers to the list will be notified of any releases.

Atomic is released under the open source Apache License, Version 2.0, so that you can use the software without restrictions, as well as contribute yourself.

Suggest a feature?

If you want to let us know what feature, new or better functionality you would like to see in a future release of Atomic, please create a new issue over at the Atomic GitHub site.

Alternatively, you can write us an email at atomic-feature-requests @ corpus-tools · org.

Found a bug?

If you have found something that doesn’t work, or doesn’t work as expected, please create a new issue over at the Atomic GitHub site.

Alternatively, you can write us an email at atomic-bugs @ corpus-tools · org.


Do you want to contribute actively to the development of Atomic? Write your own plugin? Add an editor? The Atomic source code is published at GitHub:

Feel free to fork or clone the sources and get started!

It would be great if you kept us updated on your ideas and enhancements! Please write an email to atomic @ corpus-tools · org.



For Atomic-related questions, please write a message to the atomic-user mailing list:

atomic-user @ listserv · uni-jena · de


For questions related to the LinkType research project, please consult the project website:

Keep up to date

Subscribe to the Atomic users mailing list: atomic-user.

Subscribe to the Atomic developers mailing list: atomic-dev.


The following information (Impressum) is required under German law.

Responsible for the content of this site:
Volker Gast
Friedrich Schiller University Jena
Institut für Anglistik und Amerikanistik
Ernst-Abbe-Platz 8
07743 Jena
Tel: +49 (0)3641 944500
atomic @ corpus-tools · org

Universität Jena     Deutsche Forschungsgemeinschaft (DFG)     Humboldt-Universität zu Berlin, Department of corpus linguistics and morphology