A highly extensible plattform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used stand-alone as a command line interface, or be integrated as an API into other software products.

The knight Sir Pepper fighting against the format monster

If you need to convert corpora from one linguistic format into another, Pepper is your swiss-army knife. When your annotation tool produces a different data format from the one your analysis tool can read, Pepper is there to the rescue.

To decrease the number of conceptual mappings, Pepper uses an intermediate model, which means that a conversion consists of two mappings. First, the data coming from format X will be mapped to the intermediate model Salt and second, the data will be mapped from Salt to format Y. If you imagine a set of n source and target formats, this approach will decrease the number of mappings from n²-n mappings in case of a direct mapping approach, to 2n mappings.

Mapping without Pepper pipeline left to right arrow Mapping pipeline with Pepper

Pepper itself is a platform controlling the workflow of a conversion process. The mapping itself is done by a set of Pepper modules. Pepper is a highly pluggable platform which offers the possibility to plug-in new modules in order to incorporate further formats. The flexible architecture of Pepper allows to combine all existing modules and to easily plug in new ones.

Pepper architecture

Here you will find the current stable release, snapshots, and also older versions of Pepper, including a set of Pepper modules.

Pepper is system-independent and comes as a ready-to-run zip archive, so you do not need any installation. But since Pepper is Java-based, you need to have Java installed on your system. On most systems, Java is installed by default, but in case it is not, please download it from or To check if Java (or more precisly a Java Runtime Environment) is running, open a command line and run:

java -version 

You need at least version 1.7.

If you use Pepper in your scientific work, please cite it as follows.

F. Zipser & L. Romary (2010). A model oriented approach to the mapping of annotation formats using standards. In: Proceedings of the Workshop on Language Resource and Language Technology Standards, LREC 2010. Malta. URL: <a href=””</a>

User documentation

Users new to Pepper should begin by reading the User Guide

Create your own Pepper module (Module Developers’s Guide)

f there is no existing module which fulfills your needs, you are free to implement your own module. With Pepper’s plug-in mechanism your module can easily be integrated into the Pepper plattform. The possibility of combining your new module with already existing ones lets you create completely new workflows. Please read the Module Developer’s Guide to get a detailed documentation of how to implement a Pepper module.

Using Pepper as a library

With the Pepper library, we provide a programmatic access to the Pepper framework, including the configuration of a conversion workflow, the start of a conversion and getting information about the registered Pepper modules. Since Pepper is based on a plugin structure called OSGi (see:, each Pepper module is plugged into the framework separately, whether running Pepper as CLI, or running it as a library.

To learn more about integrating Pepper as a library in other software products go to Pepper as a library.

Pepper is published under the Open Source license Apache License, Version 2.0. We want to enable everyone to use the software without restrictions, and also enable the community to take part in its developement.

Found a bug or have any feature request?

Please let us know what you have found, or which ideas for enhancements you have. Please leave us an issue on GitHub at Pepper or write us an e-mail:

Want to contribute to the project?

For both Salt and Pepper, we have published the source code at the GitHub platform at Pepper. If you are interested in contributing to the project, please feel free to fork or clone it. We are happy about any suggestions, bug reports, bug fixes, and so on. It would be nice if you keep us informed about your ideas and enhancements: Please write us an e-mail:

  • F. Zipser, A. Röhrig, A. Lüdeling, M. Klotz,T. Krause, S. Druskat & V. Voigt (2015).
    SaltNPepper, ANNIS & Atomic: Eine Infrastruktur für Mehrebenenkorpora.
    Digital Humanities“-Tag 2015 – Digitale Ressourcen an der Philosophischen Fakultät II. Berlin, 21. Mai 2015. [poster].
  • F. Zipser, T. Krause, A. Lüdeling, A. Neumann, M. Stede, A. Zeldes (2015).
    ANNIS, SaltNPepper & PAULA: A multilayer corpus infrastructure.
    Final Conference of the SFB 632 Information Structure: Advances in Information Structure Research 2003 - 2015. Berlin, 08. - 09. Mai 2015. [poster].
  • F. Zipser, M. Klotz & A. Röhrig (2015).
    From TEI to linguistic corpora using Pepper.
    37. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft. Leipzig, 04.- 06. März 2015. [poster].
  • Zipser, F. (2014).
    SaltNPepper und das Formatpluriversum.
    LAUDATIO Workshop 2014. Berlin, 07.- 08.10.2014.[slides].
  • F. Zipser, M. Frank & J. Schmolling (2014).
    Merging data, the essence of creation of multi-layer corpora.
    36. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft. Marburg, 05.- 07. März 2014. [ poster].
  • C. Odebrecht & F. Zipser (2013).
    LAUDATIO - Eine Infrastruktur zur linguistischen Analyse historischer Korpora.
    DTA-/CLARIN-D Konferenz und -Workshops: Historische Textkorpora für die Geistes- und Sozialwissenschaften. Fragestellungen und Nutzungsperspektiven, Berlin 2013.
  • F. Zipser, A. Zeldes, J. Ritz, L. Romary & U. Leser (2011).
    Pepper: Handling a multiverse of formats.
    33. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft. Göttingen, 23.- 25. Februar 2011. [ poster].
  • F. Zipser & L. Romary (2010).
    A model oriented approach to the mapping of annotation formats using standards.
    In: Proceedings of the Workshop on Language Resource and Language Technology Standards, LREC 2010. Malta. URL:
  • F. Zipser (2009).
    Entwicklung eines Konverterframeworks für linguistisch annotierte Daten auf Basis eines gemeinsamen (Meta-)modells.
    Diplomarbeit, Humboldt-Universität zu Berlin, Institut für Informatik. URL:

Sonderforschungsbereich 632, Information structure: The linguistic means for structuring utterances, sentences and texts Deutsche Forschungsgemeinschaft (DFG) Humboldt-Universität zu Berlin, Department of corpus linguistics and morphology INRIA