Pepper

A highly extensible plattform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used stand-alone as a command line interface, or be integrated as an API into other software products.

There are many modules for Pepper out there. Here we try to list all these modules we know about. If you have further modules or know about further modules, please let us know and write an email to saltnpepper@lists.hu-berlin.de.

To install further modules into an existing Pepper instance open the Pepper console and use command ‘is’:

pepper>is PATH_TO_PLUGIN

To update a module, open the Pepper console and use command ‘update’

update GROUP_ID::ARTIFACT_ID::REPOSITORY

If there is no existing module which fulfills your needs, you are free to implement your own module. With Pepper’s plug-in mechanism your module can easily be integrated into the Pepper plattform. The possibility of combining your new module with already existing ones lets you create completely new workflows. Please read the Module Developer’s Guide to get a detailed documentation of how to implement a Pepper module.

Importers

Importers are modules which map data from a format X to a Salt model. For more information about a Pepper’s workflow please take a look into Pepper’s user guide.

Module name Module description Format names and versions
AldtImporter The AldtImporter importer transforms data in aldt format used in the Perseus project to a Salt model. aldt, 1.0; aldt, 1.5
CoNLLImporter The CoNLLImporter importer transforms data in CoNLL format to a Salt model. CoNLL, 1.0
CoraXMLImporter The CoraXMLImporter importer transforms data in cora xml format to a Salt model. coraXML, 1.0
DoNothingImporter This is a dummy importer which imports nothing.
EXMARaLDAImporter The EXMARaLDAImporter transforms data in the exb format of EXMARaLDA to a Salt model. EXMARaLDA, 1.0
ElanImporter The ElanImporter transforms data in ELAN format to a Salt model. elan, 4.5.0
GateImporter The GateImporter transforms data in GATE’s xml format to a Salt model GateDocument, 2.0; ateDocument, 3.0
GenericXMLImporter Imports data coming from any XML file. The textual content of an element will be interpreted as a sequence of primary data. When processing the file, the importer will concatenate all these texts to an entire primary text. xml, 1.0
GrAFImporter The GrAFImporter transforms data in the GrAF format to a Salt model. GrAF, 1.0
MMAX2Importer The MMAX2Importer maps files produced by the MMAX2 tool to a Salt model. mmax2, 1.0
PAULAImporter The PAULA importer imports data comming from the PAULA format to a Salt model. paula, 1.0
PTBImporter The Penn Treebank importer transforms data in Penn Trebank bracketing format (ptb). to a Salt model. PTB, 1.0
RSTImporter This importer transforms data in rs3 format produced by the RST Tool (see: http://www.wagsoft.com/RSTTool/) to a Salt model. rs3, 1.0
SaltXMLImporter This importer imports a Salt model from a SaltXML representation. SaltXML is the native format to persist Salt. SaltXML, 1.0
SpreadsheetImporter This importer transforms data in the Excel format to a Salt model. xls, 97-2008; xlsx, 2007+
TCFImporter This importer transforms data in TCF format produced for instance by WebLicht (see http://weblicht.sfs.uni-tuebingen.de/) or WebAnno (see https://www.ukp.tu-darmstadt.de/software/webanno/) to a Salt model. TCF, 0.4
TEIImporter This importer transforms data in TEI format (see http://www.tei-c.org/index.xml) to a Salt model. Please note that this module only supports a subset of the TEI P5 guidelines. TEI, P5 2.6.0
TextImporter This importer imports a simple text document like .txt etc. . Even other documents can be imported as simple text. txt, 0.0
Tiger2Importer This importer transforms data in TigerXML and tiger2 format to a Salt model. tiger2, 2.0.5; tigerXML, 1.0
TreetaggerImporter This importer transforms data in TreeTagger format produced by the TreeTagger tool (see http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ to a Salt model. treetagger, 1.0
UAMImporter This importer transforms data in UAM format produced by the UAM corpus tool to a Salt model. UAM, 1.0

Manipulators

With manipulators, the imported data can be extended for further annotations, merged together or processed in other ways. For more information about the architecture of Pepper please read Pepper’s user guide.

Module name Module description
DOTManipulator This manipulator exports a Salt model to the dot syntax. This can be used to create a graphical representation of the Salt model.
FALKOManipulator This manipulator was developed especially for the FALKO Corpus. It creates a SSpan-objects for every SToken object in the document. All annotations for STokens will be duplicated and added to the spans. The annotations of the tokens will be renamed from ‘annoName’ to ‘annoName.’. For example a ‘pos’-annotation of SToken-object will be renamed to a ‘pos.’-annotation. All spans, tokens and spanning relations will be added to an artificial layer named ‘falko’.
Lemmatizer The lemmatizer adds lemmas to a document based on a list mapping words to lemmas, or word and pos tag combinations to lemmas. A built-in lemma list is included for English, but user-defined lemma lookup files can be used as well.
Merger The Merger allows to merge an unbound number of corpora to a single corpus.
OrderRelationAdder The OrderRelationAdder connects tokens or spans via an order relation with each other. This manipulator can be customized to connect spans having a specific annotation or to connect all tokens.
SaltValidator The aim of the SaltValidator is to check a Salt model and to detect possible problems for further modules. This might be very helpful, when developing an importer or a manipulator, to check their output. This could also be used by end users, to check if a module produces a processable output.
Sentencer The sentencer is a Pepper module to bundle tokens to sentences. Therefore it creates a span object for each sentence and connects that sentence with a set of tokens, belonging to the sentence. A sentence is identified as being determined by punctuations (‘.’, ‘!’ and * ‘?’). The sentencer uses the abbreviation lists of Salt to identify abbreviations.
Timeline2Token The Timeline2Token manipulator converts all primary text tokens to spans of an newly created artifical primary text which represents the timeline.
Tokenizer The tokenizer tokenizes a document using the tokenizer provided by Salt. The tokenizer uses abbreviation lists and is implemented along the Treetaggers tokenizer by Helmut Schmid (see: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/.
TueBaDZManipulator This manipulator was developed especially for the TueBaDZ Corpus. It creates a SSpan-objects for every SToken object in the document. All annotations for STokens will be duplicated and added to the spans. The annotations of the tokens will be renamed from ‘annoName’ to ‘annoName.’ For example a ‘pos’-annotation of SToken-object will be renamedto a ‘pos.’-annotation. All spans, tokens and spanning relations will be added to an artificial layer named ‘TueBaDZ’.

Exporters

Exporters are modules which map data from a Salt model to a format Y. For more information about the architecture of Pepper please read Pepper’s user guide.

Module name Module description Format names and versions
ANNISExporter This exporter transforms a Salt model into the annis format. relANNIS, 3.3; annis, 3.3
CoNLLExporter The CoNLLExporter transforms a Salt model into the CoNLL tabular format. CoNLL, 1.0
DOTExporter This exporter exports a Salt model to the dot syntax. This can be used to create a graphical representation of the Salt model. dot, 1.0
DoNothingExporter This is a dummy exporter which exports nothing. This exporter can be used to check if a corpus is importable.
EXMARaLDAExporter The EXMARaLDAExporter transforms data to the exb format of EXMARaLDA EXMARaLDA, 1.0
GraphAnnoExporter This exporter transforms a Salt model into a format for the GraphAnno tool (https://github.com/LBierkandt/graph-anno). Jason, 1.0
MMAX2Exporter The MMAX2Exporter maps a Salt model to the MMAX2 format. mmax2, 1.0
PAULAExporter The PAULA exporter exports data comming a Salt model to the PAULA format. paula, 1.0
PTBExporter This exporter transforms a Salt model into the Penn Trebank bracketing format (ptb). PTB, 1.0
RelANNISExporter Outdated: This module has been replaced by the ANNISExporter. This exporter transforms a Salt model into the relANNIS format. relANNIS, 3.1; relANNIS, 3.2
SaltInfoExporter This module produces a corpus-site of a corpus. A corpus-site is a homepage for the corpus containing all annotation names and their values and the frequencies of annotations. The corpus site can be extended for further description, to be used as a documentation. xml, 1.0, html 5.0
SaltXMLExporter This exporter exports a Salt model to a SaltXML representation. SaltXML is the native format to persist Salt. SaltXML, 1.0
TCFExporter This exporter transforms a Salt model into the TCF format produced for instance by WebLicht (see: http://weblicht.sfs.uni-tuebingen.de/) or WebAnno (see https://www.ukp.tu-darmstadt.de/software/webanno/). TCF, 0.4
TextExporter This is a PepperExporter which extracts and exports the primary text of a Salt model and stores it into a text file. txt, 1.0
TreetaggerExporter This exporter transforms a Salt model into the annis format. treetagger, 1.0