Teneo Developers

File Resources

Custom Input Processor configuration

The Input Processors and the Simplifier are configurable from within the Platform; this allows users to customize the Input Processors (IPs) of the Teneo Engine and their configuration for any solution. The custom IP configuration is provided to a solution as a file resource with a fixed name and content structure.

This page describe the process of creating a custom Input Processor configuration and how to apply it to the solution. Furthermore, at the end of the page, the reader will find useful information related to handling Platform upgrades when using custom IP configurations as well as a short troubleshooting section.

Detailed information of the Input Processor configuration properties can be found on the Input Processor API page.

Create custom IP configuration

Creating a custom IP configuration basically involves the below steps which are all described in more details in the following sections:

  1. Export of the language config in .zip from Teneo Studio Desktop
  2. Unzipping and modifying the IP configuration
  3. Rezipping the files with the following name: custominputprocessorsetup.zip
  4. Adding of the ´custominputprocessorsetup.zip´ to the solution as a file resource with the folder path /
  5. Reload of the Tryout to apply the changes to the solution
  6. Republish (full publish) of the solution to apply the changes to the published solution/AI application

Export language config

This step can currently only be performed in Teneo Studio Desktop and only if the user has the needed export permissions; read more

To export the current Input Processor configuration, follow the below steps:

  • Go to Import/Export in the backstage of Teneo Studio
  • Click Export config

Export language config

  • Browse to the folder where to store the export and click Save
  • Teneo Studio will start exporting the language configuration zip to the selected local folder, when done click OK.

The Export language config packs the entire Input Processor configuration and .jar-files, according to the current solution language and export it as a ZIP file that can be stored locally

Add custom IP config files as a Resource File

To add the custom Input Processor configuration to the solution again, follow the below steps:

  • Create a .zip containing the configuration files, the .zip must be named custominputprocessorsetup.zip
  • Follow the steps outlined here to add the file to Teneo Studio setting the Publish Location to /

Please note that the language selection of the solution must be identical to the one that was effective when the Input Processor configuration file was exported; if the languages don't match then Engine will not start.

Also, the Tryout Engine needs to be reloaded for the changes to take effect and the solution republished to apply the changes to the published solution.

A custom Input Processor configuration ZIP added to a solution takes precedence over the Input Processor configuration provided by Teneo Studio

Modify the custom IP file

To modify and update a custom Input Processor zip file, follow the steps outlined here to download the resource file, update it and re-add it to the solution.

Again, when the custom Input Processor configuration file is updated in the solution, the Tryout Engine must be reloaded and the solution published (full publish) to apply the new configuration.

Content of the language config zip

The folder structure of the Input Processor configuration file which is downloaded from Teneo Studio as described here is similar to the below image:

Folder structure

In the below text<version> refers to the actual version of the Teneo Platform, e.g. 7.0.0, 7.0.1 or 7.1.0

  • The top folders /<version>/jars and /<version>/languages are fixed and many not be renamed
  • Folder /<version>/jars contains the jar files containing the IP classes and their dependency jar files
  • Folder /<version>/languages contains one subfolder that is named by the ISO-639-1 two letter code of the solution language
  • Folder /<version>/languages/<language-code> contains one subfolder per Teneo Input Processor, named by the exact spelling and case of the IP's Java class name. Furthermore, one additional subfolder exists for the Simplifier, named by the exact spelling and case of the Simplifier's Java class name. The number and names of folders may vary, according to the configured language and Input Processors.
  • The Simplifier folder and each IP folder contains at least the Simplifier's / IP's properties file, named by the folder named and with file extension .properties. Depending on the particular Simplifier / IP the folder may contain additional configuration files.

In the folder /<version>/languages/<language-code>, the user will also find the config.properties file. This file specifies the language name and locale code, the Input Processor jar files and dependency jar files, as well as the Simplifier and Input Processors to be used by the Engine. It may look similar to the below example from the English IP configuration:

properties

1language.name = English
2
3# Following properties are removed and not sent to the engine
4languageproperty.jar.1=european-input-processors-7.0.1.jar
5languageproperty.jar.2=commons-io-2.4.jar
6languageproperty.jar.3=commons-lang3-3.11.jar
7languageproperty.jar.4=commons-math3-3.6.1.jar
8languageproperty.jar.5=english-input-processors-7.0.1.jar
9languageproperty.jar.6=opennlp-tools-1.8.4.jar
10languageproperty.jar.7=general-input-processors-7.0.1.jar
11
12#Dependencies related to MLEAP
13languageproperty.jar.8=scala-library-2.12.10.jar
14languageproperty.jar.9=mleap-base_2.12-0.16.0.jar
15languageproperty.jar.10=mleap-core_2.12-0.16.0.jar
16languageproperty.jar.11=bundle-ml_2.12-0.16.0.jar
17languageproperty.jar.12=mleap-runtime_2.12-0.16.0.jar
18languageproperty.jar.13=mleap-tensor_2.12-0.16.0.jar
19languageproperty.jar.14=spray-json_2.12-1.3.2.jar
20languageproperty.jar.15=spark-mllib-local_2.12-2.4.5.jar
21languageproperty.jar.16=breeze_2.12-0.13.2.jar
22languageproperty.jar.17=breeze-macros_2.12-0.13.2.jar
23languageproperty.jar.18=core-1.1.2.jar
24languageproperty.jar.19=arpack_combined_all-0.1.jar
25languageproperty.jar.20=opencsv-2.3.jar
26languageproperty.jar.21=spire_2.12-0.13.0.jar
27languageproperty.jar.22=spire-macros_2.12-0.13.0.jar
28languageproperty.jar.23=machinist_2.12-0.6.1.jar
29languageproperty.jar.24=shapeless_2.12-2.3.2.jar
30languageproperty.jar.25=macro-compat_2.12-1.1.1.jar
31languageproperty.jar.26=slf4j-api-1.7.30.jar
32languageproperty.jar.27=spark-tags_2.12-2.4.5.jar
33languageproperty.jar.28=unused-1.0.0.jar
34languageproperty.jar.29=jtransforms-2.4.0.jar
35languageproperty.jar.30=scalapb-runtime_2.12-0.7.1.jar
36languageproperty.jar.31=lenses_2.12-0.7.0-test2.jar
37languageproperty.jar.32=fastparse_2.12-1.0.0.jar
38languageproperty.jar.33=fastparse-utils_2.12-1.0.0.jar
39languageproperty.jar.34=sourcecode_2.12-0.1.4.jar
40languageproperty.jar.35=protobuf-java-3.4.0.jar
41languageproperty.jar.36=scala-arm_2.12-2.0.jar
42languageproperty.jar.37=config-1.3.0.jar
43languageproperty.jar.38=scala-reflect-2.12.10.jar
44
45# language of the language setting
46language.locale.language=en
47
48# simplifier FQCN.
49inputProcessorHandler.simplifier.class=com.artisol.teneo.engine.core.inputprocessor.StandardSimplifier
50
51# order of input processors FQCN.
52inputProcessorHandler.inputProcessor.class.1=com.artisol.teneo.engine.core.inputprocessor.StandardSplitting
53inputProcessorHandler.inputProcessor.class.2=com.artisol.teneo.engine.core.inputprocessor.StandardAutoCorrection
54inputProcessorHandler.inputProcessor.class.3=com.artisol.teneo.engine.inputprocessors.general.Predict
55inputProcessorHandler.inputProcessor.class.4=com.artisol.teneo.engine.core.inputprocessor.StandardSimilarityMatchCorrection
56inputProcessorHandler.inputProcessor.class.5=com.artisol.teneo.engine.core.inputprocessor.SystemAnnotation
57inputProcessorHandler.inputProcessor.class.6=com.artisol.teneo.engine.core.inputprocessor.BasicNumberRecognizer
58inputProcessorHandler.inputProcessor.class.7=com.artisol.teneo.engine.core.inputprocessor.DateTimeRecognizer
59inputProcessorHandler.inputProcessor.class.8=com.artisol.teneo.engine.inputprocessors.general.LanguageDetector
60inputProcessorHandler.inputProcessor.class.9=com.artisol.teneo.engine.inputprocessors.english.EnglishPOSTagger
61inputProcessorHandler.inputProcessor.class.10=com.artisol.teneo.engine.inputprocessors.english.EnglishNERTagger
62
63# IPs to be used for normalization IP chain
64languageproperty.normalizationIPs=StandardSplitting,StandardAutoCorrection
65

For each jar file, a property with the name languageproperty.jar.<n> must exist, where <n> is a unique positive integer number (the actual value is not important). The property value is the jar file name, without a file path.

Property language.locale.language must be set to the ISO-639-1 language code (that is, the name of the subfolders of /<version>/languages).

Property inputProcessorHandler.simplifier.class must be set to the fully qualified class name (FQCN) of the Simplifier.

For each IP to use, the property inputProcessorHandler.inputProcessor.class.<n> must exist specifying the FQCN of the IP. <n> is a positive integer number. The value specifies the execution order of the Input Processors, where the IP with the lowest number is executed first.

Platform upgrades

Any custom Input Processor configuration applied to a solution will take precedence over the default Input Processor configuration; because of this and the fact that Teneo Input Processor configurations are version specific the Platform will not be able to find the correct configuration in the custom Input Processor zip file until the custom IP configuration has been upgraded manually.

This means, that after an upgrade of the Teneo Platform a couple of steps are required to also upgrade the custom Input Processor configuration and these steps must be take for every solution which has a custom Input Processor configuration applied:

  1. Ensure any additional Input Processor libraries and/or modified config files are stored outside the Platform
    If the files are not stored externally, the pre-upgrade version of the IP configuration can be downloaded from Teneo Studio as explained here and the new/modified files can then be extracted from there
  2. Download and Save the existing custominputprocessorsetup.zip resource from Teneo Studio as explained here
  3. Then Delete the existing custominputprocessorsetup.zip resource from the solution as explained here
  4. Download an upgraded version of the Platform IP configuration from the solution (see Export language config)
    If at this stage a LANGUAGE_SETTINGS_MISSING error is seen, refer to the Troubleshooting
  5. Apply to the upgraded version of the IP configuration any changes that are required
  6. Re-zip and add again as a custominputprocessorsetup.zip file resource in Teneo Studio.

Troubleshooting

Tryout - "Directory missing" error

When an environment is upgrade, but a solution's custom Input Processor configuration has not yet be upgraded, then Engine will not start and an error will be displayed in the Tryout.

Tryout warning

Additionally, the same error is seen in Engine log files if a publish is made without upgrading first the IP configuration.

The solution to this error is to perform the upgrade steps as described above.

Download IP configuration - LANGUAGE_SETTINGS_MISSING

When an environment is upgraded but a solution's custom IP configuration has not yet been upgraded then any attempt to Export language config (see section Export language config or step 4 in the above upgrade process) will result in an error.

Language Settings Missing

The solution to this error is to ensure that the existing custom IP configuration is removed from the solution before attempting to export as explained in step 3 in the above described upgrade process.