Teneo Developers

Custom configuration

Custom Input Processor configuration

The splitting of user input text into sentences and words, spelling correction and other language depending processing are within the Teneo Platform done in separate modules: the Input Processors and the Simplifier. These are pluggable into the Teneo Engine, via the Input Processor API.

The Input Processors and the Simplifier are configurable from within the Platform allowing users to customize the Input Processors (IPs) of the Teneo Engine and the configuration for any solution by providing custom IP configurations as a resource file in Teneo Studio with a fixed name and content structure.

This opens for a number of possibilities: users can, for example, add their own input processors and update them themselves at any point or create custom specific auto corrections and abbreviation lists.

Creation of the custom IP configuration

Teneo Studio provides an export function for the custom IP configuration file. To get access to this export, the Studio user needs to have permissions to export the IP setup, read more here.

Export config

The Export language config function packs the entire Input Processor configuration and .jar-files, according to the current solution language, and export it as a ZIP file that can be downloaded and stored locally.

This file is ready to be applied to a solution, without any modifications. To apply it, it just needs to be added to the solution as a resource file.

Custom Input Processor file

The file name must be custominputprocessorsetup.zip and the file path must be /. Teneo Studio will recognize a custom Input Processor configuration file only with this exact name and location.

When a custom Input Processor configuration file is added to the solution, it takes precedence over the Input Processor configuration provided by Teneo Studio.

After a custom Input Processor configuration file is added to the solution, the Try Out engine must be restarted to apply the Input Processor configuration. Accordingly, the Try Out engine must be restarted after removing the custom Input Processor configuration file from the solution resources. Also, in both cases a full publish is required to apply the changes to a published conversational AI application.

Please note that the language selection of the solution must be identical to the one that was effective when the Input Processor configuration file was exported. If the languages don't match, then the Engine will not start.

Modification of the custom IP configuration

It is possible to modify the custom Input Processor configuration by downloading the attached resource file to a local file, unpack it and apply any desired modifications.

After modifying the file, it needs to be uploaded again to the solution, by replacing the existing resource file with the modified local version of the file.

After a custom Input Processor configuration file is updated in the solution, the Try Out engine must be restarted, and the conversational AI application must be fully published to apply the new configuration.

Content of the custom IP configuration resource file

The folder structure within the IP configuration ZIP file will look similar to this:

Folder structure

Please note that in the below text<version> refers to the actual version of the Teneo Platform, e.g. 6.2.1, 7.0.0 or 7.0.2.

  • The top folders /<version>/jars and /<version>/languages are fixed and may not be renamed.
  • Folder /<version>/jars contains the jar files containing the IP classes, and their dependency jar files.
  • Folder /<version>/languages contains one subfolder that is named by the ISO-639-1 two letter code of the solution language.
  • Folder /<version>/languages/<language-code> contains one subfolder per IP, named by the exact spelling and case of the IP’s Java class name. Furthermore, one additional subfolder exists for the Simplifier, named by the exact spelling and case of the Simplifier’s Java class name. The number and names of folders may vary, according to the configured language and IPs.
  • The Simplifier folder and each IP folder at least contains the Simplifier’s / IP’s properties file, named by the folder name and with file extension .properties. Depending on the particular Simplifier / IP the folder may contain additional configuration files.

Folder /<version>/languages/<language-code> also contains the file config.properties. This files specifies the language name and locale code, the Input Processor jar files and dependency jar files, and the Simplifier and Input Processors to be used by Engine. It may look like this:

groovy

1language.name = English
2
3# Following properties are removed and not sent to the engine
4languageproperty.jar.1=european-input-processors-6.0.0.jar
5languageproperty.jar.2=commons-io-2.4.jar
6languageproperty.jar.3=commons-lang3-3.11.jar
7languageproperty.jar.4=commons-math3-3.6.1.jar
8languageproperty.jar.5=english-input-processors-6.0.0.jar
9languageproperty.jar.6=opennlp-tools-1.8.4.jar
10languageproperty.jar.7=general-input-processors-6.0.0.jar
11
12#Dependencies related to MLEAP
13languageproperty.jar.8=scala-library-2.12.10.jar
14languageproperty.jar.9=mleap-base_2.12-0.16.0.jar
15languageproperty.jar.10=mleap-core_2.12-0.16.0.jar
16languageproperty.jar.11=bundle-ml_2.12-0.16.0.jar
17languageproperty.jar.12=mleap-runtime_2.12-0.16.0.jar
18languageproperty.jar.13=mleap-tensor_2.12-0.16.0.jar
19languageproperty.jar.14=spray-json_2.12-1.3.2.jar
20languageproperty.jar.15=spark-mllib-local_2.12-2.4.5.jar
21languageproperty.jar.16=breeze_2.12-0.13.2.jar
22languageproperty.jar.17=breeze-macros_2.12-0.13.2.jar
23languageproperty.jar.18=core-1.1.2.jar
24languageproperty.jar.19=arpack_combined_all-0.1.jar
25languageproperty.jar.20=opencsv-2.3.jar
26languageproperty.jar.21=spire_2.12-0.13.0.jar
27languageproperty.jar.22=spire-macros_2.12-0.13.0.jar
28languageproperty.jar.23=machinist_2.12-0.6.1.jar
29languageproperty.jar.24=shapeless_2.12-2.3.2.jar
30languageproperty.jar.25=macro-compat_2.12-1.1.1.jar
31languageproperty.jar.26=slf4j-api-1.7.30.jar
32languageproperty.jar.27=spark-tags_2.12-2.4.5.jar
33languageproperty.jar.28=unused-1.0.0.jar
34languageproperty.jar.29=jtransforms-2.4.0.jar
35languageproperty.jar.30=scalapb-runtime_2.12-0.7.1.jar
36languageproperty.jar.31=lenses_2.12-0.7.0-test2.jar
37languageproperty.jar.32=fastparse_2.12-1.0.0.jar
38languageproperty.jar.33=fastparse-utils_2.12-1.0.0.jar
39languageproperty.jar.34=sourcecode_2.12-0.1.4.jar
40languageproperty.jar.35=protobuf-java-3.4.0.jar
41languageproperty.jar.36=scala-arm_2.12-2.0.jar
42languageproperty.jar.37=config-1.3.0.jar
43languageproperty.jar.38=scala-reflect-2.12.10.jar
44
45# language of the language setting
46language.locale.language=en
47
48# simplifier FQCN.
49inputProcessorHandler.simplifier.class=com.artisol.teneo.engine.core.inputprocessor.StandardSimplifier
50
51# order of input processors FQCN.
52inputProcessorHandler.inputProcessor.class.1=com.artisol.teneo.engine.core.inputprocessor.StandardSplitting
53inputProcessorHandler.inputProcessor.class.2=com.artisol.teneo.engine.core.inputprocessor.StandardAutoCorrection
54inputProcessorHandler.inputProcessor.class.3=com.artisol.teneo.engine.inputprocessors.general.Predict
55inputProcessorHandler.inputProcessor.class.4=com.artisol.teneo.engine.core.inputprocessor.StandardSimilarityMatchCorrection
56inputProcessorHandler.inputProcessor.class.5=com.artisol.teneo.engine.core.inputprocessor.SystemAnnotation
57inputProcessorHandler.inputProcessor.class.6=com.artisol.teneo.engine.core.inputprocessor.BasicNumberRecognizer
58inputProcessorHandler.inputProcessor.class.7=com.artisol.teneo.engine.core.inputprocessor.DateTimeRecognizer
59inputProcessorHandler.inputProcessor.class.8=com.artisol.teneo.engine.inputprocessors.general.LanguageDetector
60inputProcessorHandler.inputProcessor.class.9=com.artisol.teneo.engine.inputprocessors.english.EnglishPOSTagger
61inputProcessorHandler.inputProcessor.class.10=com.artisol.teneo.engine.inputprocessors.english.EnglishNERTagger
62
63# IPs to be used for normalization IP chain
64languageproperty.normalizationIPs=StandardSplitting,StandardAutoCorrection
65

For each jar file, a property with the name languageproperty.jar.<n> must exist, where <n> is a unique positive integer number (the actual value is not important). The property value is the jar file name, without a file path.

Property language.locale.language must be set to the ISO-639-1 language code (that is, the name of the subfolders of /<version>/languages).

Property inputProcessorHandler.simplifier.class must be set to the fully qualified class name (FQCN) of the Simplifier.

For each IP to use property inputProcessorHandler.inputProcessor.class. <n> must exist specifying the FQCN of the IP. <n> is a positive integer number. The value specifies the execution order of the IPs, where the IP with the lowest number is executed first.

Platform upgrades

As stated, any custom Input Processor configuration applied to a solution will take precedence over the system Input Processor configuration; because of this and the fact that Teneo Input Processor configurations are version specific the Platform will not be able to find the correct configuration in the custom Input Processor ZIP file until the custom Input Processor configuration has been manually upgraded.

This means that after an upgrade of the Teneo Platform, a couple of steps are required to also upgrade the custom Input Processor configuration (these steps must be taken for every solution which has a custom Input Processor configuration applied):

  1. Ensure you have the additional Input Processor libraries and/or modified config outside the Platform
    If the files are not stored externally, the pre-upgrade version of the IP configuration can be downloaded from the Resources > File backstage tab within the solution. The new/modified files can then be extracted from there
  2. Download and save the existing customerinputprocessorsetup.zip resource from Resources > File tab (in case there is a need to roll back or review)
  3. Then, Delete the existing customerinputprocessorsetup.zip resource from Resources > File tab
  4. Download an upgraded version of the Platform IP configuration from the Import/Export tab (see Creation of the custom IP configuration)
    If at this stage a LANGUAGE_SETTINGS_MISSING error is seen refer to Troubleshooting below
  5. Apply to the upgraded version any changes that are required
  6. Re-zip and add again as a customerinputprocessorsetup.zip publish resource.

Troubleshooting: Try out - “Directory missing” error

When an environment is upgraded, but a solution custom Input Processor configuration has not yet been upgraded, then Engine will not start and an error will be displayed in the Simple Try Out as well as in the Advanced Try Out window.

Warning in Try Out

Additionally, the same error would be seen in Engine log files if a publish is made without upgrading first the IP configuration.

The solution to this error is to perform the upgrade steps as described above.

Troubleshooting: Download IP configuration - LANGUAGE_SETTINGS_MISSING

When an environment is upgraded but a solution custom IP configuration has not yet been upgraded then any attempt to “Export language config” (see the section Creation of the custom IP configuration or step 4 in the upgrade process above) will result in an error as follows:

Language Settings Missing message

The solution to this error is to ensure that the existing IP configuration has been removed from the solution before attempting to Export.

Resource Files
Natural Language Processing Capabilities in Teneo