Custom Input Processor configuration
The splitting of user input text into sentences and words, spelling correction and other language depending processing are within the Teneo Platform done in separate modules: the Input Processors and the Simplifier. These are pluggable into the Teneo Engine, via the Input Processor API.
The Input Processors and the Simplifier are configurable from within the Platform allowing users to customize the Input Processors (IPs) of the Teneo Engine and the configuration for any solution by providing custom IP configurations as a resource file in Teneo Studio with a fixed name and content structure.
This opens for a number of possibilities: users can, for example, add their own input processors and update them themselves at any point or create custom specific auto corrections and abbreviation lists.
Creation of the custom IP configuration
Teneo Studio provides an export function for the custom IP configuration file. To get access to this export, the Studio user needs to have permissions to export the IP setup, read more here.
The Export language config function packs the entire Input Processor configuration and .jar-files, according to the current solution language, and export it as a ZIP file that can be downloaded and stored locally.
This file is ready to be applied to a solution, without any modifications. To apply it, it just needs to be added to the solution as a resource file.
The file name must be
custominputprocessorsetup.zip and the file path must be
/. Teneo Studio will recognize a custom Input Processor configuration file only with this exact name and location.
When a custom Input Processor configuration file is added to the solution, it takes precedence over the Input Processor configuration provided by Teneo Studio.
After a custom Input Processor configuration file is added to the solution, the Try Out engine must be restarted to apply the Input Processor configuration. Accordingly, the Try Out engine must be restarted after removing the custom Input Processor configuration file from the solution resources. Also, in both cases a full publish is required to apply the changes to a published conversational AI application.
Please note that the language selection of the solution must be identical to the one that was effective when the Input Processor configuration file was exported. If the languages don't match, then the Engine will not start.
Modification of the custom IP configuration
It is possible to modify the custom Input Processor configuration by downloading the attached resource file to a local file, unpack it and apply any desired modifications.
After modifying the file, it needs to be uploaded again to the solution, by replacing the existing resource file with the modified local version of the file.
After a custom Input Processor configuration file is updated in the solution, the Try Out engine must be restarted, and the conversational AI application must be fully published to apply the new configuration.
Content of the custom IP configuration resource file
The folder structure within the IP configuration ZIP file will look similar to this:
- The top folders
/<version>/languagesare fixed and may not be renamed.
/<version>/jarscontains the jar files containing the IP classes, and their dependency jar files.
/<version>/languagescontains one subfolder that is named by the ISO-639-1 two letter code of the solution language.
/<version>/languages/<language-code>contains one subfolder per IP, named by the exact spelling and case of the IP’s Java class name. Furthermore, one additional subfolder exists for the Simplifier, named by the exact spelling and case of the Simplifier’s Java class name. The number and names of folders may vary, according to the configured language and IPs.
- The Simplifier folder and each IP folder at least contains the Simplifier’s / IP’s properties file, named by the folder name and with file extension
.properties. Depending on the particular Simplifier / IP the folder may contain additional configuration files.
/<version>/languages/<language-code> also contains the file
config.properties. This files specifies the language name and locale code, the Input Processor jar files and dependency jar files, and the Simplifier and Input Processors to be used by Engine. It may look like this:
1language.name = English 2 3# Following properties are removed and not sent to the engine 4languageproperty.jar.1=european-input-processors-6.0.0.jar 5languageproperty.jar.2=commons-io-2.4.jar 6languageproperty.jar.3=commons-lang3-3.11.jar 7languageproperty.jar.4=commons-math3-3.6.1.jar 8languageproperty.jar.5=english-input-processors-6.0.0.jar 9languageproperty.jar.6=opennlp-tools-1.8.4.jar 10languageproperty.jar.7=general-input-processors-6.0.0.jar 11 12#Dependencies related to MLEAP 13languageproperty.jar.8=scala-library-2.12.10.jar 14languageproperty.jar.9=mleap-base_2.12-0.16.0.jar 15languageproperty.jar.10=mleap-core_2.12-0.16.0.jar 16languageproperty.jar.11=bundle-ml_2.12-0.16.0.jar 17languageproperty.jar.12=mleap-runtime_2.12-0.16.0.jar 18languageproperty.jar.13=mleap-tensor_2.12-0.16.0.jar 19languageproperty.jar.14=spray-json_2.12-1.3.2.jar 20languageproperty.jar.15=spark-mllib-local_2.12-2.4.5.jar 21languageproperty.jar.16=breeze_2.12-0.13.2.jar 22languageproperty.jar.17=breeze-macros_2.12-0.13.2.jar 23languageproperty.jar.18=core-1.1.2.jar 24languageproperty.jar.19=arpack_combined_all-0.1.jar 25languageproperty.jar.20=opencsv-2.3.jar 26languageproperty.jar.21=spire_2.12-0.13.0.jar 27languageproperty.jar.22=spire-macros_2.12-0.13.0.jar 28languageproperty.jar.23=machinist_2.12-0.6.1.jar 29languageproperty.jar.24=shapeless_2.12-2.3.2.jar 30languageproperty.jar.25=macro-compat_2.12-1.1.1.jar 31languageproperty.jar.26=slf4j-api-1.7.30.jar 32languageproperty.jar.27=spark-tags_2.12-2.4.5.jar 33languageproperty.jar.28=unused-1.0.0.jar 34languageproperty.jar.29=jtransforms-2.4.0.jar 35languageproperty.jar.30=scalapb-runtime_2.12-0.7.1.jar 36languageproperty.jar.31=lenses_2.12-0.7.0-test2.jar 37languageproperty.jar.32=fastparse_2.12-1.0.0.jar 38languageproperty.jar.33=fastparse-utils_2.12-1.0.0.jar 39languageproperty.jar.34=sourcecode_2.12-0.1.4.jar 40languageproperty.jar.35=protobuf-java-3.4.0.jar 41languageproperty.jar.36=scala-arm_2.12-2.0.jar 42languageproperty.jar.37=config-1.3.0.jar 43languageproperty.jar.38=scala-reflect-2.12.10.jar 44 45# language of the language setting 46language.locale.language=en 47 48# simplifier FQCN. 49inputProcessorHandler.simplifier.class=com.artisol.teneo.engine.core.inputprocessor.StandardSimplifier 50 51# order of input processors FQCN. 52inputProcessorHandler.inputProcessor.class.1=com.artisol.teneo.engine.core.inputprocessor.StandardSplitting 53inputProcessorHandler.inputProcessor.class.2=com.artisol.teneo.engine.core.inputprocessor.StandardAutoCorrection 54inputProcessorHandler.inputProcessor.class.3=com.artisol.teneo.engine.inputprocessors.general.Predict 55inputProcessorHandler.inputProcessor.class.4=com.artisol.teneo.engine.core.inputprocessor.StandardSimilarityMatchCorrection 56inputProcessorHandler.inputProcessor.class.5=com.artisol.teneo.engine.core.inputprocessor.SystemAnnotation 57inputProcessorHandler.inputProcessor.class.6=com.artisol.teneo.engine.core.inputprocessor.BasicNumberRecognizer 58inputProcessorHandler.inputProcessor.class.7=com.artisol.teneo.engine.core.inputprocessor.DateTimeRecognizer 59inputProcessorHandler.inputProcessor.class.8=com.artisol.teneo.engine.inputprocessors.general.LanguageDetector 60inputProcessorHandler.inputProcessor.class.9=com.artisol.teneo.engine.inputprocessors.english.EnglishPOSTagger 61inputProcessorHandler.inputProcessor.class.10=com.artisol.teneo.engine.inputprocessors.english.EnglishNERTagger 62 63# IPs to be used for normalization IP chain 64languageproperty.normalizationIPs=StandardSplitting,StandardAutoCorrection 65
For each jar file, a property with the name
languageproperty.jar.<n> must exist, where
<n> is a unique positive integer number (the actual value is not important). The property value is the jar file name, without a file path.
language.locale.language must be set to the ISO-639-1 language code (that is, the name of the subfolders of
inputProcessorHandler.simplifier.class must be set to the fully qualified class name (FQCN) of the Simplifier.
For each IP to use property
<n> must exist specifying the FQCN of the IP.
<n> is a positive integer number. The value specifies the execution order of the IPs, where the IP with the lowest number is executed first.
As stated, any custom Input Processor configuration applied to a solution will take precedence over the system Input Processor configuration; because of this and the fact that Teneo Input Processor configurations are version specific the Platform will not be able to find the correct configuration in the custom Input Processor ZIP file until the custom Input Processor configuration has been manually upgraded.
This means that after an upgrade of the Teneo Platform, a couple of steps are required to also upgrade the custom Input Processor configuration (these steps must be taken for every solution which has a custom Input Processor configuration applied):
- Ensure you have the additional Input Processor libraries and/or modified config outside the Platform
If the files are not stored externally, the pre-upgrade version of the IP configuration can be downloaded from the Resources > File backstage tab within the solution. The new/modified files can then be extracted from there
- Download and save the existing
customerinputprocessorsetup.zipresource from Resources > File tab (in case there is a need to roll back or review)
- Then, Delete the existing
customerinputprocessorsetup.zipresource from Resources > File tab
- Download an upgraded version of the Platform IP configuration from the Import/Export tab (see Creation of the custom IP configuration)
If at this stage a LANGUAGE_SETTINGS_MISSING error is seen refer to Troubleshooting below
- Apply to the upgraded version any changes that are required
- Re-zip and add again as a
Troubleshooting: Try out - “Directory missing” error
When an environment is upgraded, but a solution custom Input Processor configuration has not yet been upgraded, then Engine will not start and an error will be displayed in the Simple Try Out as well as in the Advanced Try Out window.
Additionally, the same error would be seen in Engine log files if a publish is made without upgrading first the IP configuration.
The solution to this error is to perform the upgrade steps as described above.
Troubleshooting: Download IP configuration - LANGUAGE_SETTINGS_MISSING
When an environment is upgraded but a solution custom IP configuration has not yet been upgraded then any attempt to “Export language config” (see the section Creation of the custom IP configuration or step 4 in the upgrade process above) will result in an error as follows:
The solution to this error is to ensure that the existing IP configuration has been removed from the solution before attempting to Export.