Custom Input Processor configuration
The Input Processors and the Simplifier are configurable from within the Platform; this allows users to customize the Input Processors (IPs) of the Teneo Engine and their configuration for any solution. The custom IP configuration is provided to a solution as a file resource with a fixed name and content structure.
This page describe the process of creating a custom Input Processor configuration and how to apply it to the solution. Furthermore, at the end of the page, the reader will find useful information related to handling Platform upgrades when using custom IP configurations as well as a short troubleshooting section.
Detailed information of the Input Processor configuration properties can be found on the Input Processor API page.
Create custom IP configuration
Creating a custom IP configuration basically involves the below steps which are all described in more details in the following sections:
- Export of the language config in .zip from Teneo Studio Desktop
- Unzipping and modifying the IP configuration
- Rezipping the files with the following name:
- Adding of the ´custominputprocessorsetup.zip´ to the solution as a file resource with the folder path
- Reload of the Tryout to apply the changes to the solution
- Republish (full publish) of the solution to apply the changes to the published solution/AI application
Export language config
This step can currently only be performed in Teneo Studio Desktop and only if the user has the needed export permissions; read more
To export the current Input Processor configuration, follow the below steps:
- Go to Import/Export in the backstage of Teneo Studio
- Click Export config
- Browse to the folder where to store the export and click Save
- Teneo Studio will start exporting the language configuration zip to the selected local folder, when done click OK.
The Export language config packs the entire Input Processor configuration and .jar-files, according to the current solution language and export it as a ZIP file that can be stored locally
Add custom IP config files as a Resource File
To add the custom Input Processor configuration to the solution again, follow the below steps:
- Create a .zip containing the configuration files, the .zip must be named
- Follow the steps outlined here to add the file to Teneo Studio setting the Publish Location to
Please note that the language selection of the solution must be identical to the one that was effective when the Input Processor configuration file was exported; if the languages don't match then Engine will not start.
Also, the Tryout Engine needs to be reloaded for the changes to take effect and the solution republished to apply the changes to the published solution.
A custom Input Processor configuration ZIP added to a solution takes precedence over the Input Processor configuration provided by Teneo Studio
Modify the custom IP file
To modify and update a custom Input Processor zip file, follow the steps outlined here to download the resource file, update it and re-add it to the solution.
Again, when the custom Input Processor configuration file is updated in the solution, the Tryout Engine must be reloaded and the solution published (full publish) to apply the new configuration.
Content of the language config zip
The folder structure of the Input Processor configuration file which is downloaded from Teneo Studio as described here is similar to the below image:
- The top folders
/<version>/languagesare fixed and many not be renamed
/<version>/jarscontains the jar files containing the IP classes and their dependency jar files
/<version>/languagescontains one subfolder that is named by the ISO-639-1 two letter code of the solution language
/<version>/languages/<language-code>contains one subfolder per Teneo Input Processor, named by the exact spelling and case of the IP's Java class name. Furthermore, one additional subfolder exists for the Simplifier, named by the exact spelling and case of the Simplifier's Java class name. The number and names of folders may vary, according to the configured language and Input Processors.
- The Simplifier folder and each IP folder contains at least the Simplifier's / IP's properties file, named by the folder named and with file extension
.properties. Depending on the particular Simplifier / IP the folder may contain additional configuration files.
In the folder
/<version>/languages/<language-code>, the user will also find the
config.properties file. This file specifies the language name and locale code, the Input Processor jar files and dependency jar files, as well as the Simplifier and Input Processors to be used by the Engine. It may look similar to the below example from the English IP configuration:
1language.name = English 2 3# Following properties are removed and not sent to the engine 4languageproperty.jar.1=european-input-processors-7.0.1.jar 5languageproperty.jar.2=commons-io-2.4.jar 6languageproperty.jar.3=commons-lang3-3.11.jar 7languageproperty.jar.4=commons-math3-3.6.1.jar 8languageproperty.jar.5=english-input-processors-7.0.1.jar 9languageproperty.jar.6=opennlp-tools-1.8.4.jar 10languageproperty.jar.7=general-input-processors-7.0.1.jar 11 12#Dependencies related to MLEAP 13languageproperty.jar.8=scala-library-2.12.10.jar 14languageproperty.jar.9=mleap-base_2.12-0.16.0.jar 15languageproperty.jar.10=mleap-core_2.12-0.16.0.jar 16languageproperty.jar.11=bundle-ml_2.12-0.16.0.jar 17languageproperty.jar.12=mleap-runtime_2.12-0.16.0.jar 18languageproperty.jar.13=mleap-tensor_2.12-0.16.0.jar 19languageproperty.jar.14=spray-json_2.12-1.3.2.jar 20languageproperty.jar.15=spark-mllib-local_2.12-2.4.5.jar 21languageproperty.jar.16=breeze_2.12-0.13.2.jar 22languageproperty.jar.17=breeze-macros_2.12-0.13.2.jar 23languageproperty.jar.18=core-1.1.2.jar 24languageproperty.jar.19=arpack_combined_all-0.1.jar 25languageproperty.jar.20=opencsv-2.3.jar 26languageproperty.jar.21=spire_2.12-0.13.0.jar 27languageproperty.jar.22=spire-macros_2.12-0.13.0.jar 28languageproperty.jar.23=machinist_2.12-0.6.1.jar 29languageproperty.jar.24=shapeless_2.12-2.3.2.jar 30languageproperty.jar.25=macro-compat_2.12-1.1.1.jar 31languageproperty.jar.26=slf4j-api-1.7.30.jar 32languageproperty.jar.27=spark-tags_2.12-2.4.5.jar 33languageproperty.jar.28=unused-1.0.0.jar 34languageproperty.jar.29=jtransforms-2.4.0.jar 35languageproperty.jar.30=scalapb-runtime_2.12-0.7.1.jar 36languageproperty.jar.31=lenses_2.12-0.7.0-test2.jar 37languageproperty.jar.32=fastparse_2.12-1.0.0.jar 38languageproperty.jar.33=fastparse-utils_2.12-1.0.0.jar 39languageproperty.jar.34=sourcecode_2.12-0.1.4.jar 40languageproperty.jar.35=protobuf-java-3.4.0.jar 41languageproperty.jar.36=scala-arm_2.12-2.0.jar 42languageproperty.jar.37=config-1.3.0.jar 43languageproperty.jar.38=scala-reflect-2.12.10.jar 44 45# language of the language setting 46language.locale.language=en 47 48# simplifier FQCN. 49inputProcessorHandler.simplifier.class=com.artisol.teneo.engine.core.inputprocessor.StandardSimplifier 50 51# order of input processors FQCN. 52inputProcessorHandler.inputProcessor.class.1=com.artisol.teneo.engine.core.inputprocessor.StandardSplitting 53inputProcessorHandler.inputProcessor.class.2=com.artisol.teneo.engine.core.inputprocessor.StandardAutoCorrection 54inputProcessorHandler.inputProcessor.class.3=com.artisol.teneo.engine.inputprocessors.general.Predict 55inputProcessorHandler.inputProcessor.class.4=com.artisol.teneo.engine.core.inputprocessor.StandardSimilarityMatchCorrection 56inputProcessorHandler.inputProcessor.class.5=com.artisol.teneo.engine.core.inputprocessor.SystemAnnotation 57inputProcessorHandler.inputProcessor.class.6=com.artisol.teneo.engine.core.inputprocessor.BasicNumberRecognizer 58inputProcessorHandler.inputProcessor.class.7=com.artisol.teneo.engine.core.inputprocessor.DateTimeRecognizer 59inputProcessorHandler.inputProcessor.class.8=com.artisol.teneo.engine.inputprocessors.general.LanguageDetector 60inputProcessorHandler.inputProcessor.class.9=com.artisol.teneo.engine.inputprocessors.english.EnglishPOSTagger 61inputProcessorHandler.inputProcessor.class.10=com.artisol.teneo.engine.inputprocessors.english.EnglishNERTagger 62 63# IPs to be used for normalization IP chain 64languageproperty.normalizationIPs=StandardSplitting,StandardAutoCorrection 65
For each jar file, a property with the name
languageproperty.jar.<n> must exist, where
<n> is a unique positive integer number (the actual value is not important). The property value is the jar file name, without a file path.
language.locale.language must be set to the ISO-639-1 language code (that is, the name of the subfolders of
inputProcessorHandler.simplifier.class must be set to the fully qualified class name (FQCN) of the Simplifier.
For each IP to use, the property
inputProcessorHandler.inputProcessor.class.<n> must exist specifying the FQCN of the IP.
<n> is a positive integer number. The value specifies the execution order of the Input Processors, where the IP with the lowest number is executed first.
Any custom Input Processor configuration applied to a solution will take precedence over the default Input Processor configuration; because of this and the fact that Teneo Input Processor configurations are version specific the Platform will not be able to find the correct configuration in the custom Input Processor zip file until the custom IP configuration has been upgraded manually.
This means, that after an upgrade of the Teneo Platform a couple of steps are required to also upgrade the custom Input Processor configuration and these steps must be take for every solution which has a custom Input Processor configuration applied:
- Ensure any additional Input Processor libraries and/or modified config files are stored outside the Platform
If the files are not stored externally, the pre-upgrade version of the IP configuration can be downloaded from Teneo Studio as explained here and the new/modified files can then be extracted from there
- Download and Save the existing
custominputprocessorsetup.zipresource from Teneo Studio as explained here
- Then Delete the existing
custominputprocessorsetup.zipresource from the solution as explained here
- Download an upgraded version of the Platform IP configuration from the solution (see Export language config)
If at this stage a LANGUAGE_SETTINGS_MISSING error is seen, refer to the Troubleshooting
- Apply to the upgraded version of the IP configuration any changes that are required
- Re-zip and add again as a
custominputprocessorsetup.zipfile resource in Teneo Studio.
Tryout - "Directory missing" error
When an environment is upgrade, but a solution's custom Input Processor configuration has not yet be upgraded, then Engine will not start and an error will be displayed in the Tryout.
Additionally, the same error is seen in Engine log files if a publish is made without upgrading first the IP configuration.
The solution to this error is to perform the upgrade steps as described above.
Download IP configuration - LANGUAGE_SETTINGS_MISSING
When an environment is upgraded but a solution's custom IP configuration has not yet been upgraded then any attempt to Export language config (see section Export language config or step 4 in the above upgrade process) will result in an error.
The solution to this error is to ensure that the existing custom IP configuration is removed from the solution before attempting to export as explained in step 3 in the above described upgrade process.