Teneo Developers

Class manager

Handle Classes and Training Data

The Class Manager provides a way of adding training data and training a Machine Learning (ML) model directly in Teneo Studio. The created classes can be used in Match Requirements of Flow Intent Triggers and transitions either on their own or together with other Match Requirements.

Whenever changes are saved to the Class Manager, a Machine Learning model is trained in the Teneo Learn component with the training data provided in the Class Manager. This model is used at runtime in the Teneo Predict Input Processor (IP) to determine to which of the classes the user input is most probably related and hence which Flow the user input should trigger.

For each input, Teneo Predict will generate a top-rated annotation that is tagged as the top intent (with a TOP_INTENT suffix) and it will also create annotations for the most important class intents for that particular input. These annotations will be created with the scheme <CLASS_NAME>.INTENT.

For each of the intent classes, an annotation is generated indicating what confidence the model has in that intent being the correct one. The annotation also defines the order of likelihood of a given class, denoted by the confidence score.

For each of the classes, an annotation is generated indicating what confidence the model has in that intent class being the correct one. The annotation also defines the order of likelihood of a given class, denoted by the confidence score.

Intent classes and confidence scores

ML model training

The Machine Learning model is trained in the Teneo Learn component directly in Teneo Studio and the Machine Learning Pipeline for training the model is automatically selected by the Teneo Learn component using an algorithm that assigns a pipeline optimized to the number of training data provided. Every change in any of the classes in the Class Manager, including addition or removal of classes, will activate a new training of the solution's machine learning model in Teneo Learn as soon as the user saves the changes.

Input processing

Tokenization

For the Intent Classifier to generalize well on unseen user inputs, user inputs are pre-processed in such a way that irrelevant features such as punctuation characters or casing do not have an impact on the machine learning prediction. In general, this means that user inputs are lowercased, and certain punctuation characters are removed. The following table contains several user inputs in English that will be normalized to the same form and as such the Intent Classifier will make the same prediction for them.

Inputs
Hello, how are you?
Hello. How are you?
HELLO HOW ARE YOU??
hello how are you

It is vital that the exact same pre-processing that will be applied to the user inputs at runtime for the intent prediction is applied to the training data that is used to train the Intent Classifier/ML model. This means that pre-processing happens in two places: once for the training data in Teneo Studio and once at prediction time when the model is called, in the Teneo Predict Input Processor.

Pre-processing

For the pre-processing, the language-specific Teneo Input Processors (IPs) are used since they are tailored for the needs of each language. The final pre-processing string will be a concatenation using a single whitespace character " " of all the FINAL word forms of all sentences for a user input.

Note that the order of the input processors in the chain matters and all input processors that modify ORIGINAL, SIMPLIFIED or FINAL word form before the Predict Input Processor will have an impact on the actual prediction. Input Processors that are placed after the Predict IP in the chain do not affect the intent classification.

Configuration

For each language, the chain of input processors that is used to normalize the data is defined explicitly in the language-specific Input Processor configuration file. That configuration file (config.properties) can be found and modified when exporting and re-importing a custom input processor setup (custominputprocessorsetup.zip). See the custom input processor configuration section for more information on how to do this.

The configuration file contains a mandatory property {languageproperty.normalizationIP{}} that takes as an argument an ordered, comma-separated list of Input Processors that will be applied when normalizing the data. Note that at least one Input Processor needs to be provided.

It is important to note that the Input Processors that are defined in the property languageproperty.normalizationIPs must occur in the same order in the definition of the input processor chain (inputProcessorHandler.inputProcessor.class.) before the Predict Input Processor, and that Input Processor that modify the FINAL word form do not occur before the Predict Input Processor in the definition of the chain if they are not explicitly mentioned in the right order in the property languageproperty.normalizationIPs.

Note that the Simplifier defined in the configuration file is always run when normalizing the data and usually affects the FINAL word form and thus the normalized output unless otherwise defined in the input processors.

Standard configuration

The following table shows the input processors that are used by default to normalize the data. Again, note that per default the Simplifiers are also applied.

LanguageInput Processors
Standard (all languages except below)StandardSplitting, StandardAutoCorrection
ChineseChineseTokenizerIP
FinnishStandardSplitting, FinnishSplitting, StandardAutoCorrection
JapaneseJapaneseTokenizer, JapaneseConcatenator
TurkishTurkishAnalyzer

Please see language specific information in the Input Processors section.

Generation of annotations

If there are Classes with a confidence value over the confidence threshold defined in the solution, the Intent Trigger with the Class Match Requirement corresponding to the one with the highest confidence value will be triggered. The Input section in the Tryout window display the triggered Class Name and the confidence score.

Teneo generates annotations for a maximum of 5 top intent classes. The annotations will only be created if the difference in the confidence between an intent and the top intent is less than the TOP_INTENT divided by two. For example, imagine that the Machine Learning model predicts these top five intent classes for a particular user input:

​ A 0.14
​ B 0.11
​ C 0.06
​ D 0.05
​ E 0.03

Teneo will only generate the following annotations:

A.TOP_INTENT confidence 0.14
A.INTENT confidence 0.14
B.INTENT confidence 0.11

In this example, there are no annotations for C, D nor E. The reason for this is that the confidence values of C (0.06), D (0.05) and E (0.03) are lower than TOP_INTENT (0.14) divided by two (0.07).

Global Confidence threshold

The Confidence threshold of the machine learning Classifier determines the minimum confidence value the model must assign to a Class in order for a Class Match Requirement to trigger a trigger. The confidence threshold is a numeric value between 0 and 1. The confidence threshold is by default 0.45.

The Confidence threshold can be modified in the Solution Properties.

Testing and improving Classes

Users can test and improve classes in various ways in Teneo Studio, for example:

  • Run Auto-test to see if the examples provided in the Intent Triggers trigger the expected triggers and Match Requirements
  • Run Class Performance to perform cross validation on the classes and training data of the solution
  • Test example inputs in the Tryout to ensure that specific inputs are triggering expected Intent Triggers with Class Match Requirements
  • Review inputs and classes in the Classifier once a solution has been published and has generated log data.

Working in the Class Manager
Annotations
Input Processors