Teneo Developers

Anonymize user data

When dealing with bots, a user is in very often in a position of giving out personal information. Keeping track of what personal data may be captured by the bot is therefore part of the normal development of bots. The Pre-logging script event can be used to anonymize personal data and in a way where the functionality of the bot is not affected. With the Pre-logging script, we can:

  • Remove data before it leaves Teneo Engine
  • Redact, remove, or encrypt sensitive data

On this page, we will show you how to anonymize personal data with the following steps:

  • Create a Global variable that stores the values to redact.
  • Add a Post-processing script to capture relevant data, so that it can later be redacted.
  • Add an End-dialog script to sort the log to redact in descending order of length to ensure that replacement of shorter strings doesn't break up longer strings;
  • Add a Pre-logging script to redact the values and remove the variable.

The first section provides a simple use-case of anonymizing personal data, while the second section describes more advanced use-cases.

personal-data-redacted

Anonymize a variable

For this example, we will replace the value in userFirstName, which stores the users first name. This variable is located inside the Longberry Baristas solution and comes with the Teneo Dialogue Resources.

  1. While inside your solution, click on the 'Solution' tab.
  2. Click on 'Globals'.
  3. Select the 'Scripts' tab at the top.
  4. Add a new 'Pre-logging' script and name it Replace user name.
  5. Add the following line into the editing window: _.getDialogHistoryUtilities().replaceVariables(['Lib_sUserFirstName'], 'John Doe'). This will automatically replace the variable value to 'John Doe' right before its starts logging it.
  6. Hit 'Save'.

script-explained1

This does not affect the functionality in your solution, the variable is replaced once the conversation is over and the tasks is done. This is also possible with Outputs, Metadata and much more. Please visit reference documentations for more details.

Anonymize personal data

One other scenario where its powerful to use Pre-logging scripts is when you want to anonymize personal data. Teneo is powerful at recognizing different values of personal data thanks to the Named-entity recognizer and Part-of-speech annotation tags. In the following example, we will go ahead and redact the names mentioned while communicating with our bot, using the PERSON.NER annotation tag.

Here's a list of the Part-of-speech (POS) and Named-entity recognition (NER) tags included in Teneo.

Create a global variable

As a first step, we need to create a new Global variable that will store the values we want to redact:

  1. In the Solution backstage, select 'Globals' followed by 'Variables'.
  2. Click 'Add'. A panel for specifying the new variable appears on the right-hand side.
  3. Name the variable toRedact, and set its initial value to an empty list: []. (Make sure to edit the "Value" field and not the "Description" field.)
  4. Hit 'Save'.

Add a Post-processing script

Next in line is to add a Post-processing script to store the relevant values in the global variable we created in the previous step.

  1. Select the 'Scripts' tab at the top.
  2. Add a new 'Post-processing' script and name it Find items to redact.
  3. Add the following groovy script into the editing window:

groovy

1// Find names that have been mentioned by the user
2_.inputAnnotations.getByName('PERSON.NER').each { person ->
3  def sentence = _.sentences[person.sentenceIndex]
4  def firstWord = sentence.words[person.wordIndices.min()];
5  def lastWord = sentence.words[person.wordIndices.max()];
6  def beginIndex = firstWord.beginIndex
7  def endIndex = lastWord.endIndex
8// Select the items to redact
9  def itemToRedact = [
10    'historyIndex': _.dialogHistoryLength,
11    'beginIndex': beginIndex,
12    'endIndex': endIndex,
13    'value': sentence.text.substring(beginIndex, endIndex)
14  ]
15  toRedact.add(itemToRedact)
16}
17
  1. Hit 'Save'.

The value to store can be changed by replacing the PERSON.NER in line 2, to any other Part-of-speech (POS) or Named-entity recognition (NER) tag.

Add a Pre-logging script

Finally, we will add a Pre-logging script to redact the values stored in the global variable.

  1. While inside the 'Scripts' tab, add a 'Pre-logging' script called Redact names
  2. Add the following code into the editing window:

groovy

1// Replace the mentioned name with '*'
2toRedact.each {
3  _.dialogHistoryUtilities.redact(it.historyIndex, it.beginIndex, it.endIndex, '*' as char)
4}
5def values = [*toRedact.collect {it.value}, Lib_sUserFirstName]
6// Replace output
7_.dialogHistoryUtilities.replaceResponseText(values, '****')
8// Remove the variables
9_.dialogHistoryUtilities.removeVariables(['Lib_sUserFirstName', 'toRedact'])
10
  1. Hit 'Save'.

Make sure you return to Tryout and reload the engine before continuing on the next step.

Publish and test your bot

In order to see if our scripts work, you will need to publish your bot. Proceed as follows:

  1. Open the 'SOLUTION' tab in the solution's window.
  2. Select 'Publish' in the left sidebar.
  3. Click the 'Manage' button and in the drop-down you will see a lot of different alternatives. Locate the 'Latest' section and choose 'Publish'.

You might see a warning saying 'Publish to 'Default env' stopped with warnings. '

Teneo - Error when publish

This is nothing to worry about; the warning is shown when you publish your solution for the first time or when you have made certain global changes. To proceed, just check the checkbox 'Perform full application deployment on Try again' and click the 'Try again' button.

The publication may take a couple of minutes; the video below is sped up slightly. When it has finished, you'll receive a confirmation pop-up.

  1. Once published, click on the blue 'Open' icon. This will open the Teneo Web Chat in a new browser tab.

    Teneo Web Chat is also available in the Bots section in your team console. You need to publish the bot from Teneo Studio before using it.

  2. Click on the blue icon in the bottom right corner to open up the Teneo Web Chat window.

  3. Strike up a conversation with the bot, like:

  • Hi, my name is John Doe
  • Goodbye!
  1. Close the chat window to end the conversation.

Read the logs

Now return to your Teneo Studio and open up Log Data Source to see if the name has been redacted.

  1. Open the 'SOLUTION' tab in the solution's window.
  2. Select 'Optimization' in the left sidebar.
  3. Navigate to 'Log Data' and open up your source by clicking on the 'Manage' button followed by 'Open'.

A new window should now open. This is the Log Data window, described here. The next step is to open a new Session Viewer tab to retrieve the latest session.

  1. In the 'Session Viewer' section, click on 'New Session Viewer Tab'.
  2. Change the values to 'Start date' and 'Descending' to retrieve the most recent session.

You should see the following conversation. If not, please repeat the steps above.

personal-data-redacted

Anonymize personally identifiable information (PII)

In the following example we will go ahead and redact the personal information mentioned while communicating with our bot, using Regex as a tool to do so.

Create global variables

We must first create a few global variables. Below find a list of all essential variables to be added. You can adjust the values to meet your requirements.

Variable NameDescriptionInitial Value
pLogAnnotationsToAnonymiseAnnotations which should be anonymised/pseudonymised before being written to the logs[[ner: 'PERSON.NER', tag: '<person>'], [ner: 'EMAIL.NER', tag: '<email>'], [ner: 'ADDRESS.NER', tag: '<address>'], [ner: 'LOCATION.NER', tag: '<location>'], [ner: 'ZIP_CODE.NER', tag: '<postcode>'], [ner: 'IP.NER', tag: '<ip>']]
pLogAnonymiseStringThis string will be used to replace log data if pLogIsAnonymise = true"XXXXXX"
pLogIsAnonymiseIf true, will anonymise PII ("xxxx"), if false will pseudonymise using tag specified in pLogAnnotationsToAnonymise variablefalse
pLogToRedactList of contents to redact[]

Add a Post-processing script

Next we add a Post-processing script to store the relevant values in the global variable pLogToRedact, created in the previous step.

  1. Select the 'Scripts' tab at the top.
  2. Add a 'Post-processing' script with a name like Find PII mentioned by user.
  3. Add the following groovy script into the editing window:

groovy

1// Find PII mentioned by the user
2def ii = 1;
3
4pLogAnnotationsToAnonymise.each { annotation ->
5    println "working on annotation " + ii++ + annotation;
6    _.inputAnnotations.getByName(annotation.ner).each { item ->
7        println "annotation item: " + item;
8        try {
9            def sentence = _.sentences[item.sentenceIndex];
10            println "sentence: " + sentence;
11            def firstWord = sentence.words[item.wordIndices.min()];
12            def lastWord = sentence.words[item.wordIndices.max()];
13            def beginIndex = firstWord.beginIndex;
14            def endIndex = lastWord.endIndex;
15            // Save the items to redact
16
17            def itemToRedact = [
18                'historyIndex': _.dialogHistoryLength,
19                'sentenceIndex': item.sentenceIndex,
20                'beginIndex': beginIndex,
21                'endIndex': endIndex,
22                'value': sentence.text.substring(beginIndex, endIndex),
23                'strLength': endIndex - beginIndex,
24                'tag': annotation.tag
25            ]
26            pLogToRedact.add(itemToRedact)
27            println 'Annotated ' + annotation + ": " + sentence.text.substring(beginIndex, endIndex);
28        } catch (Exception e) {
29            println "Exception! " + e;
30            println "Using annotation " + annotation.ner;
31            println firstWord;
32            println lastWord;
33        }
34    }
35}
36
  1. Hit 'Save'.

Add an End dialog script

The next step is to add an End dialog script to sort the global variable pLogToRedact in descending order of length. This ensures that replacement of shorter strings doesn't break up longer strings and lead to lack of redaction

  1. While inside the 'Scripts' tab, add an 'End dialog' script with a name like Sort pLogToRedact.
  2. Add the following code into the editing window:

groovy

1if (pLogToRedact.size() > 1) {
2    //sort in descending order of length to ensure that replacement of shorter strings
3    //doesn't break up longer strings and lead to lack of redaction
4    pLogToRedact.sort { a, b ->
5        b.strLength <=> a.strLength
6    }
7}
8
  1. Hit 'Save'.

Add a Pre-logging script

Finally, add a Pre-logging script to redact the values stored in the global variable.

  1. While inside the 'Scripts' tab, add a 'Pre-logging' script and give it a name like Redact PII values.
  2. Add the following code into the editing window:

groovy

1public class preloggingHandler {
2
3    public static String maskVars(String varName, ArrayList varValue) {
4
5        if (varName == 'pLogToRedact') {
6            return ['<redacted>'];
7        } else {
8            return varValue;
9        }
10    }
11}
12
13if (pLogToRedact) {
14
15        try {
16            pLogToRedact.each { pii ->
17                def replaceWith = pLogIsAnonymise ? pLogAnonymiseString : pii.tag;
18                _.getDialogHistoryUtilities().replaceUserInputText(text -> text.replaceAll(/(?i)(?:\b|^)$pii.value(?:\b|$)/, replaceWith));
19                _.getDialogHistoryUtilities().replaceResponseText(text -> text.replaceAll(/(?i)(?:\b|^)$pii.value(?:\b|$)/, replaceWith));
20            }
21        } catch (Exception e) {
22            println (e.getMessage());
23        }
24
25    // Remove request parameters
26    _.getDialogHistoryUtilities().replaceRequestParameters(['userinput'], pLogAnonymiseString);
27    _.getDialogHistoryUtilities().replaceRequestParameters(['channel'], 'anyChannel');
28
29    // Remove variables
30    try {
31        _.getDialogHistoryUtilities().replaceVariables((varName, varValue) -> (varValue instanceof String || varValue instanceof ArrayList ? preloggingHandler.maskVars(varName, varValue) : varValue));
32    } catch (Exception e) {
33        _.getDialogHistoryUtilities().replaceRequestParameters(['userinput'], e.getMessage());
34    }
35}
36
  1. Hit 'Save'.

Create customized annotations using Language Objects

As noted above, Teneo Studio provides you with great flexibility for customization. You can create your own rules of personal information recognition and redaction by adding customized annotations. In the following example we will use the language object TITLES.LIST to capture the title of a person, such as Mr, Mrs, etc., and generate a customized annotation via Global Pre-listener.

  1. Open the 'SOLUTION' tab in the solution's window.
  2. Select 'Globals' in the purple bar on the left-hand side, and then select 'Listeners'.
  3. Click 'Add' and select 'Pre listener' in the drop-down list.
  4. Give the listener a name, for example Customize annotations by LO.
  5. Click the back arrow in the top left corner.
  6. Add the following condition in the TLML Syntax field:

tlml

1(%TITLES.LIST^{pLogAnnotLO = []; def tmpAnnot = [:]; tmpAnnot.put("name", 'TITLE'); tmpAnnot.put("sentenceIndex", _.sentenceIndex - 1); tmpAnnot.put("usedWordIndices", _.usedWordIndices); pLogAnnotLO << tmpAnnot})
2~
3(%TITLES.LIST^{def tmpAnnot = [:]; tmpAnnot.put("name", 'TITLE'); tmpAnnot.put("sentenceIndex", _.sentenceIndex - 1); tmpAnnot.put("usedWordIndices", _.usedWordIndices); pLogAnnotLO << tmpAnnot})
4
  1. Add the following code in the Execution Script field:

groovy

1pLogAnnotLO.eachWithIndex { annotation, index -> _.inputAnnotations.add(_.createInputAnnotation(annotation.name.toUpperCase(), annotation.sentenceIndex, annotation.usedWordIndices, [:]))
2}
3
4
  1. Save and close the listener.
  2. While inside the 'Global' panel, select 'Variables'.
  3. Add a variable called pLogAnnotLO with the initial value as empty list [].
  4. Select the variable pLogAnnotationsToAnonymise we have created, and add [ner: 'TITLE', tag: '<title>'] within the list.
  5. Click 'Save All' on the left-hand side.

Create customized annotations using Regular Expressions

In addition to language objects, you can use Regular Expressions, which are more flexible in creating customized annotations. In the following example we will create a regular expression to capture the IBAN (International Bank Account Number) and generate a customized annotation via Global pre-listener as well.

  1. Download and import the RegAnnotHelper.groovy file into your solution following this guide.
  2. Repeat the step 1-5 of the last session: Create customized annotations by Language Object to create another global pre-listener and give it a different name, such as Customize annotations by Regex.
  3. Add %TRUE.SCRIPT (or {true}, if the language object TRUE.SCRIPT cannot be found) to the Condition field, which will allow this listener to be triggered with any input.
  4. Add the following code in the Execution Script field:
RegAnnotHelper.annotateAnchoredRegex(_, 'IBAN',/\b[A-Za-z]{2}[0-9]{2}(?:[ ]?[0-9]{4}){4,5}/)
  1. Save and close the listener.
  2. Repeat the step 9, 11 and 12 of the last session to add [ner: 'IBAN', tag: '<iban>'] to the global variable pLogAnnotationsToAnonymise.

Make sure you return to Tryout and reload the engine before continuing on the next step

Publish and test your bot

In order to see if our scripts work, you will need to publish your bot. Proceed as follows:

  1. Open the 'SOLUTION' tab in the solution's window.
  2. Select 'Publish' in the left sidebar.
  3. Click the 'Manage' button and in the drop-down you will see a lot of different alternatives. Locate the 'Latest' section and choose 'Publish'.

You might see a warning saying 'Publish to 'Default env' stopped with warnings. '

Teneo - Error when publish

This is nothing to worry about; the warning is shown when you publish your solution for the first time or when you have made certain global changes. To proceed, just check the checkbox 'Perform full application deployment on Try again' and click the 'Try again' button.

The publication may take a couple of minutes; the video below is sped up slightly. When it has finished, you'll receive a confirmation pop-up.

  1. Once published, click on the blue 'Open' icon. This will open the Teneo Web Chat in a new browser tab.

    Teneo Web Chat is also available in the Bots section in your team console. You need to publish the bot from Teneo Studio before using it.

  2. Click on the blue icon in the bottom right corner to open up the Teneo Web Chat window.

  3. Strike up a conversation with the bot, like:

  • Hi, my name is John Doe
  • Please call me Doctor
  • I live in 585 Gran Via, Barcelona
  • My bank account is ES7921000813610123456789
  • Goodbye!
  1. Close the chat window to end the conversation.

Read the logs

Now return to your Teneo Studio and open up Log Data Source to see if the name has been redacted.

  1. Open the 'SOLUTION' tab in the solution's window.
  2. Select 'Optimization' in the left sidebar.
  3. Navigate to 'Log Data' and open up your source by clicking on the 'Manage' button followed by 'Open'.

A new window should now open. This is the Log Data window, described here. The next step is to open a new Session Viewer tab to retrieve the latest session. 4. In the 'Session Viewer' section, click on 'New Session Viewer Tab'. 5. Change the values to 'Start date' and 'Descending' to retrieve the most recent session.

lds-print-done

You should see the following conversation. If not, please repeat the steps above.