Data footprint

Designing our bot to have an effective data footprint is a good idea from many perspectives. It lets us query the conversational log data much faster, puts a number of good practices into focus, and it makes us conscious of what we're storing in the conversational log data, which helps from a privacy perspective.

Frame 16

Designed for Conversational Data

Teneo Inquire — which is the analytics and data part of the Teneo platform — along with all of Teneo, is built to perform well on conversational data. Teneo is not designed to store large chunks of binary data or large and unique JSON structures.

Typical conversational data sessions in the Teneo platform are normally in the range of 50 kb to 400 kb. Very large sessions, e.g. really long sessions with many turns of dialog, can be slightly larger at 400 kb to 800 kb.

Teneo has a limit of 1 mb per session.

Teneo Inquire scales typical conversational data well with traffic, meaning it scales well in regards to API calls per session.

Teneo Inquire scales less well with data outside of its purpose, including:

Conversational log data, which includes a large number of non-conversational data, such as big JSON payloads or large binary objects.
Very large conversational log data sessions which are outside of the normal spans. This is often an indicator that the bot includes large chunks of non-conversational data.

Solution data footprint is key

An effective solution data footprint is a key indicator of a well-designed bot. It also greatly impacts the performance of Teneo Inquire, which affects how fast we can query our conversational log data and how quickly Teneo Studio is able to give us feedback in e.g. the Optimization section.

Measuring the session size

You can use this scripts to get a sense of how large your session logs are. In order to be able to do this you would need to download a groovy file, and upload it to your solution resources.

Download the following groovy file: SessionSizeStatistics.groovy
Locate to the solution backstage and select 'Resources'.
Select 'File' at the top.
Use 'Add' on the upper right to add the SessionSizeStatistics.groovy file. (Alternatively, you can drag and drop it.)
Set the 'Published Location' for this file from / to /script_lib. This ensures the file can be accessed using a Groovy script later.
Hit 'Save'.

The uploaded groovy script provides multiple ways of using it, below you will find some examples that can be useful.

Script	Description
`println(SessionSizeStatistics.build(engineAccess).always())`	Continuously print session size (in kb) on Tryout
`println(SessionSizeStatistics.build(engineAccess))`	Only print warnings for sessions passing, 0,5, 0,75 and, 1 mb

These are recommended to be placed as a End dialog global script and not anywhere else to reduce data footprint.

Store session size in Variables

Each session size can be retrieved and stored in a global variable, the script to retrieve this varies depending on if you want to access it to for your Development (Dev) and Quality Assurance / Staging (QA), or Production (Prod) environment.

Note that session size is given in kb.

Here is how to do that:

Navigate to your solution backstage, followed up with 'Globals' and Variables.
Create a new Global Variable called sessionSize and give it the value 0.
Save and navigate over to 'Scripts'.
Add a new 'End dialog' script and paste in the following snippet:

Environment	Script
Development (Dev) and Quality Assurance / Staging (QA)	`sessionSize = SessionSizeStatistics.build(engineAccess).size`
Production (Prod)	`sessionSize = SessionSizeStatistics.build(engineAccess).production().size`

Please be aware that the changes will not be live until you Publish your solution.

Retrieve Session size with TQL

You can then retrieve the session sizes with Teneo Query Language (TQL) using one of the following queries:

Query	Description
`la avg s.sv:n:sessionSize`	Get an average view of the sessions
`la s.sv:n:sessionSize as 'sessionSize' order by sessionSize desc`	List all the sessions in decreasing order
`la s.id, s.sv:n:sessionSize : s.sv:n:sessionSize >= 500`	List all sessions with over 0.5 mb size
`ca s.sv:n:sessionSize : s.sv:n:sessionSize >= 500`	Count the number of sessions with over 0.5 mb size

Good practices

Here are some good practices when working on your Data Footprint.

Solution design

Avoid sending in large 'blobs' or strings representing objects of data as these are very costly. Instead, use integrations to call web services when you need to retrieve this data.

Teneo Inquire

Use Adorners! Adorners can be used to copy variables from event level to session level, which means that they will be faster to query for. You can read more about Adorners in the documentation and here in the Developers pages.
Use Aggregators! Aggregators are used to aggregate data, for example the amount of traffic towards the bots' key flows. These are incredibly fast to query against, and can be used to power dashboards. You can read more about Aggregators in the documentation and here in the Developers pages.
Use Sample! When your bot is successful and your datasets grow larger, TQL queries will take longer to run. To quickly design queries, you can use the sample command to ask Teneo to run your query over a small subset of sessions and return results. Read more about sampling in the TQL Reference.
- You can also use limit, but if the number of hits is small or if a mistake is made in the TQL query, you may end up waiting a long time for no results. Read more about limit in the TQL Reference and in the TQL Reference Guide in the Developers pages.
When you are working on reporting and analytics, it's a good idea to work on your Teneo Query Language queries in Teneo Studio as you have much more support there. However, it is recommended to use the Teneo Inquire Client to run long-running queries.
- Teneo Studio also gives you the possibility of sharing queries, which is a perfect way to save commonly used queries.
- You can publish queries, which are then easy to retrieve using the Teneo Inquire Client.
Do not wait to set up efficient reporting - do it already in sprint 1 and extend it as you go. This will make sure things are done right from the beginning.