Teneo Linguistic Modeling Language

Teneo offers support to a wide range of languages, meaning developers can create powerful bots in any one of those languages. Because of the significant variation across different languages, Teneo offers developers an approach called Teneo Linguistic Modeling Language. Teneo Linguistic Modeling Language denotes several aspects in the solution design process that are used to define functionality based on linguistic methods, as opposed to solely using machine learning. Teneo Linguistic Modeling Language, therefore allows developers to create a linguistically perfect solution that takes into account all of the quirks and features of a specific language.

On this page, we will describe in-depth what Teneo Linguistic Modeling Language (referred to as TLML on this page) is, how it works, and how it can benefit a solution.

Language Capabilities

One of Teneo's biggest strengths is in the field of Natural Language Understanding (NLU), as a very wide range of different languages — no matter the complexity — can be fully supported using Match Requirements.

Classes

The most general approach in this field is to use machine-learned classes to build your flow. These classes have their training examples on which the model is based. When using a classifier, for example, you do not need to understand the exact characteristics that define what an input is about. You can focus on collecting example inputs for the subjects you want to cover and, once you have trained a model using that data, you simply let the model do its magic. That makes it relatively easy to train a system to understand natural language without the need for linguistic skills or resources.

To get more insight into the exact workings of machine-learned classes, developers can also choose to use their native intent classifier or connect a different one to the solution. One especially powerful option is LUIS^Teneo, which allows you to combine the NLU-power of Microsoft LUIS with the conversational power of Teneo. See LUIS^Teneo in 10 minutes to get started with it.

mll

Teneo Linguistic Modeling Language

Teneo's unique and proprietary linguistic syntax, Teneo Linguistic Modeling Language (TLML), is one of many features that make Teneo stand out. The option to combine and build linguistic syntax conditions gives the user an opportunity to create a linguistically perfect solution. It is possible to decide on things like what order certain words should be in and if a word or sentence should be written in a specific way.

TLML gives users the option to create their own language objects to cover synonyms, phrases, and entities.

It’s also possible to create your own annotations to only cover certain versions of words, where one common use case in English is to differentiate between singular and plural word forms (e.g. ‘ticket’ and ‘tickets’) using Part-of-Speech tags or detecting e-mail addresses using Named Entity Recognition annotations.

syntax

Combining both approaches

Combining both approaches, also known as the 'Hybrid Approach' allows Teneo’s modular approach to use the most suitable machine learning technologies, providing Teneo Learn out of the box but also allowing easy use of external technologies like Microsoft LUIS. This can be seen in our LUIS^Teneo approach.

Teneo has a unique, bespoke, and proprietary syntax that has been designed and field-proven over the years to suit specific Conversational AI needs. Teneo Linguistic Modelling Language is crafted precisely to cater to varied and unexpected input. It is able to capture the only relevant bits of the user input, as well as extract information that can be used later, either in responses or for conversation logic. This approach is designed to be much more efficient than other non-bespoke syntax, and it allows you to use the best technology combination to fit your language, business, or use case requirements.

The 'Hybrid Approach' use a combination of Classes and TLML (linguistic syntax rules) to leverage the best aspects of both machine-learning and linguistic rules; these flows allow for the precision of the linguistic rules to be combined the flexibility of machine learning. Hence giving the developer the chance to both target a large scale of sentence and still be very precise. One practical example of combining both (Classes and Teneo Linguistic Modelling Language) for a hybrid approach can be found here.

Benefits of Teneo Linguistic Modeling Language

Teneo Linguistic Modeling Language (TLML) offers many benefits to developers and solutions. As a developer, you get to create your own TLML, giving you full control over what to focus on in an easy-to-maintain project designed to handle varied and unexpected input. Unlike approaches relying on machine-learned classes, you do not need large amounts of data to put together an efficient and accurate bot using the TLML method.

For certain languages, TLML will be especially beneficial. For example, agglutinative languages tend to perform better with TLML as compared to machine learning. These languages' morphology uses agglutination, meaning the same morpheme may be used in many different words with completely different meanings by use of various affixes; for example, in Indonesian, which is an agglutinative language, the words for "student" and "teacher" consist of the same root word and differ only in affixes used. While a machine learning algorithm would likely fail to correctly differentiate between these two words, the linguistic approach of TLML would make it easy to ensure such cases are handled correctly.

Using TLML will make a good solution turn great, regardless of the language used, many projects may contain all kinds of use cases that are very specific to that particular project. In this case, too, TLML can be very beneficial, as the developer can ensure these use cases are covered.

Conditions

Teneo Linguistic Modeling Language (TLML) are being used in Syntax Conditions which allows developers in Teneo to define their own language rules according to their projects' requirements, rather than having to rely solely on machine learning. In practice, this is done using Teneo's Natural Language Understanding (NLU) capabilities by defining conditions within flows to deal with user input.

For an example of how NLU in conditions can be used to improve TLML, consider the following image. In this condition, the developer has typed the plain text "can i bring", and from this Teneo is able to suggest several different language objects or other annotations that the developer can then use within the condition. These language objects are by themselves covering multiple versions of this same sentence, which eases the work that needs to be done collecting traning data for the intent:

ctrl_space

  • See Libraries for more information on this topic.

Teneo Linguistic Modeling Language has its own syntax to write conditions for together with the use of Operators. This allows different options of sentences to appear, for example, if the user wants words to appear in a specific order.

The following are a few of the operators available in Teneo.

OR ( / )

When conditions are combined with the OR-operator (/), at least one of them needs to match the input.

Condition Matched User Input Unmatched User Input
dog / cat / hamster / parakeet / goldfish I have a dog I don't like cats or dogs
I used to have a cat
I don't have a parakeet

AND ( & )

Conditions that are combined with the AND-operator (&) must all match the input. The AND-operator does not require the words in the input to appear in a certain order.

Condition Matched User Input Unmatched User Input
I & love & you I love you everyone loves you
you are the person whom I love

FOLLOWED BY ( + )

Conditions that are combined with the FOLLOWED BY-operator (+) must all match the input, in the order stated. The operator is a good choice for word combinations whose sense would shift if the word order were rearranged.

Condition Matched User Input Unmatched User Input
John + loves + Anna John loves Anna Anna loves John
John really loves his girlfriend Anna

NOT (!)

The NOT-operator (!) takes only one condition, and states that whatever is specified in that condition must not match the user input.

Condition Matched User Input Unmatched User Input
!cat I have a dog I don't have a cat
I love cats Are you a cat?

Conditions can be coded manually, or Teneo Studio can automatically generate a condition with the 'Draft Condition' function. This drafted condition is based on the positive examples for that node.

draft_condition

Natural Language Processing

Teneo's powerful Natural Language Processing capabilities are key to the effectiveness of Teneo Linguistic Modeling Language (TLML). Teneo processes user input accurately and efficiently and annotates it with useful information, from Part-of-Speech (POS) tags to Named Entity Recognition (NER) tags. These tags can also be used inside conditions, allowing developers to check for the presence of certain POS or NER tags in a user's input.

If you want to see how an input is annotated, enter the input in Try Out and open the 'Advanced' window.. You will find all annotations in the Input section under 'Annotations'. Hovering over an annotation will display additional information that is stored in annotation variables.

Annotations in Try Out

Teneo offers the option for developers to create their own annotations for words. Please see Annotate user inputs to read more on this.

Part-of-Speech tags

The machine-learned Part-of-Speech tagger (or POS tagger) that comes with Teneo attaches one or more POS tags to each word in the input. POS taggers are available for Chinese, Danish, Dutch, English, French, German, Italian, Japanese, Korean, Spanish, Swedish, and Turkish. Here's a list of the English POS tags. Each language has their own list of relevant Part-of-speech tags. The entire list for each language Part-of-Speech tags can be found here.

Condition Description
%$3RDPERSON.POS Verbs in third person
%$ADJ.POS Adjectives
%$ADV.POS Adverb
%$BRACKET.POS Brackets {}()[]
%$CARDINAL.POS Cardinal numbers
%$CC.POS Coordinating conjunctions
%$COMPARATIVE.POS Comparative adverbs and adjectives
%$DET.POS Determiners
%$EX.POS Existential 'there'
%$FOREIGN.POS Foreign words
%$GERUND.POS Gerund verbs
%$INF.POS Verbs in infinite tense
%$INTERJ.POS Interjections
%$INTERROG.POS Interrogative (question) words, function words used to begin a wh- question, like what, who, when
%$LS.POS List item markers
%$MODAL.POS Modal verbs
%$NN.POS Nouns
%$PARTICIPLE.POS Verbs in past participle
%$PARTICLE.POS Particles
%$PAST.POS Verbs in past tense
%$PERS.POS Personal pronouns
%$PL.POS Nouns in plural
%$POSITIVE.POS Positive adverbs and adjectives
%$POSS.POS Nouns and pronouns expressing possession
%$PREDET.POS Predeterminers
%$PREP.POS Prepositions
%$PRESENT.POS Verbs in the present tense
%$PRON.POS Pronouns
%$PROPER.POS Proper nouns
%$PUNCT.POS Punctuations such as .,:!"
%$SG.POS Singular nouns
%$SUPERLATIVE.POS Superlatives
%$SYM.POS Symbols
%$VB.POS Verbs

Many of the POS tags are made available as Language Objects with the extension .ANNOT. While you can use POS tags directly in your condition, it is better use equivalent language objects instead. The ANNOT language objects can, for example, group multiple similar POS tags.

Named Entities

Teneo comes with machine learned Named Entity Recognizers (NER's) for English, French, German, Italian, Japanese, Spanish, Swedish, and Turkish. Here's a list of examples:

Condition Description Examples
%$ADDRESS.NER Street names and street numbers when applicable I live on 2967 Washington st.
What time does the office on Birch avenue open?
I recently moved to 23 Main street.
%$CULTURAL_GROUP.NER Nationalities, ethnic groups, religious groups, etc. Do you accept students from German universities?
I need to order special food on the flight, I am Muslim.
Mum is cooking great Arabic food.
%$DATE.NER Full dates (year, month, day) I was born on April 7th, 1991.
The ticket was issued 23 Jan 2017.
I have been a customer since 14-03-2015.
%$EMAIL.NER Email addresses E-mail it to john.doe@artificial- solutions.com
My email is tester at yahoo dot com
%$EVENT.NER Events, national holidays, treaties, awards/prices, etc. Who won the Oscar's last night?
Where are you celebrating Christmas this year?
The treaty of Versailles was signed in 1919.
%$FACILITY.NER Buildings, airports, train stations, public transport lines, hospitals, etc. I want to fly from Newark to LAX.
Have you ever visited the Empire state building?
What year was the Brooklyn Bridge built?
%$FICTIONAL_FIGURE.NER Fictional figures, cartoons, etc. Who do you think you are, Superman?
He was wearing a black coat, a bit like Batman’s.
I don't believe in Santa Claus.
%$GEOGRAPHY.NER Non-political geographical entities: mountains, bodies of water, planets, moons, suns, etc. How many rings does Saturn have in total?
Who was the first one to climb Mount Everest?
How big is the Sahara Desert?
%$IP.NER IP addresses Connect to 10.1.2.148
%$KNOWN_FIGURE.NER Known, public (non-fictional) figures Barack Obama was the President of the United States for 8 years.
I am a big fan of Meryl Streep.
Play a song from Justin Bieber.
%$KNOWN_GROUP.NER Known groups and bands Play a nice song by The Beatles.
I liked the latest Coldplay album.
Another song from Metallica please.
%$LANGUAGE.NER Languages Do you have this document in Swedish?
Translate ‘hello’ to Hindi.
I speak thee languages: Polish, English and Urdu.
%$LOCATION.NER Geopolitical locations (humans made the borders), including: continents, countries, states, counties, cities, neighborhoods, etc. I was born in California.
I want to fly from Paris to London.
What time does the ferry to Staten Island depart?
%$MED_CHEM.NER Medical/chemicals: chemical substances, named diseases, and drugs Is there a vaccine for Malaria?
Who really discovered the Penicillin? Find me information about Iodine.
%$MISC.NER Miscellaneous: named entities that fit none of the other categories The Tyrannosaurus Rex could be over 6 meters high.
What happened to Obamacare?
What is the Erasmus programme?
%$ORGANIZATION.NER Organizations: companies or a division of a company, universities, schools, embassies, religious organizations, political parties, etc. I work at Apple.
What is the e-mail address to
Artificial-Solutions?
I studied at Harvard University.
%$PERSON.NER Names of persons, including titles and surnames if present My name is Mrs Johnsson.
The ticket is booked in the name
Joanne Stevens.
Send a text to John please.
%$PRODUCT.NER Product or brand names or organization names used as products in the context I drive a BMW.
How much is the new iPhone X?
Can you open Facebook and make a post for me?
%$SPORTS.NER Sports teams, sports organizations, sporting events Miami Dolphins is a good team!
I want to watch the winter Olympics.
Who won the FIFA world cup?
%$UNIQUE_IDENTIFIER.NER Unique identifiers: product numbers, phone numbers, user names, member numbers, etc. My phone number is 123-44 43 33.
I have a bonus card with number ebb123523111.
When will 103-121-111 be in stock again?
%$UNIT.NER Established units with value, including but not limited to: duration, age, temperature, percentage, information (bits) weight, volume, speed, length, currency My phone has 64 GB.
This flight has cost me almost $600.
We have more than 750 miles to go.
%$URL.NER URLs (also partial / truncated URLs) Open google.com
Go to page www.test.com
Take me to cnn dot com
%$WORK_OF_ART.NER Works of art: songs, albums, movies, books, video games, etc. Do you read the Bible?
Play Dimond’s with Rihanna.
Download Frankenstein movie.
%$ZIP_CODE.NER Zip codes My zip is V0G 1Y0.
My address is 24 Main street 23212 New York.

Many of the NER's are made available as entities with the extension .ENTITY. While you can use the NER tags directly in your condition, it is often better use the entity instead of the annotation, since entities can combine NER's and language objects to provide best coverage.

Scripting

Teneo Linguistic Modeling Language (TLML) is a powerful tool when personalizing conversations. With minimal effort, it is possible to create a personalized experience with a user based on the interactions they have. This in return gives a better bot-customer relation, where users can feel more confident about the bot.

One way of personalizing conversations is through scripting. Scripting makes it easy to retrieve all required information from user input and to store that information in variables. The easiest way to do this is using the _USED_WORDS command, which picks out the relevant parts of the input. In the following example,_USED_WORDS is used in two conditions to store the user's name in the variable userNameForOrder. In one condition, the user's name has been recognized and annotated with a NER tag, and in the other, the name has not been annotated; in both cases, _USED_WORDS makes it simple to store the required information:

Use person annotation in combination with used words

This is also demonstrated within this conversation that takes place in one of our Channel connectors, Teneo Web Chat. user-mentions-name1

  • See Scripting to read more about this topic.

When should I use Teneo Linguistic Modelling Language?

Teneo Linguistic Modeling Language (TLML) is useful in every aspect of your solution, no matter how complex or simple the use case. As the concept itself is not dependant on training data, the user is able to, in their own world, build and create the rules for any single use-case — something that is not achievable when only using machine-learning.

An example of a use case that would benefit from TLML is one in which you need to combine class-based triggers with additional rules. For example, the inputs "I want to cancel my trip" and "You canceled my trip" are very similar in terms of words used and structure, but very different in terms of meaning. This is a clear example of a limitation of approaches relying on machine learning only; strictly machine-learned bots would not be able to consistently and accurately parse such inputs. With the TLML approach, on the other hand, Teneo can handle inputs like these and understand the differences between them.

Another example of a use case that benefits from TLML is the distinction between singular and plural word forms. A machine-learning-based model would not differentiate between the words 'ticket' and 'tickets' in user input. Using TLML, however, makes it simple to distinguish between the two. It is a great addition and works perfect in any complex case, no matter the language.

Linguistic Modelling Language works exceptionally well when you are working with morphologically complex languages, such as agglutinative languages, and/or when you are creating triggers for very specific use cases. As agglutinative languages make frequent use of affixes, it is very hard for a machine-learned trigger to differienciate two letters compared to two words in a non-agglutinative language.

Let's consider English and Turkish, with the example sentence "I like you". When adding a negation to this sentence it becomes, "I don't like you". In order to achieve this in English, we only needed to add a single word: "don't". For a machine-learned based trigger, this is an easily recognizable change and would therefore affect the triggering of this particular sentence.

On a morphologically complex language like Turkish, however, "I like you" is "Seni seviyorum", which is two words instead of three. When adding a negation, this sentence becomes "Seni sevmiyorum", where the difference is a single letter. This is not only extremely hard for a machine-learned based trigger to differentiate, but could also be very easily misspelled by a user. Not being able to target these small changes can have a huge impact of the overall performance in your solution and is where TLML is especially powerful.

English Turkish
I like you Seni seviyorum
I don't like you Seni sevmiyorum

In addition to this, Teneo also annotates negation words with a special %$NEGATIVE.MST tag that can be useful when building solutions in these morphologically complex languages.

Negation in Turkish

  • See Languages to read what languages are supported in Teneo.

Language objects re-use Teneo Linguistic Modeling Language

Language objects are one of the features in Teneo that work very well together with TLML. Language objects are building blocks for language conditions. They capture words, synonyms, or various ways of expressing the same (partial) intent. They make it possible to efficiently write and re-use conditions. In Teneo, we have naming conventions for Language objects to reflect their intended usage. More on this can be read in Language Objects.

Types of language objects

Suffix Type Example Name Example Condition Generate NLU*
LEX Lexical entry: the smallest building block from which more complex language objects are built. Covers different inflections of a word, but also spelling and regional variations. DOG.NN.LEX dog / dog's / dogs / dogs' yes
MIX Mixed entry: groups LEX language objects that share the same root. HAPPY.ADJV.MIX %HAPPY.ADJ.LEX / %HAPPILY.ADV.LEX yes
MUL Multi-word unit: forms the multi-word correspondence to the LEX language objects in that they capture the dictionary-level entries of multi-word units that are meant to be used as building blocks in higher level language objects. GIVE_UP.VB.MUL %GIVE.VB.LEX + %UP.FW.LEX yes
SYN Synonym set: groups LEX/MUL language objects with similar meaning. DOG.NN.SYN %DOG.NN.LEX / %HOUND.NN.LEX / %MUTT.NN.LEX / %POOCH.NN.LEX / (...) yes
ENTITY Entity: entity-related concepts, such as colors or country names. Entities typically have columns with variables attached to them. Each entity in CURRENCY.ENTITY, for example, has a currency code. CURRENCY.ENTITY See here for an example Entity yes
LIST List: lists-related concepts, such as colors or country names. A list often contains other lists. COLOR_SHADES.LIST %COLOR_SHADES_BLUE.LIST / %COLOR_SHADES_BROWN.LIST / %COLOR_SHADES_GRAY.LIST / (...) no
PHR Phrases: groups various ways of expressing the same phrase or partial intent. IS_BROKEN.PHR ((%BE.VB?PRESENT.LEX / %SEEM.VB.SYN)>> %BROKEN.ADJV.SYN ) / (...) yes
THEME Theme: groups words with a common theme. WEATHER.THEME (%RAIN.NNVB.SYN / %SNOW.NNVB.SYN / %SUNNY.ADJ.LEX / %RAINDROP.NN.LEX / (...)) no
PROJ Project: these language objects act as place-holders in the language resources. Should be customized to fit your project. COMPANY_NAME.NN.PROJ ((%YOUR.FW.LEX / %THE.FW.LEX ) >> %COMPANY.NN.SYN) / %ARTIFICIAL_SOLUTIONS.NN.SYN yes
REC Miscellaneous: contains condition that does not fit under any other category. HOW.REC %HOW.FW.LEX / %HOWS.FW.LEX / %HOWRE.FW.LEX / %HOWD.FW.LEX / (...) no
SCRIPT Script: language object: contains scripts, e.g. to count words or sentences. WD_LT4.SCRIPT {_.getSentenceWords().length<4} no
ANNOT Annotation: language object: contains annotation labels, and may be used instead of the labels themselves. This is deprecated, please use .ENTITY instead. POS_NOUN.ANNOT %$NN.POS no

If 'yes', the language object can be included in conditions that are generated from test data (when clicking the 'Draft Condition' button under the 'Condition' panel in 'Advanced Options' for a Conditional Match Requirement).

Libraries re-use Language Objects

Teneo comes with a large number of supported languages. All languages that are supported by Teneo can be processed by the Teneo Engine, meaning user input can be split into tokens and sentences. When additional resources are required, we can create our own project-specific library. Libraries are powerful tools, as they not only allow us to create and use the resources we need in an efficient manner, but also allow us to re-use language objects across multiple bots. Language objects created in your solution can later be used to build a library. This library can later be re-used, saving the developer a lot of time and making the solution perform much better across different languages.

lob-folder-psd

  • See Libraries to read more about this topic.

Vocabulary

Many of the words covered in the libraries are structured in a hierarchy with more general language objects at the left and more specific entities at the right. This allows you to use the degree of granularity you need for your current bot. To give an example, sometimes it might be sufficient for you to know that the user is talking about an alcoholic drink, while in other cases it may be crucial to know the exact type of beer the user wants to order. Below, we show a simplified part of the hierarchical structure used for drink-related vocabulary, %DRINKS.LIST.

Language object taxonomy

More information

More information can be found in our reference documentation here.

Was this page helpful?