Test-driven Development for CAI Projects

Typically, a Conversational AI project needs to test different modules: language model performance, integrations, channel interaction, etc. In this article we will discuss a specific software process coming from software development: Test-driven development and how it can be applied to a CAI project.

What is the Test-driven development process?
As its own name describes, this development process focuses on the tests. In the case of a software project, the definition of any requirement will be directly followed by its test definition. In other words, before any feature is developed, we are already defining the acceptance criteria and its test, so the development process will start with this test as guidance. Once a feature is developed, the test defined will deliberate whether the new feature has passed the acceptance criteria or not. The main benefit of this approach is to develop a cleaner code and create a high-quality product, as it facilitates and accelerates the testing process.

In the case of a CAI Project developed with Teneo, you can use some tools and strategies to follow this development process too. Some of the advantages would be to easily identify and define the acceptance criteria, at an early stage it can help with the decision on the type of Match requirements you will need for your flows or facilitate the testing process.

Here are some proposals and suggestions so the Test-driven development process allows you and your team to benefit from its advantages.

Where should be applied this process?
Normally, a CAI Project will have different publishing environments:

  • DEV (development) dedicated for developers
  • QA (Quality Assurance) intended for testing the solution
  • PROD (Production) used by end-users

The Test-Driven development process should be applied only to the two first environments mentioned: DEV and QA. This way you can ensure the expectations for your target audience in Production.

How can I apply this process to a CAI Project?
As we have mentioned, Test-driven development applies to the very first stage of any project, something we have already explored and explained: scope the project . Based on the backlog of your project created during this quintessential phase, you will develop the already mentioned test case.

For a CAI Project approach, we will need to ensure 2 main criteria:

  • The user interactions are correctly understood by the system
  • The system provides the correct information to the user

Firstly, we will ensure the system understanding by creating a language model capable of classifying natural language inputs correctly. Secondly, we will define the answer and information delivered to the users. As these two goals are dependent one on another, we will focus on them depending on the phase of the project we are currently testing.

We are going to distinguish between 2 different phases: before and after publishing a solution to development (dev) or Quality Assurance (QA) environments. In the before publishing solution phase we will be assuring the correct intent recognition while after the solution is published phase, we will make sure the content is exactly delivered as it was expected, regarding the channel and in case any specific parameter is needed.

In the following scheme (based on Xavier Pigeon) you can explore the overview of the process:

Let’s work with an example
Following the scheme, we will recreate an example with a new use case. Imagine we will have to create an FAQ Flow for users where they can consult the WIFI password for the Longberry Baristas Coffee shop. The acceptance criteria would be the following:

As a [user], I would like to [obtain the WIFI password], in order to [use this service]

Based on this user story we can assume this interaction will be created as a flow. Here is an example of how it can look like:
Let’s continue with the steps exposed earlier.

Before publishing a Solution to DEV/QA
To create our test, we will write relevant examples, in other words utterances that must trigger this answer. By including these examples in the Auto-tests, we are already creating the test, so make sure they are included.

We will also add negative examples. These examples must not trigger the flow, so they should also be included in the Auto-test, as they will be launched in the same batch. I choose some examples that may be similar as I want to make sure the intent is predicted correctly. Here is the result for both positive and negative examples:

Next, we only need to define the type of trigger and add our answer. For the first step you can click on the wizard button in the Generate panel and Studio will create a Match for you, based on the examples that were included. In this example, a class was created as a Match requirement.

Once we have the flow finished, it’s time to test it: in the flow panel, in the Auto-test section. First, we will test this flow within its own scope: this means only its trigger will be considered for the testing.

Based on our test, we can identify that our negative examples triggered the flow, something we were not expecting:

Two negative examples are still triggering this flow. We need to improve the trigger and reiterate over the flow. The system generated a class, and this is the only criteria that needs to be fulfilled to rise the flow.

Among other possibilities, we can ensure this trigger will only be available if the user inputs mention the word “Wi-Fi”. We can simply add a Language object to the Match. This approach is described as Hybrid or Advanced NLU, as we are combining two Match requirements. Here is the appearance of the flow:

And here are the results after this approach was applied:

We have ensured that this trigger will be correctly activated, in other words, the system will correctly understand the users’ inputs that are relevant for our user story. The next step would be to compare the performance of this trigger within the rest of your Solution. This can be done on folder or Solution level. You can explore the whole options available in our Auto-test section.

Pro – tips:

  • Remember that Auto-test process takes some time to be finished. In case you have a particular workflow already defined, you can make use of these Studio API methods. Thanks to them you can interact with Teneo Studio and manage the Auto-test to be launched at any specific time, so it does not interfere with the progress.
  • In case you need to test a trigger that depends on variables or global scripts, you can exclude those triggers or transitions and test its performance on the next phase.

At this point and once the tests have been successful within the whole solution, it is time to continue our process and ensure a correct delivery of the information: make sure that the answers are correctly delivered once the solution is published.

After publishing a Solution to DEV/QA

Once we get to this point, we can use Dialog Tester. This powerful regression testing tool allows to test the interaction with the Engine once the Solution is published. Thanks to its functionality, we can ensure the conversations are occurring as it was originally designed and can be directly approved.

With the same examples we used for testing the NLU Layer with Auto-test, we can also ensure the system provides the correct information and behaves as expected with Dialog tester. Adding the same utterances to the excel file:

As this tool is not a direct part of Teneo Studio, we need to set up the solution in order to interact with it. We will need to add a new Global variable and a Post-processing script. Once we have prepared the Solution and the test, it is time to publish and launch the test. The results are presented on html and here is the appearance of our original test:

Pro- tips:

  • It is important to be aware of the environment that is being tested as the engine URL will change and there are differences between environments. Our suggestion is to always test in QA as it represents a release candidate for Production , so testing becomes more accurate. It is a more reliable representation of how the bot will be published.
  • With Dialog Tester not only responses can be tested, but entire interactions. In a few words, you can simulate users’ interactions throughout complete flows, variables and its values, sub flows, follow-up questions, disambiguation etc. (here is when you can test the triggers that we recommended to exclude in the Auto-test before).

In summary

We have discussed about the Test-driven development and how it can be applied to a CAI project. As usual, and as there are countless testing tools available, this process can and should be adjust to your use case and possibilities.

We have identified 3 main advantages:

  • Accelerates the testing process and review of results
  • Allows to clarify and define the acceptance criteria from the beginning of the project
  • Helps establish a defined and methodical framework

Have you ever used this development process before? How has been this experience? Do you recognize any other advantages? Which are your current regression testing tools?


DialogTester is a great tool which we hope to integrate into our continuous integration pipelines in AWS.


Hello @kate.taylor !
I am glad to hear that you are using Dialog tester :smiling_face: Have you been using it for Test-Driven Development approach? I am asking this because you can also use Botium as a regression testing tool!