Conversational AI testing basics

I am new to testing AI chatbots and had a few questions.
The chatbots I am testing (text and voice) use DialogFlow ES as the NLP agent. Each agent has training phrases (each phrase has entities and synonyms). In order to test:

  1. Do I test using the training phrases or create new phrases not part of that list in Dialoglow?
  2. I read something about a 80/20 rule for training and test data. Does that apply to testing Dialoglow powered bots? If so, can i split the current training phrases into training (80%) + test (20%) data?
  3. If I already have a separate testing dataset, what happens if I find an issue, i.e. a particular phrase did not return the correct intent/response? Does that test case become part of training data to cause the agent to “learn” or should test and training phrases never be mixed?

Running tests with data that has been used for training is something you should never to in machine learning projects. In our Wiki you can read more about how we see it and the options you have.

This rule-of-thumb applies to all machine learning projects. Usually there is even a third data set for parameter tuning, called dev, but that’s not relevant when using something like Dialogflow without any real influence on the training parameters.

The test data basically is for finding flaws in your training data. If this happens then you typically have to refine your training data, and afterwards you have to re-align your test data.

You can find information about how Botium can help in our Wiki and in our Blog: