Conversational AI in the Gaming Industry

Let’s look together in this article into one of the biggest entertainment industries – the gaming industry. What makes videogames so special and how could Conversational AI become more and more part of this experience in the future?


Videogames are all about immersion. You start a game, and become part of a different world- an artificial world, different to your real life. This immersion is created by several blocks, such as story, gameplay, graphics and controls. The latter mentioned here are limited by the current state of the art in technology, and a constant evolution within the gaming industry is observable according to progress on the tech side.


This evolution has been present in terms of graphics, where we went from games in which you could count the pixels in front of you to current 4K graphics and virtual reality offerings.

[reference to Police Quest II - Sierra (1988)]

[reference to Detroit Become Human - Sony]

And there is a clear pressure on evolution of graphical fidelity on “AAA” games- because many gamers want realism and the immersion this brings with it.


The controls of a game are also key point since this is the player’s way to decide what happens on screen, and while joysticks have lost their importance here throughout the years, gamepads have been a constant until today. Nintendo´s Wii managed to make the console´s controls the main selling point in 2006, by registering the player´s movement of the gamepad which delivered a (more or less) realistic way of controlling a tennis racket, a gun, a sword, etc.; which helped to sell 100 million consoles.



[Wii Sports Commercial - Nintendo]

Sony’s PlayStation 5 promotes its latest controller DualSense with dynamic adaptive triggers, a touchpad, built in microphone and rumble functionalities. Why? Because it increases the immersion of the player, and it´s an area with lots of unexplored potential!


This is still only the beginning of bringing immersion to the gamers out there. A feature which has rarely been used so far, and which is key to immersion, is voice.
Why pausing your game, and search through menus, when you could just tell the game what you want to change?
Why select with a button click between 3 preselected options your answer in a dialogue what you want to say when you could really talk and express yourself?

[The Walking Dead – Telltale Games]

Virtual Reality

This brings us also to a technology which has received more attention lately and which is likely to get more and more attention within the next years- Virtual Reality. Complete immersion - that´s the promise Virtual Reality makes to the gamers, and a promise it still struggles to hold up to. But why? There is on the one hand the graphical aspect, which is very noticeable when you have VR glasses sitting only centimeters in front of your eyes, and this part is clearly limited by the raw power available for textures and resolution.

[PSVR ad - Sony]

PlayStation has recently announced a new headset (probably releasing in 2023), to update their VR hardware setup to the latest state of the art.

But there is also the control aspect which takes you out of the immersion whenever you have to do something in Virtual Reality in a different way you would in real life. Due to this you have specific controller to simulate your hands, and there’s even a specific device to simulate your foot movements.

[See: 3dRudder]

Voice is there a key aspect, which has hardly been used yet. Why shouldn´t you be able to use your voice in Virtual Reality? The only answer is “due to technical limitations”, and that´s not a valid answer nowadays. People are used already to talk to devices such as Alexa. Widiba created VR Online Banking with Voice Control, powered by Teneo, already in 2018 ( | The End of Keypad Navigation ).

This voice/dialogue layer is simply lacking in VR games, and reduces the immersion.
Now voice can be used obviously to take specific decisions inside the game, but it could also be used to add completely new gameplay elements to modern video games. Ever had to convince an artificial character in a conversation to help you out? Ever had a conversation with your favorite video game protagonist?

[Cyberpunk 2077 – CD Projekt]

AT&T offers nowadays a voice-powered AR experience in which customers can talk to their favorite Looney Tunes characters.

[AT&T - Looney Tunes Experience]

A glimpse of what can be done in VR games can be seen already in games like Star Trek: Bridge Crew. In this game you can be the captain of a starship and give orders to your crew by voice.

[Star Trek: Bridge Crew - Ubisoft - on MrHoodlum420]

The VUI.Agency has described several other use cases in a nice article here: Will voice control become a widely-used feature in video games in the future? - We put Charisma into Conversational AI -


There has been a lot of talk around The Last of Us 2, not only because it is a great game, but also because it brought accessibility features to a new level. Naughty Dog implemented three accessibility control setups, for people with either vision, hearing or motor disabilities. If you are interested in how this works, I would recommend having a look at this video:

[Accessibility Impressions by Steve Saylor - YouTube]

An additional voice control layer could help many people with special needs to enjoy a whole range of video games titles in the same way as people without need for accessibility do. I mean you could control an adventure game like Broken Sword completely without picking up any input device at all.

[reference to Broken Sword – Revolution Software]

Many companies, such as Microsoft, have recognized that Accessibility features are not a ´niche thing´ that you can do, but something that you should do. “When everybody plays, we all win” is a nice read on that.

Conversational AI

Conversational AI is key in implementing the discussed features, and should find its way more and more inside the game development scene during the next years. Why? The required technology is already available today- and has proven itself to be robust enough for this task. Teneo helps clients and partners to build Enterprise projects which make use of voice controlled AI solutions with millions of user interactions; and Alexa and Siri are an essential part of more and more households nowadays. Why should Conversational AI be limited to voice assistants and chatbots? Teneo gives your team the power to design conversations with complete freedom, understand exactly what the gamers say, and bring immersion & accessibility to the next level.

Dialogue Design

Teneo provides you with an easy to pick up graphical user interface to design your dialogues in a drag and drop manner without the requirement of programming skills. It gives narrative designers a perfect way to work on the game´s dialogue setup, and allows to combine their pieces of work directly with the game´s logic through a published dialogue backend.

Intent Recognition

Teneo uses a unique hybrid approach to intent recognition, mixing both a linguistic rule approach and a powerful Machine Learning classifier, to deliver state-of-the-art results in this area. You are able to create, evaluate and use your model directly inside Teneo without the need to involve natural language experts.

Speech Recognition & Text-to-Speech

Teneo integrates with several speech recognition services, as for example the Speech services within the Microsoft Azure Cognitive Services. Speech recognition, as a natural language processing task itself, is robust enough nowadays to be used in production enterprise projects. Alexa, Siri, blue TV or Google Home are showcasing this every day.

Azure Neural Voice delivers text-to-speech in such high quality that use cases are already applied right now in the gaming industry, such as in Game Localization or Accessibility features.

[Create Dynamic, Accessible Content with Azure Neural Voice; presented by Deb Adeogba; YouTube]

Context Handling

An ongoing conversation creates a certain context in which the speakers communicate, e.g. once having agreed on a specific topic, you would not repeat the topic you refer to in each of your statements. One of the strengths of Teneo is its easy way to keep track of context and apply it to follow ups inside a conversation in order to deliver an intelligent dialogue.

When to start?

The speed technology progresses is getting faster and faster, we have already robot dogs patrolling New York, and some robots even triumph in dancing!

[See: NYPD uses robot dog during police operation - YouTube ]

[Boston Dynamics - YouTube]

The usage of Conversational AI in video games is possible, and the complexity of the implementation clearly varies on the use case to which it shall be applied. Adding a command voice layer to shortcut certain functionalities is easier to implement then an intelligent NPC who is able to talk about almost anything, for example. The most important starting point is probably to analyze for your game where Conversational AI adds to the immersion and overall experience, and then plan accordingly the implementation.
So, when will we start having conversations with video game characters?


Nice Article!

I think conversational AI will be important in gaming industry. I think that one of the challenges is being able to control the same reactions/responses for different ways of speaking, such as the use of synonyms or slang


Hi @Lucho_BG ,

great to see you around in the Forum! :slight_smile:
Differences in ways of speaking are actually being handled already in a nice way in many of our projects; this can be a certain accent which needs to be transcribed correctly by the Speech Recognition layer, or as you mention the usage of synonyms/slang etc. The latter is then a task for the NLU layer (Intent Recognition), and the hybrid approach described in the article can perform nicely here.
I see challenges more in the game design perspective, one thing is to understand everything the user says, another thing is to have a good answer/action/plot for it :smiley:
I think that’s why the first step is to analyze case by case (or game by game) what usage would make sense.
Thanks for your comment!