Reducing Your Session Data Footprint

Reducing Your Session Data Footprint

When designing your solution, it is important to not only consider factors like scalability, readability and ease of implementation, but also to take into account the effect your design has on your session data footprint. When a session is executed in a published engine, the progress of the session is logged. This logging is very detailed in order to enable flexibility in reporting, debugging etc. at a later date

While it is generally positive that so much detail is logged (anything logged can be queried in Inquire - anything not logged cannot!), under certain circumstances this verbose logging may lead to unmanageably large session sizes, which can negatively impact both solution and Inquire performance.

What factors affect the size of my session data footprint?

For every transaction in a session, the engine will log:

  • Inputs & Outputs
  • Including Input & Output Parameters
  • Processing Path (which triggers, flows, transitions, nodes were passed through)
  • Metadata Assigned
  • Variable Reference Changes

This article will focus on the last of these points, demonstrating how design considerations can be made to prevent redundant variable logging, which is especially important when dealing with large string variables. To do this we will provide several examples of different implementations of a CMS lookup and the respective variables logged for each.

Solution A: Redundant variable logging

Let’s imagine we are developing an FAQ bot that classifies user intents and collects a dynamic FAQ response from an external content management system. As a part of our design we choose to perform the CMS lookup using an integration for both readability and reusability.

This integration includes the method getFAQArticleString which takes the string input sArticleId, which corresponds to the CMS lookup key for a given intent, and returns the string output sAnswer.

For each intent, we create a flow with the following elements:

  • A class trigger with training examples for that intent
  • An integration node to perform the CMS lookup to retrieve the FAQ response text
  • An output node which displays the FAQ response text
  • A flow variable sArticleId storing the CMS lookup key for that intent
  • A flow variable sAnswerText where the value returned by the integration method is stored

This straightforward implementation may be scalable, readable and easy to implement, however, it fails to take the solution session data footprint into account. All variable changes and outputs are logged, meaning the same FAQ response string will be logged three times in the following contexts:

  • As the updated value of the integration variable sAnswer following the CMS lookup

  • As the updated value of the flow variable sAnswerText when sAnswer is passed into the flow

  • As the Answer Text displayed to the user in the output node

Depending on the size of the FAQ response string for each intent, sessions where the user asks several questions could easily result in a very large session data footprint.

Solution B: A size-conscious implementation

For our session size-conscious implementation, let’s keep our existing integration method as we don’t want to sacrifice the reusability and scalability it affords us. To mitigate the redundant variable logging, we can instead change how we use variables within our solution by making the following optimizations:

  • Changing the scope of the variable: firstly, by using a global variable instead of the flow variable sAnswerText we eliminate the need to propagate the variable value returned by the integration to the flow. We can then reset the value of the global variable in a flow on drop script to mimic the behavior of a flow variable to avoid accidentally using the CMS response returned in another flow later in the session. With this change, the same FAQ response string will be logged twice instead of three times.

  • Changing the type of the variable: We can further limit the number of times the FAQ response is logged by using a Groovy map variable instead of a string variable to store the result of the CMS lookup. This is because variables are logged whenever the engine considers the value to have changed - specifically this means that the variable reference has changed. The engine will not record when something within the already referenced object has changed (that would be way too processor intensive / prone to error to be valuable). If we create a global variable mVariableMap containing the value ["sAnswerText": ""] and use it to store the result of the CMS lookup it will prevent the string from being logged. This is because we are modifying a map that is already referenced, so it is not logged as a variable reference change.

Solution C: An ultralight implementation

Given the simplicity of our integration method used to look up the FAQ answer text in the CMS, we could opt to do away with it entirely and instead perform the lookup directly in the output node. This method does sacrifice the reusability of the integration as well as makes the flow slightly less readable, however it has several advantages:

  • Avoiding the use of global variables completely eliminates the risk of giving the user an answer text set from a previous flow.

  • It further reduces the data footprint size by shortening the processing path.

How can I design my solution with session data footprint in mind?

As a general rule of thumb, it is a good idea to take extra care with large string variables.

  • Where possible, avoid design strategies that require propagating large variables between integrations, flows or subflows.
  • Consider storing such strings in a Global Groovy map variable instead to prevent them from being logged redundantly

Thank you Allison for the concise overview on how to better manage our data footprints. This should help greatly in improving the performance of reporting and storage.