Get words count and sentence count on pre-processing

Georgi · 29 November 2023 07:37

Hey there

The use case is this: if user input has more than one sentence or more than x words (let’s say 20 or 30), to pass the input to a LLM to rephrase it in one sentence and up to x words, and then use it further.

This will avoid the triggering of “User input is too long” flow (which can be very annoying to appear after spending some time to describe a problem), and will solve the issue with many sentences in one input by preserving the full context of user request in one sentence (some users are quite descriptive in their inputs).

Now the issue I have is that with a pre-processing script I can overwrite user input by using “.setUserInputText()", but at this step I don’t know should I do it or not, because ".getUserInputWords().length” and “_.getSentenceCount()” are not ready yet - they will be available after processing and before matching.

Using a pre-match script also appeared not an option for me, since “.getUserInputWords().length" and ".getSentenceCount()” are now available at this step, but user input text cannot be manipulated anymore - it is set to “read-only” before matching.

In order to achieve the use case, I have a request to help me get input words count and sentence count on the pre-processing step.
I can use a groovy script to achieve this, but most probably the result will be different from the one that Teneo engine does, and this may compromise the behavior.
Is it possible for you to share the part of the NLP where UserInputWords and SentenceCount are populated, so I can use them in the pre-procesing and get consistents result with such implementation?

Benjamin · 4 December 2023 16:37

Hi @Georgi ,

I would suggest to use the Preprocessing script for the purpose you describe and use a split on the user input to get an approximate number of words (which I guess is still a good indicator here).
Something like:

if (_.getUserInputText() && _.getUserInputText().split("\\s+").size() > 20) {
... 
}

Since you are then overwriting the original user input with the summarized version by using the method setUserInputText(), the Teneo Engine continues afterwards with that summarized user input for the further processing - and that should be the desired result from what I understand from your explanation.

(You might want to save the original user input in a global variable before overwriting it, so that you can see in the logs always what happened)

Hope it works, let me know!
/Benjamin

Georgi · 4 December 2023 18:57

Hi, Benjamin,

To be honest, not the answer I’ve expected, but thanks anyway.

The idea was to do this the right way, the way Teneo processing is doing it so we can get consistent results constantly in terms of words length and sentence count.
If your proposal is just getting some approximation of what Teneo engine would count, this is fine too. Not ideal but let’s say, still fine.

In relation to using Generative AI in all possible ways it could improve UX, I am a firm believer this is the way to go. Together with Teneo platform this could become a very powerful force, I guess…

Thanks for proposing to keep original inputs as well, for further reference and analyses. I had already this in mind.

Best regards!

Topic		Replies	Views
Enter new user input via script Working in Studio	3	381	24 January 2024
Is it possible to report on the character count of user utterances using TQL? Analytics and Teneo Query Language	2	528	10 June 2022
Manipulating user output Working in Studio	4	978	4 May 2021
Condition issues with multiple sentences in one message Working in Studio	1	904	10 December 2021
New content on Language Understanding Announcements	0	715	19 August 2021

Get words count and sentence count on pre-processing

Related topics