Convert tsv file to json required by Microsoft LUIS

In Teneo Studio, you can fill in your training data in the Class Manager by the Import Classes option. It requires your data to be in tsv format, in which each line should contain the intent class name and the example sentence, separated by tab. However, if you decide to use the LUIS^Teneo approach, you may need to convert these tsv files into json formatted data which is the format Microsoft LUIS requires for importing LUIS app and batch testing data.

You can use the following Groovy code to convert your tsv file for Teneo Class Manager to a json file which is ready to be imported as Microsoft LUIS app or as batch testing:

import groovy.json.JsonBuilder

public static void main(def args){
    
	if (!args) {
		throw new Exception("Please provide the name of the input file as argument")
	} else {
		String path = args[0] // Input file path
		String mode = args.size()>=2?args[1]:"app" // Put app or test here, by default "app"
		String output = args.size()>=3?args[2]:"luis_import_file.json" // Output file in json format
	    String separator = args.size()>=4?args[3]:"\t" // Separator between intent and example in input file; for tsv please use "\t"
		String appName = args.size()>=5?args[4]:"my_app" // App name, only apply to app import
		String appVersion = args.size()>=6?args[5]:"1.0" // App version, by default 1.0, only apply to app import
		String locale = args.size()>=7?args[6]:"en-us" // Language locale for LUIS, e.g. "en-us", only apply to app import 
		String schemaVersion = args.size()>=8?args[7]:"7.0.0" // Luis schema version, only apply to app import
		
		BufferedReader reader = new BufferedReader(new FileReader(path))
		String fileLine
		List<Map> examples = new ArrayList()	
	
		try {
			while ((fileLine = reader.readLine()) != null) {
				String[] tt = fileLine.split(separator)
				String intent = tt[0], text = tt[1]
				Map m = [:]
				m.intent = intent
				m.text = text
				m.entities = []
				examples << m
			}
		} finally {
			try {
				reader.close();
			} catch (err) {}
		}
		
		if (mode=="test"){
			
			String luisJson = new JsonBuilder(examples).toString()
			File outputFile = new File(output)
			outputFile.write(luisJson)
		
		} else {
			
			Map luisApp = [:]
			List<String> intentNames = new ArrayList() 
			luisApp.intents = []
			for (example in examples){
				if (!intentNames.contains(example.intent)) intentNames << example.intent
			}
			for (intentName in intentNames) luisApp.intents << ["name":intentName,"features":[]]
			luisApp.entities = []
			luisApp.utterances = examples
			luisApp.name = appName
			luisApp.culture = locale
			luisApp.luis_schema_version = schemaVersion
			luisApp.versionId = appVersion
			luisApp.desc = "" // obligatory field, leave it empty
			luisApp.closedLists = [] // obligatory field, leave it empty
			luisApp.prebuiltEntities = [] // obligatory field, leave it empty
			luisApp.phraselists= [] // obligatory field, leave it empty
			
			String luisJson = new JsonBuilder(luisApp).toString()
			File outputFile = new File(output)
			outputFile.write(luisJson)
		
		}	
	}
}

Please note that the code above is not designed to be used inside Teneo Studio. To execute the code above you need to have Groovy installed in your computer. You can install Groovy following the guide here.

After you set up Groovy in your computer, you can copy the code, create a new text file and paste the code, save it as tsv_to_json.groovy (or any other names you want, just remember to set up the extension as .groovy), and put it in the save folder where you have your tsv file to be converted. Then open the Window Command Prompt , change the current working directory to the folder containing this groovy file, and run the following command if you need to convert your file to a json file for batch testing:

groovy tsv_to_json.groovy your_file.tsv test output_file.json \t

As you can see in the code, you need 4 arguments to run this groovy file:

  • The first argument represents the input file name.
  • The second argument represents that the output file is for batch testing.
  • The third argument represents the output file name. Please use β€œ.json” as extension of the output file.
  • The last argument represents the separator. For tsv file please put β€œ\t” here. If you have a csv file, please put β€œ,” as separator.

If you want to generate a file to import a LUIS app, please run the following command:

groovy tsv_to_json.groovy your_file.tsv app output_file.json \t my_app 1.0 en-us 7.0.0

This time you need 8 arguments:

  • The first argument represents the input file name.
  • The second argument represents that the output file is for LUIS app import.
  • The third argument represents the output file name.
  • The fourth argument represents the separator.
  • The fifth argument represents the name of the LUIS app to be imported.
  • The sixth argument represents the version of the LUIS app to be imported.
  • The seventh argument represents the language locale. Click here for the list of languages supported by LUIS.
  • The last argument represents the LUIS schema version. The latest one is 7.0.0 at this moment. Click here for more information about LUIS schema version.

The code in this post is an example on how you can convert your tsv file for class import in Teneo Studio to a json formatted file for Microsoft LUIS. As it is executed outside Teneo Studio, you could also write similar code in any other programming language. Hope this post can help in your LUIS^Teneo project!

1 Like