miércoles, 16 de marzo de 2016

Using commands with Speech Recognition - Sixth Week

In my sixth week we considered submitting the project to Microsoft Open Source Challenge. I looked at the tools they had and found that the only one that was helpful for the project was the Speech API from Project Oxford. We could use speech recognition for voice commands in the Field Book app. That way the user would be able to take notes on the field easily and fast.

Project Oxford

This project from Microsoft Research is an Open Source set of APIs for vision, speech and language based on Artificial Intelligence. There is a code for speech recognition in Android, including a sample application that uses it. I downloaded it and ran it on my phone.The recognition works by sending an audio to a server, that after processing it responses with a list of 5 strings, each of one is a possible transcription of what the user said. Here is an example of how the code is:


MicrophoneRecognitionClient m_micClient;
SpeechRecognitionMode m_recoMode = SpeechRecognitionMode.ShortPhrase;
m_micClient = SpeechRecognitionServiceFactory.createMicrophoneClient(
        this,
        m_recoMode,
        language,
        this,
        subscriptionKey);
m_micClient.startMicAndRecognition(); 

The Activity has to implement the interface ISpeechRecognitionServerEvents in order to make the code above work.

There is also a tool called LUIS (Language Understanding Intelligent Service) that lets you build language models. This seemed to be a good thing to use because I wanted a specific language for the application (the language of the valid commands like "set height to 11"). But after testing it for a while, I realized that probably it wouldn't work. I was thinking that giving an input audio LUIS would give always a valid command as output, it turns out it doesn't. So I figured out I would have to make my own code that given a string returns a valid command.

Commands Design with Finite State Machine 

I looked at the problem of understanding a recognized speech as giving a list of words to a Finite State Machine (FSM) to process. If the FSM reads all the words that needs then is going to end on an acceptance state (the ones that has two circles around). That kind of state means a valid command in the application. This works better than simply comparing the string given by the speech recognizer because sometimes there might be words added by mistake. When working with Speech APIs you have to remember that the recognition may fail many times so your program must handle errors and try to fix them.
Diagram of a proposed Finite State Machine


No hay comentarios.:

Publicar un comentario