Custom Grammars for Speech Recognition
Overview
In the previous chapter, we built an application using the SAPI 5.1 SDK. With very little voice training, an end user could use the program for basic dictation and text-to-speech output. Although the general grammar used by the application works well for purposes like dictation, a very specific grammar can be beneficial in particular situations.
In this chapter, we build an application that allows number and mathematical operations to be entered by speech. The program then computes the answer automatically.
Note |
The source code for the projects are located on the CD-ROM in the PROJECTS folder. You can either type them in as you go or you can copy the projects from the CD-ROM to your hard drive for editing. |
Creating a Custom Grammar
Much of the programming in this application is similar to the previous SAPI example. Therefore, we begin by creating our custom grammar. The grammar rules used by SAPI are defined using XML (eXtensible Markup Language). This is very attractive to those with an HTML or any XML-derivative background and makes writing grammar fairly easy.
We use Notepad, although you could use any text editor, XML editor, HTML editor, or even Visual Studio if you prefer. Let's begin by opening Notepad. Add the following line to the empty document:
The grammar itself is surrounded by 'GRAMMAR' tags. The next line consists of the rules, the first (and only for our application) of which is the number rule, which has an 'ACTIVE' tag associated with it, meaning this is something the speech recognition engine should use.
Add the following lines to your code:
At this time, your Notepad document should contain the following lines:
Before moving on, we need to take a quick look at the various elements we'll encounter:
<L>: Defines an expression of alternate phrase recognitions. Each subelement represents a possible separate recognition in place of this element. It is a synonym of the LIST tag. Empty elements are not valid (i.e., the tag must have children). The LIST element can define a default property name (PROPNAME) or ID (PROPID), which is inherited by its child PHRASE elements.
<P>: Describes the PHRASE element. It is a synonym of the PHRASE element. An associated property name and value pair is generated only if the contents of this element are recognized. It is important to note that a P empty element is not allowed.
We also need to understand the grammar attributes:
<... VALSTR=''>: Specifies the string value to be associated with the semantic property (name/value pair)
<... PROPNAME=''>: Specifies the string identifier to be associated with the semantic property (name/value pair)
<... VAL=''>: Specifies the numeric value to be associated with the semantic property (name/value pair)
Now that we have the attributes and elements to work with, it's very easy to fill in the remaining part of the grammar. We need to recognize the following operations: plus, minus, times, divided by, quit, equal, and new.
Here are the operations:
plus
minus
times
divided by
QUIT
EQUAL
NEW
The last part of the XML file should handle all of the numeric entries from 0 to 30. Here are the entries:
zero
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
twenty-one
twenty-two
twenty-three
twenty-four
twenty-five
twenty-six
twenty-seven
twenty-eight
twenty-nine
thirty
Note |
The application will only recognize the values that you place into the file. As such, if you need to place, for example, something like '45,' you need to add the values up to 45 to the file. |
The complete XML file should contain all of these lines as follows:
plus
minus
times
divided by
QUIT
EQUAL
NEW
zero
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
twenty-one
twenty-two
twenty-three
twenty-four
twenty-five
twenty-six
twenty-seven
twenty-eight
twenty-nine
thirty
After you have entered all of the text into Notepad, you can save the XML file. To save a file in Notepad with an extension other than '.txt,' you must choose Save As from the File menu and then use quotation marks around the filename. In our case, we need to save the file as 'grammar.xml.' You can save this to your desktop or some other place that is easily accessible. Later, we'll copy this file to the 'bin' directory of our application so that is available when we run the program.
User Interface
The custom grammar is now finished and is probably the most important thing we are going to create. However, in order to test the XML-based grammar, we need to build an application that loads it. We begin with a GUI that consists of a few controls, shown in Figure 22.1. You can use the figure as a guide to add the controls found in Table 22.1 to the form:
Figure 22.1: The finished GUI.
Type |
Name |
Text |
---|---|---|
TextBox |
txtSpeech |
TextBox1 |
Label |
lblFirst |
0 |
Label |
lblOperand |
+ |
Label |
lblSecond |
0 |
Label |
lblAnswer |
= |
You already know that this application differs from the previous example because we are going to load a custom grammar. This is the biggest change, but it is definitely not the only one. One of the changes is in the way that recognition is handled. Rather than clicking a button to start the recognition, the application is speech-ready on startup. All input for this application is speech-enabled. That is, you never need a mouse or your pen to do anything. You have the ability to close the application and control all input using only speech.
Load Grammar
We begin the programming part of the application by adding the reference to the Microsoft Speech Object Library and then adding the Imports statement as we did in the previous example. We also create the same variables, although we don't have a need for m_bRecoRunning because the recognition engine is always running.
Here are the three Dim statements:
Dim WithEvents RecoContext As SpeechLib.SpSharedRecoContext Dim Grammar As SpeechLib.ISpeechRecoGrammar Dim m_cChars As Short
The Form_Load event will be used for initializing several variables and loading the grammar. Most of the code, with the exception of loading the grammar, should look very similar to the previous example. As you can see from the following code, we are loading our custom
Here is the code for the procedure:
txtSpeech.Text = "" m_cChars = 0 lblFirst.Text = "" lblSecond.Text = "" If (RecoContext Is Nothing) Then RecoContext = New SpeechLib.SpSharedRecoContext() Grammar = RecoContext.CreateGrammar(1) Grammar.CmdLoadFromFile(System.AppDomain.CurrentDomain.BaseDirectory() & "grammar.xml", SpeechLoadOption.SLOStatic) Grammar.DictationSetState(SpeechRuleState.SGDSInactive) Grammar.CmdSetRuleIdState(1, SpeechRuleState.SGDSActive) End If
Recognition
The recognition event for this application is handled in exactly the same way the earlier application was handled. We use a Case statement to determine the recognized text and then perform the appropriate changes to the labels and text box. The application works as follows:
- The application is opened and everything is blank.
- Recognition starts for the first number (lblFirst).
- Recognition occurs for the operand (lblOperand).
- Recognition for the last number takes place (lblSecond).
- 'Equal' is said by the user to perform the calculation (lblAnswer).
- The user has several options. He can continue to dictate a different operand or second number and obtain different answers when doing so by saying 'Equal.' He can also say 'New' or 'Quit' to start a new problem or exit the application. If he starts a new equation, the project starts back at step 2.
Here is the code for the procedure:
Dim strText As String strText = Result.PhraseInfo.GetText Select Case strText Case "plus" lblOperand.Text = "+" Case "minus" lblOperand.Text = "-" Case "divided by" lblOperand.Text = "/" Case "times" lblOperand.Text = "*" Case "QUIT" End Case "EQUAL" If lblFirst.Text <> "" And lblSecond.Text <> "" Then Dim X, Y As Integer X = Int32.Parse(lblFirst.Text) Y = Int32.Parse(lblSecond.Text) If lblOperand.Text = "+" Then lblAnswer.Text = "= " & (X + Y).ToString ElseIf lblOperand.Text = "-" Then lblAnswer.Text = "= " & (X - Y).ToString ElseIf lblOperand.Text = "/" Then lblAnswer.Text = "= " & (X / Y).ToString ElseIf lblOperand.Text = "*" Then lblAnswer.Text = "= " & (X * Y).ToString End If End If Case "NEW" lblFirst.Text = "" lblSecond.Text = "" lblAnswer.Text = "=" Case Else If lblFirst.Text = "" Then lblFirst.Text = Result.PhraseInfo.Properties.Item(0).Value Else If lblSecond.Text = "" Then lblSecond.Text = Result.PhraseInfo.Properties.Item(0).Value End If End If End Select txtSpeech.Text = Result.PhraseInfo.Properties.Item(0).Value
Testing the Application
Testing this application is very simple. When you run the application, it looks like Figure 22.2. Next, say a number, such as 'Three.' You should now see the application recognize the number in the text box, and it also changes the first of the two labels to the same number (see Figure 22.3). Next, give the application the operation you want to perform. For example, you can say 'Times' to perform multiplication (see Figure 22.4). The last number is entered in the same way as the first, so if you say 'Five,' it is displayed in the text box and changes the second label (see Figure 22.5). Finally, you can say 'Equal' to perform the calculation (see Figure 22.6).
Figure 22.2: The application on startup.
Figure 22.3: The application receives its first digit.
Figure 22.4: The operand is recognized.
Figure 22.5: The final number is recognized.
Figure 22.6: The calculation is complete.
If you want to continue performing calculations, you can say 'New' to start over.
Summary
In this chapter, we created an application that essentially uses speech recognition to perform basic calculations. We created a custom grammar, so the application only recognizes the items we want it to. You can see how the grammars can be a very effective method to increase speech recognition accuracy for certain types of applications. In the next two chapters, we begin our look at the hardware of the Tablet PC and how we can develop software around it.