Speech-to-Text

This article will guide you through the process of integrating Azure's Bing Speech API with the controls from the UI for WPF suite. More specifically, you will use the RadWatermarkTextBox, RadToggleButton and RadGlyph to create a simple user control which transforms the recorded speech to text.

Create a Bing Speech API Account

Before you start, you need to create a new Bing Speech API account through the Azure portal. This has been explained in great detail in this article.

WPF Speech-To-Text Create Bing Speech API

Once you've created the account, you have to obtain the subscription keys that have been generated for you. You will require them later on in order to make API calls from your application.

Create the Application

Create a new WPF application and add the following assemblies which you will need.

Telerik.Windows.Controls
Telerik.Windows.Data

You can then set a theme for the application.

Once you've set a theme for your application, add the following dictionary to your list of merged dictionaries which will enable you to use the TelerikWebUI font glyphs for the icons in the project.

Example 1: The User Control's Definition

<Application.Resources> 
    <ResourceDictionary> 
        <ResourceDictionary.MergedDictionaries> 
            <!-- ... --> 
            <ResourceDictionary Source="/Telerik.Windows.Controls;component/Themes/FontResources.xaml"/> 
        </ResourceDictionary.MergedDictionaries> 
    </ResourceDictionary> 
</Application.Resources>

Define the User Control's Layout

Now, add a new UserControl to your project. You can give it a descriptive name, such as RecordVoiceControl, for example. Next, define the following XAML:

Example 2: The User Control's Definition

<UserControl x:Class="AzureCognitiveServices.RecordVoiceControl" 
            xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" 
            xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" 
            xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"  
            xmlns:d="http://schemas.microsoft.com/expression/blend/2008"  
            xmlns:telerik="http://schemas.telerik.com/2008/xaml/presentation" 
            xmlns:local="clr-namespace:AzureCognitiveServices" 
            mc:Ignorable="d"  
            d:DesignHeight="300" d:DesignWidth="300"> 
    <Grid> 
        <Grid.ColumnDefinitions> 
            <ColumnDefinition Width="Auto" /> 
            <ColumnDefinition Width="*" /> 
        </Grid.ColumnDefinitions> 
        <telerik:RadToggleButton x:Name="recordButton"  
                                Checked="recordButton_Checked"  
                                Unchecked="recordButton_Unchecked"> 
            <Path Fill="{Binding (TextElement.Foreground), RelativeSource={RelativeSource AncestorType=ContentPresenter}}"  
                Width="20" Height="20" Stretch="UniformToFill"  
                Data="{telerik:RadGlyph Glyph={StaticResource GlyphRadioButtonChecked}}" /> 
        </telerik:RadToggleButton> 
        <telerik:RadWatermarkTextBox Text="{Binding Text, Mode=TwoWay, RelativeSource={RelativeSource AncestorType=local:RecordVoiceControl}}"  
                                    AcceptsReturn="True" 
                                    Grid.Column="1" MinLines="3"  
                                    WatermarkContent="Write your short description here or press the record button to the left..." /> 
    </Grid> 
</UserControl>

Add the Speech Recognition NuGet Package

You now have to add the Microsoft.ProjectOxford.SpeechRecognition NuGet package corresponding to your application's build configuration.

Add the Microsoft.ProjectOxford.SpeechRecognition NuGet Package

Use the Speech Recognition API

All that's left to do is actually handle the button's Checked and Unchecked events and use the Bing Speech API to pass the recognized text to the RadWatermarkTextBox.

Example 3: The User Control's Definition

public partial class RecordVoiceControl : UserControl 
{ 
    private MicrophoneRecognitionClient micClient; 
    const string subscriptionKey = "the subscription key you obtained from Azure"; 
 
    public RecordVoiceControl() 
    { 
        InitializeComponent(); 
    } 
 
    public string Text 
    { 
        get { return (string) GetValue(TextProperty); } 
        set { SetValue(TextProperty, value); } 
    } 
 
    public static readonly DependencyProperty TextProperty = 
        DependencyProperty.Register("Text", typeof(string), typeof(RecordVoiceControl), new PropertyMetadata(null)); 
 
    private void recordButton_Checked(object sender, RoutedEventArgs e) 
    { 
        if (this.micClient == null) 
        { 
            this.micClient = SpeechRecognitionServiceFactory.CreateMicrophoneClient( 
                SpeechRecognitionMode.LongDictation, 
                "en-US", 
                subscriptionKey); 
 
            this.micClient.OnResponseReceived += (s, a) => { this.WriteResponseResult(a); }; 
        } 
 
        this.micClient.StartMicAndRecognition(); 
    } 
 
    private void recordButton_Unchecked(object sender, RoutedEventArgs e) 
    { 
        this.micClient.EndMicAndRecognition(); 
    } 
 
    private void WriteResponseResult(SpeechResponseEventArgs e) 
    { 
        Dispatcher.BeginInvoke(new Action(() => 
        { 
            for (int i = 0; i < e.PhraseResponse.Results.Length; i++) 
            { 
                this.Text += e.PhraseResponse.Results[i].DisplayText; 
            } 
        }), DispatcherPriority.Background); 
    } 
}

The Checked handler creates a new instance of the MicrophoneRecognitionClient class via the CreateMicrophoneClient factory method if one has not already been created. This method requires the type of recognition mode, the language which the client will listen for and the subscription key from the Azure account you've set up. It then calls the speech recognition by invoking the StartMicAndRecognition which in turn starts listening for input. Once the button is pressed a second time, the EndMicAndRecognition method is invoked and the detected phrases are added to the RadWatermarkTextBox's Text property.

Use the User Control

That's it. You can now use the control however you find necessary to record you speech.

Example 4: Use the User Control

<local:RecordVoiceControl Text="{Binding Text, Mode=TwoWay}" Height="60" />

Figure 1 shows the final result in the Office2016 theme.

Figure 1: The custom speech recognition control in the Office2016 theme

The custom speech recognition control in the Office2016 theme

Speech-to-Text

Create a Bing Speech API Account

Create the Application

Example 1: The User Control's Definition

Define the User Control's Layout

Example 2: The User Control's Definition

Add the Speech Recognition NuGet Package

Use the Speech Recognition API

Example 3: The User Control's Definition

Use the User Control

Example 4: Use the User Control

Figure 1: The custom speech recognition control in the Office2016 theme

See Also

Getting Started

Support Resources

Community

Product Bundles

DevCraft

Web

Mobile

Document Management

Desktop

Reporting

Testing & Mocking

CMS

UI/UX Tools

Debugging

Free Tools

Support and Learning

Productivity and Design Tools

Example 1: The User Control's Definition

Example 2: The User Control's Definition

Example 3: The User Control's Definition

Example 4: Use the User Control

Figure 1: The custom speech recognition control in the Office2016 theme

Getting Started

Support Resources

Community