7 Mar

Text-To-Speech in Silverlight Using WCF

Category:UncategorizedTag: , , :

Back in February, I wrote a blog post showing you how to, using Silverlight 4 OOB (out of browser) with elevated trust, access system devices; including how to use the SAPI.SpVoice API through the new Silverlight 4 COM Interop feature to implement text to speech. The biggest problem to that approach is that it will only work on the Windows platform.

So, I started thinking to myself, that the real purpose behind Text-To-Speech isn’t just the cool factor, but it is accessibility for impaired users.  I have been doing web development for nearly a decade, and I have always been conscious about web users with impairments that may make viewing or navigating my websites more difficult.  Text-To-Speech isn?t very helpful if it will only work in a fully trusted out of browser Silverlight 4 application on a Windows machine. So instead, lets create a Text-To-Speech solution that will work inside the browser, on any browser, on any machine.  And heck, I want this to be in Silverlight 3.

Lets get started by creating a new Silverlight 3 application, and be sure to include a web project as well.

create new silverlight project

Right click the web project and add a reference to the System.Speech.dll.

add reference to System.Speech

Next, lets right click the web project and click Properties ?> Web and set the specific port number to your liking, I set mine to 1914. This will come in handy when we create our WCF service.

set port number

Now we need to create the WCF service that will take our text and send it back as a WAV stream, so go ahead and right click the web project and select ?Add New Item?.  From this list add a new WCF service and name is SpeechService.  Now this is important; when you have created your WCF service an endpoint is created for you in the Web.config file.  You need to change the binding of the endpoint to basicHttpBinding.  Silverlight only works using basicHttpBinding, and will not create the ServiceReferences.ClientConfig file properly if you do not do this.

change endpoint binding  to basicHttpBinding

Now, lets add an OperationCOntract to your WCF services interface called Speak that returns a byte[], and takes a string parameter.

public interface ISpeechService
    byte[] Speak(string textToSay);

The implementation of this method will look like the following:

public byte[] Speak(string textToSay)
    SpeechSynthesizer ss = new SpeechSynthesizer();
    MemoryStream ms = new MemoryStream();
    return ms.ToArray();

The next thing we need to do is create our UI in Silverlight.  Here is what mine looks like.

<Grid x:Name="LayoutRoot">
    <TextBox x:Name="_txtTextToSay" />
    <Button Content="Speak To Me" Click="Button_Click" />
    <MediaElement x:Name="_audioPlayer"/>

Create an event handler for your button.  Next add a service reference to your SpeechService in your web project.

add service reference to your speechservice

The next part is somewhat complicated and time consuming.  You have to write your own WAVV decoding class that takes the byte array that is return from the service and converts it to a System.Windows.Media.MediaStreamSource. Luckily for you, I already did this for you with the help of some resources on MSDN.  In the button?s event handler add this code:

private void Button_Click(object sender, RoutedEventArgs e)
    SpeechServiceClient client = new SpeechServiceClient("BasicHttpBinding_ISpeechService");
    client.SpeakCompleted += (o, ea) =>
            WavMediaStreamSource audioStream = new WavMediaStreamSource(new MemoryStream(ea.Result));

Basically what this does is uses the WavMediaStreamSource class I created that inherits from MediaStreamSource, takes the byte[] returned from the SpeechService and converts it back to a stream, then is passes it off to my WAV decoding classes, which is used as the source for the MediaElement responsible for playing the audio.

All that is next is to build your solution and start making your Silverlight applications more accessible.

Download the Source

12 thoughts on “Text-To-Speech in Silverlight Using WCF

  1. Hi, I’m developing a version of this that i wold like to put on the web.

    But I’ve come unstuck when it came to publishing.

    Do i need to change the service URI or port or anything to get this to work in the wild?

    Because my current effort get me a cross browser security error.


  2. @simon-john roberts

    Yes, you must change to service endpoints in both the web.config and the ServiceReferences.ClientConfig in the Silverlight project to match your environment. I don’t know how you have your solution setup, but to avoid having to worry about cross-domain policies, place your service in the hosting web project (the project that hosts your Silverlight client).

  3. Hi:

    I am getting exception when running in vs 2010 and silverlight 4.

    Unrecognized element ‘message’ in service reference configuration. Note that only a subset of the Windows Communication Foundation configuration functionality is available in Silverlight.

  4. Dear Brian,
    I am working with Silverlight 3 and Visual Studio 2008 and I am using the same stuff that you are presented here.
    “Speech to Text” is working perfectly, when I started my project inside Visual Studio.
    BUt when I published web site on a host. Security error appears.
    The reason is System.Speech has to be called only from full trust assembly.
    I made strong name( sn.exe)to my assembly and put it in GAC (gacutil.exe).
    But still I have error.
    I was greatly appreciated ,if you can comment my situation.
    Thank you,

  5. @Johnson
    I came across this similar problem. Brian’s steps work fine in Visual Studio 2008, but when I tried to upgrade the project to visual studio 2010 I get the same error you are experiencing.

    The solution to this problem is to go into the ServiceReferences.ClientConfig and you should see:

    You need to remove all references to the message element. For some reason this element is valid when using VS2008 but not VS2010.

  6. Is the solution streaming? Meaning if you have a ton of text, does it take a long time to convert it and bring it back?

Comments are closed.