Mac speech recognition - text to speech, and speech to text

Mac speech recognition FAQ: How can I work around the bugs in the Apple Mac speech recognition software?

I'd like to say I've been having a great time with the Mac OS X speech recognition capabilities in OS X 10.6 (Snow Leopard), but the truth is that it seems to have a lot of bugs. Many AppleScript developers on the internet are saying that Apple apparently "broke" the speech recognition server in Leopard, and has never fixed it in Snow Leopard. That's a real bummer, because it's a lot of fun to work with.

Mac speech recognition workarounds

Fortunately, Mac AppleScript developers keep coming up with workarounds, so here's one Mac speech recognition workaround that shows:

  1. How to get your Mac OS X system to prompt you with a question,
  2. Listen for your reply from a list of possible replies, and
  3. Take some action based on your reply.

I can't take credit for most of this script; it's based on this excellent thread on MacScripter.net, including (a) the workaround posted at the end of the thread, which partially solves the problem, and (b) my own addition, where I have to kill the Mac SpeechRecognitionServer on my system to get my AppleScript script to keep running.

A Mac OS X "text to speech" and "speech to text" example

Without any further ado, here's an AppleScript "text to speech" and "speech to text" example, with a few comments to make it all easier to understand. (AppleScript comments begin with the "--" characters.)

-- the computer says this
say "Do you think the Cubs will win the World Series this year?"

-- the computer listens for possible answers. we've told it to
-- listen for either "yes" or "no"
tell application "SpeechRecognitionServer" to set theResponse to listen for {"yes", "no"}
if theResponse is "yes" then
  -- if you answer yes, the computer responds here
	say "Wow, you have a lot of faith."
else
  -- if you answer no, the computer responds here
	say "The odds are definitely against them."
end if

-- the Mac Snow Leopard SpeechRecognitionServer won't go away until it times out with
-- an error, so kill it here
delay 1
do shell script "killall SpeechRecognitionServer"

Mac OS X text/speech speak/listen example - summary

I hope this simple Mac OS X "text to speech" and "speech to text" example is helpful. Again, kudos to the AppleScript developers on MacScripter.net for the initial example and bug fix.

I just looked for the AppleScript Dictionary for the Mac SpeechRecognitionServer, and when I did, I think I see why there are so many problems with this; it is written in Carbon, Apple's older programming technology. I found this when looking for the AppleScript dictionary for the speech recognition server, which I found by using the Unix locate command, and then browsing here from the AppleScript Dictionary browse command:

/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/SpeechRecognition.framework/Versions/A/Resources/SpeechRecognitionServer.app

As you can see from that path name, the SpeechRecognitionServer has a Carbon footprint. (Sorry, bad joke.) Hopefully one day they will update this to their newer Cocoa developer framework.

These bugs and this Carbon finding have me down a little bit this morning, but I'm really trying to get to a point where I can interact with my Mac OS X system using the "text to speech" and "speech to text" capabilities. I'd like to do all the usual things people dream about, including using the Mac "text to speech" capability to have the system prompt me with questions and listen for answers, have it read the weather, news, stock market and email reports, and having the system read documents, such as Wikipedia pages, using the Mac "text to speech" capability.

On the flip side, I'd like to use the Mac "speech to text" capability (the ability for the Mac OS X system to listen to my speech) to tell the Mac what to do, including play music, radio stations, or again, have it open and read (speak) documents.

Comments

Permalink

I must say first that I have enjoyed all of your posts on Applescript that I've read. I've been able to take something away from all of them, even if the topic didn't affect me directly (example: I don't personally need an iTunes alarm, but perusing your script did help me with, oddly enough, the work I'm currently doing with speech recognition).

I've been trying to do essentially what you describe at the end of the post - tell my Mac to do something, have it reply, etc. Specifically, I'm currently working with iTunes and telling it what to play. So far, all has gone fairly smoothly, but I recently hit a HUGE snag, and I'm hoping you might be able to help.

What I'd really like to be able to do is come home, tell my computer to play me some music, tell it what artist, have it respond with a "Which album?" question, and play whichever one I respond with. I know *how* it's all supposed to be set up - the problem is that my Mac doesn't recognize any response I give it! I even tried the simple yes/no script you posted above, and it won't even recognize one of those words.

Any aid you can provide would be greatly appreciated, and if you're interested, I'd be more than willing to share some of the scripts I've created for iTunes.

Technical stuff:
When attempting the yes/no script, my keyword was set to optional (mac is always listening), there was no background noise, and I was within 5 feet of the microphone.

Thanx!

This speech stuff on the Mac has gotten harder the last few days for me, mostly because it seems like Apple is not supporting it very well. For instance, their speech recognition server is written in Carbon, and likely hasn't been updated for years.

Back to the question ...

If you're comfortable with the Speech Recognition Server, and you know (a) it's to see "always on" (and you're saying "Computer" before your commands), or (b) you press the [Esc] key and nothing is happening, this is likely one of the Carbon-related problems.

The old speech server gets into some sort of zombie-land very, very fast, and you have to kill it and restart to get a conversation started again. As I've learned, this makes it really hard to have an ongoing conversation with my Mac. (Think of it as being "stateless", like HTTP, and you have to store cookies so the Mac speech server can figure out where the conversation state was before you last killed it.)

If money were no object, the solution would be to pay $199 for MacSpeech, and get on with things, but since I don't want to do that, I'm trying to work around the problem. I also need to get on the Mac mailing lists and complain about this like everyone else is.

In regards to your iTunes scripts, yes, I'm very interested. If you can bear with me, I'll be opening this site up to authors shortly, and we can set up an account for you. I hope to have all this going in a "beta" state in the next two weeks.

Oops, I forgot, another possible problem is the volume level. You can check this on Snow Leopard by clicking the Calibrate... button in the System Preferences > Speech > Speech Recognition tab. This opens a Microphone Calibration menu, and on this dialog, I find it's best to adjust the slider until my voice is in the low end of the green-bar range. You can test this with the phrases on the left side of that dialog.

Permalink

You can get your mac to read you the weather using automator and a Yahoo Weather RSS feed.

As a workflow drag over "Get specified URL" from the internet library. Put in the RSS feed address.

Next, drag over "get text from articles"

Here's where you need to play around, as you drag over "filter paragraphs"

If only doing current conditions, it's easy. Have one filter with the parameter that it must start with "current" and another that it must end with "f" or "c" depending on how you're having your temperatures read to you.

It's a little harder if you are trying to get the forecast. If that's the case you need to do a bunch of negatives.
Filters:
1. are not empty
2. do not contain "conditions"
3. do not end with "f" or "c"
4. do not contain "weather"
5. do not contain "+"

After you have your filters in place you go to the text library and drag over "speak text" and choose the voice you want to have say it. Works pretty good.

I just finished this after searching around and finding nothing really helpful, so hopefully this works for you. I want to figure out a way so that when it speaks the text, instead of saying "mon" I can get it to say "Monday" and add periods so they don't so like run-on sentences. I'll update you if I figure that one out.

Not bad for my first day on Automator though, if I do say so myself :)

Cheers,

Ian

Wow, thank you for this Mac Automator URL/weather tip. After a few years of just looking at the Automator icon, I just started using it two weeks ago to batch process photos, and it works like a charm.
Yes, please update me (us) if you find a way to turn "mon" into Monday and add periods. I'm sorry, I don't know how to do that yet myself.
Cheers,
Al