Pure Java speech recognition

Summary: There is a Pure Java speech recognition project named Sphinx-4, which looks very promising. You can easily run their WebStart demo, and recently I've developed my own Java speech recognition app on top of Sphinx-4 to control my Mac OS X system.

So far this looks pretty sweet ... if you're interested in programming with speech recognition, a project named Sphinx-4 provides a Java speech recognizer, i.e., a speech recognition server written entirely in Java.

I just ran the ZipCity WebStart demo app they have on their website, and it seems to work as advertised, which is pretty cool. I thought I might have to do something special to turn on the iMac microphone, but when you press the "Speak" button on their Java application, the microphone works just fine.

Based on my experience with Dragon Naturally Speaking, which I used for several years, I know that training is an important part of speech recognition software, and I'm just reading about the Sphinx-4 training capability.

I'm pretty cranked about this. I currently live in a small, relatively quiet apartment in Wasilla, Alaska, and I also work here, and there are a ton of speech recognition things I'd like to try, including a lot of interaction with iTunes, such as:

  • An iTunes alarm clock
  • Playing a radio station
  • Switching from one radio station to another
  • Playing a music album
  • Playing a podcast
  • Playing movies

(On my limited budget I know that I'll have to do some things with the iMac remote control, but that's fine, I usually keep it in the kitchen anyway.)

I can also envision interactions with browsers and websites, but a lot of what I'd like to do involves iTunes. Unfortunately AppleScript has to be the glue between Java and iTunes, but I've been thinking about writing an abstraction layer there anyway (unless someone else already has).

Here's a short blurb on the Sphinx-4 software:

Sphinx-4 is a state-of-the-art speech recognition system written entirely in the Java programming language. It was created via a joint collaboration between the Sphinx group at Carnegie Mellon University, Sun Microsystems Laboratories, Mitsubishi Electric Research Labs (MERL), and Hewlett Packard (HP), with contributions from the University of California at Santa Cruz (UCSC) and the Massachusetts Institute of Technology (MIT).

And here's a link to the Sphinx-4 Java speech recognition software website.