GSoC/GCI Archive
Google Summer of Code 2011

CMUSphinx Speech Recognition Toolkit

Web Page: http://cmusphinx.sourceforge.net/wiki/summerofcodeideas

Mailing List: mailto:cmusphinx-devel@lists.sourceforge.net

The CMUSphinx project is the leading speech recognition project in open source world. Since being released as open source code in 1999, it has provided a platform for building ASR applications. Nowadays, it's used in desktop control software, telephony platforms, intelligent houses and more than 20 other applications. The growing CMUSphinx community is working on making speech technology accesible to everyone. With a growing emphasis on minority languages, our goal is to spread open source all over the world. Over its long history, the project has been supported by CMU, SUN, Mitsubishi Electric, LIUM and many other organizations. Thousands of students use CMU Sphinx in their studies to learn state-of-the art in machine learning algorithms. CMUSphinx has been the base for more than 20 PhD theses. Now, CMUSphinx is aiming at the end user. Our goal is to bring open source speech recognition from the universities to every computer, and this moment is getting closer every day.

 

Our source code repository is located at: http://code.google.com/p/google-summer-of-code-2011-cmusphinx/

Projects

  • Long Audio Alignment The problem is to align a given approximate-transcription for an audio data corresponding to the audio file as well as improve the transcription at points of low confidence.
  • Training the acoustic model on long audio files Optimalisation of SphinxTrain by the utilization of massively parallel hardware - NVIDIA CUDA framework: Enable the acoustic model training on long audio files by the utilization of NVIDIA CUDA architecture. Incorporate the technique to reduce the memory requirements of Baum-Welch algorithm. Modify SphinxTrain to be able to process long input audio files.