« July 2007 | Main

September 07, 2007

Make Your Computer Sing With Festival

Festival is a free speech synthesis system from The Centre for Speech Technology Research at The University of Edinburgh in Scotland. Besides doing typical Text to Speech tasks that most operating systems can do today, Festival has a special Singing mode, which can accept an XML input file full of words to sing, pitches, and durations. Here is an example from the documentation:

<?xml version="1.0"?>
<!DOCTYPE SINGING PUBLIC "-//SINGING//DTD SINGING mark up//EN" 
      "Singing.v0_1.dtd"
[]>
<SINGING BPM="30">
<PITCH NOTE="G3"><DURATION BEATS="0.3">doe</DURATION></PITCH>
<PITCH NOTE="A3"><DURATION BEATS="0.3">ray</DURATION></PITCH>
<PITCH NOTE="B3"><DURATION BEATS="0.3">me</DURATION></PITCH>
<PITCH NOTE="C4"><DURATION BEATS="0.3">fah</DURATION></PITCH>
<PITCH NOTE="D4"><DURATION BEATS="0.3">sew</DURATION></PITCH>
<PITCH NOTE="E4"><DURATION BEATS="0.3">lah</DURATION></PITCH>
<PITCH NOTE="F#4"><DURATION BEATS="0.3">tee</DURATION></PITCH>
<PITCH NOTE="G4"><DURATION BEATS="0.3">doe</DURATION></PITCH>
</SINGING>
I decided to create a song based on a Barbershop Quartet from a book in my collection. To do this, I created one XML file for each of the four parts and played them individually while recording them into an audio program. Then I combined them all into a single audio file. Voila! Here is the final product - a computerized version of the song M-O-T-H-E-R:

Mother.wav (6 MB)
Mother.m4a (580k, for iTunes and QuickTime)

Here are the XML files I created that were inputted into Festival:

mother1.xml
mother2.xml
mother3.xml
mother4.xml

Lyrics:

M is for the million things she gave me.
O means only that she's growing old.
T is for the tears were shed to save me.
H is for her heart of purest gold.
E is for her eyes with love-light shining.
R means right and right she will always be.
Put them all together they spell MOTHER,
A word that means the world to me.

NOTE: Keep in mind that only the source code for Festival is officially released. If you wish to use it, you will need to either build the binaries from the sources or find the binaries somewhere online. I was able to get the Windows binaries from a project called Flinger, which uses speech synthesis to sing based on a MIDI input. I did notice that Flinger did not include all of the configuration files needed for singing via an XML, so I had to also download the Festival sources and add them to my Flinger install. You might have to play around with it a little bit, but you should be able to get Festival/Flinger to sing from an XML file without too much trouble.

Posted by Chuck at 03:58 PM | Comments (0)