Pronounce Demo

(American English trained on CMU dictionary)

Word list

List the query words, one per line. Captialization does not matter. The alphabet should consist of letters, numbers underscore_ or the apostrophe -- all other characters are ignored. It is also possible to specify partial words in context, by putting the unpronounced part of the word in [] brakets. Try out the example words to see how it works.You can also specify acronyms in form of I_B_M or U_S_A - they will be pronounced one letter at a time.

The word list can also be a pronounciation dictionary for the purposes of forced alignment or model accuracy checking. Specify the pronounciation dictionary by following each word with a sequence of phones seperated by spaces and check the above check box. The phones should be from the CMU dictionary phonebet. If the checkbox is not checked, the pronounciation definitions will be ignored, and the model dictionary will be used instead.
If a pronounciation definition is found in a dictionary, forced alignment is performed between the string of letters and the sequence of phones.

Model Type

Both monophone and triphone models have been trained. The triphone model almost always gives better results.

Forced alignment

If the word (or partial word plus its context) is found in the pronounciation dictionary, it is possible to force alignment between letters and phones. This is more accurate than asking the model to come up with the sequence of phones on its own. By default the model attempts forced alignment, but it can turned off with the above checkbox.

Report Type





Accuracy summary prints out the word accuracy rate (a word is considered correct if its phone sequence matches the one in the dictionary) as well as the minimum edit distance between the correct phone sequence and predicted phone sequence summed over all the words. Some other statistics are also printed.

Accuracy summary and report mistakes will also show individual words which were 'mispronounced' as well as their edit distance.

All available information prints out a perl structure with dictionary, model and forced alignment definition or as much information as is available. The perl structure can then be read into PERL via EVAL function.

Best pronounciation simply returns the best possible pronounciation, one word per line.



The query will take half a minute even for a small number of words. Please be patient.


Arthur Kantor

Back to speech Wiki at UIUC.