User:OrenBochman/bots/ipa

IPA-BOT

edit
  1. A bot to automate IPA entry generation.
    1. the spelling.
    2. a phonemic model.
    3. all the existing IPA data.

Features

edit
  1. knowledge based version (rule based).
    1. start with a languages that have simple spelling to sound maps like Hungarian and Swahili.
    2. add phonemic adjustment
      1. assimilation
      2. elision
  2. data base version (statistical).
    1. HMM based on input output data.
    2. use existing text to do.
  3. per language on/off flag
  4. check flag - add a template for human checking (for proper nouns).
  5. hybrid
  6. use both models and some discriminator

Issues

edit

Q.A. - train and test on 95% / 5% split of existing annotation per language.

Other Features

edit
  1. poll:
  2. is there interest in generating TTS voice files for entries?
  3. is there interest in generating hyphenation as well?

Resources

edit
  1. open source TTS projects with language models, scripts for tts.
    1. Mbrola
    2. Sphinx
    3. Hspell
  2. CMU dict for English.
  3. mallet to graphic models.