2 May 2007 11:30
Problems with glitches in diphone voice with F0 and duration CART-tree models
Hello, we are working on the development of two diphone voices in Spanish(a male one and a female one).
We started out this process by recording diphones in a professional recording studio with an EGG. Diphones
labeling was adjusted manually and pitchmaks were obtained automatically.Both voices are now finished
but we have encountered some problems with glitches:
1.- When we load our male voice with the rule-based models (F0 and duration) used by Eduardo L?pez (previous
male voice in Spanish), we find that there are glitches in some words containing the sound "s" when this is
followed by some other consonants across syllable boundaries. Examples: sistemas, expuesto, puesto,
este...
After reading the documentation, we concluded that the origin of the problem could be the pitchmark and
diphones labeling (diphones as "s-t", "s-d" --> fricative followed by plosive sounds). The diphones
labeling was revised once and again but we didn't get any better result. So, we changed the pitchmark
labeling, but the problem is that "s" is a voiceless sound, so the EGG signal has no frequency (it is almost
plain). Then we made pitchmarks for the sound "s" equally distant between themselves, but the problems
with glitches continued. Could you please help us with this?. We read the documentation on pitchmark but
all the examples there show how to adjust the pitchmarks for voiced sounds (vowels and voiced consonants).
2.- Since we did not get any positive results, we developed both an F0 and a duration CART-tree model. In
order to do so, we trained the models with a corpus of 1006 sentences recorded by the voice talents
themselves. However, we did not have the time to get the labeling (diphones labeling and pithcmark
labeling) of these 1006 sentences manually. When we added these two models, we discovered some
additional glitches and also the already existing glitches were now more prominent. We then thought that
generating new models with this voice (diphone voice + F0 CART-tree model + duration CART-tree model)
could make glitches disappear. Yet, this didn't work.
3.- Finally we also tried some other things with respect to Target F0 value:
3.1.- We first thought that big target f0 breaks between segment "s" and next target f0 could produce the
glitch. So, we modified "tree_f0.scm" file to reduce the difference in f0 between segment "s" and the
target f0 that followed it. Some glitches disappeared but some others remained.
(Continue reading)
RSS Feed