Re: Raised f0 in clustergen
В Втр, 04/09/2007 в 07:48 -0400, Alan W Black пишет:
> Nickolay V. Shmyrev wrote:
> > This leads to more generic question - how we can model speech parameters
> > better. Probably we should model logarithm of f0 instead of f0 like in
> > hts and adjust distance matrix for mcep. What is the base for logarithm
> > then?
> > Are there articles on appropriate topic?
> Yes they are appropriate topics. I had noticed before that tuning the
> F0 by hand could make the voice sound better (even if the generated F0
> was not actually close to the source speaker).
> The F0 generation in clustergen is really quite different from HTS, a
> smoothed F0 over the whole sentence is generated (through unvoiced
> regions) which is then predicted from a separate model from the mcep
> model. This is producing pretty good values (correlation and rmse)
> compared to other F) models I've done in the past. HOwever I've not
> really done listening tests on them.
> Though I have seen on other systems that playing with the F0 values can
> improve the sound of the voice even. The Log F0 vs absolute F0 may make
> a difference, though in some experiments I've found it makes a flatter
> F0 (smaller variance), which I believe is the bigger problem. I've not
> done listening tests here, but we are deep in the process building new
> prosodic models for Festival (clustergen and otherwise) based on the new
> story data we now have access to.
Very interesting, thanks
As for me it seems that I've found the reason of raised f0, it's trivial
actually. My speaker has range around 75-180 Hz so systematic pitch
doubling gives such effect. Wavesurfer's pitch extractor seems much more
reliable. So I have to look closer on pitchmark program performance. I
also have to select speaker more carefully before recording new
If someone interested, some recordings are uploaded to voxforge.org
They are in urp.tgz
I wonder if it's the case for jmk too.