Daniel Nylander | 2 Sep 2007 17:05
Picon
Favicon
Gravatar

Swedish voice


Hi all,

I'm currently looking for a Swedish voice for festvox/festival.
Does one exist or do I need to build one myself?

--

-- 
Daniel Nylander (CISSP, GCUX, GCFA)
Stockholm, Sweden
http://www.DanielNylander.se
info@... 
yeager@...  dnylande@...

Nickolay V. Shmyrev | 2 Sep 2007 17:44
Picon
Favicon

Re: Swedish voice

В Вск, 02/09/2007 в 17:05 +0200, Daniel Nylander пишет:
> Hi all,
> 
> I'm currently looking for a Swedish voice for festvox/festival.
> Does one exist or do I need to build one myself?
> 

Look here

http://person.sol.lu.se/JohanFrid/webapps/pmwiki/index.php/Synthesis/Demos
paulseph.farrugia | 2 Sep 2007 22:02
Picon

Paulseph Farrugia is out of the office.


I will be out of the office starting  24/08/2007 and will not return until
06/09/2007.

I will respond to your message when I return.

On IT Support related issues, please contact Pierre Felter.

Nickolay V. Shmyrev | 4 Sep 2007 08:39
Picon
Favicon

Raised f0 in clustergen

Hi all

Recently I'm processing new database and I've noticed the strange
behavior in clustergen voices, if you compare cg and hts voices on 

http://www.festvox.org/voicedemos.html

you'll find that cg voices has slightly bigger f0 than hts ones. It's
easily visible on jmk and awb voices too. I've measured it on my voice
and it seems that instead of usual 120 Hz in recordings clustergen tree
gives 140 Hz or so. So I just subtract 20 Hz from predicted f0 and it
gives better overall results. 

Any ideas about such behavior?

Nickolay V. Shmyrev | 4 Sep 2007 11:51
Picon
Favicon

Re: Raised f0 in clustergen

This leads to more generic question - how we can model speech parameters
better. Probably we should model logarithm of f0 instead of f0 like in
hts and adjust distance matrix for mcep. What is the base for logarithm
then?

Are there articles on appropriate topic?

Alan W Black | 4 Sep 2007 13:48
Picon
Favicon

Re: Raised f0 in clustergen

Nickolay V. Shmyrev wrote:
> This leads to more generic question - how we can model speech parameters
> better. Probably we should model logarithm of f0 instead of f0 like in
> hts and adjust distance matrix for mcep. What is the base for logarithm
> then?
> 
> Are there articles on appropriate topic?
> 

Yes they are appropriate topics.  I had noticed before that tuning the 
F0 by hand could make the voice sound better (even if the generated F0 
was not actually close to the source speaker).

The F0 generation in clustergen is really quite different from HTS, a 
smoothed F0 over the whole sentence is generated (through unvoiced 
regions) which is then predicted from a separate model from the mcep 
model.  This is producing pretty good values (correlation and rmse) 
compared to other F) models I've done in the past.  HOwever I've not 
really done listening tests on them.

Though I have seen on other systems that playing with the F0 values can 
improve the sound of the voice even.  The Log F0 vs absolute F0 may make 
a difference, though in some experiments I've found it makes a flatter 
F0 (smaller variance), which I believe is the bigger problem.  I've not 
done listening tests here, but we are deep in the process building new 
prosodic models for Festival (clustergen and otherwise) based on the new 
story data we now have access to.

Alan

(Continue reading)

Abdo Jalo | 4 Sep 2007 16:15
Favicon

Trim leading/trailing silence

Hi,
 
I'm looking for utility that would trim a leading and trailing silence (quite audio) from wav files. Where do I find it please help.
 
Abdo
Thanks
Abdo Jalo | 4 Sep 2007 16:22
Favicon

Trim leading/trailing silence

Hi,
 
I'm looking for utility that would trim a leading and trailing silence (quite audio) from wav files. Where do I find it please help.
 
Abdo
Thanks
Jonas Lindh | 4 Sep 2007 16:59
Picon
Picon
Picon

Re: Trim leading/trailing silence

Hi there.
If you mean that you want to remove silence in the beginning and end of 
a recording you can use my Praat script and loop through your files  to 
remove the silence 
http://www.ling.gu.se/%7Ejonas/sounds/Loop_desilence_script.praat
if you want to remove all silence there is a built-in command in Praat 
and then you can just apply the loop to that.
Best regards
Jonas
Abdo Jalo skrev:
> Hi,
>  
> I'm looking for utility that would trim a leading and trailing silence 
> (quite audio) from wav files. Where do I find it please help.
>  
> Abdo
> Thanks

Attachment (jonas.lindh.vcf): text/x-vcard, 433 bytes
Nickolay V. Shmyrev | 5 Sep 2007 15:49
Picon
Favicon

Re: Raised f0 in clustergen

В Втр, 04/09/2007 в 07:48 -0400, Alan W Black пишет:
> Nickolay V. Shmyrev wrote:
> > This leads to more generic question - how we can model speech parameters
> > better. Probably we should model logarithm of f0 instead of f0 like in
> > hts and adjust distance matrix for mcep. What is the base for logarithm
> > then?
> > 
> > Are there articles on appropriate topic?
> > 
> 
> Yes they are appropriate topics.  I had noticed before that tuning the 
> F0 by hand could make the voice sound better (even if the generated F0 
> was not actually close to the source speaker).
> 
> The F0 generation in clustergen is really quite different from HTS, a 
> smoothed F0 over the whole sentence is generated (through unvoiced 
> regions) which is then predicted from a separate model from the mcep 
> model.  This is producing pretty good values (correlation and rmse) 
> compared to other F) models I've done in the past.  HOwever I've not 
> really done listening tests on them.
> 
> Though I have seen on other systems that playing with the F0 values can 
> improve the sound of the voice even.  The Log F0 vs absolute F0 may make 
> a difference, though in some experiments I've found it makes a flatter 
> F0 (smaller variance), which I believe is the bigger problem.  I've not 
> done listening tests here, but we are deep in the process building new 
> prosodic models for Festival (clustergen and otherwise) based on the new 
> story data we now have access to.

Very interesting, thanks

As for me it seems that I've found the reason of raised f0, it's trivial
actually. My speaker has range around 75-180 Hz so systematic pitch
doubling gives such effect. Wavesurfer's pitch extractor seems much more
reliable. So I have to look closer on pitchmark program performance. I
also have to select speaker more carefully before recording new
database :(

If someone interested, some recordings are uploaded to voxforge.org 

http://www.repository.voxforge1.org/downloads/Russian/Trunk/Audio/16kHz_16bit 

They are in urp.tgz

I wonder if it's the case for jmk too.

Gmane