8 Jul 2010 16:05
Re: Abbreviation in festival
Hi alan,
Please help me solve the problem i had mentioned below. Thanks in advance
Anand
--
Read my blogs at http://anand85.wordpress.com
On Sun, Jun 20, 2010 at 10:47 PM, kulandai anand <kulandaianand85 <at> gmail.com> wrote:
Hi Alan,As you had suggested i had used the token extraction code. It works fine except the case where punctuations are involved. For example, If i run the code for a sentence like"WHAT, I'M GETTING AN I.D, THIS IS WHY I'M HERE, MY WALLET WAS STOLEN."
I am getting the timestamps as followsWHAT 0.000000I'M 0.619938GETTING 0.942348AN 1.036114I.D 0.000000THIS 1.819720IS 1.966930WHY 2.247694I'M 2.467787HERE 0.000000MY 3.182322WALLET 3.634726WAS 3.778209STOLEN 0.000000Whenever the word is followed by a punctuation, the time stamp is generated as 0.000000. Do help me solve this issue.Thanks a lot in advanceAnandOn Thu, Jun 17, 2010 at 3:07 PM, kulandai anand <kulandaianand85-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:Hi alan,Thanks for your guidance. My problem is now solved.Thanks and regardsAnandOn Wed, Jun 2, 2010 at 11:00 PM, Alan W Black <awb-ETDLCGt7PQU3uPMLIKxrzw@public.gmane.org> wrote:kulandai anand wrote:Something likeHi Alan,
Based on your suggestion, i tried getting the tokens out of the
utterance through the command (utt.save relation UTT 'Token "File.lab").
This stores the abbreviations as one word alright. But i also need the
timestamps of the tokens as i get when i use (utt.save.words UTT
"File.lab"). The command (utt.save.tokens UTT "File.lab") doesn't seem to
work. Is there a way to get the timestamps too??
(define (token_plus_times utt)
(set! f (fopen "xxx.out" "w"))
(set! x (utt.relation.first utt 'Token))
(while x
(format f "%s %f\n"
(item.name x)
(item.feat x "R:Token.daughtern.R:SylStructure.daughtern.daughtern.end"))
(set! x (item.next x)))
(fclose f)
t)
So that
(set! utt1 (SynthText "Mr 1984 and 30 Nov."))
gives a file xxx.out with
Mr 0.602346
1984 2.534533
and 2.720479
30 3.277871
Nov 3.865351
(set! utt1 (SynthText "Mr 1984 and 30 Nov."))Thanks
Anand
On Tue, Jun 1, 2010 at 8:43 PM, Alan W Black <awb-ETDLCGt7PQU3uPMLIKxrzw@public.gmane.org> wrote:kulandai anand wrote:Hi,You want to consider the distinction between tokens and words. Tokens are
I am using festival to synthesize an utterance and store its word
transcription in a lab file. Everything works fine. But when i synthesize
an
abbreviation, for example (ID), each letter is stored as a separate word
in
the word transcription file. But i need to compare the word transcription
with another ASR's word transcription where the abbreviation is stored as
a
single word. Is there any way in festival to get the whole abbreviation as
a
single word in the lab file through festival? Please help me with this
issue
(typically) white space separated tokens with outside punctuation removed,
while words are the expansion of tokens.
You if you dump a label file from the Tokens, you might get something
closer to what your are looking for. Note the Token related is actually a
list of trees rather than a simple list, so you might need to write
something specifically to dump only top tokens themselves.
Alan
Thanks in advanceAnand
--
Read my blogs at http://anand85.wordpress.com
--
Read my blogs at http://anand85.wordpress.com
RSS Feed