Alberto Calderone | 1 Sep 10:15 2008
Picon

Moses error.

Dear Moses-Support,
I am writing you this message concerning an error that comes up when 
running moses with our LM generated using europarl and using the 
procedure illustrated in the website.

The error message, at run time, is: terminate called after throwing an 
instance of 'std::bad_alloc'  what():  St9bad_alloc

Do you have any suggestion on how to fix it or why this might happen?
Thank you for your support.

Regards

--

-- 
--
Alberto Calderone
http://www.translated.net
Direct: (+39) 06 90 25 4 258
Main:   (+39) 06 90 25 4 001
Fax:    (+39) 06 233 200 102
Miles Osborne | 1 Sep 14:24 2008
Picon
Picon

Re: Moses error.

buy more memory and/or use build smaller language models

Miles

2008/9/1 Alberto Calderone <alberto@...>:
> Dear Moses-Support,
> I am writing you this message concerning an error that comes up when
> running moses with our LM generated using europarl and using the
> procedure illustrated in the website.
>
> The error message, at run time, is: terminate called after throwing an
> instance of 'std::bad_alloc'  what():  St9bad_alloc
>
> Do you have any suggestion on how to fix it or why this might happen?
> Thank you for your support.
>
> Regards
>
>
>
>
> --
> --
> Alberto Calderone
> http://www.translated.net
> Direct: (+39) 06 90 25 4 258
> Main:   (+39) 06 90 25 4 001
> Fax:    (+39) 06 233 200 102
>
> _______________________________________________
(Continue reading)

E.Y.Kow | 2 Sep 16:20 2008
Picon

is this a reasonable moses setup?

Dear Moses team and users,

I am using Moses to translate from an imaginary language "French" to
English, and was hoping I could get some comments on my current setup.

Does the following use of Moses sound reasonable to anybody?  I have
posted it below as a commented Makefile excerpt.  Note that it is
based on the tutorials:

  http://www.statmt.org/moses/?n=FactoredTraining.HomePage
  http://www.statmt.org/moses/?n=Moses.Tutorial

software
--------
- GIZA++ 1.0.2
   (compiled /without/ the -DBINARY_SEARCH_FOR_TTABLE flag)
- SRILM
   (standard)
- moses 2008-7-11
   (standard)

usage
-----
My corpus consists of two text files,
 foo/train-corpus.en
 foo/train-corpus.fr

Each line in the file consists of a sentence in the respective language,
with (for example) the sentence in line 3 of the English file
corresponding to the sentence in line 3 of the "French" file.
(Continue reading)

Joerg Tiedemann | 2 Sep 17:30 2008
Picon

Class Language Models


I have difficulties understanding the description about "class language 
models" at
http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc8

What I actually want to do is to use a language model on combined 
(concatenated) factors, let's say word+pos when decoding a factored 
model. I'm not sure if the "class language model" feature is the right 
thing to look at for doing this.

for example, if my -translation-factors are 0-0,1 (0 being words and 1 
being target language POSs) then I would like to combine the translation 
prob's with a language model using factors 0,1 (word/pos) before the 
generation step from 0,1 to 0. and then maybe even adding a word 
language model on the generated surface string. what would be the proper 
way of doing this with moses?

two other small things:

I realised that moses has problems with xml-input when using lexicalised 
reordering. I get segmentation faults after decoding some sentences. It 
works fine using distance-based reordering.

for the irstlm developers: it would be nice to change the hard-coded 
settings for gzip/gunzip in the bin/*.pl files to more general ones. 
otherwise I always have to do this by hand after downloading a new version.

For example, replace
	my $gzip="/usr/bin/gzip";
	my $gunzip="/usr/bin/gunzip";
(Continue reading)

Carlos Henriquez | 2 Sep 17:28 2008
Picon

Re: is this a reasonable moses setup?

Your training process is fine for a baseline. The only thing missing is the tuning process. The values you'll find in moses.ini are not tuned for optimal results and usually a development corpus is used for such a task. Some of that information is found in

http://www.statmt.org/moses/?n=FactoredTraining.Tuning

The tuning process is really important. It makes a big improvement in your translation results so you should do it always. Those values are weights for the different models and wrong values or random values will not give you as good results as tuned ones.

The rest of your steps are fine. Language model, training and translation. It's a nice start.
 
--
Carlos A. Henríquez Q.
+34-693-278-219
carloshq-kk6s2y0+N2Bi+jfwTKn9+w@public.gmane.org
carlosalberto.henriquez-/E1597aS9LQAvxtiuMwx3w@public.gmane.org


----- Mensaje original ----
De: "E.Y.Kow <at> brighton.ac.uk" <E.Y.Kow-QgJXc4GwR4vQzY9nttDBhA@public.gmane.org>
Para: moses-support-3s7WtUTddSA@public.gmane.org
Enviado: martes, 2 de septiembre, 2008 16:20:33
Asunto: [Moses-support] is this a reasonable moses setup?

Dear Moses team and users,

I am using Moses to translate from an imaginary language "French" to
English, and was hoping I could get some comments on my current setup.

Does the following use of Moses sound reasonable to anybody?  I have
posted it below as a commented Makefile excerpt.  Note that it is
based on the tutorials:

  http://www.statmt.org/moses/?n=FactoredTraining.HomePage
  http://www.statmt.org/moses/?n=Moses.Tutorial

software
--------
- GIZA++ 1.0.2
  (compiled /without/ the -DBINARY_SEARCH_FOR_TTABLE flag)
- SRILM
  (standard)
- moses 2008-7-11
  (standard)

usage
-----
My corpus consists of two text files,
foo/train-corpus.en
foo/train-corpus.fr

Each line in the file consists of a sentence in the respective language,
with (for example) the sentence in line 3 of the English file
corresponding to the sentence in line 3 of the "French" file.

> %/m-corpus.en %/m-corpus.fr : %/train-corpus.en %/train-corpus.fr
>        cd $(<D) ; $(MOSES_SCRIPTS)/training/clean-corpus-n.perl train-corpus en fr m-corpus 1 100

Before using my corpus directly, I clean it up with the clean-corpus
script, which produces the files foo/m-corpus.en and foo/m-corpus.fr

> %.lm : %
>        $(SRILM_BINDIR)/ngram-count -text $< -lm $ <at>

From foo/m-corpus.lm, I train a language model using SRILM's ngram-count
with the options -text.  I assume these are reasonable options to pass
to SRILM.

> %/model/moses.ini: %/m-corpus.en.lm
>        cd $(<D); $(MOSES_SCRIPTS)/training/train-factored-phrase-model.perl\
>          --root-dir .\
>          --corpus $(basename $(basename $(<F)))\
>          --f fr --e en --lm 0:3:$(<F):0

Armed with an English language model, I use the script
  train-factored-phrase-model.perl
I am using an unfactored language model for simplicity.

This produces foo/model/moses.ini, among other files in foo/model,
notably foo/model/phrase-table.0-0.gz.

> %/test.results: %/test-corpus.fr %/test-corpus.en %/model/moses.ini
>        cd $(<D); moses -f model/moses.ini < $(<F) > $( <at> F)

Finally, some translation.  I call Moses on the file foo/model/moses.ini
and I produce foo/test.results which looks a bit like English indeed.

Any thoughts?

Thanks!

--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9

_______________________________________________
Moses-support mailing list
Moses-support@...
http://mailman.mit.edu/mailman/listinfo/moses-support
Mauro Cettolo | 3 Sep 09:12 2008
Picon

Re: Class Language Models

If you work with factored models, typically it's expected that LMs are
trained for each factor. In your case, you have to train a LM on target
words and another LM on target POSs.

Class LMs are a feature provided by the IRSTLM toolkit: they model n-grams
of classes which translation (target) units are mapped to. I never used
them in combination with factored models, but in theory this should be
possible; nevertheless, they were introduced just for trying to efficiently
emulate the factored models in a single-factor framework.

Mauro

Joerg Tiedemann wrote:
> I have difficulties understanding the description about "class language
> models" at
> http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc8
>
> What I actually want to do is to use a language model on combined
> (concatenated) factors, let's say word+pos when decoding a factored
> model. I'm not sure if the "class language model" feature is the right
> thing to look at for doing this.
>
> for example, if my -translation-factors are 0-0,1 (0 being words and 1
> being target language POSs) then I would like to combine the translation
> prob's with a language model using factors 0,1 (word/pos) before the
> generation step from 0,1 to 0. and then maybe even adding a word
> language model on the generated surface string. what would be the proper
> way of doing this with moses?
>
>
> two other small things:
>
> I realised that moses has problems with xml-input when using lexicalised
> reordering. I get segmentation faults after decoding some sentences. It
> works fine using distance-based reordering.
>
> for the irstlm developers: it would be nice to change the hard-coded
> settings for gzip/gunzip in the bin/*.pl files to more general ones.
> otherwise I always have to do this by hand after downloading a new version.
>
> For example, replace
>         my $gzip="/usr/bin/gzip";
>         my $gunzip="/usr/bin/gunzip";
> with
>         my $gzip=`which gzip`;chomp $gzip;
>         my $gunzip=`which gzip`;chomp $gunzip;
>         $gunzip .= ' -d';
> or something like that.
>
> thanks.
> cheers,
> --
>
> Jörg
>
>
> ***********/\/\/\/\/\/\/\/\/\/\/\************************************
> **  Jörg Tiedemann                 j.tiedemann@...              **
> **  Alfa-Informatica               http://www.let.rug.nl/~tiedeman **
> **  Rijksuniversiteit Groningen    Harmoniegebouw, room 1311-429   **
> **  Postbus 716                    phone: +31 (0)50-363 5935       **
> **  9700 AS Groningen              fax:   +31 (0)50-363 6855       **
> *************************************/\/\/\/\/\/\/\/\/\/\/\**********
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@...
> http://mailman.mit.edu/mailman/listinfo/moses-support
> .
>
>   

--

-- 
Mauro Cettolo
FBK - Ricerca Scientifica e Tecnologica
Via Sommarive 18
38100 Povo (Trento), Italy
Phone: (+39) 0461-314551
E-mail: cettolo@...
URL: http://hlt.fbk.eu/people/cettolo

E cuale esie la me Patrie? cent, centmil, nissune
parcè che par picjâ lis bandieris spes a si picjin i omis
sushil ronghe | 3 Sep 16:49 2008
Picon

empty paragraph

hi,

while doing sentence alignment for english and spanish (en es)
i got several (error?)  messages like this

ep-99-10-06.txt (speaker 78) different number of paragraphs 9 != 13
ep-99-10-06.txt (speaker 87) different number of paragraphs 8 != 9
ep-99-10-06.txt (speaker 113) different number of paragraphs 8 != 9
ep-99-10-06.txt (speaker 170) different number of paragraphs 8 != 7
ep-99-10-06.txt (speaker 171) different number of paragraphs 14 != 16
ep-99-10-06.txt (speaker 181) different number of paragraphs 4 != 3
ep-99-10-06.txt (speaker 219) different number of paragraphs 8 != 7
Warning: No known abbreviations for this language

THen i compared the text in file 99-10-06 for both the languages

English

<SPEAKER ID=78 NAME="President">
Ladies and gentlemen, as you can well imagine, this is neither the time nor the place to start a debate. In fact, the vote is under way.
<P>
(Parliament adopted the decision)
<P>
Report (A5-0017/1999) by Mr H.-P. Martin, on behalf of the Committee on Industry, External Trade, Research and Energy, on the proposal for a Council Decision providing further macro-financial assistance to Bulgaria (COM(1999)403 - C5-0098/1999 - 1999/0165(CNS))
<P>
(Parliament adopted the legislative resolution)
<P>
Report (A5-0018/1999) by Mr H.-P. Martin, on behalf of the Committee on Industry, External Trade, Research and Energy, on the proposal for a Council Decision providing supplementary macro-financial assistance to the former Yugoslav Republic of Macedonia (COM(1999)404 - C5-0099/1999 - 1999/0166(CNS))
<P>
(Parliament adopted the legislative resolution)
<P>
Report (A5-0019/1999) by Mr H.-P. Martin, on behalf of the Committee on Industry, External Trade, Research and Energy, on the proposal for a Council Decision providing supplementary macro-financial assistance to Romania (COM(1999)405 - C5-0097/1999 - 1999/0167(CNS))
<P>
(Parliament adopted the legislative resolution)
<P>
Joint motion for resolution on the International AIDS Conference in Zambia


spanish

<SPEAKER ID=78 NAME="La Presidenta">
Señorías, como pueden suponer, no es ni el lugar ni el momento de iniciar un debate. Estamos procediendo a la votación.
<P>
(El Parlamento aprueba la decisión)
<P>

<P>
Informe (A5-0017/1999) del Sr. H.-P. Martin, en nombre de la Comisión de Industria, Comercio Exterior, Investigación y Energía, sobre la propuesta de decisión del Consejo por la que se concede una ayuda macrofinanciera suplementaria a Bulgaria (COM(1999)403 - C5-0098/1999 - 1999/0165(CNS))
<P>
(El Parlamento aprueba la resolución legislativa)
<P>

<P>
Informe (A5-0018/1999) del Sr. H.-P. Martin, en nombre de la Comisión de Industria, Comercio Exterior, Investigación y Energía, sobre la propuesta de decisión del Consejo por la que se concede una ayuda macrofinanciera suplementaria a la Antigua República Yugoslava de Macedonia (COM(1999)404 - C5-0099/1999 - 1999/0166(CNS))
<P>
(El Parlamento aprueba la resolución legislativa)
<P>

<P>
Informe (A5-0019/1999) del Sr. H.-P. Martin, en nombre de la Comisión de Industria, Comercio Exterior, Investigación y Energía, sobre la propuesta de decisión del Consejo por la que se concede una ayuda macrofinanciera suplementaria a Rumania (COM(1999)405 - C5-0097/1999 - 1999/0167(CNS))
<P>
(El Parlamento aprueba la resolución legislativa)
<P>

<P>
Propuesta de resolución común sobre la Conferencia Internacional sobre el sida en Lusaka

 
we can see the cause of the error :Spanish content is having extra <p> tokens but they are empty .
After the alignment i observed these file and found that though the error log was shown the content is
still present in aligned files.. see the same portion in aligned files...

English:

<SPEAKER ID=78 NAME="President">
Ladies and gentlemen , as you can well imagine , this is neither the time nor the place to start a debate .
In fact , the vote is under way .
<P>
( Parliament adopted the decision )
<P>
Report ( A5-0017 / 1999 ) by Mr H.-P. Martin , on behalf of the Committee on Industry , External Trade , Research and Energy , on the proposal for a Council Decision providing further macro-financial assistance to Bulgaria ( COM ( 1999 ) 403 - C5-0098 / 1999 - 1999 / 0165 ( CNS ) )
<P>
( Parliament adopted the legislative resolution )
<P>
Report ( A5-0018 / 1999 ) by Mr H.-P. Martin , on behalf of the Committee on Industry , External Trade , Research and Energy , on the proposal for a Council Decision providing supplementary macro-financial assistance to the former Yugoslav Republic of Macedonia ( COM ( 1999 ) 404 - C5-0099 / 1999 - 1999 / 0166 ( CNS ) )
<P>
( Parliament adopted the legislative resolution )
<P>
Report ( A5-0019 / 1999 ) by Mr H.-P. Martin , on behalf of the Committee on Industry , External Trade , Research and Energy , on the proposal for a Council Decision providing supplementary macro-financial assistance to Romania ( COM ( 1999 ) 405 - C5-0097 / 1999 - 1999 / 0167 ( CNS ) )
<P>
( Parliament adopted the legislative resolution )
<P>
Joint motion for resolution on the International AIDS Conference in Zambia


spanish:

<SPEAKER ID=78 NAME="La Presidenta">
Señorías , como pueden suponer , no es ni el lugar ni el momento de iniciar un debate .
Estamos procediendo a la votación .
<P>
( El Parlamento aprueba la decisión )
<P>

<P>
Informe ( A5-0017 / 1999 ) del Sr . H.-P. Martin , en nombre de la Comisión de Industria , Comercio Exterior , Investigación y Energía , sobre la propuesta de decisión del Consejo por la que se concede una ayuda macrofinanciera suplementaria a Bulgaria ( COM ( 1999 ) 403 - C5-0098 / 1999 - 1999 / 0165 ( CNS ) )
<P>
( El Parlamento aprueba la resolución legislativa )
<P>

<P>
Informe ( A5-0018 / 1999 ) del Sr . H.-P. Martin , en nombre de la Comisión de Industria , Comercio Exterior , Investigación y Energía , sobre la propuesta de decisión del Consejo por la que se concede una ayuda macrofinanciera suplementaria a la Antigua República Yugoslava de Macedonia ( COM ( 1999 ) 404 - C5-0099 / 1999 - 1999 / 0166 ( CNS ) )
<P>
( El Parlamento aprueba la resolución legislativa )
<P>


Questions:
-> Does it  mean that the aligned files i have generated are not suitable for training the model?
-> Can we modify the pre-precessing script to replace the empty paragraphs?


Thanks

--
********************************
sushil ronghe
*********************************
_______________________________________________
Moses-support mailing list
Moses-support@...
http://mailman.mit.edu/mailman/listinfo/moses-support
Chris Callison-Burch | 4 Sep 00:12 2008

additional weights in the phrase table

Hi guys,

I'm trying to add another weight to the phrase table.  I get a message  
in MERT training that says "Your model tm needs 6 weights but we  
define the default ranges for only 5 weights.  Cannot use the default,  
you must supply lambdas by hand."    Can anyone tell me how to do so?

Thanks a million.

--Chris
Nguyen Bach | 4 Sep 00:43 2008
Picon

Re: additional weights in the phrase table

Hi Chris,

I think you need to supply mert weights via mert-moses.pl script with 
option
 --lambdas="d:  ... lm: ... tm: ... w:"

Cheer,
Nguyen
Chris Callison-Burch wrote:
> Hi guys,
>
> I'm trying to add another weight to the phrase table.  I get a message  
> in MERT training that says "Your model tm needs 6 weights but we  
> define the default ranges for only 5 weights.  Cannot use the default,  
> you must supply lambdas by hand."    Can anyone tell me how to do so?
>
> Thanks a million.
>
> --Chris
> _______________________________________________
> Moses-support mailing list
> Moses-support@...
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>   
Sirvan Yahyaei | 4 Sep 01:36 2008
Picon

sigtest-filter

Dear all,
I wanted to build sigtest-filter tool included in the moses package,  
however the url for SALM toolkit does not work and I could not find it  
anywhere else.
If it is still publicly available, may I have a copy of it or a url to  
download it.

Thanks,
Sirvan

Gmane