Hi !
I recently completed my second training process of a
new model, and discovered some differences compared to the first time. When
running the training-script train-factored-phrase-model, I got lots of messages,
first there were many "alignment point out of range ..", then came
lots of "use of uninitialized value in schalar ..." and finally I got
lots of warnings like the ones below (I'm doing translation from english to
norwegian, using 3-gram LMs).
WARNING: sentence 1200 has alignment point (4, 0) out
of bounds (4, 4)
E: annen operasjon p?? urethra
F: other operation on urethra
WARNING: sentence 1201 has alignment point (10, 0)
out of bounds (8, 8)
E: perkutan drenasje av pseudocyste eller abscess i pancreas
F: percutaneous drainage of pseudocyst or abscess of
pancreas
WARNING: sentence 1202 has alignment point (10, 0)
out of bounds (8, 7)
E: lukking av endeenterostomi med anastomose til colon
F: closure of terminal enterostomy with anastomosis
to colon
WARNING: sentence 1209 has alignment point (5, 0) out
of bounds (5, 4)
E: andre spesifiserte kvinnelige kj??nnsorganer
F: other specified female genital organs
I didn't get any errors, it all terminated nicely. But
looking at the lex-files in the newly constructed model, almost all entries (ca
90%) in the two lex-files are like this "NULL educational 1.0000000",
"NULL plumbing 1.0000000", "NULL reformere 1.0000000",
"NULL renskrivning 1.0000000", "beordre NULL 0.0000058",
"skyfle NULL 0.0000058". When I did the first training, my lex-files
had almost no such entries with NULL, so the difference is huge.
The only thing I did differently this time, was to
use several LM when running the training-script, but that should be ok. The
data is sentence-aligned, but quite extended compared to the first time.
Below is the command I used to execute the
training-script with parameters
bin/moses-scripts/scripts-20070717-1342/training/train-factored-phrase-model.perl
-model-dir /home/stig/wsDirMoses/model/2opptrening13aug/ -scripts-root-dir
/home/stig/wsDirMoses/bin/moses-scripts/scripts-20070717-1342 -root-dir
/home/stig/wsDirMoses -corpus
/home/stig/wsDirMoses/corpus/opptrening12aug/alleKodeverk.lowercased -f en -e
no -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm
0:3:/home/stig/wsDirMoses/lm/opptrening12aug/generell.lm:0 -lm
0:3:/home/stig/wsDirMoses/lm/opptrening12aug/icd10.lm:0 -lm
0:3:/home/stig/wsDirMoses/lm/opptrening12aug/ICFtitler.lm:0 -lm
0:3:/home/stig/wsDirMoses/lm/opptrening12aug/ncsp.lm:0
If anyone have any idea why I get so many NULLs in my
lexical tables and all those messages during training, I’d be happy to
know,
Stig Alvestad