Jasper Phillips | 23 May 2013 19:56
Favicon

Minimum Linux Kernel Version Required for Moses 1.0 Binaries

Good Afternoon All,

I am running the EMS included in the Moses 1.0 64-bit Linux Binary to train a baseline system.  However, it fails at the [TRAINING: prepare data] step with the error "FATAL: kernel too old".  I'm wondering what is the minimum kernel version required for the 1.0 binaries?  I'm trying to run Moses on a Red Hat Enterprise Linux Client release 5.9 with kernel version 2.6.18-308.11.1.e15.

The following is the step where it fails:

number of steps doable or running: 1 at Thu May 23 12:36:49 EDT 2013
        doable: TRAINING:prepare-data
        executing /home/g/grad/icos/Moses/moses_64/scripts/ems/test_model_basic/steps/1/TRAINING_prepare-data.1 via sh (1 active)
FATAL: kernel too old
step TRAINING:prepare-data crashed
number of steps doable or running: 0 at Thu May 23 12:40:16 EDT 2013


If any other info would be helpful, please let me know.

Thanks in advance!


M. Jasper Phillips

Operations Department

Language Scientific, Inc.®

10 Cabot Road, Suite 209

Medford, MA 02155

T: 617.621.0940 x152

F: 617.621.2552

jphillips-PZpoF4qebg/zh0hh38GeGa6RkeBMCJyt@public.gmane.org

www.languagescientific.com

FKA: RIC International, Inc.

_______________________________________________
Moses-support mailing list
Moses-support@...
http://mailman.mit.edu/mailman/listinfo/moses-support
Cara Greene | 23 May 2013 16:23
Picon
Picon
Favicon

Postgraduate student internships


Apologies for cross posting. 
Please forward to interested parties.

**********************************************************************************************

At the Centre for Next Generation Localisation (CNGL) in Dublin, Ireland, we have a number of internships available covering a wide range of topics in Natural Language Processing and Machine Translation based at our Dublin City University site.  The internships are available for both basic research and more applied research projects (including development-focused work).

Candidates are required to be registered as MSc or PhD students (by research) in their home universities while carrying out their internship in Dublin and need to provide written confirmation of this from their home institute. Please find the internship advertisement attached.

Details of a number of specific internships can be found at:
http://www.cngl.ie/outreach/graduate-programme/postgradinternships/

Closing date: 31st May 2013

For any informal enquiries please contact:
CNGL Education and Outreach

Dr. Cara Greene, CNGL, DCU
Phone: +353 (0)1 7006704
E-mail: cgreene (AT) computing.dcu.ie

Web: http://www.cngl.ie

Application Procedure:

For formal applications, please download an application form from the link below and send it to cgreene (AT) computing.dcu.ie by Friday 31st May 2013.
http://www.cngl.ie/outreach/graduate-programme/postgradinternships/


Thank you,
Cara

Dr. Cara Greene

CNGL Education and Outreach
School of Computing
Dublin City University
Dublin 9

T: +353 (0)1 7006704
E: cgreene (AT) computing.dcu.ie

Attachment (CNGL-internships-2013.pdf): application/pdf, 391 KiB
_______________________________________________
Moses-support mailing list
Moses-support@...
http://mailman.mit.edu/mailman/listinfo/moses-support
Jacob Dlougach | 23 May 2013 15:37
Picon

Error while consolidating

I am constantly getting the following message while trying to build a baseline model for Turkish-English:
(6.6) consolidating the two halves <at> Thu May 23 14:44:11 MSK 2013
Executing: /opt/mosesdecoder/scripts/../bin/consolidate /home/jacob/tr-en/phrasebased/training/model/phra
se-table.half.f2e.gz /home/jacob/tr-en/phrasebased/training/model/phrase-table.half.e2f.gz /dev/stdout --
GoodTuring /home/jacob/tr-en/phrasebased/training/model/phrase-table.half.f2e.gz.coc | gzip -c > /home/ja
cob/tr-en/phrasebased/training/model/phrase-table.gz
Consolidate v2.0 written by Philipp Koehn
consolidating direct and indirect rule tables
adjusting phrase translation probabilities with Good Turing discounting
ERROR: source phrase does not match in line 1201: 'ruhe ! ! !' != 'take part ! !'
Executing: rm -f /home/jacob/tr-en/phrasebased/training/model/phrase-table.half.*

My training script looks like this:
export MOSES_DIR=/opt/mosesdecoder
nohup $MOSES_DIR/scripts/training/train-model.perl \
    -corpus clean \
    -root-dir . \
    -f tr -e en \
    -lm 0:5:/home/jacob/tr-en/LM/en.utf8.blm.mm:9 \
    -alignment grow-diag-final-and \
    -reordering msd-bidirectional-fe \
    -external-bin-dir /opt/MGIZA++/ \
    -cores 24 -mgiza -mgiza-cpus 32 -write-lexical-counts \
    -score-options "--GoodTuring" \
    -max-phrase-length 5 \
    >training.log 2>&1 &

As the result, the translation table is much shorter than expected (1135 lines, when parallel corpus size is 12364208 sentences). What could possibly go wrong here?
_______________________________________________
Moses-support mailing list
Moses-support@...
http://mailman.mit.edu/mailman/listinfo/moses-support
Prashant Mathur | 23 May 2013 01:53
Picon
Favicon

MIRA experiments

Hi All,

I am trying to use MIRA for sparse features in moses.
In MIRA there are many factors which regulate the final results a lot, such as
1. Normalisation & sigmoid parameter for normalisation
2. Value of aggressive parameter (slack variable)
3. scaling of margin
so on.

I wanted to know if there is any comparative study done on how much a
factor can affect the MT system.
I say this because I don't want to just use the default values.
1. When I experiment with MIRA sometimes normalisation works much
better than not doing it,
2. changing the slack variable slightly drives the BLEU crazy.
So, I use Simplex to optimize the value of slack variable but that
also fails to converge on most of the occasions (doesn't converge even
after 50 iterations).
I read the MIRA for moses paper but that one didn't had the
comparative study that I am looking for.

Any links?

Thanks,
Prashant
Hieu Hoang | 22 May 2013 14:59
Picon
Picon

kanzhang.jerry@... requires approval

hi kan

you must subscribe to the mailing list to post to it. You can subscribe here
  http://mailman.mit.edu/mailman/listinfo/moses-support

you answer you question-i think you need to install the package
   sudo apt-get install libbz2-dev
also, make sure you have these packages installed too
   zlib1g-dev
  libboost-all-dev

---------- Forwarded message ----------
From: <moses-support-owner-3s7WtUTddSA@public.gmane.org>
Date: 22 May 2013 12:29
Subject: Moses-support post from kanzhang.jerry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org requires approval
To: moses-support-owner <at> mit.edu


As list administrator, your authorization is requested for the
following mailing list posting:

    List:    Moses-support <at> mit.edu
    From:    kanzhang.jerry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
    Subject: MOSES ON Ubuntu 12.04LTS ERROR HELP!
    Reason:  Post by non-member to a members-only list

At your convenience, visit:

    http://mailman.mit.edu/mailman/admindb/moses-support

to approve or deny the request.


---------- Forwarded message ----------
From: Kan Zhang <kanzhang.jerry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: moses-support-3s7WtUTddSA@public.gmane.org
Cc: 
Date: Wed, 22 May 2013 19:29:38 +0800
Subject: MOSES ON Ubuntu 12.04LTS ERROR HELP!
Hi ALL,

I got a problem in installing MOSES on my laptop Ubuntu 12.04lts, and the build.log is attached,

the dependencies such as the boost lib is already installed, and i followed step-by-step instructions, but still something goes wrong, would you please help find out my wrong doing or give any suggestions?

I'm looking forward to your reply and thank you very much!

--
Best wishes!
Kan Zhang


---------- Forwarded message ----------
From: moses-support-request-3s7WtUTddSA@public.gmane.org
To: 
Cc: 
Date: 
Subject: confirm d45cc48f0688f082b21903940c2e19b9b87c87f5
If you reply to this message, keeping the Subject: header intact,
Mailman will discard the held message.  Do this if the message is
spam.  If you reply to this message and include an Approved: header
with the list password in it, the message will be approved for posting
to the list.  The Approved: header can also appear in the first line
of the body of the reply.



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu

Attachment (build.log): application/octet-stream, 11 KiB
_______________________________________________
Moses-support mailing list
Moses-support@...
http://mailman.mit.edu/mailman/listinfo/moses-support
Alexander Raginsky | 22 May 2013 11:05
Favicon

NLP People news

NLP People news:

1. We continue to publish the guidelines on career path selection. The article we just made available at NLPPeople.com is called “Why does it really take to make it in academia?”. We and the author (Paige Harris) will very much appreciate if you could share your opinion on the topics discussed in the article. The direct link to the publication: https://nlppeople.com/index.php/publications/articles/111-what-does-it-really-take-to-make-it-in-academia

2. Hot NLP job openings: 
- talented NLP researcher for a leading software company based in Cambridge (UK):https://nlppeople.com/index.php/job/2083/nlp-researcher-cambridge
- lead researcher position within Samsung Advanced Research Institute in Bangalore (India)https://nlppeople.com/index.php/job/2059/researcher-bangalore
- NLP software developer advertised by Venbest Recruiting in Kyiv (Ukraine):https://nlppeople.com/index.php/job/2049/software-developer-(nlp)-kyiv

3. Current machine learning and data mining openings:
- entry level programmer for the The Oak Ridge National Laboratory (USA):https://nlppeople.com/index.php/job/2089/entry-level-programmer-knoxville-tn
- data analytics lead for Intelecox at San Jose, CA, (USA): https://nlppeople.com/index.php/job/2075/data-analytics-lead-san-jose-ca

.. and many more NLP, localization, machine learning and data mining job opportunities are waiting for you at NLPPeople.com
_______________________________________________
Moses-support mailing list
Moses-support@...
http://mailman.mit.edu/mailman/listinfo/moses-support
Johannes Hellrich | 21 May 2013 11:44
Picon
Picon
Favicon

Incorporating terminological information

Hi all,

I want to incorporate a (biomedical) multilingual-terminology into a  
SMT pipeline. I have no current plans to use disambiguation, so I  
could simply extract a dictionary from my terminology.
But how do I incorporate this dictionary with my pipeline? The common  
solution seems to be adding its entries to the training data. Would on  
over-sample it or add the entries only once? I also wondered if one  
couldn't treat it like a transliteration and replace the known  
translations in the training data with placeholders (and add some kind  
of wrapping around Moses to insert the correct ones) - this could be  
more useful if I use disambiguation later on.
Are there any best-practices if failed to find or would this be worth  
a thorough analysis?
Thank you very much,

Johannes Hellrich

----------------------------------------------------------------
This message was sent through https://webmail.uni-jena.de
Chen Kehai | 21 May 2013 09:31
Picon

回复: Moses-support Digest, Vol 79, Issue 34


 moses-support-request <at> mit.edu编写:

>Send Moses-support mailing list submissions to
>	moses-support <at> mit.edu
>
>To subscribe or unsubscribe via the World Wide Web, visit
>	http://mailman.mit.edu/mailman/listinfo/moses-support
>or, via email, send a message with subject or body 'help' to
>	moses-support-request <at> mit.edu
>
>You can reach the person managing the list at
>	moses-support-owner <at> mit.edu
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Moses-support digest..."
>
>
>Today's Topics:
>
>   1. Unusual failure for train-model.perl step 2.1b (Tom Hoar)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Mon, 20 May 2013 20:40:40 +0700
>From: Tom Hoar <tahoar <at> precisiontranslationtools.com>
>Subject: [Moses-support] Unusual failure for train-model.perl step
>	2.1b
>To: Moses-Support <moses-support <at> mit.edu>
>Message-ID: <519A27D8.8050107 <at> precisiontranslationtools.com>
>Content-Type: text/plain; charset=UTF-8; format=flowed
>
>The train-model.perl script from Beta 0.91 configured for MGIZA++ failed 
>on step 2.1b in the reverse direction with the error below. I think this 
>might be a result of inadequate cleaning. Can anyone confirm this or 
>offer an alternate reason? Thanks.
>
>m5p0 = -1 (fixed value for parameter p_0 in IBM-5 (if negative then it 
>is determined in training))
>manlexfactor1 = 0 ()
>manlexfactor2 = 0 ()
>manlexmaxmultiplicity = 20 ()
>maxfertility = 10 (maximal fertility for fertility models)
>ncpus = 1 (Number of threads to be executed, use 0 if you just want all 
>CPUs to be used)
>p0 = 0.999 (fixed value for parameter p_0 in IBM-3/4 (if negative then 
>it is determined in training))
>pegging = 0 (0: no pegging; 1: do pegging)
>reading vocabulary files
>Reading vocabulary file 
>from:/opt/domy/TRAININGS/alignments/align-dell2_full-en-es/giza.classes/en.vcb
>Reading vocabulary file 
>from:/opt/domy/TRAININGS/alignments/align-dell2_full-en-es/giza.classes/es.vcb
>Source vocabulary list has 85970 unique tokens
>Target vocabulary list has 84643 unique tokens
>Calculating vocabulary frequencies from corpus 
>/opt/domy/TRAININGS/alignments/align-dell2_full-en-es/giza.classes/es-en-int-train.snt
>Reading more sentence pairs into memory ...
>ERROR: target word 118049 is not in the vocabulary list
>Exit code: 255
>
>
>
>
>------------------------------
>
>_______________________________________________
>Moses-support mailing list
>Moses-support <at> mit.edu
>http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>End of Moses-support Digest, Vol 79, Issue 34
>*********************************************

_______________________________________________
Moses-support mailing list
Moses-support <at> mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
j0hn | 20 May 2013 21:39
Picon
Gravatar

Question about rule table probabilities and counts

Hello everyone, I'm trying to understand the format on the rule table from the tree based models and I'm a little bit confused about the counts.

I use the following corpus to train:

Spanish:
en mi casa
esta es mi casa

English:
in my house
this is my house

Then I use freeling to get the tree based tagging and I use that to train moses. The resulting rule table is here: http://pastebin.com/4k50avpt

I'm trying to understand how the probabilities are calculated, for example, I know that the direct phrase translation probability is calculated as:
count(target & source) / count(source)
but in rule table I get rules like this one:

esta es [sn][sn-chunk] [S] ||| this is [sn][sn-chunk] [S] ||| 1 1 1 1 2.718 ||| 2-2 ||| 0.5 0.5

Note that the counters of source and target are decimal numbers... in that case how are the direct and indirect translation probability calculated?
Is there any documentation where all of this is explained?

Thanks!
_______________________________________________
Moses-support mailing list
Moses-support@...
http://mailman.mit.edu/mailman/listinfo/moses-support
Tom Hoar | 20 May 2013 15:40

Unusual failure for train-model.perl step 2.1b

The train-model.perl script from Beta 0.91 configured for MGIZA++ failed 
on step 2.1b in the reverse direction with the error below. I think this 
might be a result of inadequate cleaning. Can anyone confirm this or 
offer an alternate reason? Thanks.

m5p0 = -1 (fixed value for parameter p_0 in IBM-5 (if negative then it 
is determined in training))
manlexfactor1 = 0 ()
manlexfactor2 = 0 ()
manlexmaxmultiplicity = 20 ()
maxfertility = 10 (maximal fertility for fertility models)
ncpus = 1 (Number of threads to be executed, use 0 if you just want all 
CPUs to be used)
p0 = 0.999 (fixed value for parameter p_0 in IBM-3/4 (if negative then 
it is determined in training))
pegging = 0 (0: no pegging; 1: do pegging)
reading vocabulary files
Reading vocabulary file 
from:/opt/domy/TRAININGS/alignments/align-dell2_full-en-es/giza.classes/en.vcb
Reading vocabulary file 
from:/opt/domy/TRAININGS/alignments/align-dell2_full-en-es/giza.classes/es.vcb
Source vocabulary list has 85970 unique tokens
Target vocabulary list has 84643 unique tokens
Calculating vocabulary frequencies from corpus 
/opt/domy/TRAININGS/alignments/align-dell2_full-en-es/giza.classes/es-en-int-train.snt
Reading more sentence pairs into memory ...
ERROR: target word 118049 is not in the vocabulary list
Exit code: 255
Lucia Specia | 17 May 2013 19:23
Picon
Gravatar

Shared task on quality estimation: test set released

Dear all,

The test sets for all subtasks can be downloaded from http://www.quest.dcs.shef.ac.uk/wmt13_qe.html

We will extend the deadline for submitting system results in two days: June 2nd instead of May 31st.

Best,
_______________________________________________
Moses-support mailing list
Moses-support@...
http://mailman.mit.edu/mailman/listinfo/moses-support

Gmane