Re: A Question on Moses / not recognizing compiled in SRILM language model on OSX
Hi David,
I think your project sounds fun.
Can you send me the line in your moses.ini file that specifies the
language model? It may be that you have a format error which is
causing the trouble.
As for the training data. First, with 48000 segments, you may well
have a reasonable basic system. Although system quality is highly
dependent on language, and the genres being translated, there is a
popular MT corpus (BTEC, consisting of tourism type phrases) that only
has about 40000 sentence pairs.
To train a language model, you'll definitely want to include the
English translations you're using to train the translation model since
these are closest to the kinds of sentences your system will be able
to generate. In fact, a reasonable starting point would just be to
use your English translations as the basis for the language model.
Beyond that, I'm not familiar with which corpora are freely available,
but perhaps someone else on the list who's looked into this could make
a suggestion.
--Chris
Chris
On 10/5/07, David Kirk Evans <dave@...> wrote:
> Hello moses-support,
>
> Just for fun I'm using Moses to learn a translation system based
> off of 42 translations of Japanese comic books that I've done over
> the years.
>
> I thought that I had completed working through the learning cycle,
> but when I try to run moses as a decoder, I ran into this error:
>
> ...
> Start loading LanguageModel /Users/devans/Documents/workspace/
> GMAOParallelDataExtractor/europarl.en.lm : [16.000] seconds
> ERROR:Language model type unknown. Probably not compiled into library
> ERROR:no LM created. We probably don't have it compiled ...
>
> I believe that it is compiled into the library though, since I
> configured with:
>
> $ ./configure --with-srilm=/usr/local/srilm
>
> and the make process properly found Ngram.h, and it looks like it
> included in the lib directory (LDFLAGS = -L/usr/local/srilm/lib/macosx)
>
> Has anyone else run into this problem?
>
> I compiled on
> $ uname -a
> Darwin Dhalsim.local 8.10.0 Darwin Kernel Version 8.10.0: Wed May 23
> 16:50:59 PDT 2007; root:xnu-792.21.3~1/RELEASE_PPC Power Macintosh
> powerpc PowerBook5,8 Darwin
> $ gcc --version
> powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc.
> build 5026)
> $ automake --version
> automake (GNU automake) 1.9.6
>
> I haven't used Xcode myself, but I do have it around so if that is
> the recommended way to get things running under OSX perhaps I should
> try that?
>
> Anyway, I hope I can work this out since I curious to see if I can
> get any sort of reasonable translations out of approximately 48,290
> aligned comic book "bubbles" done by an amateur translator...
>
> By the way, I used the English portion of the Europarl corpus to
> build the language model since it was the only data I knew of that
> was freely available. Does anyone know if someone has built a SRILM
> compatible language model off of the google n-gram data, or some
> other sort of data that would be less parlimentary-like and more
> general text-like?
>
> Thanks in advance,
>
> David K. Evans
> _______________________________________________
> Moses-support mailing list
> Moses-support@...
> http://mailman.mit.edu/mailman/listinfo/moses-support
>