Francis Tyers | 22 May 2013 15:24
Favicon
Gravatar

Blog post about Apertium course

Blog post by Niklas Laxström (developer of Translatewiki) on the recent
Apertium course in Helsinki:

http://laxstrom.name/blag/2013/05/22/on-course-to-machine-translation/

Interesting reading :)

Fran

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Bernard Chardonneau | 20 May 2013 19:38
Picon
Favicon

Apertium presentation accepted during the next RMLL

The following is in French. In February, I announced on this list
I was candidate for a presentation of Apertium (in French) during
the next RMLL in Brussels.

Now this presentation is accepted (but not yet included in the
time-table of this meeting).

A condition is I speak a little about north europe langages in my
presentation.

User-Agent: SquirrelMail
Date: Sun, 19 May 2013 15:22:48 +0200
From: Odile Bénassy <odile.benassy@...>
To: "Bernard Chardonneau" <bechapertium@...>
Subject: Re: Votre proposition pour les RMLL 2013

Le Sam 18 mai 2013 15:10, Bernard Chardonneau a écrit :
> Bonjour,
>
>
> A mon tour de prendre un peu de retard après avoir traité un peu plus
> de 400 mails arrivés pendant les vacances plus ceux qui ont suivi jusqu'à
> jeudi.

je suis pareil

on vous accepte ; vos explications me convainquent ; d'avance merci de
tenir compte, lors de votre présentation, des préoccupations contenues
dans mon précédent mail, cela facilitera les choses

(Continue reading)

Xavi Ivars | 15 May 2013 15:36
Picon
Gravatar

Results of the PMC Election 2013

Hi all,

It's been more than three weeks since we started the election for the new Apertium PMC.

We've had a 92% of election turnout, with 25 people voting from a total of 27 people in the census.

The results are:

President: 
Mikel L. Forcada: 25 votes

Because of that, Mikel is already part of the PMC, and in case he were in the TOP6 elected members, the 7th would also be part of the PMC. However, Mikel is not in the TOP6, so no changes to the original result are needed.

Member Votes     Comment
Francis Tyers 21     Elected PMC member
Sergio Ortiz 14         Elected PMC member
Jacob Nordfalk 13     Elected PMC member
Jimmy O'Regan 13      Elected PMC member
Gema Ramírez Sánchez 11     Elected PMC member
Felipe Sánchez Martínez     Elected PMC member
Mikel L. Forcada 8     Discarded: PMC president
Bernard Chardonneau 5
Juan Antonio Pérez Ortiz 5


With those results, the new Apertium PMC is composed by
  • Mikel L. Forcada (president)
  • Francis Tyers
  • Sergio Ortiz
  • Jacob Nordfalk
  • Jimmy O'Regan
  • Gema Ramírez Sánchez
  • Felipe Sánchez Martínez
Congratulations to you all the new PMC members!

--
< Xavi Ivars >
< http://xavi.ivars.me >
------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@...
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
aboobacker sidheeque mk | 14 May 2013 04:55
Picon
Gravatar

monodix

How to split a word formed from more than two words into it,s components
Eg
(malayalam):മഴക്കാലമേഘങ്ങളെല്ലാമിരുണ്ടുകൂടി
is formed from
മഴക്കാലം, + മേഘങ്ങള്‍,+ എല്ലാം, +
ഇരുണ്ട്, + കൂടി
------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Bernard Chardonneau | 13 May 2013 23:07
Picon
Favicon

Stange behaviour of the on-line version of the translator

The problem appeared during April.

Now, if we use http://www.apertium.org/index.php?id=translatetext for a
translation, when sending the text to be translated, the translation
direction selected switch automatically to Spanish -> Catalan. So no other
choice can be used.

A less important problem, changes done on availlable language pairs since
more than one year are not yet taken into account by on-line translators.
This point also concern http://apertium.saluton.dk website.

--------------------------------
Bernard Chardonneau (France)
Phone : [33] 1 64 90 87 04 or [33] 9 72 36 32 90
(from Sept to June except holidays)
GSM phone : [33] 6 49 95 13 95 (french scholl holidays, C zone)

Multilingual websites for my free softwares :
http://libremail.free.fr and http://libremail.tuxfamily.org
http://cyloop.tuxfamily.org (mainly translated with Apertium)

My general website (in french only)
http://bech.free.fr

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
Per Tunedal | 12 May 2013 14:26

Status of the pair sv-da


Hi,
I'm pleased to hear of the plans of  Jonas Fromseier Mortensen to start
working on Norwegian-Danish (no-da), including both bokmål (nb) and
nynorsk (nn). That would make it much easier for me to realize my
original plan to set up the pair Norwegian-Swedish (no-sv), me too
including both bokmål (nb) and nynorsk (nn).

What has refrained me from starting the work so far, is that I was
pushed into first fixing "some minor issues" with the pair
Swedish-Danish (sv-da). OK, I'll give it a week, I thought, and have now
spent a year! My goal was to fix the most blatant errors and extend the
dictionaries to include more words used in ordinary life, rather than in
the EU Parliament. Further I wanted to release the other translation
direction, Danish to Swedish (da-sv).

Status as today:

1. I've fixed some errors but many are yet to be found and tackled. Some
errors might be fixed by retraining the tagger, writing some clever
transfer rules and using the new disambiguator: that remains for me to
try.
2. I've added quite a few new words, mainly by:
a) adding entries from the pair Icelandic-Swedish (is-sv)
b) gold-washing from various sources by using wish-list of Danish and
Swedish words.
- I hoped that many of the words would "meet in the middle", i.e. would
be present in both monodixies, letting me just add the translation in
the bidix. Unfortunately, this only happened for about a third of the
added words. Consequently, I have to add some words manually to the
monodixies.
- By now, I've added most of the found wanted nouns and verbs. I have
simply skipped all words I haven't managed to translate effortlessly.
- Many common adjectives and adverbs remains to add.

Further, I've added quite a few abbreviations and some common false
friends I know of. I've also started some work on pronouns - many are
still missing.

Working with the bidix has revealed that many of the words in the Danish
dictionary (much larger than the Swedish dictionary) are simply
non-existent. All the same, they are nicely put into the monodix with
valid paradigms. Apparently, one or more of the semi-automatic tools has
gone havoc. This is a minor problem for me, as they will all go away
when I trim the dictionaries, but might be a nuisance for Jonas while
working on the new pair  Norwegian-Danish (no-da).

An other problem is that my knowledge of Danish is very limited. I have
tried to make some informed guesses, with the help of dictionaries and
an introductory grammar. All the same, some of my entries, especially in
the Danish monodix, might be erroneous. It might be a good idea to take
a glance at them (marked by my initials PT). Maybe expanding the monodix
and looking for odd entries. Or translating some test texts and spotting
errors.

The translation is still very poor, and unfortunately I believe that
this is very hard to fix. I've identified the tagger and word
disambiguation as the critical steps. I've come to the conclusion that
it's silly to let the tagger choose one and only one translation. A
better disambiguation would be most helpful. Maybe it would be possible
to translate all possible matches, disregarding the part of speech, and
later choose the translation that makes most sense/is the most fluent in
the target language? Or use a disambiguator instead of the tagger? I
will gladly discuss this in a separate thread.

Right now, I'm quite busy with other projects, so I cannot do much work
on Apertium. On the other hand I'm always interested in having a
discussion.
Yours,
Per Tunedal

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Volkan Cirik | 1 May 2013 21:22
Picon
Gravatar

GSoC New Language Pair TR - EN

Hello again!

I am also interested in new language pair for Apertium for GSoC. I talked on mIRC
with spectie about Turkish - Azeri but I am more into Turkish - English.

Using incubator version of TR-EN as a starting point with HOWto page, I try to 
translate some sentences from here. My solution is here : https://github.com/wolet/apertium
After checking out the code, please run make test-input.

I should also indicate that I am familiar with MT systems since I am a member of Koc University
AI Lab, which recently delivered English-Turkish SMT service for Bologna Translation project of EU. 

Any tips other than the Apertium's GSoC wiki page to deliver feasible proposal for new pair?
Thanks,
volkancirik
------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@...
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Volkan Cirik | 1 May 2013 21:11
Picon
Gravatar

GSoC - Sliding-window POS Tagger

Hello all,


I am interested in participating GSoC'13 with Apertium. I am trying to solve
the challenge of Sliding-window POS Tagger. I come up with a solution
but I does not solve for the first token and honestly I do not understand
how the given gold answer is generated. Can you give me any hints?

By the way, my answer is :
I.num.mf.sg have.vbhaver.inf a.det.ind.sg saw.n.sg ...sent

Thanks,
volkan
------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@...
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Madhura Parikh | 1 May 2013 19:24
Picon

[GSoC - 13] : Student Interested in Applying to Apertium

Hello!

I am Madhura Parikh , currently a final year Computer Science undergrad. I am very interested in working on the following two project ideas mentioned in the ideas-list

1. Accent and diacritic restoration
Here the coding challenge says that we should modify the program to respect superblanks. Does this mean that it should also accept inputs that are say for e.g similar to [ <em> ] this is important [</em>] and ignore the superblanks? Isn't something like this already implemented in charlifter.restorer.restore (near line 691) ? I would be glad for clarification on this.

2. Geriaoueg vocabulary assistant

I understand that I am applying very late - unfortunately I have been busy with my term-ending exams, project and submission deadlines at college in the GSoC application phase. I have created an account on the wiki, sourceforge and joined the mailing list.

I am very interested in NLP  - I have done a couple of projects and internships on this and am headed for graduate study also in this area. More about my projects may be found at  sites.google.com/site/madhuraparikh/.

Regards,
Madhura
------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@...
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
David Cuenca | 1 May 2013 17:59
Picon

MT and Wikipedia

Dear all,

Erik Möller, head of Engineering and Product Development in the Wikimedia Foundation, started a thread on the Wikimedia mailing list about the convenience or not of supporting open source machine translation. Original thread:
http://lists.wikimedia.org/pipermail/wikimedia-l/2013-April/125350.html

I suggested using software like Omegawiki or Wikidata as a frontend for building grammar and language pair files that software like Apertium uses:
http://lists.wikimedia.org/pipermail/wikimedia-l/2013-April/125642.html

It would be great if you could take a look and share your impressions, either here or on the WM mailing list.

Thank you!
David Cuenca


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@...
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Mikel Forcada | 1 May 2013 17:04
Picon
Favicon

2-year postdoctoral job at Prompsit Language Engineering

Abu-MaTran: automatic building of machine translation

Marie Curie IAPP project FP7-PEOPLE-2012-IAPP

24-month recruitment of a postdoctoral researcher

Overview


Prompsit1 is a research-shaped company created in 2006 inside the Transducens2 research group at the Department of Software and Computing Systems3 (Universitat d'Alacant - Spain). It's a leading company in the development of machine translation, specially linguistically-motivated systems such as Apertium4 rule-based systems or linguistically-augmented Moses5 statistical systems. The company activity in R&D is intense both as an industry-driven activity or by participation in public national and international R&D programs.


Abu-MaTran6 (Automatic Building of Machine Translation) is an IAPP-FP77 project in which the company is currently involved. The project aims at increasing the hitherto low industrial adoption of machine translation by identifying crucial cutting-edge research techniques (automatic acquisition of corpora and linguistic resources, pivot techniques, linguistically augmented statistical translation and diagnostic evaluation) and preparing them to be suitable for commercial exploitation.


Besides Prompsit as a central node of interaction, the project involves four top research institutions (Dublin City University8 - project coordinator, Universitat d'Alacant9, University of Zagreb10 and Institute for Language and Speech Processing11). At Prompsit, the project will be led by researcher Sergio Ortiz Rojas, responsible for most of the code of the Apertium MT platform, Prompsit's linguistically-augmented Moses system, a modular version of the Bitextor12 parallel text collector and other natural language processing tools for information extraction or opinion analysis.


The position involves research, development and participation in outreach activities to achieve the goals of the Abu-MaTran project as well as collaboration with all researchers in the project.

Job Description

Main Duties and Responsibilities

  • Investigate, in collaboration with the partners, better techniques to automate:

    • monolingual and bilingual general and domain-focused corpora acquisition

    • monolingual and bilingual terminology extraction

    • automatic induction of transfer rules

    • building of pivot or linguistically-augmented machine translation systems

    • machine translation automatic evaluation

  • Implement the techniques for each of the previous points

  • Carry out experiments to evaluate their performance.

  • Release the output as free/open-source tools with appropriate interfaces to use them.

  • Write the appropriate documentation for each of the work lines: technical documentation for developers, academic-oriented (papers, posters, etc.) publications, and tutorials or manuals for developers.

  • Attend project-related conferences and meetings

  • Present the results at relevant conferences and scientific meetings

  • Review work plan with the collaborators according to project intermediate milestones and results.

  • Get involved and give support to outreach activities (linguistic olympiads13, FreeRBMT workshop14)

Person Specification

Applicants should provide evidence in their applications that they meet the following criteria.

The staff in charge of this recruitment process, will use a range of selection methods to measure candidates' abilities in these areas including reviewing your application, seeking references, inviting shortlisted candidates to be interviewed, and other forms of assessment action relevant to the post.

Criteria

Qualifications (compulsory): PhD in Computer Science (or at least 4 years of full-time research experience) and less than 10 years of full-time research experience.

International procedure (compulsory): the candidate cannot have worked or lived for more than 12 months within the last 3 years in Spain.

Experience in:

  • Natural language processing, particularly in machine translation (compulsory).

  • User-level or developer-level experience in Apertium, Moses, OpenMaTrEx15, Bitextor, FBC and FMC16, ccLexExtractor17 and DELiC4MT18 (desirable)

  • Data acquisition (desirable)

  • Terminology extraction (desirable)

  • Machine learning (desirable)

  • MT evaluation (desirable)

  • Creation of user interfaces and software releasing/sharing (desirable)

Programming languages: C++, Python, PHP (compulsory). JAVA (desirable).

Multilingual skills: Good level of English (compulsory). Basic knowledge of Spanish or Catalan (desirable). Knowledge of the South Slavic Languages targeted in the project use case -- Croatian, Bosnian, Serbian or Montenegrin and Slovenian (desirable).

Good writing and communication skills: ability to intercommunicate with people and to communicate results, ideas, etc. (compulsory).

Collaborative working skills: ability to take and delegate responsibilities (compulsory).

Experience in free/open-source software development: participation in free/open-source software development projects as user or, better, as contributor (desirable).

Experience in transfer of knowledge between the industry and the academy: interaction between industry and academy in previous positions is highly valued (desirable).

Creativity and flexibility skills: ability to be open to different ideas or opinions, to analyse and solve problems and to make decisions (desirable).

Active research skills: ability to follow state-of-the-art research lines associated with the project, to learn and acquire new skills relevant to the project, to write scientific works, and to meet deadlines (desirable).

Further Information

This post is fixed-term and full-time at Prompsit (Elx/Elche, Spain). The starting date is January 2014 and duration is 24 months. Splits are not possible.

Terms and conditions of employment:

Terms and conditions will be according to the stipulations of the IAPP program. The recruited candidate will have a full-time contract with full social security coverage subject to the Spanish laws and taxes.

Salary:

For an Experience Researcher with less than 10 year experience the stipulated salary will be a €57,154 per year gross salary corresponding to living allowance and additionally €683/€977 per month for mobility allowance (depending on family charges).

Closing date:

1st June 2013.

Informal enquiries:

For informal enquiries about this job contact us at info-y6CagUfFfa1Wk0Htik3J/w@public.gmane.org.

12http://bitextor.sourceforge.net/ (see prompsit branch in source code)

-- Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/) Departament de Llenguatges i Sistemes Informàtics Universitat d'Alacant E-03071 Alacant, Spain Phone: +34 96 590 9776 Fax: +34 96 590 9326
------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@...
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Gmane