Min-Yen Kan | 2 May 06:16 2009

Call for Exhibits at ACL-IJCNLP 2009

		      ACL-IJCNLP 2009 Call for Exhibits

The Annual Meeting of the ACL is the primary event for computational
linguistics (CL) and natural language processing (NLP) in the calendar
year. In 2009, it is combined with the International Joint Conference on
Natural Language Processing, which is the premier event for CL and NLP in
Asia. In recent years, the event has attracted over 1000 professionals from
around the world. The number and calibre of participants, from academia and
industry, provides an unrivalled opportunity to share ideas and catch up on
the latest advances in the field. In addition, the Singapore venue for
ACL-IJCNLP 2009 will provide a focal point for the rapidly-growing Asian

If you have a commercial product or service of interest to the CL and NLP
community, the ACL 2009 exhibits program is the perfect way to introduce it to
potential customers. Possible application areas include: mobile
communications, machine translation, the semantic web, language interfaces to
robots, dialogue systems, CL publications and e-journals.

The ACL 2009 exhibits space is ideally located near both the presentation
halls and breakout spaces. High-speed internet connectivity as well as a wide
range of audio-visual equipment is available to exhibitors.

We hope to feature a wide range of products in the exhibits program,

  * NLP technologies, including machine translation, natural language
    understanding, dialogue processing and natural language generation systems

  * Next-generation web services based on NLP technologies
(Continue reading)

Janne Bondi Johannessen | 2 May 09:29 2009


Call for papers

The RILIVS initiative will arrange its first open workshop on language infrastructure this autumn. 
We invite papers on the following topics, within the general area of language variation:

  • Language maps /cartography
  • Corpus search interfaces and results handling
  • Annotation (for grammar as well as for metadata)
  • Standardization (fonts)

Deadline for submission of abstracts: 20. May 2009.

Abstracts should be sent to: jannebj <at> iln.uio.no.

Selected, reviewed papers will be published in the Oslo Studies of Language series.

Confirmed invited speakers:

  • Hans-Jörg Bibiko (Max Planck Institute for Evolutionary Anthropology, Leipzig)
  • Steven Krauwer (ELSNET / CLARIN / Utrecht University)
  • Anke Lüdeling (Humboldt University of Berlin)
  • Bert Vaux (University of Cambridge)

Confirmed local speakers:

  • Janne Bondi Johannessen (University of Oslo)
  • Christian-Emil Ore (University of Oslo)

The workshop will take place at the University of Oslo, on 17. and 18. September 2009.

Local organising committee:

WORKSHOP WEB PAGE: http://omilia.uio.no/rilivs/

Janne Bondi Johannessen
Professor, The Text Laboratory, ILN
University of Oslo
P.O.Box 1102 Blindern, N-0317 Oslo, Norway
Tel: +47 22 85 68 14, mob.: +47 928 966 34
Corpora mailing list
Corpora <at> uib.no
Geoffrey Williams | 3 May 10:06 2009

PhD offer in Corpus Linguistics and NLP at Université de Bretagne-Sud

PhD offer in Corpus Linguistics and NLP

Title: RESON: RElations Sémantiques et Ontologies Naturelles / Semantic
relationships and natural ontologies
The LICORN team of the HCTI research group
(http://web.univ-ubs.fr/corpus/) associated to the  VALORIA research
group. (http://www-valoria.univ-ubs.fr/)has currently a vacancy for a
PhD research grant in computer science and corpus linguistics, funded by
the regional council of Brittany, France.

RESON is a multidisciplinary project situated at the crossroads between
corpus linguistics and computer sciences, in particular natural language
processing. It involves two closely related search group from the 
Université de Bretagne Sud (UBS), Lorient, France : LiCoRN, part of the
larger humanities l'HCTI which specialises in corpus linguistics, text
encoding and lexicography and the computer science group VALORIA

This particular research fits into a number of applied projects under way
in including applications in web crawling, robotics and electronic
dictionaries. The object here is to detect semantic relationships within
large corpora using different theories of collocational networking allied
with models such as FrameNet (Fillmore et l), Frames (Martin) and Corpus
Pattern Analysis (Hanks). The context is practical creating models
applicable in two ongoing projects: Emotirob and RITEL.
The former concerns the creation of an interactive robot companion for
children undergoing long stays in hospital. The latter, RITEL, is a
collaboration with the CNRS research group LIMSI to create interactive
dialogue via telephone.

More information can be found on the Association Bernard Gregory website

Candidates should speak fluent French and English and hold Master’s degree
in corpus linguistics,  computer sciences or natural language processing.
Knowledge of other European languages is a plus.

Candidates should contact the proposers, Geoffrey Williams (LiCoRN –
geoffrey.williams <at> univ-ubs.fr ) and Jeanne Villaneau (VALORIA –
jeanne.villaneau <at> univ-ubs.fr ) by email with a full CV and letter of


Geoffrey Williams, MSc, PhD
Professeur des universités en sciences du langage
Département d'ingénierie du document
UFR de Lettres, Sciences Humaines et Sociales
Université de Bretagne-Sud
4 rue Jean Zay,
F-56321 LORIENT Cedex

tél: +33 (0) 2 97 87 29 20
fax: +33 (0) 2 97 87 65 25

Corpora mailing list
Corpora <at> uib.no

Min-Yen Kan | 4 May 02:27 2009

Tutorials at ACL-IJCNLP 2009

Tutorial details are now available for ACL-IJCNLP 2009.
Full details of the tutorials are available at:


The tutorials announced are as follows:

T1: Fundamentals of Chinese Language Processing

Date/Time: Morning, 2 Aug 2009
Presenters/organisers: Chu-Ren Huang and Prof. Qin LU (both from The Hong
Kong Polytechnic University)

T2: Topics in Statistical Machine Translation

Date/Time: Morning, 2 Aug 2009
Presenters/organisers: Kevin Knight (USC/Information Sciences Institute)
Philipp Koehn (University of Edinburgh)

T3: Semantic Role Labeling: Past, Present and Future

Date/Time: Morning, 2 Aug 2009
Presenter/organiser: Lluís Màrquez (Universitat Politècnica de Catalunya)

T4: Computational Modeling of Human Language Acquisition

Date/Time: Afternoon, 2 Aug 2009
Presenter/organiser: Afra Alishahi (Saarland University)

T5: Learning to Rank and Applications

Date/Time: Afternoon, 2 Aug 2009
Presenter/organiser: Hang Li (Microsoft Research Asia)

T6: State-of-the-art NLP Approaches to Coreference Resolution: Theory and
Practical Recipes

Date/Time: Afternoon, 2 Aug 2009
Presenter/organisers: Simone Paolo Ponzetto (University of Heidelberg) and
Massimo Poesio (Universita' di Trento)

Please send inquiries concerning ACL-IJCNLP 09 tutorials to
tutorials-acl09 "at" sussex "dot" ac "dot" uk

Tutorials Co-Chairs

Diana McCarthy, University of Sussex, UK
Chengqing Zong, Institute of Automation, Chinese Academy of Sciences
(CASIA), China

Corpora mailing list
Corpora <at> uib.no

pcomp | 4 May 02:33 2009

PsychocompLA-2009 *second* Call for papers

************************ Second Call for Short 

Psychocomputational Models of Human Language 

July 28th & 29th at CogSci 2009 - Amsterdam, 

Submission Deadline: May 15, 2009


Workshop Topic:

The workshop is devoted to psychologically-motivated 
models of
language acquisition. That is, models which are 
compatible with
research in
psycholinguistics, developmental psychology and 

Invited Speakers:

* Tom Griffiths, University of California, Berkeley
* Hinrich Scheutze, University of Stuttgart (TBC)

Workshop History:
This is the fifth meeting of the Psychocomputational 
Models of Human
Acquisition workshop following PsychoCompLA-2004, 
held in Geneva,
Switzerland as
part of the 20th International Conference on 
Computational Linguistics
2004), PsychoCompLA-2005 as part of the 43rd Annual 
Meeting of the
for Computational Linguistics (ACL-2005) held in Ann 
Arbor, Michigan
where the
workshop shared a joint session with the Ninth 
Conference on
Natural Language Learning (CoNLL-2005), 
PsychoCompLA-2007 held in
Tennessee as part of the 29th meeting of the Cognitive 
Science Society
2007), and PsychoCompLA-2008 held in Washington 
D.C., as part of the
30th meeting
of the Cognitive Science Society (CogSci-2008). Given 
the increasing
this year the workshop will be spread over two days 
directly before
the main
conference of the 31st meeting of the Cognitive Science 
which begins on July 30th, 2009.

Workshop Description:

The workshop will present research and foster discussion 
psychologically-motivated computational models of 
acquisition, with an
emphasis on the acquisition of syntax. In recent decades 
there has
been a
thriving research agenda that applies computational 
techniques to
emerging natural language technologies and many 
meetings, conferences
workshops in which to present such research. However, 
there have been
only a few
(but growing number of) venues in which 
psychocomputational models of
how humans
acquire their native language(s) are the primary focus.
Psychocomputational models of language acquisition are 
of particular
interest in
light of recent results in developmental psychology that 
suggest that
very young
infants are adept at detecting statistical patterns in an 
input stream.
Though, how children might plausibly apply statistical 
'machinery' to
the task
of grammar acquisition, with or without an innate 
language component,
remains an
open and important question. One effective line of 
investigation is to
computationally model the acquisition process and 
between a model and linguistic or psycholinguistic 
theory, and/or
between a model's performance and data from linguistic 
children are exposed to.

Topics and Goals:

Short papers that present research on (but not 
necessarily limited
to) the following topics are welcome:

* Models that address the acquisition of word-order;
* Models that combine parsing and learning;
* Formal learning-theoretic and grammar induction 
models that
incorporate psychologically plausible constraints;
* Comparative surveys that critique previously reported
* Models that have a cross-linguistic or bilingual 
* Models that address learning bias in terms of innate
linguistic knowledge versus statistical regularity in the
* Models that employ language modeling techniques 
from corpus
* Models that employ techniques from machine learning;
* Models of language change and its effect on language
acquisition or vice versa;
* Models that employ statistical/probabilistic grammars;
* Computational models that can be used to evaluate 
linguistic or developmental theories (e.g., principles &
parameters, optimality theory, construction grammar, 
* Empirical models that make use of child-directed 
corpora such

This workshop intends to bring together researchers 
from cognitive
computational linguistics, other computer/mathematical 
linguistics and
psycholinguistics working on all areas of language 
Diversity and
cross-fertilization of ideas is the central goal.

Workshop Organizers:
Rens Bod, University of Amsterdam (rens.bod at uva.nl)
William Gregory Sakas, City University of New York  
(sakas at
Workshop Co-Organizer:
Taylor Cassidy, City University of New York
(Pyscho.Comp <at> hunter.cuny.edu)

Submission details:

Authors are invited to submit short papers of 
(maximally) 2 pages of
plus 2 pages for data, references and other 
supplementary materials.
should be anonymous, clearly titled and the narrative 
section should
be no more
than 1400 words in length. Either PDF, or MS Word 
formats are
acceptable. Please
include a cover sheet (as a separate attachment) 
containing the title
of your
submission, your name, contact details and affiliation. 
Send your
electronically to

Email: Psycho.Comp <at> hunter.cuny.edu.
    with  PsychoCompLA-2009 Submission  somewhere in 
the subject


The accepted papers will appear in the online workshop 
Full papers
of accepted short papers will be considered in Fall 2009 
for inclusion
in an
issue of the new Cognitive Science Society Journal - 
topiCS - whose
focus will
be psychocomputational modeling of human language 

Submission deadline: May 15, 2009

Contact: Psycho.Comp <at> hunter.cuny.edu
      with  PsychoCompLA-2009  somewhere in the 
subject line.

Corpora mailing list
Corpora <at> uib.no

Mario Crespo Miguel | 4 May 13:43 2009

Speech Recognition System for Spanish

Dear colleagues,

I wonder if someone could help me out with finding an available speech recognition for system Spanish ;) (it is desirable that it does not require too much training). Thank you very much in advance for help. 

All the best,

Mario Crespo Miguel

University of Cádiz
Escuela Superior de Ingeniería
C/ Duque de Nájera, 16
11002 Cádiz

Corpora mailing list
Corpora <at> uib.no
Pierre Zweigenbaum | 4 May 16:58 2009

CFP: ACL-IJCNLP-2009 Workshop on Comparable Corpora, May 15th

		       Extended call for papers
	     New deadline for submission: May 15th, 2009

	2nd Workshop on Building and Using Comparable Corpora:
		from parallel to non-parallel corpora

			   ACL-IJCNLP 2009

	August 6th, 2009
	Suntec, Singapore

	  short papers: 4 pages
	  long papers: 8+1 pages


 Following the success of the first Workshop on Building and Using
 Comparable Corpora
 <http://www.limsi.fr/~pz/lrec2008-comparable-corpora/> at LREC 2008,
 this workshop aims to bring together language engineers as well as
 linguists interested in the constitution and use of comparable
 corpora, ranging from parallel to non-parallel corpora. In the larger
 context of the joint ACL-IJCNLP, this workshop aims to solicit
 contributions from researchers in different geographical regions, in
 order to highlight in particular the issues with comparable corpora
 across languages that are very different from each other, such as
 across Asian and European languages. Research in minority languages
 is also of particular interest.


 Research in comparable corpora has been motivated by two main reasons
 in the language engineering and the linguistics communities. In
 language engineering, it is chiefly motivated by the need to use
 comparable corpora as training data for statistical NLP applications
 such as statistical machine translation or cross-lingual retrieval.
 In linguistics, on the other hand, comparable corpora are of interest
 themselves in providing intra-linguistic discoveries and
 comparisons. It is generally accepted in both communities that
 comparable corpora are documents in one to many languages, that are
 comparable in content and form in various degrees and dimensions. It
 was pointed out that parallel corpora are at one end of the spectrum
 of comparability whereas quasi-comparable corpora are at the other
 end. We believe that the linguistic definitions and observations in
 comparable corpora can improve methods to mine such corpora for
 applications to statistical NLP. As such, it is of great interest to
 bring together builders and users of such corpora.

 Parallel corpora are a key resource as training data for statistical
 machine translation, and for building or extending bilingual lexicons
 and terminologies. However, beyond a few language pairs such as
 English-French or English-Chinese and a few contexts such as
 parliamentary debates or legal texts, they remain a scarce resource,
 despite the creation of automated methods to collect parallel corpora
 from the Web. Interests in non-parallel forms of comparable corpora
 in language engineering primarily ensued from the scarcity of
 parallel corpora. This has motivated research into the use of
 comparable corpora: pairs of monolingual corpora selected according
 to the same set of criteria, but in different languages or language
 varieties. Non-parallel yet comparable corpora overcome the two
 limitations of parallel corpora, since sources for original,
 monolingual texts are much more abundant than translated
 texts. However, because of their nature, mining translations in
 comparable corpora is much more challenging than in parallel
 corpora. What constitutes a good comparable corpus, for a given task
 or per se, also requires specific attention: while the definition of
 a parallel corpus is fairly straightforward, building a non-parallel
 corpus requires control over the selection of source texts in both

 With the advent of online data, the potential for building and
 exploring comparable corpora is growing exponentially. Comparable
 documents in languages that are very different from each other pose
 special challenges as very often, the non-parallel-ness in sentences
 can result from cultural and political differences.


 Kenneth Ward Church (Microsoft Research, Redmond)


 We solicit contributions in but not limited to the following topics:

 * Building Comparable Corpora
     - Human translations
     - Automatic and semi-automatic methods
     - Methods to mine parallel and non-parallel corpora from the Web
     - Tools and criteria to evaluate the comparability of corpora
     - Parallel vs non-parallel corpora, monolingual corpora
     - Rare and minority languages
     - Across language families
     - Multi-media/multi-modal comparable corpora
 * Applications of Comparable Corpora
     - Human translations
     - Language learning
     - Cross-language information retrieval & document categorization
     - Bilingual projections
     - Machine translation
     - Writing assistance
 * Mining from Comparable Corpora
     - Extraction of parallel segments or paraphrases from
       comparable corpora
     - Extraction of bilingual and multilingual translations of
       single words and multi-word expressions; proper names, named
       entities, etc.


 May 15, 2009	Paper submissions
 Jun 1, 2009	Notification of acceptance
 Jun 7, 2009	Camera-ready copies due
 Aug 6, 2009	Workshop date


 Authors are invited to submit papers on original, unpublished work in
 the topic area of this workshop. We invite the presentation of:

    * Long papers should present completed work and should not exceed
      8 pages (plus one page of references);
    * Short papers can present work in progress (4 pages including

 Please use the official style files for ACL/IJCNLP 2009 available at:

 Submission site:



 Pascale Fung, Hong Kong University of Science & Technology (HKUST)
 Pierre Zweigenbaum, LIMSI-CNRS (France)
 Reinhard Rapp, University of Mainz (Germany)
   and University of Tarragona (Spain)


 Hamdulla Askar(Xinjiang University, China)
 Srinivas Bangalore (AT&T Labs, US)
 Lynne Bowker (University of Ottawa, Canada)
 Éric Gaussier (Université Joseph Fourier, Grenoble, France)
 Gregory Grefenstette (Exalead, Paris, France)
 Hitoshi Isahara (National Institute of Information and Communications
 Technology, Japan)
 Min-Ye Kan (National University of Singapore)
 Adam Kilgarriff (Lexical Computing Ltd)
 Philippe Langlais (Université de Montréal, Canada)
 Rada Mihalcea (University of North Texas, US)
 Dragos Stefan Munteanu (Language Weaver, Inc., US)
 Grace Ngai (Hong Kong Polytechnic University, Hong Kong)
 Carol Peters (ISTI-CNR, Pisa, Italy)
 Serge Sharoff (University of Leeds, UK)
 Richard Sproat (OGI School of Science & Technology, US)
 Mandel Shi (Xiamen University, China)
 Yujie Zhang (National Institute of Information and Communications
 Technology, Japan)


 Ricky Chan Ho Yin, Hong Kong University of Science & Technology

Corpora mailing list
Corpora <at> uib.no

Sebastian Germesin | 5 May 14:19 2009

Corpus with Disfluency Annotations

Hello list,

I recently develpoed a disfluency detection system based on machine  
learning and I'd like to measure the performance of this system on  
other corpora. It was developed within the AMI project (see http://www.amiproject.org 
) and hence I only have results on AMI data.

So far, I am aware of the ICSI corpus and Switchboard corpus that do  
also have disfluency annotations.

Do other corpora exist, that do have disfluencies annotated? It does  
not necessarly have to be an english corpus, any other language is  
welcome as well :)

Thanks a lot in advance and best regards,

Sebastian Germesin
M.Sc. Sebastian Germesin	
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH (DFKI)
Stuhlsatzenhausweg 3
66123 Saarbruecken

Tel.: +49 681-302-5368
Fax.: +49 681-302-5020

email: sebastian.germesin <at> dfki.de
home: http://www.dfki.de/~germesin

Corpora mailing list
Corpora <at> uib.no

Kiril Simov | 5 May 16:53 2009

Second CFPs: Workshop Adaptation of Language Resources and Technology to New Domains

Adaptation of Language Resources and Technology to New Domains

RANLP 2009 Workshop


It is widely acknowledged that despite the great advances in
Computational Linguistics nowadays, the creation of new
Language Resources (LR) and Language Technology (LT) for a
new domain or task is still quite expensive and
time-consuming. At the same time there are already a lot of
varieties of LR and LT, developed for various languages and
purposes. What happens when new tasks come? Do we have to
develop new resources and technology from the beginning, or
can we re-use or adapt the existent ones? Last, but not
least alternative is to combine both strategies depending on
the task. The first option seems reasonable when richer and
larger data is needed for the new applications. The second
option is justified only if such a resource or technology
does not exist at all, or some new approach is applied. The
third one is the ever ‘compromising’, but also very
realistic option.
As the machine learning techniques have matured enough to
successfully support real applications within various
domains, a new bottleneck became the requirement for large
and adequate training data for input. Thus, the NLP
community faced the question of the relevant LR and LT
adaptation. It concerns the operability between general
domain NLP toolkits and specific domain tasks with respect
to terminology, language, structure, steps of preprocessing
Thus, the Workshop is devoted to various methods for
transferring the linguistic knowledge and supportive
technology from the existing language resources in one
domain into a different one.


- parameters of adaptivity and re-usability of LR and LT
- methods for adaptation of existing NLP resources to specific tasks
- domain specific requirements to the LR and LT
- general domain vs. specific domain processing
- profiling LR
- extrapolation of richer annotations to large data
- evaluation of adapted LR and LT


Núria Bel, Pompeu Fabra University
Erhard Hinrichs, Tuebingen University (co-chair)
Petya Osenova, Bulgarian Academy of Sciences and Sofia University
Kiril Simov, Bulgarian Academy of Sciences (co-chair)

Invited speaker

Jun'ichi Tsujii, University of Tokyo and University of Manchester - NacTeM

Submission details

Authors are invited to submit an extended abstract up to 800
words. Abstracts should describe existing research connected
to the topics of the workshop. The following formats are
accepted: PDF, PS, MS Word, ASCII text. Each submission
should provide the following information: title; author(s);
affiliation(s); and contact author's e-mail address, postal

The abstracts should be sent electronically to:
Petya Osenova
Email: petya <at> bultreebank.org
by the deadline listed below. The submissions will be
reviewed by the workshop's programme committee.

The accepted papers will appear in the workshop proceedings.
The final paper should not exceed 15 A4 pages formatted
according RANLP09 guidelines

Important Dates

Deadline for abstract submission:   7th June 2009
Notification of acceptance              7th July 2009
Final version of the papers              23rd August 2004

Program Committee

Núria Bel, Pompeu Fabra University
Gosse Bouma, Groningen University
António Branco, Lisbon University
Walter Daelemans, Antwerp University
Markus Dickinson, Indiana University
Erhard Hinrichs, Tuebingen University
Josef van Genabith, Dublin City University
Iryna Gurevych, Technische Universität Darmstadt - UKP Lab
Atanas Kiryakov, Ontotext OOD
Vladislav Kubon, Charles University
Sandra Kuebler, Indiana University
Lothar Lemnitzer, DWDS, Berlin-Brandenburgische Akademie der Wissenschaften
Bernardo Magnini, FBK
Detmar Meurers, Tuebingen University
Paola Monachesi, Utrecht University
Preslav Nakov, National University of Singapore
John Nerbonne, Groningen University
Petya Osenova, Bulgarian Academy of Sciences and Sofia University
Gabor Proszeky, MophoLogic
Adam Przepiorkowski, Polish Academy of Sciences
Marta Sabou, Open University - UK
Kiril Simov, Bulgarian Academy of Sciences
Cristina Vertan, Hamburg University 

Corpora mailing list
Corpora <at> uib.no
Adam Kilgarriff | 5 May 23:27 2009

Lexicom Workshop, Brno 15-19 June 2009

Lexicom: a Workshop in Lexicography and Lexical Computing 
    Brno, Czech Republic, 15-19 June 2009    
Led by Sue Atkins, Adam Kilgarriff and Michael Rundell of the Lexicography MasterClass, this is an intensive five-day workshop, with seminars on theoretical issues alternating with practical sessions at the computer. There will be some parallel 'lexicographic' and 'computational' sessions.
*          corpus creation
*          corpus analysis:
            o        software and corpus querying
            o        discovering word senses, recording contextual information
*          preparing word sketches
*          writing entries for dictionaries and lexicons
*          dictionary databases and writing systems
*          using web data
Applications are invited from people with interests and experience in any of these areas. 

Over the last eight years Lexicom workshops (in Europe, Asia and the Americas) have attracted 270 participants from 34 countries, including lexicographers, computational linguists, professors, research students, translators, terminologists, and editors, managers and technical support staff from dictionary publishers and information-management companies.  
Workshop website, registration: http://nlp.fi.muni.cz/lexicom2009-europe  
Draft programme and reports of past events: http://www.lexmasterclass.com
Sue Atkins, Michael Rundell, Adam Kilgarriff
The Lexicography MasterClass

Adam Kilgarriff                                      http://www.kilgarriff.co.uk              
Lexical Computing Ltd                   http://www.sketchengine.co.uk
Lexicography MasterClass Ltd      http://www.lexmasterclass.com
Universities of Leeds and Sussex       adam <at> lexmasterclass.com
Corpora mailing list
Corpora <at> uib.no