dychen | 2 Jul 04:21 2005

For list of multi-word units

Dear All,
I am looking for a list or database of English Multi-word units (including phrases, idioms, compounds, etc), which is freely available for research.Can anybody  direct me to such resources? Any references will be appreciated.
Best Regards

Bernice Nuhfer-Halten | 2 Jul 06:23 2005

comments on concordances requested

I have compiled six concordances of poetry from Spain and Hispanic America.  I would appreciate any comments that you could send as to the usability in your studies of corpus linguistics, or in your language/literature classes.  Any suggestions would also be greatly appreciated.  The URL is: http://www.spsu.edu/sis/nuhfer-halten/concordance.html
Thanks in advance.

Bernice M. Nuhfer-Halten, Ph.D,
Associate Professor of Spanish and French
Department of Social and International Studies
Southern Polytechnic State University
Marietta, GA 30060
678.915.4949 fax
Compiler of Concordances of Spanish and Hispanic American Poetry
Andre Halama | 2 Jul 19:59 2005

Re: For list of multi-word units

dychen wrote:

> I am looking for a list or database of English Multi-word units
> (including phrases, idioms, compounds, etc), which is freely available
> for research.

Here is a link to the list of multiword tokens used in BNC2:



Stelios Piperidis | 4 Jul 15:12 2005

CfP : RANLP2005 Workshop "Language and Speech Infrastructure for Information Access in the Balkan Countries"

* Apologies for multiple postings! *


Language and Speech Infrastructure for Information Access in the Balkan

Workshop to be held in conjunction with RANLP 2005

25 September 2005

Borovets, Bulgaria

The emerging convergence of internet and media technologies, the abundance
of interesting and useful archived and contemporary content and the ever
increasing need for access to this content has set new challenges and
opportunities for human language technologies (HLT). Useful results from the
application of HLT for information access in a number of widely used
languages already exist, while only some first attempts have been made in
lesser used languages.

At the same time, the enlargement process of the European Union and the
forthcoming accession of a number of Balkan countries set new requirements
for technologically enhanced access to information generated and consumed in
the Balkan Peninsula.

The current workshop aims at bringing together researchers working in a
broad range of human language technologies for processing single- and/or
multimedia content.  The focus of the workshop is on issues relevant to the
respective linguistic infrastructure in the Balkan countries.
Submissions should be related to language and speech resources, tools and
applications developed for the languages of the Balkans. More specifically,
topics include but are not limited to:

·	grammars, lexicons and wordnets
·	information retrieval & extraction
·	text summarization
·	question answering
·	natural language generation
·	machine translation and computer-aided translation tools
·	speech synthesis and recognition
·	dialogue systems
·	multimedia information processing
·	computer-assisted language learning
·	architectures and development environments
·	evaluation of resources and language processing components and systems

The official language of the workshop is English.

Submissions should be A4, two-column format and should not exceed seven
pages, including cover page, figures, tables and references. Times New Roman
12 pt font is preferred. The first page should state the title of the paper,
the author's name(s), affiliation(s), postal and email address(es), a list
of keywords and an abstract. Papers should be submitted electronically in
PDF format to Stelios Piperidis, spip <at> ilsp.gr

Each paper will be reviewed by up to three members of the programme
committee. Authors of accepted papers will receive guidelines regarding how
to produce camera-ready version of their papers for inclusion in the

Parallel submissions to the main conference and the workshop are allowed but
the review process will be coordinated.

Paper submission deadline: 20 July 2005
Notification of acceptance: 2 September 2005
Camera ready papers due: 8 September 2005
Workshop date: 25 September 2005

Stelios Piperidis, Institute for Language and Speech Processing, GR
Elena Paskaleva, Bulgarian Academy of Sciences, BG

Galia Angelova, Bulgarian Academy of Sciences, BG
Kalina Bontcheva, University of Sheffield, UK
Dan Cristea, "Alexandru Ioan Cuza" University of Iasi, RO
Tomas Erjavec, Jozef Stefan Institute, SL
Bojana Gajic, Norwegian University of Science and Technology, NO
Maria Gavrilidou, Institute for Language and Speech Processing, GR
Florentina Hristea, University of Bucarest, RO
Vangelis Karkaletsis, NCSR Demokritos, GR
Rada Mihalcea, University of  North Texas, US
Ruslan Mitkov, University of Wolverhampton, UK
Ivan Obradovic, University of Belgrade, SR
Kemal Oflazer, Sabanci University, TR
Petya Osenova, Sofia University "St' Kliment Ohridski", BG
Harris Papageorgiou, Institute for Language and Speech Processing, GR
Elena Paskaleva, Bulgarian Academy of Sciences, BG
Katerina Pastra, Institute for Language and Speech Processing, GR
Bojan Petek, University of Ljubljana, SL
Stelios Piperidis, Institute for Language and Speech Processing, GR
Kiril Simov, Bulgarian Academy of Sciences, BG
Sofia Stamou, Computer Technology Institute, GR
Marko Tadic, University of Zagreb, HR
Dan Tufis, Romanian Academy of Sciences, RO
Dusko Vitas, University of Belgrade, SR

For further information, please contact Stelios Piperidis, spip <at> ilsp.gr or
Elena Paskaleva, hellen <at> lml.bas.bg.

Wanxiang Che | 5 Jul 07:30 2005

How can I get the Chinese PropBank

Dear all,

Who can tell me how can I get the latest Chinese PropBank version.

Thanks a lot!


Carl (Wanxiang Che)
wanxiang <at> gmail.com

eneko agirre | 5 Jul 11:11 2005

GWA-06, Korea Extended deadline

  			3rd Announcement and final Call for Papers
	3rd International Conference of the Global WordNet Association
			Jeju Island, Republic of Korea

The Global Wordnet Association is pleased to announce the Third
International Conference of the Global WordNet Association (GWA'06),
organized by KAIST.
The conference will be held on Jeju Island, Korea, January, 23-27, 2006.

More information can be found on the GWA website:


We invite papers addressing the questions listed below.
Proposals for tutorials on building wordnets and demonstrations
of wordnet databases and wordnet-based software on any issue related
to wordnets are welcome, too.

A.      Linguistics and WordNet:
         a.        In depth analysis of Semantic Relations,
         b.        Theoretical definitions of word meaning,
         c.        Necessity and Completeness issues.

B.      Architecture of WordNet:
         a.        Language independent and language dependent components

C.      Tools and Methods for Wordnet Development:
         a.        User and Data entry interface, organization,
         b.        Extending and enriching wordnets
         c. Integrating WordNet with other linguistic resources
            (SUMO, Top ontologies, VerbNet, FrameNet, etc.)

D.      WordNet as a lexical resource and component of NLP and MT:
         a.        Word sense disambiguation using wordnet(s)
         b.        Ontologies and WordNet(s)
         c.        The Lexicon and WordNet(s)
         d.        Semantic interpretation and WordNet(s)

E.      Applications of WordNet:
         a.        Information Extraction and Retrieval,
         b.        Document Structuring and Categorization,
         c.        Automatic Hyperlinking
         d.        Language Teaching,
         e.    	  Psycholinguistic Applications
F.      Standardization, distribution and availability of wordnets and =
wordnet tools.

Presentations will fall into one of the following categories:

--long papers (30 mins)
--short papers (15 mins)
--project reports (10 mins)
--demonstrations (20 mins)

Submissions will have to state the preferred format. Acceptance may
be subject to changes in the format or length of the presentation.
(E.g., a long paper submission may be accepted as a short paper.)

Final papers should be submitted in electronic form (Postscript or PDF)
Long papers should contain approximately 7,500 words (~ 15 pages);
short papers and demonstrations should be approx. 4,000 words long.
Project reports will be limited to approx. 1,000 words (2 pages of text).
Send papers to
Christiane Fellbaum (fellbaum <at> princeton.edu)
Piek Vossen (piek.vossen <at> irion.nl)

The extented deadline for submissions is July 15th, 2005. Decisions
acceptance will be announced to the authors in early September.

We anticipate publishing the Proceedings both in paper and CD format.

Conference Chairs:
	Piek Vossen and Christiane Fellbaum

Local Organizing Committee Chair:

Key-Sun Choi (kschoi <at> cs.kaist.ac.kr)
Dept. of Computer Science
Bank of Language Resources
Korea Terminology Research Center for Language & Knowledge Engineering

Program Committee (as of March 7)

Eneko Agirre (Donostia, Spain)
Antonietta Alonge (Perugia, Italy)
Pushpak Bhattacharyya (Mumbai, India)
Bill Black (Manchester, England)
Paul Buitelaar (Saarbruecken, Germany)
Key-Sun Choi (KAIST, Korea)
Salvador Climent (Barcelona, Spain)
Bento Diaz (Sao Paolo, Brazil)
Christiane Fellbaum (Princeton, U.S.A./Berlin (Germany)
Julio Gonzalo (Madrid, Spain)
Ben Haskell (Princeton, U.S.A.)
Andreas Hotho (Kassel, Germany)
Chu Ren Huang (Taipei, Republic of China)
Farhad Keyvan (New Jersey, USA)
Adam Kilgarriff (England)
Claudia Kunze (Tuebingen, Germany)
Lothar Lemnitzer (Tuebingen, Germany)
Birte Loenneker (Hamburg , Germany)
Bernardo Magnini (Trento, Italy)
Palmira	Maraffa (Lisabon, Portugal)
Toni Marti (Barcelona, Spain)
David Martinez (Donostia, Spain)
Rada Mihalcea (Texas, U.S.A.)
Karel Pala (Brno, Czech Republic)
Adam Pease (U.S.A.)
Ted Pedersen (Minnesota, U.S.A.)
German Rigau (Donostia, Spain)
Horacio Rodriguez (Barcelona, Spain
Sofia Stamou (Patras, Greece)
Mark Stevenson (Sheffield, England)
Dan Tufis (Bucarest, Romania)
Serge Yablonsky (Russia)
Kadri Vider (Tartu, Estonia)
Piek Vossen (Netherlands)
Shuly Wintner (Haifa, Israel)

Important dates:

07-2005   Deadline for paper submission, extended to July 15th
09-2005   Author notification
10-2005   Final Papers due
11-2005   Registration is open
01-2006   Conference

David Reitter | 5 Jul 13:17 2005

Communicator corpora parsed?

Hi -

is anyone aware of syntactic annotations of the (e.g. DARPA)  
Communicator corpus, or similar large, task-oriented human/machine or  
human/human dialogue corpora?
I'm looking for tree structures, and atomic categories such as VP or  
PP would do just fine. I could work with non-perfect (i.e. machine- 
parsed) annotations.

Generally I'd be grateful for tips regarding larger spoken dialogue  
corpora (task-oriented dialogue) that have been syntactically annotated.


David Reitter - ICCS/HCRC, Informatics, University of Edinburgh

ELDA | 5 Jul 14:44 2005


[Apologies for multiple postings]

LREC 2006

5th Conference on Language Resources and Evaluation


Magazzini del Cotone Conference Center,  GENOA - ITALY

MAIN CONFERENCE: 24-25-26 MAY 2006

WORKSHOPS and TUTORIALS: 22-23 and 27-28 MAY 2006

Conference web site: http://www.lrec-conf.org

The fifth international conference on Language Resources and Evaluation, 
LREC 2006, is organised by
ELRA in cooperation with a wide range of international associations and 


In the Information Society, the pervasive character of Human Language 
Technologies (HLT) and their
relevance to practically all fields of Information Society Technologies 
(IST) has been widely recognised.
Two issues are considered particularly relevant: the availability of 
Language Resources (LRs) and the
methods for the evaluation of resources, technologies, products and 
applications. Substantial mutual
benefits are achieved by addressing these issues through international 

The term language resources refers to sets of language data and 
descriptions in machine readable form,
such as written or spoken corpora and lexica, annotated or not, multimodal 
resources, grammars,
terminology or domain specific databases and dictionaries, ontologies, 
multimedia databases, etc. LRs
also cover basic software tools for their acquisition, preparation, 
collection, management, customisation
and use. LRs are used in many types of components/systems/applications, 
such as software localisation
and language services, language enabled information and communication 
services, knowledge
management, e-commerce, e-publishing, e-learning, e-government, cultural 
heritage, linguistic studies,
etc.. This large range of usages makes the LRs infrastructure a strategic 
part of the e-society, where the
creation of a basic set of LRs for all languages must be ensured in order 
to bring all languages to the same
level of usability and availability.
The relevance of the evaluation for language technologies development is 
increasingly recognised. This
involves assessing the state-of-the-art for a given technology, measuring 
the progress achieved within a
programme, comparing different approaches to a given problem, assessing the 
availability of technologies
for a given application, product benchmarking, and assessing system 
usability and user satisfaction.

The aim of the LREC conference is to provide an overview of the 
state-of-the-art, explore new R&D
directions and emerging trends, exchange information regarding LRs and 
their applications, evaluation
methodologies and tools, ongoing and planned activities, industrial uses 
and needs, requirements coming
from the new e-society, both with respect to policy issues and to 
technological and organisational ones.
LREC provides a unique forum for researchers, industrials and funding 
agencies from across a wide
spectrum of areas to discuss problems and opportunities, find new synergies 
and promote initiatives for
international cooperation in the areas mentioned above, in support to 
investigations in language sciences,
progress in language technologies and development of corresponding 
products, services and applications.


Examples of the topics which may be addressed by papers submitted to the 
conference are given below.

Issues in the design, construction and use of Language Resources (LRs)
*	Methodologies and tools:
*	Guidelines, standards, specifications, models and best practices for LRs.
*	Methods, tools, procedures for the acquisition, creation, annotation, 
management, access,
	distribution, use of monolingual and multilingual LRs.
*	Methods for the extraction and acquisition of knowledge (e.g. terms, 
ontologies, lexical
	information, language modelling) from LRs, and knowledge transfer among 
*	Definition and requirements for a Basic and Extended LAnguage Resource 
	ELARK) for all languages.
*	Documentation and archiving of languages, including minority and 
endangered languages.
*	LRs for linguistic research in human-machine communication.

*	LRs construction & annotation:
*	Metadata descriptions of LRs and metadata for semantic/content markup.
*	Ontologies and knowledge representation, especially with respect to HLT.
*	Terminology and NLP tools and methodologies for terminology and ontology 
building or
	mapping, term extraction, domain-specific dictionaries.
*	LRs for machine translation.
*	LRs for ubiquitous processing.
*	Availability and use of generic vs. task/domain specific LRs.
*	Multimedia and Multimodal LRs - Integration of various media and 
modalities in LRs
	(speech, vision, language).

*	LRs exploitation:
*	Industrial production of LRs.
*	Industrial LRs requirements, user needs and community's response.
*	Exploitation of LRs in different types of applications (information 
extraction, information
	retrieval, speech dictation, translation, summarisation, web services, 
semantic web, semantic
	search, text mining, inferencing, etc.).
*	Exploitation of LRs in different types of interfaces (dialogue systems, 
natural language and
	multimodal/multisensorial interactions, etc.).

Issues in Human Language Technologies (HLT) evaluation
*	Methodologies, tools and standardisation:
*	Evaluation, validation, quality assurance of LRs.
*	Evaluation methodologies, protocols and measures.
*	Benchmarking of systems and products, resources for benchmarking and 
	blackbox, glassbox and diagnostic evaluation of systems.
*	From evaluation to standardisation.
*	User centered design tools and methods.
*	Evaluation of ontologies and knowledge bases by means of LR-related 
*	Evaluation in written language processing: (document production and 
management, text
	retrieval, terminology extraction, message understanding, text alignment, 
	translation, morphosyntactic tagging, parsing, semantic tagging, word 
sense disambiguation,
	text understanding, summarization, question answering, localization, etc.).
*	Evaluation in spoken language processing: (speech recognition and 
understanding, voice
	dictation, oral dialogue, speech synthesis, speech coding, speaker and 
language recognition,
	spoken translation, etc.).
*	Evaluation of multimedia document retrieval and search systems (including 
	indexing, filtering, alert, question answering, etc.).
*	Evaluation of multimodal systems.

*	Usability evaluation of HLT based user Interfaces:
*	Usability and user satisfaction evaluation.
*	Psychophysical and cognitive evaluation.
*	User experience assessment.
*	Heuristic evaluation.
*	Multimodal interaction evaluation.
*	Evaluation of usability in mobile services/applications, etc.

General issues
*	National and international activities and projects.
*	Open architectures for LRs.
*	LRs and the needs/opportunities of the emerging industries.
*	LRs and contributions to societal needs (e.g. e-society).
*	Priorities, perspectives, strategies in national and international 
policies for LRs.
*	Needs, possibilities, forms, initiatives of/for international 
cooperation, and their organisational
	and technological implications.
*	Organisational, economical and legal issues in the construction, 
distribution, access and use of

Special Highlights

LREC targets the integration of different types of LRs (spoken, written, 
and other modalities), and of the
respective communities. To this end, LREC encourages submissions covering 
issues which are common
to different types of LRs and language technologies, such as dialogue 
strategy, written and spoken
translation, domain-specific data, multimodal communication or multimedia 
document processing, and
will organise, in addition to the usual tracks, common sessions 
encompassing the different areas of LRs.

The 2006 Conference emphasises in particular the importance of promoting:
-	synergies and integration between (multilingual) LRs and Semantic Web 
-	new paradigms for sharing and integrating LRs and LT coming from 
different sources,
-	communication with neighbouring fields for applications in e-government 
and administration,
-	common evaluation campaigns for the objective evaluation of the 
performances of different
-	systems and products (also industrial ones) based on large-size and high 
quality LRs.

LREC therefore encourages submissions of papers, panels, workshops, 
tutorials on the use of LRs
in these areas.


The Scientific Programme will include invited talks, oral presentations, 
poster presentations, peer-
reviewed demonstrations and panels.
There is no difference in quality between oral presentations and poster 
presentations. Only the
appropriateness of the type of communication (more or less interactive) to 
the content of the paper
will be considered.


Submitted abstracts of papers for oral and poster or demo presentations 
should consist of about 1000

A limited number of panels, workshops and tutorials is foreseen: proposals 
will be reviewed by the
Programme Committee.

For panels, please send a brief description, including an outline of the 
intended structure (topic, organiser,
panel moderator, tentative list of panelists).

For workshops and tutorials, see the dedicated section below.

Only electronic submissions will be considered. Further details about 
submission will be circulated in
the 2nd Call for Papers to be issued at the end of July and posted on the 
LREC web site (www.lrec-


The Proceedings of the conference will include both oral and poster papers.
Printed Proceedings will be published only on demand. Proceedings on CD 
will be provided to all.
In addition a book of Abstracts will be printed.


*	Submission of proposals for panels, workshops and tutorials: 14 October 2005
*	Submission of proposals for oral and poster papers, referenced demos: 14 
October 2005
*	Notification of acceptance of panels, workshops and tutorials proposals: 
7 November 2005
*	Notification of acceptance of oral papers, posters, referenced demos:  16 
January 2006
*	Final versions for the proceedings:  20 February 2006
*	Conference: 24-26 May 2006
*	Pre-conference workshops and tutorials: 22 and 23 May 2006
*	Post-conference workshops and tutorials: 27 and 28 May 2006

Internet connections and various computer platforms and facilities will be 
available at the conference site.
In addition to referenced demos concerning LRs and related tools, it will 
be possible to run unreferenced
demos of language engineering products, systems and tools. Those interested 
should contact the organiser
of the demonstrations (details will be posted on www.lrec-conf.org).


Pre-conference workshops and tutorials will be organised on 22 and 23 May 
2006, and post-conference
workshops and tutorials on 27 and 28 May 2006. A workshop/tutorial can be 
either half day or full day.
Proposals for workshops and tutorials should be no longer than three pages, 
and include:
*	A brief technical description of the specific technical issues that the 
workshop/tutorial will
*	The reasons why the workshop/tutorial is of interest this time.
*	The names, postal addresses, phone and fax numbers and email addresses of 
	workshop/tutorial organising committee, which should consist of at least 
three people
	knowledgeable in the field, coming from different institutions.
*	The name of the member of the workshop/tutorial organising committee 
designated as the
	contact person.
*	A time schedule of the workshop/tutorial and a preliminary programme.
*	A summary of the intended workshop/tutorial call for participation.
*	A list of audio-visual or technical requirements and any special room 

The workshop/tutorial proposers will be responsible for the organisational 
aspects (e.g. workshop/tutorial
call preparation and distribution, review of papers, notification of 
acceptance, assembling of the
workshop/tutorial proceedings, etc.). Further details about submission will 
be circulated in the 2nd Call for
Papers and posted on the LREC web site (www.lrec-conf.org).

Proceedings will be produced for each workshop/tutorial.


Consortia or projects wishing to take this opportunity for organising 
meetings should contact the ELDA
office, lrec <at> elda.org (further details are given at the end of the document).


Nicoletta Calzolari, Istituto di Linguistica Computazionale del CNR, Pisa, 
Italy (Conference chair)
Khalid Choukri, ELRA, Paris, France
Aldo Gangemi, Istituto di Scienze e Tecnologie della Cognizione del CNR, 
Roma, Italy
Bente Maegaard, CST, University of Copenhagen, Denmark
Joseph Mariani, LIMSI-CNRS, Orsay, France
Jan Odijk, ScanSoft, Merelbeke, Belgium and UIL-OTS, Utrecht, The Netherlands
Daniel Tapias, Telefonica Moviles, Madrid, Spain

The composition of the committees as well as instructions and addresses for 
registration and
accommodation will be detailed on the LREC web site at www.lrec-conf.org 
and will be announced in the
2nd Call for Papers.


For more information about ELRA (European Language Resources Association), 
please contact:

Khalid Choukri, ELRA CEO
55-57 Rue Brillat-Savarin,
75013 Paris - France
Tel: + 33 1 43 13 33 33
Fax: + 33 1 43 13 33 30
Email: choukri <at> elda.org
Web: http://www.elra.info or http://www.elda.org/

The first LREC was organised in Granada (Spain) in 1998: 197 papers and 
posters were presented, with
about 510 registered participants from 38 different countries from all 
continents. Among these, the largest
group came from Spain (81 participants), followed by France (75), USA (73), 
Germany (47), UK (43)
and Italy (41). Registered participants belonged to over 325 different 

LREC 2000, in Athens, had 129 oral papers and 152 posters presented, with 
around 600 participants from
51 different countries from all continents. Among these, the largest group 
came from Greece (117),
followed by USA (70), France (59), Germany (45), UK (43), Japan (35) and 
Italy (29). Registered
participants belonged to 319 different organisations.

LREC 2002, which took place in Las Palmas de Gran Canaria (Spain), 
attracted over 700 representatives,
coming from 38 countries around the world. The following figures illustrate 
how successful it proved to
be: for the main conference, 460 papers had been submitted and reviewed, of 
which 365 were presented at
the conference. Most of the areas in HLT were covered (about 280 papers 
dealt with written resources,
about 100 with spoken resources, 25 with multimodal and multimedia 
resources, around 50 dealt with
evaluation of HLT, and 16 with terminology).

The 4th edition of the Language Resources and Evaluation Conference was 
held in memory of two dear
friends and colleagues we lost in 2003, Angel Martin Municio and Antonio 
LREC 2004, which took place in Lisbon (Portugal), attracted almost 1000 
participants, coming from 50
countries from all the continents. Close to 800 submissions for poster and 
oral presentations were
reviewed by the Scientific Committee: 519 were actually presented, a 
majority dedicated to written
resources (260), 116 dealt with spoken resources, 40 with terminological 
issues, 57 with evaluation, 17
were on general issues, and 29 on multimodal-multimedia ones. In addition, 
a total of 18 satellite
workshops covering various fields were organised before and after the main 
A new award in HLT was launched on that occasion: the ELRA Board created a 
prize for "Outstanding
Contributions to the Advancement of Language Resources and Language 
Technology Evaluation", to
honour the memory of its co-founder and 1st president, Antonio Zampolli. 
The Antonio Zampolli Prize
was awarded for the first time at LREC 2004 to Fredrick Jelinek, from John 
Hopkins University,
Baltimore, USA.

A similar number of participants is expected at LREC 2006.

If you want to know the state-of-the-art in LT and LRs and their 
application in all aspects of
e-society , this is the Conference to go to!

Bryar Family | 6 Jul 14:23 2005

Looking for lexicon/corpus of ambiguous terms.

I'm trying to find out if anyone has build a lexicon or corpus consisting of
highly ambiguous terms, words with multiple , quite diverse meanings in

Trying to test a new disambiguation approach. 

Best regards,

Jack Bryar
MBN Partners

Gaël Dias | 6 Jul 19:50 2005


[French version follows]
[Apologize for multiple postings]

-------------------------CALL FOR PAPERS--------------------------------



              Submission Deadline: 05/09/2005

Gaël Dias, Simão Melo de Sousa and Maxime Crochemore


The global use of Natural Language Processing (NLP) applications depends
crucially upon the proposed systems’ algorithmic efficiency. Current
considerations of how NLP systems will be applied suggest new challenges
that require both theoretical innovations as well as systems that can be
put to real use. The advent of the Web and its huge resources requires
that the field of NLP be increasingly sensitive to the importance of
scalability. Such considerations are not sterile academic issues: rather
they will define the commercial success or failure of future NLP
Unfortunately, there are but few algorithmic solutions that are
sufficiently efficient in both space and time to be able to handle
problems that arise from the explosion of the gigabyte-sized data now
available on the Web. Until now, only the field of Information Retrieval
has been concerned with the definition of algorithms, data structures
and architectures that allow treatments with acceptable response times.
Today, as  NLP moves more and more towards Natural Language Engineering,
it is appropriate to determine the theoretical limits of the problems
which this new discipline raises, as well as the factors that relate to
  system effectiveness, namely complexity and algorithms.
Thus, this Call for Papers aims to bring together communities that are
working on or interested in algorithms, theoretical computer science,
and scalable NLP applications. To this end, we solicit publications that
range from the presentation of theoretical work, to the implementation
of powerful algorithmic solutions, as related to software that is
capable of dealing with huge textual databases.


The following list is non-exhaustive and lists various topics that are
relevant to this call, and which relate to fundamental algorithmic
techniques that are capable of dealing with large textual databases.

- Advanced data structures (suffix trees, suffix arrays, etc).
- Advanced algorithms (search, sorting algorithms, dynamic programming,
- Sequence Algorithms (search, short patterns, repetitions, etc).
- Indexing (search, repetitions, hashing, etc).
- Alignment (linear space, sub-quadratic, etc).
- Automata (finite-state machines, suffix automata, transducers, etc).
- Compression (information theory, fast decompression, compression
transducers, etc).
- Graphs (large graph algorithms, Web graphs, etc).
- Dynamic programming.
- Tabulation.
- Distributed and Parallel systems.
- Grid Computing.
- Complexity (space/time, complexity of parallel algorithms, etc.).
- Theoretical Foundations.

We intend to receive submissions including these techniques in the
classical domains of NLP i.e. morphology, syntax, semantics and
pragmatics. We are also interested by all submissions tackling the
following applications:

- Linguistic Resources Processing (Corpora, Non-linearly Structured
- Lexicon/Thesaurus/Ontology-based NLP,
- Information Retrieval,
- Question-Answering Systems,
- Topic Tracking,
- Information Extraction,
- Text Mining,
- Integrated Systems,
- Collaborative Systems.

The TAL journal (Traitement Automatique des Langues:
http://tal.revuesonline.com/) is an international journal published
since 1960 by the French association ATALA (Association pour le
Traitement Automatique des Langues: http://www.atala.org) with the
collaboration of the CNRS (Centre National de la Recherche Scientifique
: http://www.cnrs.fr/). The journal is published and distributed by
Hermès Lavoisier.


Submitted papers must be no longer than 25 pages, and must be in PDF
format. Style sheets are available online at


Papers may be written in French or English. English submissions are
accepted only for non French-speaking authors.


Submission Deadline: 05/09/2005


The papers must be sent electronically to the following address:
tal2005 <at> di.ubi.pt.


Ricardo Baeza-Yates  (Univeristy of Chile, Santiago, Chile)
Tilman Becker        (DFKI, Saarbrücken, Germany)
Jean Berstel         (University of Marne-la-Vallée, France)
Nieves Brisaboa      (University of La Coroña, Spain)
Maxime Crochemore    (University of Marne-la-Vallée, France)
Gaël Dias            (University of Beira Interior, Covilhã, Portugal)
Patrick Gallinari    (University of Paris 6, France)
Martin Jansche       (Columbia University, New York, USA)
Éric Laporte         (University of Marne-la-Vallée, France)
Thierry Lecroq       (University of Rouen, France)
Gabriel Lopes        (New University of Lisbon, Portugal)
Nuno Mamede          (INESC-ID, Lisbon, Portugal)
Mehryar Mohri        (New York University, USA)
Alexis Nasr          (University of Paris 7, France)
Arlindo Oliveira     (INESC-ID, Lisbon, Portugal)
Ted Pedersen         (University of Minnesota, Duluth, USA)
Dominique Revuz      (University of Marne-la-Vallée, France)
André Salem          (University of Paris 3, France)
Richard Sproat       (University of Illinois, Urbana, USA)
Simão Sousa          (University of Beira Interior, Covilhã, Portugal)
Mikio Yamamoto       (University of Tsukuba, Japan)

-----------------------APPEL À SOUMISSION------------------------------


                NUMERO SPECIAL DE LA REVUE TAL 2005

               Date limite de soumission : 05/09/2005

Gaël Dias, Simão Melo de Sousa et Maxime Crochemore.


L'emploi globalisé des techniques du Traitement Automatique des Langues
(TAL) dans le quotidien des usagers ne se fera que par l’efficacité
algorithmique des systèmes proposés. De fait, les défis propres à ce
domaine nous ammènent à innover autant du point de vue théorique que de
proposer des systèmes qui puissent être également deployés dans un cadre
d'utilisation réelle. En effet, l'avènement des ressources gigantesques
de la Toile aidant, le TAL doit être capable de répondre aux défis posés
par le passage à l’échelle. Ces considérations, loin d'être stériles,
définiront son succès ou son échec commercial.
Malheureusement, il n’existe que peu de solutions algorithmiques
complètes capables de traiter efficacement, en temps et en espace, les
problèmes posés par l’explosion des données disponibles, souvent de
l'ordre du Giga octets comme sur la Toile. Jusqu'à présent, peu de
domaines se sont préoccupés de la définition d'algorithmes, de
structures de données et d’architectures qui permettent des traitements
avec des temps de réponse acceptables.
Aujourd’hui, au moment où le TAL se transforme de plus en plus en
Ingénierie des Langues, il est opportun de cerner au mieux les limites
théoriques des problèmes que soulève cette nouvelle discipline, comme il
est important de se préoccuper des différents facteurs qui pèsent sur
l'efficacité des systèmes proposés c'est-à-dire leur complexité et leurs
Ainsi, cet appel à proposition vise à fédérer les communautés
travaillant ou intéressées par l’algorithmique, l'informatique
fondamentale et le passage à l’échelle d’applications du TAL. Dans ce
sens, nous retiendrons les contributions allant de la présentation de
travaux théoriques à l’implémentation de solutions algorithmiques
performantes dans le cadre de logiciels applicables aux conditions de
grandes masses de données de textuelles.


La liste suivante, non exhaustive, énumère divers thèmes pertinents pour
cet appel et relatifs aux fondements et aux techniques algorithmiques
permettant de traiter de grandes quantités de données textuelles:

- Structures de données avancées (arbres des suffixes, tableau des
suffixes, etc.),
- Algorithmique avancée (méthodes de recherche, algorithmes de tri,
programmation dynamique, etc.),
- Algorithmique des séquences (recherche approchée, motifs courts,
répétitions, etc.),
- Indexation (recherche, répétitions,  hachage, etc.),
- Alignement (en espace linéaire, sous-quadratique, etc.),
- Automates (automates finis, automates des suffixes, transducteurs, etc.),
- Compression (théorie de l’information, décompression rapide,
transducteurs de compressions, etc.),
- Graphes (algorithmique des grands graphes, graphes du web, etc.),
- Programmation dynamique,
- Tabulation,
- Parallélisme et systèmes distribués,
- Grilles de calcul,
- Complexité (en espace et en temps, complexité des algorithmes
parallèles, etc.),
- Fondements théoriques.

Nous attendons des soumissions incluant ces techniques dans les domaines
classiques du TAL que sont l’analyse morphologique, morpho-syntaxique,
syntaxique, sémantique et pragmatique. Nous sommes également intéressés
par les applications suivantes:

- Traitement des Ressources Linguistiques (Corpora, Corpora à Structures
non Linéaires),
- Traitement des Lexiques, Thesaurus, Ontologies,
- Recherche d’Information,
- Systèmes de Question-Réponse,
- Veille Technologique,
- Extraction d’Information,
- Fouille de Données Textuelles,
- Systèmes Intégrés ou Chaînes de Traitement Automatique des Langues,
- Systèmes Collaboratifs.


La revue TAL (Traitement Automatique des Langues :
http://tal.revuesonline.com/) est une revue internationale éditée depuis
1960 par l’ATALA (Association pour le Traitement Automatique des
Langues, http://www.atala.org) avec le concours du CNRS. Elle est
publiée et diffusée par les éditions Hermès Lavoisier.


Les articles (25 pages maximum) seront soumis au format PDF. Les
feuilles de style sont disponibles en ligne sur le site:


Les articles sont écrits en français ou en anglais. Les soumissions en
anglais ne sont acceptées que pour les auteurs non francophones.


Date limite de soumission : 05/09/2005


Les articles doivent être envoyés par voie électronique à l’adresse
suivante: tal2005 <at> di.ubi.pt.


Ricardo Baeza-Yates (Univeristé du Chili, Santiago, Chili)
Tilman Becker       (DFKI, Saarbrücken, Allemagne)
Jean Berstel        (Université de Marne-la-Vallée, France)
Nieves Brisaboa     (Université de la Corogne, Espagne)
Maxime Crochemor    (Université de Marne-la-Vallée, France)
Gaël Dias           (Université de la Beira Interior, Covilhã, Portugal)
Patrick Gallinari   (Université Paris 6, France)
Martin Jansche      (Université de Columbia, New York, USA)
Éric Laporte        (Université de Marne-la-Vallée, France)
Thierry Lecroq      (Université de Rouen, France)
Gabriel Lopes       (Nouvelle Université de Lisbonne, Portugal)
Nuno Mamede         (INESC-ID, Lisbonne, Portugal)
Mehryar Mohri       (Université de New York, USA)
Alexis Nasr         (Université Paris 7, France)
Arlindo Oliveira    (INESC-ID, Lisbonne, Portugal)
Ted Pedersen        (Université du Minnesota, Duluth, USA)
Dominique Revuz     (Université de Marne-la-Vallée, France)
André Salem         (Université Paris 3, France)
Richard Sproat      (Université de l’Illinois, Urbana, USA)
Simão Sousa         (Université de la Beira Interior, Covilhã, Portugal)
Mikio Yamamoto      (Université de Tsukuba, Japon)