Yukio Tono | 1 Jan 16:44 2004
Picon

CFP: An International Symposium on Learner Corpora in Asia

(Apologies if you receive this message more than once) 

An International Symposium on Learner Corpora in Asia

Date: 13-March-2004 - 14-March-2004 
Location: Tokyo, JAPAN 
Contact: Yukio Tono 
Contact Email: y.tono <at> meikai.ac.jp 
Meeting URL:http://www.eng.ritsumei.ac.jp/icle-j/symposium/

Linguistic Sub-field: Corpus Linguistics, Second
Language Acquisition 
Call Deadline: 1-Feb-2004 

Meeting Description: 

This international symposium on learner corpora aims 
to gather those involved in designing, compiling and using 
corpora of second language learners for the purpose of 
studying interlanguage features, second language acquisition, 
and language teaching and learning, especially in Asian 
contexts. The target language can be any language as long as 
it is a corpus of L2 (second language) learners. We hope to 
provide an opportunity among participants to share technical 
and professional expertise they have in learner corpus 
research in order to promote this exciting research 
methodology for SLA researchers in English and other
languages. 

This symposium is partially supported by the Ministry
(Continue reading)

John Colby | 4 Jan 09:43 2004

RE: automatic error correction

I want to thank all of those who corresponded with me regarding automatic
error correction for ASR, OCR and textual processing.   You not only gave
me valuable references but focused my attention on certain models used in
automatic error correction.  Your comments were very helpful to me in
finding a sound theoretical basis for my own work in automatic error
correction.

Below is a bibliography compiled with the help of those who corresponded
with me and from my own recent literature search.

John Colby
UC Santa Cruz

Juan Amengual and Enrique Vidal, Efficient Error Correcting Viterbi Parsing,
IEEE Trans. on PAMI, Vol. 20, No. 10.

Juan Armengual, Enrique Vidal and Jose-Miguel Benedi, Simplifying Language
Through Error-Correcting Decoding, in Proceedings of the ICSLP96, PA
(USA), October 1996.

Eric Brill and Robert C. Moore, An Improved Model for Noisy Channel
Spelling Correction, ACL 2000.

CHANIER, THIERRY, M. PENGELLY, M. TWIDALE, . J. SELF.
Conceptual modelling in error analysis in computer-assisted language
learning systems. [SWARTZ . YAZDANI 1992], . 125-150.

Eugene Charniak and Mark Johnson, Edit Detection and Parsing for
Transcribed Speech, in Proceedings of the 2nd Meeting of the North
American Chapter of the Association for Computational Linguistics, 2001.
(Continue reading)

tpederse | 4 Jan 22:25 2004
Picon

Discover Word Meanings with SenseClusters!


We are pleased to announce the release of SenseClusters, a free software   
package that does unsupervised discovery of word senses by clustering   
together instances of a word (or words) that are used in similar contexts   
in raw text. It supports a wide range of clustering techniques based on  
both context vectors and similarity matrices. 

SenseClusters is flexible, and can be used in any application that  
requires clustering of similar instances of text. Examples could include   
word sense discrimination, synonymy identification, text classification,  
and summarization. It can also be used to implement models such as Latent  
Semantic Analysis (LSA). 

SenseClusters takes a user through the entire process of unsupervised  
learning of word senses, including text preprocessing, feature selection, 
context vector and similarity matrix construction, dimensionality 
reduction via singular value decomposition (SVD), and clustering via both 
agglomerative and partitional algorithms. 

SenseClusters provides a great deal of native functionality, and also   
provides seamless interfaces to take advantage of a number of powerful  
tools, including Cluto (a Clustering toolkit), SVDPACKC (which carries  
out singular value decomposition), and the Ngram Statistics Package.

For general information please visit:
http://senseclusters.sourceforge.net 

For immediate download of the first public release (0.47) please visit: 
http://sourceforge.net/projects/senseclusters/ 

(Continue reading)

Patricia Rodríguez Inés | 5 Jan 13:13 2004
Picon
Picon

CULT (Corpus Use and Learning to Translate) programme

CORPUS USE AND LEARNING TO TRANSLATE (Barcelona 22nd-24th January 2004)

Attached you will find the programme for the forthcoming CULT conference.

As the conference date is approaching, we would like to invite you to register 
on the conference web page (http://www.fti.uab.es/cg.cult.bcn).

Looking forward to seeing you in Barcelona.

The CULT organisation committee
Attachment (CULT BCN 04_programme.PDF): application/octet-stream, 11 KiB
Gaël Dias | 6 Jan 11:26 2004
Picon

Call for Papers: LREC 2004 Workshop: MEMURA

********************* CALL FOR PAPERS *********************

                      MEMURA-2004 
Workshop on Methodologies and Evaluation of Multiword Units 
              in Real-world Applications 
                  (MEMURA Workshop)

          INVITED SPEAKER: KENNETH W. CHURCH

   In association with the 4th International Conference On
        Language Resources and Evaluation - LREC 2004

           Centro Cultural de Belém, Lisbon, Portugal
                         May 25, 2004

                 http://memura2004.di.ubi.pt

********************* CALL FOR PAPERS *********************

This annoucement contains:
  [1] Workshop Description
  [2] Target Audience
  [3] Areas of Interest
  [4] Invited Speaker
  [5] Important dates
  [6] Abstract Submission
  [7] Workshop Chairs
  [8] Program Committee
  [9] Contact

(Continue reading)

Manfred Krifka | 6 Jan 10:00 2004
Picon

CfP: ESSLLI workshop on questions

                  *** Our apologies for multiple postings ***

                                Call for papers

            Workshop: Syntax, Semantics and Pragmatics of Questions
                               August 9-13, 2004

Organized as part of the European Summer School on Logic, Language, and
Information (ESSLLI), August 9-20, 2004 in Nancy, France. 

Website of the summer school: http://esslli2004.loria.fr
Website of the workshop: http://amor.rz.hu-berlin.de/~h2816i3x/ESSLLI_Questions.html 

Workshop organizers:
- Ileana Comorovski, Université Nancy 2
- Manfred Krifka, Humboldt-Universität & Zentrum für Allgemeine
  Sprachwissenschaft (ZAS), Berlin

Workshop Purpose:
The investigation of questions has deepened our understanding of syntax (e.g.
the constraints on syntactic dependencies), of semantics (e.g., the
representation of non-declarative information) and of pragmatics (e.g., the
nature of speech acts). However, researchers often took little notice of
research on questions (and answers) in adjacent fields. For example,
syntacticians interested in multiple constituent questions were unaware of the
interpretation of different types of multiple questions, and semanticists
disregarded important pragmatic factors like speaker bias.  This workshop tries
to bring together researchers that transcend such boundaries. The goal is to
gain not only a more profound understanding of questions, but of the
interaction of syntax, semantics and pragmatics in general.
(Continue reading)

Yuri Tambovtsev | 6 Jan 16:01 2004
Picon

Udeghe is added to our Tungus-Manchurian corpora

Dear corpora colleagues, we added some Udghe texts to our world language corpora. Udeghe is said to belong to the Amur group of the Tungus-Manchurian language family. It is interesting to note that it has the glottal stop like many American Indian languages of the South and North Americas. Do you know if any linguist compared Udeghe to AmerIndian languages? I'm going to do it, but I'm not sure if it has been already done. I'd appreciate your sending me your e-mail messages directly to my e-mail address yutamb <at> hotmail.com Looking forward to hearing from you soon, remain yours sincerely Yuri Tambovtsev
Jean-Claude MARTIN | 6 Jan 14:29 2004
Picon

LREC 2004 Workshop on Multimodal Corpora : 2nd and final CFP

____________________________________________________________________
             This message is posted to several lists.
           We apologize if you receive multiple copies.
      Please forward it to everyone who might be interested.
_____________________________________________________________________

  **********************************************
 SECOND AND FINAL CALL FOR PAPERS

            Workshop on

   MULTIMODAL CORPORA:

      MODELS OF HUMAN BEHAVIOUR
     FOR THE SPECIFICATION AND EVALUATION
     OF MULTIMODAL INPUT AND OUTPUT INTERFACES

    http://lubitsch.lili.uni-bielefeld.de/MMCORPORA/

Centro Cultural de Belem, LISBON, Portugal, 25th may 2004

 **********************************************

In Association with
4th INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
LREC2004 http://www.lrec-conf.org/lrec2004/index.php  Main conference
26-27-28 May 2004

MOTIVATIONS
-------------------------
The primary purpose of this one day workshop is to share information
and engage in the collective planning for the future creation of usable
pluridisciplinary multimodal resources.
It will focus on the following issues regarding multimodal corpora:
how researchers build models of human behaviour out of the annotations
of video corpora,
how they use such knowledge for the specification of multimodal input
(e.g. merging users' gestures and speech )
and output (e.g. specification of believable and emotional behaviour in
Embodied Conversational Agents) in human computer interfaces,
and finally how they evaluate multimodal systems (e.g. full system
evaluation and glass box evaluation of individual
system components).

Topics to be addressed in the workshop include, but are not limited to:
* Models of human multimodal behaviour in various disciplines
* Integrating different sources of knowledge (literature in
socio-linguistics, corpora annotation)
* Specifications of coding schemes for annotation of multimodal video
corpora
* Parallel multimodal corpora for different languages
* Methods, tools, and best practice procedures for the acquisition,
creation, management, access, distribution, and use of multimedia and
multimodal corpora
* Methods for the extraction and acquisition of knowledge (e.g. lexical
information, modality modelling) from multimedia and multimodal corpora
* Ontological aspects of the creation and use of multimodal corpora
* Machine learning for and from multimedia (i.e., text, audio, video),
multimodal (visual, auditory, tactile), and multicodal (language,
graphics, gesture) communication
* Exploitation of multimodal corpora in different types of applications
(information extraction, information retrieval, meeting transcription,
multisensorial interfaces,
  translation, summarisation, www services, etc.)
* Multimedia and multimodal metadata descriptions of corpora
* Applications enabled by multimedia and multimodal corpora
* Benchmarking of systems and products; use of multimodal corpora for
the evaluation of real systems
* Processing and evaluation of mixed spoken, typed, and cursive (e.g.,
pen) language processing
* Automated multimodal fusion and/or generation (e.g., coordinated
speech, gaze, gesture, facial expressions)
* Techniques for combining objective and subjective evaluations,  and
for making evaluations cost-effective, predictive and fast

The output of the workshop will be the following:
* Better knowledge of the potential of major models of human multimodal
behaviour
* Challenging issues in the usability of multimodal corpora
* Fostering of a pluridisciplinary community of multimodal researchers
and multimodal interface developers

RATIONALE
-------------------------
Multimodal resources feature the recording and annotation of several
communication modalities
such as speech, hand gesture, facial expression, body posture, graphics.

Several researchers have been developing such multimodal resources for
several years,
often with a focus on a limited set of modalities or on a given
application domain.
A number of projects, initiatives and organisations have addressed
multimodal resources with a federative approach:
* At LREC2002, a workshop had addressed the issue of "Multimodal
Resources and Multimodal Systems Evaluation"
http://www.limsi.fr/Individu/martin/wslrec2002/MMWorkshopReport.doc
* At LREC2000, a 1st workshop had addressed the issue of multimodal
corpora, focussing  on meta-descriptions and large corpora
http://www.mpi.nl/world/ISLE/events/LREC%202000/LREC2000.htm
* The European 6th Framework program (FP6), started in 2003, includes
multilingual and multisensorial
  communication as one of the major R&D issue, and the evaluation of
technologies appears as a specific
   item in the Integrated Project instrument presentation
http://www.cordis.lu/ist/so/interfaces/home.html
* NIMM was a work group on Natural Interaction and MultiModality which
ran under the IST-ISLE project
 (http://isle.nis.sdu.dk/). In 2001, NIMM compiled a survey of existing
multimodal resources
  (more than 60 corpora are described in the survey), coding schemes and

annotation tools.
  The ISLE project was developed both in Europe and in the USA
(http://www.ldc.upenn.edu/sb/isle.html)
* EcorporaA (European Language Resources Association) launched in
November 2001 a
  survey about multimodal corpora including marketing aspects
(http://www.icp.inpg.fr/EcorporaA/).
* A Working Group at the Dagstuhl Seminar on Multimodality recorded, in
November 2001,
  28 questionnaires from researchers on multimodality, from which 21
have been announcing their
  attention to record other multimodal corpora in the future.
(http://www.dfki.de/~wahlster/Dagstuhl_Multi_Modality/)
* Other surveys have been recently made about multimodal annotation
coding schemes and tools (COCOSDA, LDC, MITRE).

Yet, existing annotation of multimodal corpora until now have been made
mostly on an individual basis,
each researcher or team focusing on its own needs and knowledge about
modality specific coding schemes
or application examples.
Thus, there is a lack of real common knowledge and understanding of how
to proceed from annotations
to usable models of human multimodal behaviour and how to use such
knowledge
for the design and evaluation of multimodal input and embodied
conversational agent interfaces.

Furthermore, the evaluation of multimodal interaction poses different
(and very complex) problems than the evaluation of monomodal speech
interfaces or
WYSIWYG direct interaction interfaces.
There are a number of recently finished and ongoing projects in the
field of multimodal interaction
in which attempts have been made to evaluate the quality of the
interfaces in all meanings
that can be attached to the term 'quality'.
There is a widely felt need in the field for exchanging information on
multimodal
interaction evaluation with researchers in other projects.
One of the major outcomes of this workshop should be better
understanding of
the extent to which evaluation procedures developed in one project
generalise to other, somewhat related projects.

IMPORTANT DATES
-------------------------
* 24 January 2004: Deadline for paper submission
* 29 February 2004: Acceptance notifications and preliminary program
* 21 March 2004: Deadline final version of accepted papers
* 25 May 2004: Workshop

SUBMISSIONS
---------------
The workshop will consist primarily of paper presentations and
discussion/working sessions.
Submissions should be 4 pages long, must be in English, and follow the
submission guidelines at http://lubitsch.lili.uni-bielefeld.de/MMCORPORA

Demonstrations of multimodal corpora and related tools are encouraged as

well (a demonstration outline of 2 pages can be submitted).
As soon as possible, authors are encouraged to send to
lrec <at> limsi.u-psud.fr
a brief email indicating their intention to participate, including their

contact information and
the topic they intend to address in their submissions.
Proceedings of the workshop will be printed by the LREC Local Organising

Committee.
The organisers might consider a special issue of a suitable Journal for
selected publications from the workshop.

TIME SCHEDULE AND REGISTRATION FEE
--------------------------------------------------
The workshop will consist of a morning session and an afternoon session,

with a focus on the use of multimodal corpora for building models of
human behaviour and
specifying/evaluating multimodal input and output Human-Computer
Interfaces.
There will also be time slots for collective discussion and one coffee
break in the morning and in the afternoon.
For this full-day Workshop, the registration fee is 100 EURO for LREC
Conference participants
and 170 EURO for other participants. These fees will include coffee
breaks and the Proceedings of the Workshop.

ORGANISING COMMITEE
--------------------------------------------------
Jean-Claude MARTIN, LIMSI-CNRS, martin <at> limsi.u-psud.fr
Elisabeth Den OS, MPI, Els.denOs <at> mpi.nl
Peter KÜHNLEIN, Univ. Bielefeld, p <at> uni-bielefeld.de
Lou BOVES, L.Boves <at> let.kun.nl
Patrizia PAGGIO, CST, patrizia <at> cst.dk
Roberta CATIZONE, Sheffield, roberta <at> dcs.shef.ac.uk

PRELIMINARY PROGRAM COMMITEE
--------------------------------------------------
Elisabeth AHLSÉN
Jens ALLWOOD
Elisabeth ANDRE
Niels Ole BERNSEN
Lou BOVES
Stéphanie BUISINE
Roberta CATIZONE
Loredana CERRATO
Piero COSI
Elisabeth Den OS
Jan Peter DE RUITER
Laila DYBKJÆR
David HOROWITZ
Bart JONGEJAN
Alfred KRANSTEDT
Steven KRAUWER
Peter KÜHNLEIN
Knut KVALE
Myriam LAMOLLE
Jean-Claude MARTIN
Joseph MARIANI
Jan-Torsten MILDE
Sharon OVIATT
Patrizia PAGGIO
Catherine PELACHAUD
Janienke STURM

**********************************************

P. Kaszubski | 6 Jan 23:40 2004
Picon

CfP Assessing potential of corpora (PLM2004)

Poznań Linguistic Meeting 2004
(http://elex.amu.edu.pl/ifa/plm/)

WORKSHOP on "Assessing the potential of corpora"
May 20, from 9 am.

Call for Papers
===========

The goal of the workshop is to convene a forum of users of language 
corpora interested in the exchange of ideas related to the 
feasibility of corpus use in linguistic study and language teaching. 
We welcome papers and/or demonstrations describing corpus-inspired 
research and pedagogical applications, and, ideally, attempting to 
evaluate the resources, tools and procedures examined for the 
purpose. Some of the likely areas include:
  * insights from large and small corpora for descriptive and 
pedagogical linguistics 
  * data-driven-learning and other uses of corpora in/for the 
classroom 
  * corpus-based/-driven contrastive analysis 
  * corpus-based/-driven genre analysis 
  * learner corpora – Error Analysis and more 
  * corpora and translation 
  * the "web as corpus" for language research and pedagogy 
  * corpus-based/-driven tools and methods for language task learning 
  * ... and many many more :)

Topics proposed so far:
  * Automatic phonetic annotation of corpora for EFL purposes – Prof. 
Włodzimierz Sobkowiak 
  * Studying metaphor with the BNC – Dr. Małgorzata Fabiszak 
  * Corpora for the teaching of translation – Maciej Machniewski (PhD 
report) 
  * Corpus-based teaching of English syntax – Dr. Paweł Scheffler 
  * Web concordancing and EFL writing – Dr. Przemysław Kaszubski

Presentations will last 30 minutes and be followed by 10-minute 
discussion. The conference language will be English; however we also 
welcome papers based on corpora of Polish and other languages.

Abstracts of 250-300 words should be e-mailed by March 15 to:
Przemek Kaszubski <przemek <at> ifa.amu.edu.pl>.

To register and receive more information on PLM2004, go to: 
http://elex.amu.edu.pl/ifa/plm.

----------

=======================================
Dr Przemyslaw Kaszubski
t: +48 61 8293515
e: przemka <at> amu.edu.pl
w: http://elex.amu.edu.pl/ifa/staff/kaszubski.html

SEARCH PICLE LEARNER CORPUS ONLINE:
http://main.amu.edu.pl/~przemka/picle.html

COMPREHENSIVE CORPORA BIBLIOGRAPHY:
http://main.amu.edu.pl/~przemka/welcome.html#Corpbibl

IFA WRITING COURSES PAGE:
http://main.amu.edu.pl/~przemka/IFA_writing/ifawrit.htm

School of English (IFA)
Adam Mickiewicz University
Al. Niepodleglosci 4
61-874 Poznan
t: +48 61 8293506
f: +48 61 8523103
w: http://elex.amu.edu.pl/ifa
=======================================

Stefan Th. Gries | 7 Jan 05:38 2004
Picon

Russian corpora

Dear colleagues

Does anybody know whether there are freely available corpora of Russian and
where these can be accessed? I only know of the newspaper CDs mentioned on
this list 1.5 years ago, but is there something else by now? I'll post a
summary.

Stefan Th. Gries
-----------------------------------------------------------
IFKI, Southern Denmark University
http://people.freenet.de/Stefan_Th_Gries
-----------------------------------------------------------


Gmane