Dirk Siepmann | 1 Apr 2005 13:09
Picon
Favicon

New book - Discourse markers across languages

Dear List Members,

I'd like to draw your attention to my new book 'Discourse markers across languages' (Routledge Advances in
Corpus Linguistics 6, eds. Tony McEnery and Michael Hoey; ISBN 0-415-34949-4) which is now available
through Routledge (www.routledge.com) or amazon.co.uk (amongst others). 

The book deals with ready-made phrases, or 'second-level (multi-word) discourse markers' such as 'it is
argued that' or 'the same goes for'. Specifically, it answers questions such as 'how can such phrases be
defined or translated?' or 'how can they be recorded in dictionaries?'
The book falls into two parts. Part I presents a functional taxonomy of second-level markers in English,
French and German as well as an analysis of their use in continuous text. Part II offers a contrastive
interlanguage analysis of the performance of non-native writers and translators. Lexicographic
implications are then considered, which lead to suggestions for the treatment of second-level markers
in general and learners' dictionaries. 
The book is essential reading for professional linguists or lexicographers with an interest in
collocation or phraseology, as well as for academics, translators and language teachers seeking to
produce well-crafted text in a foreign language. 

More information and an executive summary can be 
found on the web at http://www.dirk-siepmann.de/multiwordmarkers or www.dirk-siepmann.de/multiwordmarkers.htm

Best wishes,

Dirk Siepmann

--

-- 
Dr. Dirk Siepmann
Universität-GH Siegen Fachbereich 3
Adolf-Reichwein-Straße
57068 Siegen
(Continue reading)

Burstein, Jill | 1 Apr 2005 16:12
Favicon

Final CFP: Extended Deadline/Change in Submission Method

The Second Workshop on Building Educational Applications Using Natural
Language Processing 

 
<http://www.ets.org/research/conferences/nlp.html>
Two major research areas in educational applications, automated evaluation
of students' free-responses and intelligent tutoring systems (ITS), have
developed fairly autonomously within the NLP community. We made progress
toward bridging this gap in the First Workshop on Building Educational
Applications Using NLP in 2003, where researchers in a wide variety of
educational applications met at NAACL 2003 in Edmonton to share their
research - both in the speech- and text-based communities. Papers dealt with
automated evaluation of essay-length texts and classification of brief
responses that students enter into a tutoring system. Other research that
was reported included exploring the value of using grammar checking within a
tutoring system, comparing speech- and text-based tutoring systems, and
automatically generating multiple-choice questions. 
There continues to be a significant and fast-growing body of research toward
developing educational applications that incorporate NLP.  This has become
apparent as, since the First Workshop in 2003, subsequent workshops have
been held by scientists working in this field (InSTIL/ICALL 2004 Symposium
on Computer Assisted Learning and the eLearning International Workshop,
COLING 2004 <http://www.issco.unige.ch/coling2004/>). We hope that this
workshop will continue to facilitate communication between researchers who
work on all types of instructional applications, for K-12, undergraduate,
and graduate school. Our goal is to continue to expose the NLP research
community to these technologies with the hope that they may see novel
opportunities for use of their tools in educational applications. 

For this workshop, we will invite submissions including, but not limited to:
(Continue reading)

Alessandro Lenci | 1 Apr 2005 17:15
Picon
Favicon

Call for Papers - OntoLex 2005, IJCNLP-05 Workshop

                                 Call for Paper
                 OntoLex 2005 - Ontologies and Lexical Resources

                         http://www.ilc.cnr.it/ontolex2005

                                 IJCNLP-05 Workshop

                         October 15, 2005 Jeju Island, South Korea

Background and Goals

The new framework of Information Society fostered the growing of the HLT 
area and the project of turning the World Wide Web into a machine 
understandable resource to access digital information (the so-called 
Semantic Web), posing new challenges for integrated technologies. 
Lexicographers, lexical semanticists and ontologists are joining forces to 
build innovative systems for integrating ontological knowledge with lexical 
and semantic resources. Important examples of this interaction are the 
recent works on the conceptual analysis of WordNet, and the wide use of 
upper ontologies in innovative international projects like EuroWordNet, 
SIMPLE, Balkanet, DWDSnet, etc.
OntoLex 2005 will be the fourth workshop on Ontologies and Lexical 
Knowledge Bases, following OntoLex 2000, 2002, and 2004. In this workshop 
we want to discuss the relation between ontological knowledge and language. 
A special focus will be on the role of ontologies in multilingual language 
processing. This relation can be investigated from a number of different 
angles, for example:
- what differences and similarities there are between ontologies and more 
traditional lexical resources such as dictionaries and wordnets;
- how ontologies can be extracted from language corpora;
(Continue reading)

Carlos Rodriguez | 1 Apr 2005 18:12
Picon

Answers to domain corpora request

Thanks to everyone who answer my request for open-source domain corpora. 
Leonel Ruiz and Stella Tagnin pointed me to corpora in Spanish and 
Brazilian Portuguese. For English, Ylva Berglund mentioned OPUS (an open 
source parallel corpus). From the text mining front, big textual 
collections of Bio-Medical full-text articles are now available, as 
pointed out by Paul Buitelaar (http://muchmore.dfki.de/resources1.htm) 
and Kevin Cohen (http://www.biomedcentral.com/info/about/datamining/ 
[8,000 plus articles in xml]), among other data collections. Also, the 
Linux Documentation Project provides a quite big, typological 
homogeneous collection.
Unfortunately, big textual collections from other disciplines are more 
difficult to obtain in dowloadable form.  I am now compiling a 300 
article collection from Sociology journals, in case anyone is also 
interested in cross-genre comparatives  and lexical acquisition.

Carlos Rodríguez
National Autonomous University, Mexico

Unnatural Language | 2 Apr 2005 00:47

Unnatural Language Processing Workshop

          First and Last Call for Papers (April 1, 2005)

Frankly, NLP is just too hard, and unsupervised learning is getting
itself into all kinds of trouble now that it's in its teens.  Here in
the heart of the Silicon Swamp, we're alarmed to find ourselves
uttering random n-grams just for emphasis.  It's time to treat the world
to 99.9% accuracy.  It's time to redefine the task.  It's time for the

           1st Workshop on Unnatural Language Processing

                   Johns Hopkins University CLSP

TALK ABSTRACTS of up to 1 page due by APRIL 30, 2005 to ulp <at> nlp.cs.jhu.edu.  
We will attempt to collect these in an online proceedings.  As this is
an electronic workshop, there is no time limit on the talks themselves, 
although there is also no guarantee that anyone will be within earshot.

Self-invited talks (highest bidder)
-----------------------------------
Question Evasion: Lessons from the Loebner Prize Competition
Understanding Abney's Exposition of Blum & Mitchell's Reinterpretation 
    of the Yarowsky Algorithm

Shared task
-----------

   Zero-Sum Corpora: Destructive Mining of the Web

       Twenty teams.  One Web.  Three days.
      Are you computational linguist enough? 
(Continue reading)

Timothy Baldwin | 3 Apr 2005 15:20
Picon

2nd CFP: ACL 2005 Workshop on Deep Lexical Acquisition


Apologies for cross-postings

******************************************************************

			     2ND CALL FOR PAPERS

		ACL 2005 WORKSHOP ON DEEP LEXICAL ACQUISITION

     Sponsored by the ACL Special Interest Group on the Lexicon (SIGLEX)

				30 June, 2005

				Ann Arbor, USA

		 http://www.cs.mu.oz.au/~tim/events/acl2005/

		     Submission deadline: 11 April, 2005

		   *** NOTE REVISED SUBMISSION DETAILS ***

WORKSHOP DESCRIPTION

In natural language processing (NLP), there is a pressing need to develop deep
lexical resources (e.g. lexicons for linguistically-precise grammars, template
sets for information extraction systems, ontologies for word sense
disambiguation). Such resources are critical for enhancing the performance of
systems and for improving their portability between domains. For example, to
perform reliably, an information extraction system needs access to
high-quality lexicons or templates specific to the task at hand.
(Continue reading)

NICOLAS EMMANUEL | 3 Apr 2005 14:03
Picon

french conference announcement : JETOU : Rôle et place des corpus en linguistique : nouvelle date de soumission / new date of submission

JETOU 2005 > NOUVELLE DATE DE SOUMISSION / NEW DATE OF SUBMISSION : 9 Avril 2005

Les doctorants et « jeunes chercheurs » des 3 laboratoires de Sciences du langage de Toulouse :

C.P.S.T. (Centre Pluridisciplinaire de Sémiolinguistique Textuelle)
E.R.S.S. (Equipe de Recherche en Syntaxe et Sémantique)
Laboratoire Jacques Lordat (Centre Interdisciplinaire des Sciences du Langage et de la Cognition)
ainsi que du laboratoire d'informatique :
I.R.I.T. (Institut de Recherches en Informatique de Toulouse)

organisent les deuxièmes journées JETOU qui s'adressent aux doctorants et « jeunes chercheurs » en
sciences du langage.
Au cours de ces journées, nous nous interrogerons sur le « rôle et la place des corpus en linguistique ».

Les corpus en linguistique suscitent de multiples interrogations, comme en témoignent les nombreux
colloques, ouvrages, conférences, consacrés à ce sujet. Nous citerons à titre d'exemple, pour
l'année écoulée :

- Ecole d'été du CNRS, « Linguistique de Corpus », Université de Caen, 14-19 juin 2004 ;
- COLDOC'2004 « La construction des observables en sciences du langage », Colloque Jeunes Chercheurs
Modyco, 29-30 avril 2004 ;
- N° 2004-1 de RFLA, revue de l'AFLA, « Linguistique et informatique : nouveaux défis » ;
- N° 3 de Corpus, « L'usage des corpus en phonologie », décembre 2004 ; etc.

Ces manifestations montrent que l'utilisation des corpus intéresse différents champs des Sciences du
langage. Au-delà des traditionnelles questions liées à la constitution des corpus, aux formats des
données, aux outils d'exploitation, etc., nous nous focaliserons sur le rôle et la place des corpus
pour la description et la théorie linguistique. En d'autres termes, il s'agit d'expliciter comment les
corpus s'intègrent à la recherche linguistique. 

(Continue reading)

Aris Xanthos | 3 Apr 2005 21:53
Favicon

Final CFP - extended deadline: Psychocomputational Models of Language Acquisition

[Apologies for multiple postings]

            *** Final Call for Papers ***

        ***  Extended Deadline: 11 April  ***

Psychocomputational Models of Human Language Acquisition

Workshop at ACL 2005

29-30 June 2005 at University of Michigan Ann Arbor

http://www.colag.cs.hunter.cuny.edu/psychocomp

Workshop Topic
--------------

The workshop, which is a follow-up to the successful workshop held
at COLING in 2004, will be devoted to psychologically motivated
computational models of language acquisition -- models that are
compatible with, or motivated by research in psycholinguistics,
developmental psychology with particular emphasis on the acquisition
of syntax, though work on the acquisition of morphology, phonology
and other levels of linguistic description is also welcome.

The workshop will be taking place at the same time as CoNLL-2005
(http://cnts.uia.ac.be/conll/cfp.html) and we expect there to be sufficient
interest for a plenary session of papers that are relevant
to both audiences. There will also be a plenary session for
Mark Steedman's invited talk.
(Continue reading)

Rosina Weber | 2 Apr 2005 22:30
Picon
Favicon

Textual Case-Based Reasoning Workshop <at> ICCBR05

We apologize for multiple postings.

CALL FOR PAPERS: TCBR <at> ICCBR05

Textual Case-Based Reasoning Workshop
at the 6th International Conference on Case-Based Reasoning

http://www.pages.drexel.edu/~rw37/tcbr05.html

24 August 2005 as part of the ICCBR 2005 Workshop Program, Chicago, IL

DESCRIPTION

Textual CBR is an increasingly important CBR sub-discipline. Textual CBR 
techniques can facilitate rapid construction of CBR systems by reducing 
or eliminating the task of feature-design in domains in which raw cases 
consist of free or semi-structured text. Moreover, many tasks, such as 
question answering, are inherently language-based. For such tasks, 
retaining a textual case representation may be more effective than 
engineering a feature representation that is intermediate between text 
queries and text solutions. In addition, textual CBR can provide 
information extraction or other language analysis tools to assist in 
engineering feature-based cases. Accelerating growth in the number and 
size of incident report, lessons-learned, and other unstructured or 
semi-structured text collections insures that textual CBR will continue 
to increase in importance.

The goal of this workshop will be to provide a forum for discussion of 
trends, developments, research issues, contributions from other 
communities, and practical experience in textual case-based reasoning. A 
(Continue reading)

Trilok Khairnar | 4 Apr 2005 10:45
Picon

Re: Corpus from Blogs required.

Hello Jean-Phi, Gilad 

Thanks for the inputs.

Permalinks and Technorati APIs will definitely be useful.

Technorati APIS provide - inbound and outbound links of a blog, basic
user and blog info etc. but not the list of posts on a blog and their
text.

On the other hand, permalinks should be useful to extract the text of
one blog post at a time though surrounding text on the blog like
badges and blogroll will be included too. (Looks like a hack will be
required to extract only the text of a post when permalink is
available.)

I will try this sometime using Atom.Net and RSS.Net libraries and let
the list-members know.

Thanks,
Trilok.

On Mar 31, 2005 4:05 AM, Jean-Phi <jpprost <at> gmail.com> wrote:
> Hi,
> 
> > In the absence of such corpus and APIs, I am thinking of doing this by
> > 1] using RSS, ATOM feed parsers on some OPML files to get URLs for blog posts
> > 2] Extracting the text (easier if the blog template format is known)
> 
> It might not be that easy: I suspect that many blogs use some sort of
(Continue reading)


Gmane