Torsten Zesch | 1 Jun 2007 10:13
Picon
Favicon

Corpora of commented documents

Dear Colleagues,

I'm looking for corpora which contain natural language annotations and
comments on documents, e.g. textual annotations of PDF or HTML documents
created with Acrobat or with a Web annotation system.

Moreover, I would be very thankful for pointers to corpora of
handwritten free-form text or notes (digital ink).

Thanks,
Torsten Zesch

Claire Jessel | 1 Jun 2007 10:56
Picon
Favicon

corpus of rated words

hello
does anyone know of a word corpus in which all words would be rated as 
positive/neutral/negative (for instance, 'good' would have a positive 
rating, 'bad' a negative rating)?
thanks
claire

--

-- 

uma - separating the signal from the noise

claire jessel . software engineer . claire.jessel <at> uma.at
uma information technology GmbH . amerlingstrasse 1 . a-1060 vienna
http://www.uma.at . phone +43-1-526 29 67-712 . fax +43-1-526 29 67-200

--
This message contains information which may be confidential and 
privileged. Unless you are the addressee (or authorised to receive 
for the addressee), you may not use, copy or disclose to anyone the 
message or any information contained in the message. If you have 
received the message in error, please advise the sender by reply 
e-mail  <at> uma.at, and delete the message. Thank you very much.

Costas Gabrielatos | 1 Jun 2007 14:48
Picon

Help with research

Dear All

I'm helping a colleague find volunteers for his research. It involves
listening to 11 short texts (about a minute each) and selecting one of five
responses to each. Volunteers need to be native speakers of one of the
following languages:

Chinese, English, Farsi, French, Greek, Italian, Japanese, Korean, Malay,
Russian, Spanish

If you would like to help, please go to:
<http://www.agu.ac.jp/~saburi/language_survey/instruction.cgi>
http://www.agu.ac.jp/~saburi/language_survey/instruction.cgi

You will be prompted for a username and password. The username is your
native language; please use the corresponding password from the following
list:

ID: Chinese; password: confucius

ID: English; password: newton

ID: Farsi; password: silkroad

ID: French; password: pasteur

ID: Greek; password: pythagoras

ID: Italian; password: davinci

(Continue reading)

Hélène Mazo | 1 Jun 2007 15:26

ELRA - Language Resources Catalogue - Erratum

Our apologies if you have received multiple copies of this announcement.

*ERRATUM:* Wrong URL links were included in this announcement sent to 
you yesterday. The current posting contains the corrected links. Please 
discard the previous posting.
We would like to  apologize for the misleading information.

*******************************************************************
ELRA - Language Resources Catalogue - Update
*******************************************************************

ELRA is happy to announce that 2 new Speech Resources, 1 Written Corpus 
and 1 Monolingual Lexicon are now available in its catalogue.

*ELRA-S0238 MIST Multi-lingual Interoperability in Speech Technology 
database
*The MIST Multi-lingual Interoperability in Speech Technology database 
comprises the recordings of 74 native Dutch speakers (52 males, 22 
females) who uttered 10 sentences in Dutch, English, French and German, 
including 5 sentences per language identical for all speakers and 5 
sentences per language per speaker unique. Dutch sentences are 
orthographically annotated.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=988&language=en 
<http://catalog.elra.info/product_info.php?products_id=988&language=en>

*ELRA-S0239 N4 (NATO Native and Non Native) database
*The (NATO Native and Non Native) database comprises speech data 
recorded in the naval transmission training centers of four countries 
(Germany, The Netherlands, United Kingdom, and Canada) during naval 
(Continue reading)

Martin Wynne | 1 Jun 2007 15:38
Picon
Picon
Favicon

Looking at concordances

Is anyone aware of any research that has been done on how people look at 
concordances? I'm interested in experimental research on what users do 
when they look at concordance lines, from the point of view of their 
behaviour, gaze, and any psychological aspects. References to usability 
studies on concordancer software would be also be interesting.

I'm thinking of putting together a proposal for an investigation in 
collaboration with psychologists and ethnographers into what we do when 
we look at concordances, and so I need to be aware of existing work.

I'll be happy to post a summary of results to the list.

--

-- 
Martin Wynne
Head of the Oxford Text Archive and
AHDS Literature, Languages and Linguistics

Oxford University Computing Services
13 Banbury Road
Oxford
UK - OX2 6NN
Tel: +44 1865 283299
Fax: +44 1865 273275
martin.wynne <at> oucs.ox.ac.uk

ELDA | 1 Jun 2007 15:44

ELRA - Language Resources Catalogue - ERRATUM

Our apologies if you have received multiple copies of this announcement.

*ERRATUM:* Wrong URL links were included in this announcement sent to 
you yesterday. The current posting contains the corrected links. Please 
discard the previous posting.
We would like to  apologize for the misleading information.

*******************************************************************
ELRA - Language Resources Catalogue - Update
*******************************************************************

ELRA is happy to announce that 2 new Speech Resources, 1 Written Corpus 
and 1 Monolingual Lexicon are now available in its catalogue.

*ELRA-S0238 MIST Multi-lingual Interoperability in Speech Technology 
database
*The MIST Multi-lingual Interoperability in Speech Technology database 
comprises the recordings of 74 native Dutch speakers (52 males, 22 
females) who uttered 10 sentences in Dutch, English, French and German, 
including 5 sentences per language identical for all speakers and 5 
sentences per language per speaker unique. Dutch sentences are 
orthographically annotated.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=988&language=en 
<http://catalog.elra.info/product_info.php?products_id=988&language=en>

*ELRA-S0239 N4 (NATO Native and Non Native) database
*The (NATO Native and Non Native) database comprises speech data 
recorded in the naval transmission training centers of four countries 
(Germany, The Netherlands, United Kingdom, and Canada) during naval 
(Continue reading)

Janyce M. Wiebe | 1 Jun 2007 15:21
Picon

corpus of rated words


Our current subjectivity lexicon has words rated with what we call
their "prior polarity" -- out of context, would you expect a word to
be positive, negative, or neutral.  Many people also use the General
Inquirer lexicon.  We included that lexicon in ours, but with some
filtering out of words in that lexicon which we found in our work to
be too often neutral to include.

You can find the lexicon at www.cs.pitt.edu/mpqa under "subjectivity
clues".  (Note that "subjectivity" is linguistic expression of private
states, and includes expressions of sentiments, speculations, and so
on).

Hope this is helpful.

Best,
Jan Wiebe
www.cs.pitt.edu/~wiebe

Claire Jessel writes:
 > hello
 > does anyone know of a word corpus in which all words would be rated as 
 > positive/neutral/negative (for instance, 'good' would have a positive 
 > rating, 'bad' a negative rating)?
 > thanks
 > claire
 > 
 > -- 
 > 
 > uma - separating the signal from the noise
(Continue reading)

Kevin B. Cohen | 1 Jun 2007 17:28
Picon

Re: corpus of rated words

At the website http://tcc.itc.it/people/valitutti/home/home.html#wna, see the following:

<paste>
I am working for some years on affective lexicon and I developed an extension of WordNet in order to collect affective concepts and correlate them with words. I named this rerource WordNet-Affect. It is described in the paper Developing Affective Lexical Resources, written with Carlo Strapparava and Oliviero Stock, and available at psychnology.org.
Another reference is "WordNet-Affect: an affective extension ofWordNet". In /Proceedings ofthe 4th International Conference on Language Resources and Evaluation (LREC 2004)/, Lisbon, May 2004, pp. 1083-1086.
Alessandro Oltremari described possible uses of this resource in as an ontology of emotional concepts. The paper is Unfolding the mental: the role of cognitive structures and it is available in the proceedings of MUSIL. Workshop on the Potential of Cognitive Semantics for Ontologies The resource is part of WordNet Domains and it is free available. Currently, I am working to a substantial reorganization of WordNet-Affect in order to use it as an ontology. Starting from a subset of affective synset, I am selecting an "affective core" with a set of affective categories, hierarchically organized. I hope to finished this work in one or two months. If you are interested on it, I can provide you with further informations on these improvements of the resources. From another hand, I am interested too to your possible applications about the topic of your thesis.


Kev


On 6/1/07, Claire Jessel <claire.jessel <at> uma.at > wrote:
hello
does anyone know of a word corpus in which all words would be rated as
positive/neutral/negative (for instance, 'good' would have a positive
rating, 'bad' a negative rating)?
thanks
claire



--
K. B. Cohen
Biomedical Text Mining Group Lead
Center for Computational Pharmacology
303-724-7563 (office) 303-916-2417 (cell) 303-377-9194 (home)
http://compbio.uchsc.edu/Hunter_lab/Cohen
Khurshid Ahmad | 1 Jun 2007 17:32
Picon
Picon

Re: Looking at concordances

Dear Martin
What do people do when ...? Sounds like a first rate project in cognition.
 Concordances are designed to fix your gaze (on the concorded word) and
the purpose of a concordance,as I understood from leading corpora wallahs,
is to establish a statistical pattern.

The concordances were invented in hermeneutics for the study of texts (of
divine origin) and another route is  Kabbalah. The late John Sinclair
always used to have a concordance listing as a punch line; the punch lines
appear in his writings as well.

There are a number of studies in translation and terminology studies that
deal with what people do with text and terminology tools; I can send you a
list if you wish.

Best wishes

> Is anyone aware of any research that has been done on how people look at
> concordances? I'm interested in experimental research on what users do
> when they look at concordance lines, from the point of view of their
> behaviour, gaze, and any psychological aspects. References to usability
> studies on concordancer software would be also be interesting.
>
> I'm thinking of putting together a proposal for an investigation in
> collaboration with psychologists and ethnographers into what we do when
> we look at concordances, and so I need to be aware of existing work.
>
> I'll be happy to post a summary of results to the list.
>
> --
> Martin Wynne
> Head of the Oxford Text Archive and
> AHDS Literature, Languages and Linguistics
>
> Oxford University Computing Services
> 13 Banbury Road
> Oxford
> UK - OX2 6NN
> Tel: +44 1865 283299
> Fax: +44 1865 273275
> martin.wynne <at> oucs.ox.ac.uk
>
>
>

Khurshid Ahmad

Professor of Computer Science
Department of Computer Science
Trinity College,
DUBLIN-2
IRELAND
Phone 00 353 1 896 8429

Web Page: http://people.tcd.ie/kahmad

Orasan, Constantin | 2 Jun 2007 04:16
Picon
Picon
Favicon

RANLP 2007 Workshop on Computer-Aided Language Processing(CALP'07) -- Second CFP & submission information

                     [Apologies for cross-postings]

    RANLP-07 Workshop: Computer-Aided Language Processing (CALP'07)

           Second call for papers and submission information


                           Borovetz, Bulgaria
                           September 30, 2007

          Workshop site:  http://clg.wlv.ac.uk/events/CALP07/
             RANLP'2007 site:  http://lml.bas.bg/ranlp2007/



AIMS:

The past years have seen a variety of promising NLP projects but in 
the vast majority of real-world applications, fully automatic NLP is 
still far from delivering reliable results. As a result, computer-
aided methods have emerged as a practical alternative. In the 
computer-aided scenario, processing is not done entirely by 
computers, human intervention improves, post-edits or validates the 
output of the computer program.

The aim of the workshop is to bring together researchers working on 
CALP projects and to provide a forum for fruitful discussion on 
related issues and further developments in the field. Topics of 
interests include areas where computers can be used to help but not 
to fully automate the process such as (but not limited to) machine 
translation, production of summaries, generation of documents, 
extraction of terminology, creation of indexes, ontology creation and 
annotation of texts using semi-automatic methods.

The workshop also encourages discussions and submissions focusing on 
evaluation issues addressing the efficiency of the CALP methods. Of 
particular interest will be studies which compare the saving of time 
and cost of CALP methods as opposed to manual methods.

TOPICS

Prospective authors are invited to submit proposals in the following 
areas of interest, related to computer-aided language processing:

* computer-aided language processing for NLP tasks including but not 
limited to computer-aided summarisation, indexing, translation, 
generation, etc.
* semi-automatic annotation methods: post-processing vs. interactive 
annotation
* semi-automatic ontology development
* interactive machine learning methods such as active learning
* evaluation issues addressing the efficiency of CALP methods
* translation memories and other translation aides
* interactive machine translation
* pre/post-editing of machine translation
* intelligent tools for language learning such as dictionaries and 
concordancers
* computer-aided assessment tools
* authorship attribution and plagiarism detection


INVITED SPEAKER

Ruslan Mitkov, University of Wolverhampton, UK


SUBMISSION GUIDELINES:

* Format: Authors are invited to submit two types of papers: full 
papers which describe original and unpublished work in the topic area 
of this workshop, and short papers which describe full working 
systems, and which, if accepted, will be presented at a special 
session accompanied by live demo. Papers should be submitted as a PDF 
file, formatted according to the RANLP 2007 stylefiles and not should 
not exceed 8 pages for full papers and 4 pages for short papers. The 
RANLP 2007 stylefiles are available at:

http://lml.bas.bg/ranlp2007/submissions.htm.

As reviewing will be blind, the papers should not include the 
authors' names and affiliations. Furthermore, self-references that 
reveal the authors' identities should be avoided. Papers that do not 
conform to these requirements will be rejected without review.

* Submission procedure: Submission of papers is handled using 
the START system available at
http://www.softconf.com/ranlp/CALP07/submit.html

* Reviewing: Each submission will be reviewed at least by two members 
of the Program Committee. Reviewers will be asked to provide detailed 
comments, and to score submitted papers on the following factors:
       - Relevance to the workshop
       - Significance and originality
       - Technical/methodological accuracy
       - References to related work
       - Presentation (clarity, organisation, English)

* Accepted papers policy. Accepted papers will be published in the 
workshop proceedings. By submitting a paper at the workshop the 
authors agree that, in case the paper is accepted for publication, at 
least one of the authors will attend the workshop; all workshop 
participants are expected to pay the RANLP-2007 workshop registration 
fee.


IMPORTANT DATES

Paper submission deadline: June 15, 2007
Paper acceptance notification:  July 20, 2007
Camera-ready papers due:  August 31, 2007
Workshop date:  September 30, 2007


WORKSHOP CHAIRS:

Constantin Orasan, University of Wolverhampton, UK
Sandra Kübler, Indiana University, USA


PROGRAM COMMITTEE:

Amit Bagga, Ask Jeeves, USA
Kalina Bontcheva, University of Sheffield, UK
Robert Clark, Translution Ltd, UK and Leeds University, UK
Le An Ha, University of Wolverhampton, UK
Catalina Hallett, Open University, UK
Laura Hasler, University of Wolverhampton, UK
Erhard Hinrichs, University of Tuebingen, Germany
Martin Kay, Stanford University, USA
Elina Lagoudaki, Imperial College, UK
Inderjeet Mani, Georgetown University, USA
Patricio Martinez Barco, University of Alicante, Spain
Andrea Mulloni, Cogito Srl, Italy
Masumi Narita, Tokyo International University, Japan
Matteo Negri, IRST, Italy
Gabor Proszeky, Morphologic, Hungary
Frederique Segond, Rank Xerox, France
Doina Tatar, Babes-Bolyai University, Romania


CONTACT:

calp07 at wlv dot ac dot uk


Gmane