Ciarán Ó Duibhín | 1 Jun 2009 02:44
Picon
Picon

Re: Tagging with synsets?

Thank you for all replies on this, which I summarize:
• FreeLing  http://garraf.epsevg.upc.es/freeling/  (Francis Tyers) (compiled application)
• SenseRelate http://senserelate.sourceforge.net; web interface  http://marimba.d.umn.edu/allwords/allwords.html  (Ted Pedersen) (Perl)
• UKB http://ixa2.si.ehu.es/ukb (Eneko Agirre)
• SenseLearner http://lit.csci.unt.edu/~senselearner/ incl web interface (Rada Mihalcea) (Perl)
 
I'm pleasantly surprised to find that some of these include implementations of algorithms to tag running English text with preferred synsets.
 
Any of them could involve a struggle to get it working on Windows, but I have a few possibilities there.
 
Thanks again,
Ciarán Ó Duibhín.




 
_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Anne Dister | 1 Jun 2009 10:13
Picon
Favicon

Call for papers : JADT 2010

Call for Papers 

JADT 2010
10th International Conference on the Statistical Analysis of Textual Data
University of Rome (Italy)
June, 9-11, 2010

http://jadt2010.uniroma1.it/


Conference Topics

The themes of interest of the conference concern the application of statistical models and tools in the following domains:

Textometry, Statistical Analysis of Textual Data
Exploratory Textual Data Analysis
Corpus and Quantitative Linguistics
Natural Language Processing
Text Corpora Encoding
Statistical Analysis of Unstructured and Structured Data
Text Categorisation, Fuzzy Classification and Visualization
Information Retrieval and Information Extraction
Text Mining, Web Mining, Semantic Web
Stylometry, Discourse analysis
Software for Textual Data Analysis
Machine Learning for Textual Data Analysis
Multilingual and parallel corpora


Languages for the papers and the presentations

Submissions, communications and presentations can be made in any one of the following
languages: English, French, Italian, Spanish.


Important Dates

Title and Abstract (max 20 rows) : 31 July 2009
Submission Deadline: First Version of Paper (Full-text) : 20 October 2009
Notification to Authors : 10 December 2009
Final Camera-Ready Paper Delivery : 10 January 2010
Conference Dates : 9-11 June 2010


Submission

JADT 2010 uses the EasyChair system to manage submissions, so, if you don't already have, you
need to create an account on EasyChair to submit a paper to the conference (see below).
Submissions should be limited to original, evaluated work. All papers should include background
survey and/or reference to previous work. The authors should provide explicit explanation when
there is no evaluation in their work. We encourage the authors to include in their papers proposals
and discussions of the relevance of their work to the themes of the conference.


Title and pre-submission (not mandatory)

Participants interested in presenting a paper for either an oral or poster presentation, should submit
title and abstract (maximum 20 rows) by July 31, 2009. Abstracts may be submitted online by
accessing the conference website. The online abstract submission will open on June 20, 2009. All
fields on the online abstract submission form must be completed (author name(s), affiliation(s), email
address, title of abstract, preferred format: oral/poster, keywords, topic area). Not mandatory.
The abstract will not be peer reviewed.

To submit the abstract you must activate an EasyChair account:
https://www.easychair.org/login.cgi?conf=jadt2010.


Papers

The Authors should submit the full paper by October 20, 2009. The papers will be peer reviewed,
see the Author guidelines for conference standards. The full text of the communication should be
maximum 8 pages recommended for posters, 12 pages for oral presentations, including references.
Use the Easychair username and password to submit the paper. In the Easychair menu select: "My
Submission", then from the right top menu select "Submit a New Version", now include the DOC
file in the "File window" and the PDF file in the Attachment window. To finish click "Submit a new
version".
Authors will be notified of the Scientific Committee's decisions on the acceptance of their proposal
and on the presentation format (oral communication or poster) by December 10, 2009.


Final Papers

The Authors are invited to submit the final version of paper (8/12 pages) of their contribution by
January 10, 2010, using the same paper’s submission procedure. The final paper will be peerreviewed.
Contacts

For further information or enquiries write to jadt2010 <at> uniroma1.it


---Anne Dister

Université de Louvain
Facultés universitaires Saint-Louis (Bruxelles)
_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Tony Berber Sardinha | 1 Jun 2009 05:42
Favicon

Re: Tagging with synsets?

Dear all

Thanks for the thread and for this summary.

I installed freeling, which looks great, but its tags are numerical  
sense codes, for which I can't find an explanation. For example, in  
the sentence 'General Motors is beginning its reinvention', the verb  
'beginning' is tagged as 00239960. I'd like to know what this tag  
means. Thanks ahead for any pointers.

bye

tony

On May 31, 2009, at 9:44 PM, Ciarán Ó Duibhín wrote:

> Thank you for all replies on this, which I summarize:
> • FreeLing  http://garraf.epsevg.upc.es/freeling/  (Francis Tyers)  
> (compiled application)
> • SenseRelate http://senserelate.sourceforge.net; web interface 
http://marimba.d.umn.edu/allwords/allwords.html 
>   (Ted Pedersen) (Perl)
> • UKB http://ixa2.si.ehu.es/ukb (Eneko Agirre)
> • SenseLearner http://lit.csci.unt.edu/~senselearner/ incl web  
> interface (Rada Mihalcea) (Perl)
> • Graph-based WSD http://lit.csci.unt.edu/index.php/ 
> Downloads#GWSD:_Unsupervised_Graph-based_Word_Sense_Disambiguation  
> (Rada Mihalcea) (Perl)
> • NLTK http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html ;
http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html 
>  (Claire Brierley) (Python)
>
> I'm pleasantly surprised to find that some of these include  
> implementations of algorithms to tag running English text with  
> preferred synsets.
>
> Any of them could involve a struggle to get it working on Windows,  
> but I have a few possibilities there.
>
> Thanks again,
> Ciarán Ó Duibhín.
>
>
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora <at> uib.no
> http://mailman.uib.no/listinfo/corpora

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

vrentoumi | 1 Jun 2009 11:36
Picon
Favicon

Corpus in English tagged with opinion or sentiment

-------------------------------------------------------------------

Hello, the lab I am working for is looking to buy a corpus in English that
has been annotated with opinion or sentiment. Can someone please provide
relevant information?

Thank you in advance,

Vassiliki Rentoumi
Phd Student
NCSR Demokritos,
IIT (Institute of Informatics and Telecommunications)
SKEL (Software and Knowledge Laboratory)
Athens
Greece

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

CRuehlemann | 1 Jun 2009 13:31
Picon
Favicon

Re: Corpus in English tagged with opinion or sentiment

 
Hi Vassiliki,
 
you might check out the MPQA Corpus freely available at:
 
 
described in:
 

Wiebe, Janyce, Theresa Wilson, and Claire Cardie. ‘Annotating expressions of opinions and emotions in language’. Language Resources and Evaluation 39(2-3): 165-210 (available at:http://nrrc.mitre.org/NRRC/publications.htm)

 

Hope this helps

 

Chris

 

------------------------------------------------------------------

Dr. Christoph Rühlemann, Munich

 

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Kiril Simov | 1 Jun 2009 14:04

Third CFPs: Workshop Adaptation of LanguageResources and Technology to New Domains

Adaptation of Language Resources and Technology to New Domains
(AdaptLRTtoND)
http://www.bultreebank.org/AdaptLRTtoND/

RANLP 2009 Workshop
http://www.lml.bas.bg/ranlp2009/

Motivation

It is widely acknowledged that despite the great advances in
Computational Linguistics nowadays, the creation of new
Language Resources (LR) and Language Technology (LT) for a
new domain or task is still quite expensive and
time-consuming. At the same time there are already a lot of
varieties of LR and LT, developed for various languages and
purposes. What happens when new tasks come? Do we have to
develop new resources and technology from the beginning, or
can we re-use or adapt the existent ones? Last, but not
least alternative is to combine both strategies depending on
the task. The first option seems reasonable when richer and
larger data is needed for the new applications. The second
option is justified only if such a resource or technology
does not exist at all, or some new approach is applied. The
third one is the ever ‘compromising’, but also very
realistic option.
As the machine learning techniques have matured enough to
successfully support real applications within various
domains, a new bottleneck became the requirement for large
and adequate training data for input. Thus, the NLP
community faced the question of the relevant LR and LT
adaptation. It concerns the operability between general
domain NLP toolkits and specific domain tasks with respect
to terminology, language, structure, steps of preprocessing
etc.
Thus, the Workshop is devoted to various methods for
transferring the linguistic knowledge and supportive
technology from the existing language resources in one
domain into a different one.

Topics

- parameters of adaptivity and re-usability of LR and LT
- methods for adaptation of existing NLP resources to specific tasks
- domain specific requirements to the LR and LT
- general domain vs. specific domain processing
- profiling LR
- extrapolation of richer annotations to large data
- evaluation of adapted LR and LT

Organizers

Núria Bel, Pompeu Fabra University
Erhard Hinrichs, Tuebingen University (co-chair)
Petya Osenova, Bulgarian Academy of Sciences and Sofia University
Kiril Simov, Bulgarian Academy of Sciences (co-chair)

Invited speaker

Jun'ichi Tsujii, University of Tokyo and University of Manchester - NacTeM

Submission details

Authors are invited to submit an extended abstract up to 800
words. Abstracts should describe existing research connected
to the topics of the workshop. The following formats are
accepted: PDF, PS, MS Word, ASCII text. Each submission
should provide the following information: title; author(s);
affiliation(s); and contact author's e-mail address, postal
address.

The abstracts should be sent electronically to:
Petya Osenova
Email: petya <at> bultreebank.org
by the deadline listed below. The submissions will be
reviewed by the workshop's programme committee.

The accepted papers will appear in the workshop proceedings.
The final paper should not exceed 15 A4 pages formatted
according RANLP09 guidelines
(http://www.lml.bas.bg/ranlp2009/).

Important Dates

Deadline for abstract submission:   7th June 2009
Notification of acceptance              7th July 2009
Final version of the papers              23rd August 2004

Program Committee

Núria Bel, Pompeu Fabra University
Gosse Bouma, Groningen University
António Branco, Lisbon University
Walter Daelemans, Antwerp University
Markus Dickinson, Indiana University
Erhard Hinrichs, Tuebingen University
Josef van Genabith, Dublin City University
Iryna Gurevych, Technische Universität Darmstadt - UKP Lab
Atanas Kiryakov, Ontotext OOD
Vladislav Kubon, Charles University
Sandra Kuebler, Indiana University
Lothar Lemnitzer, DWDS, Berlin-Brandenburgische Akademie der Wissenschaften
Bernardo Magnini, FBK
Detmar Meurers, Tuebingen University
Paola Monachesi, Utrecht University
Preslav Nakov, National University of Singapore
John Nerbonne, Groningen University
Petya Osenova, Bulgarian Academy of Sciences and Sofia University
Gabor Proszeky, MophoLogic
Adam Przepiorkowski, Polish Academy of Sciences
Marta Sabou, Open University - UK
Kiril Simov, Bulgarian Academy of Sciences
Cristina Vertan, Hamburg University 

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Dr.Xu Ruifeng | 1 Jun 2009 16:10
Picon
Favicon

Opinion Corpus in English and Chinese

Dear Corpora-lister,

  I am upgrating a Chinese opinion corpus. I know following opinoin corpus in English and Chinese are available.

  1. MPQA  English News
  Wiebe, Janyce, Theresa Wilson, and Claire Cardie. ‘Annotating expressions of opinions and emotions in
language’. Language Resources and Evaluation 39(2-3): 165-210 

  2. NTU opinion corpus: Chinese news 
      Construction of an Evaluation Corpus for Opinion Extraction 
      Lun-Wei Ku, Tung-Ho Wu, Li-Ying Lee and Hsin-Hsi Chen, Proceedings of NTCIR-5 Workshop Meeting 

    Later in NTCIR-6, corpus on Simplified Chinese news are provided.

  3. CUHK opinin corpus : Chinese product review
     Ruifeng Xu, Yunqing Xia, Kam-Fai Wong and Wenjie Li
    Opinion Annotation in On-line Chinese Product Reviews
  Proceedings of the Sixth International Language Resources and Evaluation (LREC'08 

  I want to know other available opinion corpus on English and Chinese, especially on product review.
 
  Thanks.

Dr. Ruifeng Xu,
Dept. of Chinese, Translation and Linguistics,
City University of Hong Kong
_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Taras Zagibalov | 1 Jun 2009 21:56
Picon
Favicon

Re: Opinion Corpus in English and Chinese

Dear Dr. Ruifeng Xu,

For my research I designed a review corpus (for text-based sentiment
classification). UTF-8, simplified Chinese.
You can find two versions of it on
http://www.informatics.sussex.ac.uk/users/tz21/

it168test - the corpus consists of 2317 reviews of mobile phones of
which 1158 are negative and 1159 are positive (more info in a
readmr.txt file inside).

coling08 - the corpus consists of 10 subcorpora containing a total of
7982 reviews, distributed between 10 product types (more info in a
readmr.txt file inside).

Hope you can find them useful.

Best regards,
Taras Zagibalov

2009/6/1 Dr.Xu Ruifeng <ruifeng.xu <at> cityu.edu.hk>:
> Dear Corpora-lister,
>
>  I am upgrating a Chinese opinion corpus. I know following opinoin corpus in English and Chinese are available.
>
>  1. MPQA  English News
>  Wiebe, Janyce, Theresa Wilson, and Claire Cardie. ‘Annotating expressions of opinions and
emotions in language’. Language Resources and Evaluation 39(2-3): 165-210
>
>  2. NTU opinion corpus: Chinese news
>      Construction of an Evaluation Corpus for Opinion Extraction
>      Lun-Wei Ku, Tung-Ho Wu, Li-Ying Lee and Hsin-Hsi Chen, Proceedings of NTCIR-5 Workshop Meeting
>
>    Later in NTCIR-6, corpus on Simplified Chinese news are provided.
>
>  3. CUHK opinin corpus : Chinese product review
>     Ruifeng Xu, Yunqing Xia, Kam-Fai Wong and Wenjie Li
>    Opinion Annotation in On-line Chinese Product Reviews
>  Proceedings of the Sixth International Language Resources and Evaluation (LREC'08
>
>  I want to know other available opinion corpus on English and Chinese, especially on product review.
>
>  Thanks.
>
> Dr. Ruifeng Xu,
> Dept. of Chinese, Translation and Linguistics,
> City University of Hong Kong
> _______________________________________________
> Corpora mailing list
> Corpora <at> uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Yorick Wilks | 1 Jun 2009 21:58
Picon
Favicon

IE practice: Searching for names in emails

Does anyone recall research  on the detection/annotation of proper  
names in emails? This has been done in Information Extraction on prose  
texts since the early 1990s but I see someone has patented any search  
for (proper) names in email text, which seems absurd in 2007. It seems  
to me a pubic duty to contest this kind of patenting of the obvious  
(and the consequent  restraints on research) and Id be glad to be  
reminded of clear cases of pre-2007 prior art on this.
Yorick Wilks]

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Rich Cooper | 1 Jun 2009 22:59

Re: IE practice: Searching for names in emails

Yorick Wilks wrote:

Does anyone recall research  on the detection/annotation of proper  
names in emails? This has been done in Information Extraction on prose  
texts since the early 1990s but I see someone has patented any search  
for (proper) names in email text, which seems absurd in 2007. It seems  
to me a pubic duty to contest this kind of patenting of the obvious  
(and the consequent  restraints on research) and Id be glad to be  
reminded of clear cases of pre-2007 prior art on this.
Yorick Wilks]
_______________________________________________

Are you referring to the Brill patent assigned to Microsoft?  A patent or
application number would be useful.  

-Rich

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora


Gmane