Xinglong Wang | 1 Feb 10:38 2012

Job Opening: NLP Developer Job at Brandwatch

Brandwatch is looking for an NLP developer.

Brandwatch is one of the world’s leading social media monitoring 
applications. We help customers from all across the globe discover, 
understand and respond to comments made about them on the web in real time.

We have built a team of the brightest developers and built our own web 
crawler, text analysis engine and web-based user interface. Continuing 
our rapid expansion we are looking for an NLP developer to join our 
Analysis team. Working closely with the rest of the team and other 
software developers, you will play an active role throughout the whole 
development process.

Your duties will include:

1. Design and implement accurate and efficient software applications for 
sentiment analysis and topic extraction
2. Analyse clients’ requirements and provide innovative solutions to 
extract and organise information from textual data
3. Maintain and enhance the performance of our existing NLP systems

You should have:

1. PhD (or equivalent R&D experience) in Computational Linguistics or 
Natural Language Processing
2. Good understanding of statistical approaches and machine learning
3. Good Java programming skills
4. Familiarity with Linux
5. Excellent communication skills in English

(Continue reading)

Motaz SAAD | 1 Feb 12:23 2012
Picon
Picon

Re: released arabic corpus

Hello Ahmed,

There are another corpora also http://sites.google.com/site/motazsite/Home/osac

Best,
Motaz

From: "Safi Ahmed" <ahmd_sfi <at> yahoo.fr>
To: corpora <at> uib.no
Sent: Saturday, January 28, 2012 4:40:20 PM
Subject: [Corpora-List] released arabic corpus

Hi,

I found out an open source arabic corpus, I think it is the most voluminous, particularly for those working on text categorization. This is the link: http://sites.google.com/site/mouradabbas9/corpora

Regards

Ahmed

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Thierry Hamon | 1 Feb 12:42 2012
Picon

Final Call: EACL 2012 Workshop on Innovative hybrid approaches to the processing of textual data (extended deadline: Feb 5, 2012)


Call for papers

Innovative hybrid approaches to the processing of textual data
April 22, 2012, Avignon, France

Website:
http://www-limbio.smbh.univ-paris13.fr/membres/hamon/hybrid/

** Extended deadline: Feb 05, 2012 **

Submission Deadline: Jan 27, 2012

The hybrid approach term covers a large set of situations in which
different approaches are combined in order to better process textual
data and to attempt a better achievement of the dedicated task.

Among the hybridizations the possible combinations are unlimited. The
most frequent combination, as stressed during The Balancing Act in
1994, addressed machine learning and rule-based systems. Beyond this,
the hybridization can be augmented with distributionnal approaches,
syntactic and morphological analyses, semantic distances and
similarities, graph theory models, cooccurrences of linguistic units
(e.g., word and their dependencies, word senses and pos-tag, NEs and
semantic roles,...), knowledge-based approaches (terminologies and
ontologies), etc.

As a matter of fact, the hybridization implies to define a strategy to
efficiently combine several approaches: cooperation between
approaches, filtering, voting or ranking of the multiple system
outputs, etc.

Indeed, the combination of these different methods and approaches
appears to provide more complete and performant results. The reason is
that each method is sensitive and efficient with given data and within
given contexts. Hence, their combination may improve both precision
and recall. The coverage is indeed improved, while the exploitation of
different methods may also lead to the improvement of the precision
since their use within filtering, voting etc. modes becomes possible.

In this workshop, we favour the extended meaning of the hybridization
of methods, applied to various application areas, such as (but do not
feel constrained by these):

- automatic creation of linguistic resources
- POS tagging
- building and structuring of terminologies
- information retrieval and filtering
- information extraction
- linguistic annotation
- semantic labeling
- sign language recognition and transcription
- oral data transcription
- filtering and validation of lexical resources
- text summarization
- question/answering system
- natural language generation
- etc.

We invite authors to submit novel methods and novel conceptions of the
hybridization performed in various areas related to the textual data
processing.

Important dates

Nov 25, 2011: 1st workshop CFP
Jan 04, 2012: Abstract deadline (optional)
Feb 05, 2012: Paper due date (Extended deadline)
Feb 29, 2012: Notification of acceptance
Mar 09, 2012: Camera-ready deadline
Apr 22, 2012: Workshop

Submission instructions:

Authors are invited to submit full papers on original, unpublished work
in the topic area of this workshop.  Submissions should be formatted
using the EACL 2012 stylefiles for latex or MS Word, with blind review
and not exceeding 8 pages plus an extra page for references.
The PDF files will be submitted electronically at
https://www.softconf.com/eacl2012/Hybrid2012/

Program Committee

Delphine Bernhard, LiLPa, Université de Strasbourg, France
Philipp Cimiano, CITEC, University of Bielefeld, Germany
Vincent Claveau, IRISA-CNRS, Rennes, France
Kevin Cohen, University of Colorado Health Sciences Center, USA
Marie-Claude l'Homme, OLST, Université de Montreal, Canada
Béatrice Daille, Université de Nantes, LINA, France
Stefan Th. Gries, University of California, Santa Barbara, USA
Anna Kazantseva, University of Ottawa, Canada
Alistair Kennedy, University of Ottawa, Canada
Ben Leong, University of North Texas, USA
Bruno Pouliquen, WIPO, Geneva, Switzerland
Sampo Pyysalo, National Centre for Text Mining, University of Manchester, United Kingdom
Mathieu Roche, LIRMM, Université de Montpellier 2, France
Patrick Ruch, Haute école de gestion de Genève, Switzerland
Paul Thompson, National Centre for Text Mining, University of Manchester, United Kingdom
Özlem Uzuner, University at Albany, State University of New York, USA

Organization Committee

Natalia Grabar, CNRS UMR 8163 STL, Université Lille 1&3, France
Marie Dupuch, MOSTRARE/LIFL & CNRS UMR 8163 STL, Université Lille 1&3, France
Amandine Périnet, LIM&BIO, Université Paris 13, France
Thierry Hamon, LIM&BIO, Université Paris 13, France

--

-- 
Thierry Hamon                      E-mail : thierry.hamon <at> univ-paris13.fr
Laboratoire d'Informatique Médicale et Bioinformatique - LIM&BIO (EA3969)
UFR SMBH Léonard de Vinci et Institut Galilée
Université Paris 13                                Tel: +33 1 49 40 35 53
74, rue Marcel Cachin                              Tel: +33 1 48 38 73 07
93017 Bobigny Cedex France                         Fax: +33 1 48 38 73 55
URL: http://www-limbio.smbh.univ-paris13.fr/membres/hamon

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Valeria Quochi | 1 Feb 14:22 2012
Picon

CFP: LREC 2012 Language Resource Merging Workshop

2nd Call for Papers

LREC 2012 Workshop on: Language Resource Merging

http://panacea-lr.eu/en/news/project/2011/12/19/lrec-2012-merging-lr-workshop/

Date: 22 May 2012 &#150; Afternoon Session

Location: Istanbul, Turkey

**** Deadline for paper submission: 15 February 2012 *****

CONTEXT

The availability of adequate language resources has been a well-known  
bottleneck for most high-level language technology applications, e.g.  
Machine Translation, parsing, and Information Extraction, for at least  
15 years , and the impact of the bottleneck is becoming all the more  
apparent with the availability of higher computational power and  
massive storage, since modern language technologies are capable of  
using far more resources than the community produces. The present  
landscape is characterized by the existence of numerous scattered  
resources, many of which have differing levels of coverage, types of  
information and granularity. Taken singularly, existing resources do  
not have sufficient coverage, quality or richness for robust  
large-scale applications, and yet they contain valuable information  
(Monachini et al. 2004 and 2006; Soria et al. 2006; Molinero, Sagot  
and Nicolas 2009; Necsulescu et al. 2011). Differing technology or  
application requirements, ignorance of the existence of certain  
resources, and difficulties in accessing and using them, has led to  
the proliferation of multiple, unconnected resources that, if merged,  
could constitute a much richer repository of information augmenting  
either coverage or granularity, or both, and consequently multiplying  
the number of potential language technology applications. Merging,  
combining and/or compiling larger resources from existing ones thus  
appears to be a promising direction to take.

The re-use and merging of existing resources is not altogether  
unknown. For example, WordNet (Fellbaum, 1998) has been successfully  
reused in a variety of applications. But this is the exception rather  
than the rule; in fact, merging, and enhancing existing resources is  
uncommon, probably because it is by no means a trivial task given the  
profound differences in formats, formalisms, metadata, and linguistic  
assumptions.

The language resource landscape is on the brink of a large change,  
however. With the proliferation of accessible metadata catalogues, and  
resource repositories (such as the new META-SHARE  
(http://www.meta-net.eu/meta-share) infrastructure), a potentially  
large number of existing resources will be more easily located,  
accessed and downloaded. Also, with the advent of distributed  
platforms for the automatic production of language resources, such as  
PANACEA (http://www.panacea-lr.eu/), new language resources and  
linguistic information capable of being integrated into those  
resources will be produced more easily and at a lower cost. Thus, it  
is likely that researchers and application developers will seek out  
resources already available before developing new, costly ones, and  
will require methods for merging/combining various resources and  
adapting them to their specific needs.

Up to the present day, most resource merging has been done manually,  
with only a small number of attempts reported in the literature  
towards (semi-)automatic merging of resources (Crouch & King 2005;  
Pustejovsky et al. 2005; Molinero, Sagot and Nicolas 2009; Necsulescu  
et al. 2011). In order to take a further step  towards the scenario  
depicted above, in which resource merging and enhancing is a reliable  
and accessible first step for researchers and application developers,  
experience and best practices must be shared and discussed, as this  
will help the whole community avoid any waste of time and resources.

AIMS OF THE WORKSHOP

This half-day workshop is meant to be part of a series of meetings  
constituting an ongoing forum for sharing and evaluating the results  
of different methods and systems for the automatic production of  
language resources (the first one was the LREC 2010 Workshop on  
Methods for the Automatic Production of Language Resources and their  
Evaluation Methods). The main focus of this workshop is on  
(semi-)automatic means of merging language resources, such as  
lexicons, corpora and grammars. Merging makes it possible to re-use,  
adapt, and enhance existing resources, alongside new, automatically  
created ones, with the goal of reducing the manual intervention  
required in language resource production, and thus ultimately  
production costs.

WORKSHOP TOPICS

The topics of the workshop are related to best practices, methods,  
techniques and experimental results regarding the merging of various  
types of language resources, such as lexicons and corpora, especially  
in support of language technology applications. In particular, new  
methods for automatic merging with a view towards reducing human  
intervention will be most welcome.

Topics for submission include, but are not limited to:

-       Experiments on (semi-)automatic merging of automatically  
produced resources

-       Experiments on the merging of two or more existing resources  
containing the same or different levels of linguistic information

-       Studies or experiments on merging resources at different  
levels of granularity (corpora, lexicons, grammars)

-       Studies or experiments on unifying, mapping or converting  
encoding formats

-       Comparison between different resources and mapping algorithms  
to provide desired merging

-       Use of linguistic information from different sources in  
high-level language applications

-       Use of new, merged language resources in language technology  
applications

SUBMISSIONS

Interested participants must submit a preliminary paper of about 4-6  
pages including references (between 2000-2500 words). For the  
submission please use the online form on START LREC Conference Manager  
at: https://www.softconf.com/lrec2012/MergingLR2012/

When submitting a paper from the START page, authors will be asked to  
provide essential information about resources (in a broad sense, i.e.  
also technologies, standards, evaluation kits, etc.) that have been  
used for the work described in the paper or are a new result of your  
research.

For further information on this new initiative, please refer to  
http://www.lrec-conf.org/lrec2012/?LRE-Map-2012

Papers will be peer-reviewed by the workshop Program Committee.

IMPORTANT DATES

&#149;       Deadline for paper submission: 15 February 2012

&#149;       Notification of acceptance: 15 March 2012

&#149;       Submission of camera-ready version of papers: 31 March 2012

&#149;       Workshop date: 22 May 2012 &#150; Afternoon Session

CONTACT

lrec12_workshop_merging <at> ilc.cnr.it

ORGANIZING COMMITTEE

Núria Bel, UPF, Barcelona, Spain

Maria Gavrilidou, ILSP-&#147;Athena&#148;, Athens, Greece,

Monica Monachini, CNR-ILC, Pisa, Italy

Valeria Quochi, CNR-ILC, Pisa, Italy

Laura Rimell, University of Cambridge, UK

PROGRAMME COMMITTEE:

Victoria Arranz, ELDA, Paris, France

Paul Buitelaaar, National University of Ireland, Galway, Ireland

Nicoletta Calzolari, CNR-ILC, Pisa, Italy

Olivier Hamon, ELDA, Paris, France

Ale&#154; Horák, Masaryk University, Brno, Czech Republic

Nancy Ide, Vassar College, Mass. USA

Bernardo Magnini, FBK, Trento, Italy

Paola Monachesi, Utrecht University, Utrecht, The Netherlands

Jan Odijk, , Utrecht University, Utrecht, The Netherlands

Muntsa Padró, IULA, Barcellona, Spain

Karel Pala, Masaryk University, Brno, Czech Republic

Thierry Poibeau University of Cambridge, UK and CNRS, Paris, France

Benoît Sagot, INRIA, Paris, France

Kiril Simov, Bulgarian Academy of Sciences, Sofia, Bulgaria

Claudia Soria, CNR-ILC, Pisa, Italy

Maurizio Tesconi, CNR-IIT, Pisa

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Adam Kilgarriff | 1 Feb 15:47 2012

Two Lexicom courses, NZ (Feb) and Austria (Sept)

Lexicom Workshops in Lexicography and Lexical Computing

Final call for lexicom2012-NZ, in Auckland, New Zealand,  
in association with the first Asia Pacific Corpus Linguistics Conference 11-15 Feb 2012

First call for lexicom2012-europe, in Galtür, Austria, 24-28 Sept 2012 
 
Led by Adam Kilgarriff and Michael Rundell of the Lexicography MasterClass, these are intensive five-day workshops, with seminars on theoretical issues alternating with practical sessions at the computer.  There will be some parallel 'lexicographic' and 'computational' sessions.

Topics to be covered include:

*          corpus creation
*          corpus analysis:
           o        software and corpus querying
           o        discovering word senses, recording contextual information
*          preparing word sketches
*          writing entries for dictionaries and lexicons
*          dictionary databases and writing systems
*          using web data
*          the future of lexicography and lexical computing

Applications are invited from people with interests and experience in any of these areas. 

Over the last eleven years Lexicom workshops (in Europe, Asia and the Americas) have attracted over 300 participants from 35 countries, including  lexicographers, computational linguists, professors, research students, translators, terminologists, and editors, managers and technical support staff from dictionary publishers and information-management companies. 

To register, go to:

    http://nlp.fi.muni.cz/lexicom2012nz or 
    http://nlp.fi.muni.cz/lexicom2012eu 

Early registration for the Europe event is advised.  The workshop has been oversubscribed in previous years.

Further details, including draft programme and reports of past events can be found at: http://www.lexmasterclass.com 





--
========================================
Adam Kilgarriff                  adam <at> lexmasterclass.com                                             
Director                                    Lexical Computing Ltd                
Visiting Research Fellow                 University of Leeds     
Corpora for all with the Sketch Engine                 
                        DANTE: a lexical database for English                  
========================================

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Verena Henrich | 1 Feb 15:38 2012
Picon

Workshop CFP: Comp. Approaches to dialectal and typol. variation

2nd CALL FOR PAPERS:

Computational approaches to the study of dialectal and typological variation

Workshop organized as part of the European Summer School on Logic, 
Language and Information ESSLLI 2012 (http://www.esslli2012.pl), August 
6-10 2012 (ESSLLI first week), Opole, Poland

Workshop Organizers: Erhard Hinrichs (erhard.hinrichs <at> uni-tuebingen.de),
Gerhard Jäger (gerhard.jaeger <at> uni-tuebingen.de)

Workshop Purpose

Computational dialectometry is an innovative method to investigate 
language variation. This still rather young approach employs techniques 
from statistical NLP - such as pattern recognition, sequence alignment, 
clustering, and dimension reduction techniques - to study synchronous 
dialectal variation. It uses easy-to-operationalize data (such as 
phonetic transcriptions of a small core vocabulary) collected from a 
large number of speakers within a certain geographic area. Methods from 
unsupervised machine learning are then used to measure dialect distances 
and to model dialect continua. Together with advances in digitally 
collecting population and geographic data, it is now possible to study 
the correlation of linguistic variation with social and geographic factors.

Recent years have seen remarkable efforts in typology to set up 
electronic data inventories that contain significant data sets from 
large, typologically diverse and representative samples of languages. 
The data types thus collected in computational typology are remarkably 
similar - from an operational point of view - to the kind of resources 
that are being used in computational dialectometry. It is therefore a 
natural move to bring these two communities into contact and to discuss 
the mutual usability of algorithms and perhaps common standards for data 
encoding and exchange.

The goals of this workshop are twofold:
- to expose the ESSLLI community in general and researchers at the 
interface of language and computation in particular to the application 
of data-driven NLP methods to a rather new domain, and
- to provide a forum for practitioners and students of computational 
dialectometry, of quantitative typology, and of historical linguistics 
to learn about each other's research concerns and accompanying methods, 
and to receive feedback as well as inspiration for possible 
collaboration across sub-disciplines.

Submission Details

Authors are invited to submit an EXTENDED ABSTRACT for a 30-minute 
presentation (including discussion). Submissions should not exceed 3 
pages, including figures, data, and references. Details about the 
anonymous electronic submission procedure will be posted with the second 
Call for Papers. The submissions will be reviewed anonymously by the 
workshop's programme committee. The abstracts accepted for presentation 
will appear in the workshop web site and be published as part of the 
ESSLLI 2012 proceedings. In addition, we are considering the possibility 
of compiling a journal special issue from selected papers presented at 
the workshop.

Program Committee

Balthasar Bickel (Zürich University), Michael Cysouw (LMU München), 
Charlotte Gooskens (Groningen University), Erhard Hinrichs (Tübingen 
University; co-chair), Gerhard Jäger (Tübingen University; co-chair), 
Brian Joseph (The Ohio State University, Columbus, Ohio), John Nerbonne 
(Groningen University), Søren Wichmann (MPI for Evolutionary 
Anthropology, Leipzig)

Local Arrangements

All workshop participants, including the authors, are required to 
register for ESSLLI.

Important Dates

- February 15: Final Call for Papers
- March 1: *Deadline* for Submission
- April 15: Notification of Acceptance
- June 1: Deadline for Proceedings Papers
- August 6-10: Workshop

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Maarten Marx | 1 Feb 17:29 2012
Picon
Picon

Job Opening: Postdoc or senior scientific programmer

The ILPS group at the Informatics Institute of the University of Amsterdam (UvA) has an opening for a postdoc or senior scientific programmer in the ‘Namescape’ project.

We are looking for someone with a background in computational liguistics or AI, with an affinity for literature and the Dutch language.
The project is a collaboration with the Hygens institute for Dutch History and the Institute for Dutch Lexicology (INL) and financed by Clarin-NL. The research concerns the use of named entities within Dutch literature. The aim is to extend a manual pilot project on 20 novels to a fully automatic analysis of a corpus of almost 9000 digitized modern Dutch novels. The project involves named entity recognition, special classification (real vs fictional entities), entity deduplication and modelling co-occurrence of entities.

The project runs for 1 year. For more information, see the research proposal at http://ilps.science.uva.nl/PoliticalMashup/uploads/2012/01/proposal_namescape_abstract.pdf

Profile: Background in computational linguistics or AI. Experience with Linux, NLP and XML technology.
Level Postdoc or scientific programmer. Salary (depending on education and work experience) between €2.379 and €3.195 a month (before taxes).
Duration 1 year
Start March, April 2012
Information Maarten Marx (maartenmarx <at> uva.nl)
How to apply? Send your application with CV before February, 15 2012 to maartenmarx <at> uva.nl with subject “Namescape”.

***********************************************************************
       M       maartenmarx <at> uva.nl  http://www.science.uva.nl/~marx
  Maarten 
        r        Informatics Institute,   Universiteit van  Amsterdam
      xxx     Science Park 904, 1098 XH Amsterdam The Netherlands
      x  x     Phone: +31 20 525 2888         Mobile: 06 400 16 120
***********************************************************************



_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Javier Perez Guerra | 1 Feb 19:26 2012
Picon

International Workshop ELLIPSIS2012: call for papers

Date: 10-Nov-2012 - 10-Nov-2012
Location: Vigo, Spain
Contact Person: María Evelyn Gandón-Chapela
Meeting Email: ellipsis2012 <at> uvigo.es
Web Site: http://webs.uvigo.es/ellipsis2012

Linguistic Field(s): Discourse Analysis; Pragmatics;
Psycholinguistics; Semantics; Syntax

Call Deadline: 15-May-2012

Meeting Description:
International Workshop 'ELLIPSIS2012, Crosslinguistic, Formal,
Semantic, Discoursive and Processing Perspectives'
We are pleased to announce the International Workshop 'ELLIPSIS2012,
Crosslinguistic, Formal, Semantic, Discoursive and Processing
Perspectives' to be held at the University of Vigo (Spain) on 10
November 2012.
Over the past 40 years, ellipsis has centred the attention of many
scholars aiming at explaining the mismatch between meaning (the
intended message) and sound (what is actually uttered) that ellipsis
evinces in natural language communication. By using ellipsis and by
relying on the context and on the ability of our interlocutors to
decipher what has been omitted, one can avoid redundancy and
repetition. The 'mechanism' of ellipsis has thus become a central
issue of debate for researchers working on semantics, syntax,
pragmatics and psycholinguistics. This workshop aims at bringing
together researchers who are currently looking at ellipsis from
different points of view: formal, semantic, discoursive and
processing. The goal would be to discuss what should be explained by a
theory of ellipsis in light of the assumptions of specific frameworks.

The following speakers have kindly accepted our invitation to lecture
in this workshop:
Lobke Aelbrecht (University of Ghent)
Gerard Kempen (Leiden University)
Jason Merchant (University of Chicago)
Maribel Romero (University of Konstanz)

Workshop Organisation:
Ellipsis 2012 is organised by the Language Variation and Textual
Categorisation (LVTC) research group at the University of Vigo
(http://webs.uvigo.es/lvtc), in cooperation with the research network
'English Linguistics Circle (ELC)' (http://www.elc.org.es), a network
coordinated by Professor Teresa Fanego (University of Santiago de
Compostela), involving five research groups based at the Universities
of Santiago de Compostela and Vigo.

Call for Papers:
We would like to invite presentations concerned with any topic
involving ellipsis, including the following:
- Ellipsis types: Gapping, VP Ellipsis, Sluicing, Stripping,
Pseudogapping, British English do, Antecedent-Contained Deletion,
Comparative Ellipsis, Swiping, Spading, NP ellipsis
- Natural Language Processing of ellipsis
- Functions of ellipsis in discourse
- The structural representation of ellipsis types and their constituents
- The exploration of the implications of particular theoretical
frameworks for the structure of elided elements.
Proposals for 20-minute presentations must be submitted in MS Word or
RTF format as an email attachment to ellipsis2012 <at> uvigo.es before 15
May 2012. The email message should use the subject header
'Ellipsis2012 abstract'. Abstracts should be one page in length
(single-spaced), excluding references, and be written in standard
12-point font. The page should be headed only by the title of the
paper and must not include the presenter(s)'s name, affiliations or
address(es). The accompanying email should include:
(a) Title of the paper
(b) Name(s) of the author(s)
(c) Institutional affiliation(s)
(d) Email address(es)
Notification of acceptance will be sent out by 30 June 2012.

Publication:
Authors of papers accepted for presentation will be invited to submit
their paper for publication in a special journal issue or volume with
an international publisher. Papers will be subjected to refereeing.

Important Dates:
15 May 2012: Deadline for abstract submission
30 June 2012: Notification of acceptance or rejection
1 October 2012: (Re-)Submission of 1-page abstract for conference booklet
10 November 2012: Workshop in Vigo

Contact persons: Javier Pérez-Guerra (jperez <at> uvigo.es) and María
Evelyn Gandón-Chapela (evelyn.gandon <at> uvigo.es)

Workshop homepage: http://webs.uvigo.es/ellipsis2012

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Yuri Tambovtsev | 2 Feb 10:28 2012
Picon

Tambovtsev's Computational dialectometry

Dear Corpora colleagues , I have computed the sound chains of some dialects in the Slavonic, Turkic and Finno-Ugric languages. After comparing their frequency characteristics by the Chi-square criterion, I discovered the typological distances between these dialects. Many of them were too different to be considered dialects, so I proposed to call them separate languages. Be well, Yuri  Tambovtsev, Novosibirsk, Russia.
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Karen Fort | 2 Feb 11:25 2012
Picon

Re: Ambiguous words in English and their frequency

Hi all,

I could not find the time to precise my question and then received a lot 
of very interesting answers and references.
Thank you all for this!

In fact, I should have said that I'm looking for the number of ambiguous 
word tokens in terms of POS in an English corpus, for example from the 
Penn TreeBank. One solution would be to compute this myself from the 
Brown corpus, but I was curious if there was a ref. on this.

I found this ref for French that says 60% of the French tokens in their 
corpus were non ambiguous in terms of POS:
Tzoukermann, E.; Radev, D. R. & Gale, W. A. Ken Church, Susan Armstrong, 
P. I. E. T. & Yarowsky, D. (ed.) Natural Language Processing Using Very 
Large Corpora Tagging french without lexical probabilities -- combining 
linguistic knowledge and statistical learning Kluwer Academic, 1999

Of course, it all depends on the number of tags, their refinement et so 
on. It only gives a very rough idea and should be taken in its context, 
obviously. But that's all I need.

Best,

Karen

Le 26/01/2012 10:39, Eckhard Bick a écrit :
> Hello again,
>
> I forgot to add, that the ambiguous word tokens in my English test run
> amounted to 49.8%.
>
> Best,
> Eckhard
>
> On 2012-01-25 20:33, FORT, Karen wrote:
>> Hi all,
>>
>> I need to find this information (the proportion of ambiguous words in English and their frequency).
>> For example, we know that in French 8% of the words represent 30% of the ambiguity.
>> Of course, it's very rough, but it's only to have a rough idea.
>>
>> Can somebody help me with this (of course, I searched for a ref but could not find anything precise)?
>>
>> Thank you in advance,
>>
>> Regards,
>>
>>
>> Karën FORT
>> Ingénieure/Engineer et/and doctorante/PhD student
>> INIST-CNRS / LIPN
>> 2, allée de Brabois
>> 54500 Vandoeuvre-lès-Nancy
>> France
>> Bureau/Office: H112
>> +33 (0)3 83 50 46 36
>>
>> http://www-lipn.univ-paris13.fr/~fort/
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora <at> uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>

--

-- 
Karën FORT
Ingénieure/Engineer et/and doctorante/PhD student
INIST-CNRS / LIPN
2, allée de Brabois
54500 Vandoeuvre-lès-Nancy
France
Bureau/Office: H112
+33 (0)3 83 50 46 36

http://www-lipn.univ-paris13.fr/~fort/

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora


Gmane