Thierry Fontenelle | 1 May 03:15 2008
Picon

Re: Distance & word context.

Hi Justin,

You might be interested in the following paper:

Grefenstette, G. (1996) "Evaluation Techniques for Automatic Semantic Extraction: Comparing
Syntactic and Window Based Approaches", in Boguraev and Pustejovsky (eds) Corpus Processing for
Lexical Acquisition, The MIT Press, 205-216.

It's not "new" any more, but it seems to correspond to the opposition between distance and roles within
grammatical constructs you are alluding to.

I hope it helps,

Thierry

Thierry Fontenelle
Microsoft Natural Language Group
Redmond, WA
thierryf <at> microsoft.com

-----Original Message-----
From: corpora-bounces <at> uib.no [mailto:corpora-bounces <at> uib.no] On Behalf Of J Washtell
Sent: Wednesday, April 30, 2008 2:50 PM
To: corpora <at> uib.no
Subject: [Corpora-List] Distance & word context.

Hello all,

This list is stimulating as always. I feel it is my turn to throw some
questions around :-)
(Continue reading)

Krishnamurthy, Ramesh | 1 May 12:31 2008
Picon
Picon

Re: Distance & word context.

I don't have a reference, but I think I remember Ken Church (?) mentioning 'long-range collocations, up to
c. 10k words apart' in c. 1992?

Ramesh Krishnamurthy
Lecturer in English Studies, School of Languages and Social Sciences,
Aston University, Birmingham B4 7ET, UK
Tel: +44 (0)121-204-3812 ; Fax: +44 (0)121-204-3766 [Room NX08, 10th
Floor, North Wing of Main Building]
http://www.aston.ac.uk/lss/school/staff/krishnamurthyr.jsp
Director, ACORN (Aston Corpus Network project): http://acorn.aston.ac.uk/

-----Original Message-----
From: corpora-bounces <at> uib.no [mailto:corpora-bounces <at> uib.no] On Behalf Of J Washtell
Sent: 30 April 2008 22:50
To: corpora <at> uib.no
Subject: [Corpora-List] Distance & word context.

Hello all,

This list is stimulating as always. I feel it is my turn to throw some
questions around :-)

Can anybody point me towards works (however old or new) that exploit
the distance between terms in a corpus (such as, but not restricted
to, the use of "distance-weighted" context windows). The specific
applications are not important; I am interested in any works that deal
with the concept of distance as opposed to (or in addition to) say
frequency counts or roles/positions within grammatical constructs.

Related to this, I am also interested in any work that courts the
(Continue reading)

David Reitter | 1 May 14:45 2008
Picon
Picon

Re: Distance & word context.

On 1 May 2008, at 11:31, Krishnamurthy, Ramesh wrote:

> I don't have a reference, but I think I remember Ken Church (?)  
> mentioning 'long-range collocations, up to c. 10k words apart' in c.  
> 1992?

I know of this paper:

 <at> inproceedings{church2000noriegas,
	Address = {Saarbr{\"u}cken, Germany},
	Author = {Kenneth W. Church},
	Booktitle = {Proceedings of the 18th conference on Computational  
linguistics (COLING)},
	Pages = {180-186},
	Title = {Empirial Estimates of Adaptation: The chance of Two Noriegas  
is closer to $p/2$ than $p^2$},
	Year = {2000}}

J Washtell wrote:

> Can anybody point me towards works (however old or new) that exploit
> the distance between terms in a corpus (such as, but not restricted
> to, the use of "distance-weighted" context windows). The specific
> applications are not important; I am interested in any works that deal
> with the concept of distance as opposed to (or in addition to) say
> frequency counts or roles/positions within grammatical constructs.

We've been exploiting the distance between repeated syntactic  
constructions (not terms) in our work on structural (syntactic)  
priming.  For a more applied paper, see
(Continue reading)

Seth Grimes | 1 May 16:39 2008

Sentence diagramming software

Does anyone know of free or on-line automatic sentence-diagramming 
software?  I'm looking for software that would parse text, identify parts 
of speech, and create the kind of *graphical* diagrams that children learn 
to make in grammar school.  I'm looking for something that goes beyond 
drawing software that you can use to diagram manually, and again stress 
the graphical part.

FYI, there's a pretty nice, on-line, non-graphical diagrammer at 
http://1aiway.com/nlp4net/services/enparser/Default.aspx .

Thanks,

 				Seth

--
Seth Grimes   Alta Plana Corp, analytical computing & data management
               Intelligent Enterprise magazine (CMP), Contributing Editor
grimes <at> altaplana.com       http://altaplana.com    301-270-0795

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

John F. Sowa | 1 May 18:06 2008
Picon

Re: Distance & word context.

JW> I am interested in any works that deal with the concept of
 > distance as opposed to (or in addition to) say frequency counts
 > or roles/positions within grammatical constructs.

There are many different notions of 'distance'.  One example
is the term "semantic distance", for which Google provides
26,700 hits.

 > does anybody know of any (perhaps more linguistically oriented)
 > works that discuss the existence/importance of *very* long range
 > dependencies and associations in text (e.g. Dear... Yours,
 > Results... Conclusion, etc), and the role these play when
 > considering word context.

Some such connections can be defined by context-free rules:

    Letter  ->  Salutation  Body  ComplimentaryClose

The two terms "salutation" and "complimentary close" give
16,400 hits on Google.

John Sowa

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Noemie Elhadad | 1 May 21:05 2008

Postdoctoral Position in Natural Language Processing – Dept. of Biomedical Informatics, Columbia University

Postdoctoral Position
Natural Language Processing Group
Department of Biomedical Informatics
Columbia University

A 1- to 2-year postdoctoral position is available starting Summer 2008
in the Natural Language Processing Group in the Biomedical Informatics
Department at Columbia University.

The project investigates the automatic processing of health news.
Experience with techniques in statistical natural language processing
and machine learning and fluent programming abilities are highly
desirable. Salary and benefits commensurate with experience.

Applicants must have a PhD in computer science, biomedical informatics
or information science in the past two years; solid experience with
machine learning and/or statistical natural language processing;
strong programming skills; at least two first-author papers in English
in previous area of research; and good verbal and written
communication skills.

To apply, please send your CV (including publication record and the
names and contact information of 3 references), a brief (1-2 pages)
description of your thesis work and related research interests, and
your two best publications to:

Noémie Elhadad - noemie <at> dbmi.columbia.edu
Assistant Professor
Department of Biomedical Informatics
Columbia University
(Continue reading)

J Washtell | 1 May 21:22 2008
Picon

Re: Distance & word context.

Hello,

Thank you to everybody who has contributed to my search. So far it's  
helped me un-earth a small pile of extra relevant material (including  
some I had accidentally buried :-)).

Kudos!

Justin Washtell
University of Leeds

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

J Washtell | 1 May 21:24 2008
Picon

Fwd: Re: Distance & word context.

John,

Thank you very much for your feedback.

I was referring to what one might call the linear "physical distance"  
or "narrative distance" in corpora (as would correlate with the time  
between terms occurring as a reader reads, if you like). Hence my  
citing "distance-weighted context windows" as an example of one way in  
which this is considered (also referred to as "ramped" windows etc).  
As there doesn't seem to be a consistent established terminology - at  
least none that I'm familiar with - Google is unsatisfactory by  
itself. I'm definitely not asking about semantic distance.

What I ceratinly didn't make clear is that I'm particularly interested  
in approaches that have been used to *mine* these relationships from  
corpora, as well as general linguistic discussions concerning their  
existence, rather than formal ways in which they can be expressed (as  
per context-free grammar) - although I am certainly interested in the  
models that have made such mining possible.

Do you have any more insights which I might find useful in light of  
this? Perhaps something that you might expect to have fewer hits  
(owing to our now hopefully increased precision)?

Many thanks!

Justin Washtell
University of Leeds

Quoting "John F. Sowa" <sowa <at> bestweb.net>:
(Continue reading)

Paula Newman | 1 May 22:29 2008
Picon
Picon

Re: Fwd: Re: Distance & word context.


Justin,
Particularly with regard to 

> does anybody know of any (perhaps more linguistically oriented)
> works that discuss the existence/importance of *very* long range
>  dependencies and associations in text (e.g. Dear... Yours,

For a related case of analyzing email messages into components,
both 
(a) the EMU work on text-to-speech (e.g., Richard Sproat, Jianying Hu, Hao
Chen, "EMU: An E-mail Preprocessor for Text-to-Speech," IEEE Signal
Processing Society 1998 Workshop on Multimedia Signal Processing, Los
Angeles, CA)
.and 
(b)  my work for presenting and summarizing email-based discussion lists
(Newman, P. S. Exploring discussion lists: steps and directions.
Proceedings of the Second Joint ACM/IEEE-CS Conference on Digital Libraries
(JCDL 02))

are relevant.  The approaches used are similar, employing manually weighted
finite-state machines to find the best decompositions.

Paula

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

(Continue reading)

Iztok Kosem | 2 May 08:48 2008
Picon
Picon

REMINDER - Aston Corpus Symposium



We would like to remind you about the following two events in May 2008. There are still spaces left, however we would advise you to register as soon as possible to avoid disappointment.


Aston University, Birmingham, UK
School of Languages and Social Sciences
ISLS - Institute for the Study of Language and Society

ASTON CORPUS SYMPOSIUM 2008: Friday 23rd May 2008
http://acorn.aston.ac.uk/symposium.html

Speakers:
Svenja Adolphs (University of Nottingham)
Khurshid Ahmad (Trinity College, Dublin)
Kate Beeching (University of the West of England)
Silvia Bernardini (University of Bologna)
Ramesh Krishnamurthy (Aston University)
Kieran O'Halloran (Open University)
Tony McEnery (Lancaster University)
Paul Thompson (University of Reading)
Chris Tribble (King's College, London)

preceded by

POSTGRADUATE CONFERENCE IN CORPUS LINGUISTICS: Thursday 22nd May 2008 http://acorn.aston.ac.uk/postgraduate_conference.html


Please find below the registration fees:

Standard rate:
Corpus Symposium - £50
Postgraduate conference - £20
Both events - £60

Unwaged students:
Corpus Symposium - £25
Postgraduate conference - £15
Both events - £30

The registration form can be found at http://acorn.aston.ac.uk/sym_registration.html.


Conference organisers:
Iztok Kosem (kosemi <at> aston.ac.uk)
Guadalupe Ruiz Yepes (g.r.yepes <at> aston.ac.uk)


_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Gmane