DJamé Seddah | 1 Nov 2009 15:39
Picon
Favicon

Re: Open-source French syntactic parser???

Hi,
as there are many systems available for French, here's a synthesis (of  
the tools currently maintained and easily available that I'm aware of,  
please forgive me if I forgot some) :

----------------------------------------------
Wide coverage Symbolic systems
-----------------------------------------------
** TAG based
  FRMG (French Metagrammar) based parser. This one is based on Tree  
adjoining grammars, it comes with
a disambuigation tool which is very efficient.  Installer is here  http://alpage.inria.fr/alpi.fr.html 
     beware, it's a very big system and the parser doesn't run (yet)  
on 64 bits systems.

** LFG based
SxLFG . A very fast LFG parser, it comes from a grammar derived from a  
previous version of FRMG and  with a huge lexicon (http://alpage.inria.fr/~sagot/sxlfg.html 
), you can ask the author for a link to a svn page I think

** Interaction grammars
this one has a nice graphical interface and it makes use of a  
supertagger
http://leopar.loria.fr/

** Properties grammar
	this is one is from Marseilles (see www.lpl.univ-aix.fr/~blache/projets.html) 
  but the site is currently down so I don't know if it's downloadable  
(I think you might have to sign a licence).

(Continue reading)

Fukun Xing | 1 Nov 2009 13:47
Picon

About Part of Speech in English and Chinese

Hi everybody,
   I am puzzled with the part of speech of "chief" in the phrase "the chief executive officer". In the Penn Treebank "chief" in the phrase sometimes is tagged as "JJ" and sometimes tagged as "NN". Could you tell me how you will tag it and why. And is it safe to say that there are some PoS ambiguities, which can not even be solved by human, in English. I know that it maybe true in Chinese that sometimes it is impossible for human to decide the right pos of some words. For example, "一件 包装/v n 精美 的 礼品" (1. a present with wonderful decoration. 2. a prsent decorated wonderfully)In this sentence "包装"(decorate/decoration) can be tagged as noun or verb, both are right, which cannot affected right understanding of the sentence. If there is such thing in English can you give some examples?
 Thanks in advance!

Xing

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Eugenie Giesbrecht | 1 Nov 2009 23:54
Picon
Picon

Post-Doc/Assistant Professor in NLP and Machine Learning (Karlsruhe)

The Knowledge Management research group at the Institute of Applied
Informatics and Formal Description Methods (AIFB,
http://www.aifb.uni-karlsruhe.de/Forschungsgruppen/WBS) is an inter-
disciplinary team of computer scientists and related disciplines and
among the leading international institutions in the field of semantic
technologies. In this context, the institute seeks to fill a position
for a

Postdoctoral Researcher / Assistant Professor
(TV-L E14 German public service salary scale)

for two years, with renewal possible.

We are seeking applications from individuals with a very good Ph.D.
and expertise on an international level in at least one of the topic
areas Large-Scale Information Retrieval, Text Mining, Natural Language
 Processing, Machine Learning, or Data Mining. Further, we require
general competencies that qualify you for acquisition and coordination
 of research projects, for the guidance of Ph.D. students and for
participation in lecturing. The position requires advanced German and
English language skills.

We offer a first-class and international research environment with
substantial freedom with respect to your thematic orientation and
research activities. The close cooperation with the FZI Research
Center for Information Technology in Karlsruhe, the Karlsruhe Service
 Research Institute (KSRI) and a large industry partner network allow
for the combination of fundamental and applied research and a fast
transfer of research results into practice.

The Karlsruhe Institute of Technology is an equal opportunities
employer and welcomes applications from women. Handicapped persons
having the same qualification will be preferred.

For further questions with respect to this position, please contact
Sebastian Blohm (blohm <at> kit.edu).

For best consideration, please submit your application, preferably
 electronically as PDF, to bewerbung-studer <at> aifb.uni-karlsruhe.de or
to Karlsruhe Institute of Technology, Prof. Dr. Rudi Studer, Institut
AIFB 11.40, D-76128 Karlsruhe. The deadline for applications is
November 30th. However, review of application will continue until the
position is filled.

This posting is available online:
http://www.aifb.uni-karlsruhe.de/WBS/seb/E14-AIFB.ml-nlp-web-en.pdf

--

-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Mike Scott | 2 Nov 2009 10:47
Favicon

Re: About Part of Speech in English and Chinese

I think there are two different aspects here. One is that as linguistic categories aren't well established, POS categories won't be either since they derive ultimately from linguistic theory. If we take cases like
(1) church tower
(2) tall tower
it is clear that (2) is adjectival, but in the case of (1) some linguistic theories will call church a noun (because that word-form arguably is mainly used for nouns) while others would call it an adjective because it is here premodifying a noun. The former theories seem to act as if word-forms had a primary POS, rather as people have their gender determined before birth, while latter theories allow for the possibility that words may swing both ways, so to speak, depending on the company they keep. 

The second aspect concerns the information supplied in the context or inferable from it. In the case of (3) ... chief distribution ...
English simply does not tell us without more context whether we are talking of the way chiefs (e.g. tribal chiefs) are distributed through a population or territory, or whether we are talking of the main patterns of distribution of something. Either way, chief premodifies distribution. In POS tagging for such a case, the context may or may not disambiguate so POS tagging will necessarily, for those linguists who think word-forms have a predetermined POS, be varied.

Cheers -- Mike

Fukun Xing wrote:

Hi everybody,
   I am puzzled with the part of speech of "chief" in the phrase "the chief executive officer". In the Penn Treebank "chief" in the phrase sometimes is tagged as "JJ" and sometimes tagged as "NN". Could you tell me how you will tag it and why. And is it safe to say that there are some PoS ambiguities, which can not even be solved by human, in English. I know that it maybe true in Chinese that sometimes it is impossible for human to decide the right pos of some words. For example, "一件 包装/v n 精美 的 礼品" (1. a present with wonderful decoration. 2. a prsent decorated wonderfully)In this sentence "包装"(decorate/decoration) can be tagged as noun or verb, both are right, which cannot affected right understanding of the sentence. If there is such thing in English can you give some examples?
 Thanks in advance!

Xing

_______________________________________________ Corpora mailing list Corpora <at> uib.no http://mailman.uib.no/listinfo/corpora

-- Mike Scott *** If you publish research which uses WordSmith, do let me know so I can include it at http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm *** University of Aston and Lexical Analysis Software Ltd. mike.scott <at> aston.ac.uk www.lexically.net
_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Adam Kilgarriff | 2 Nov 2009 11:06

Re: About Part of Speech in English and Chinese

To add to Mike's response, my particular bugbears are not only noun/adj ambiguities like chief (and many others; male, female, gold silver, ...) but also past-participles/adjectives, and, worst of all, -ing forms, which can float between nouns, verbs and adjectives in a most licentious manner (and if they modify another word, you don't know if the underlying relationship is SUBJ or OBJ, as in Chomsky's "visiting relatives")

They are the cause of a lot of the noise in what we do for English

Adam

2009/11/2 Mike Scott <mike <at> lexically.net>
I think there are two different aspects here. One is that as linguistic categories aren't well established, POS categories won't be either since they derive ultimately from linguistic theory. If we take cases like
(1) church tower
(2) tall tower
it is clear that (2) is adjectival, but in the case of (1) some linguistic theories will call church a noun (because that word-form arguably is mainly used for nouns) while others would call it an adjective because it is here premodifying a noun. The former theories seem to act as if word-forms had a primary POS, rather as people have their gender determined before birth, while latter theories allow for the possibility that words may swing both ways, so to speak, depending on the company they keep. 

The second aspect concerns the information supplied in the context or inferable from it. In the case of (3) ... chief distribution ...
English simply does not tell us without more context whether we are talking of the way chiefs (e.g. tribal chiefs) are distributed through a population or territory, or whether we are talking of the main patterns of distribution of something. Either way, chief premodifies distribution. In POS tagging for such a case, the context may or may not disambiguate so POS tagging will necessarily, for those linguists who think word-forms have a predetermined POS, be varied.

Cheers -- Mike

Fukun Xing wrote:

Hi everybody,
   I am puzzled with the part of speech of "chief" in the phrase "the chief executive officer". In the Penn Treebank "chief" in the phrase sometimes is tagged as "JJ" and sometimes tagged as "NN". Could you tell me how you will tag it and why. And is it safe to say that there are some PoS ambiguities, which can not even be solved by human, in English. I know that it maybe true in Chinese that sometimes it is impossible for human to decide the right pos of some words. For example, "一件 包装/v n 精美 的 礼品" (1. a present with wonderful decoration. 2. a prsent decorated wonderfully)In this sentence "包装"(decorate/decoration) can be tagged as noun or verb, both are right, which cannot affected right understanding of the sentence. If there is such thing in English can you give some examples?
 Thanks in advance!

Xing

_______________________________________________ Corpora mailing list Corpora <at> uib.no http://mailman.uib.no/listinfo/corpora

-- Mike Scott *** If you publish research which uses WordSmith, do let me know so I can include it at http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm *** University of Aston and Lexical Analysis Software Ltd. mike.scott <at> aston.ac.uk www.lexically.net

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora




--
================================================
Adam Kilgarriff                                      http://www.kilgarriff.co.uk              
Lexical Computing Ltd                   http://www.sketchengine.co.uk
Lexicography MasterClass Ltd      http://www.lexmasterclass.com
Universities of Leeds and Sussex       adam <at> lexmasterclass.com
================================================
_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Thierry Poibeau | 2 Nov 2009 14:17
Picon
Picon
Favicon

CONF: Workshop on Language Acquisition in Cambridge UK

We are pleased to announce a workshop on language acquisition  intitled

MULTIMODAL APPROACHES TO LANGUAGE ACQUISITION

26-28 November 2009, RCEAL, Cambridge UK (http://www.rceal.cam.ac.uk/)
Attendance is FREE.

http://www-lipn.univ-paris13.fr/~poibeau/acquisition.html

Please circulate the information.

DESCRIPTION

The study of language acquisition is of paramount importance for 
linguistics, psychology, cognitive science, communication, etc. Children 
produce forms that look or sound like sketches of adult forms. Those 
productions cannot be analyzed without special attention to gestures, gaze, 
mimics as well as the context, the positioning of interlocutors in space, 
and the specificity of discourse objects. The child language community 
shares tools and data through the Internet (especially, the childes 
database) that can be used as a basis for multimodal, multilingual and 
inter- disciplinary research.

PROGRAMME The workshop will be organized over four half days, focusing on 
different aspects of language acquisition.

        Thursday 26 November (afternoon, room GR06/07) - Corpus, coding and 
multimodality
        " 2-3pm: Corpus, coding and metadata (Christophe Parisse, Modyco- 
Inserm, CNRS-U. Paris Ouest Nanterre)
        " 3-4pm: Pointing gesture and multimodality (Emmanuelle Mathiot, 
STL, UMR-CNRS 8163 and Univ. Lille 3; Aliyah Morgenstern, Univ. Sorbonne 
Nouvelle; Marie Leroy, CNRS-MoDyCo and Univ. Paris- Descartes)
        " 4-4:30pm: Coffee break
        " 4:30-5:30pm: Discussion

        Friday 27 November (morning, room GR05) - Child Language 
Argumentation
        " 9:30-10:30am: Argumentation as a motive for syntax development: a 
case study of the development of "parce que" in child language (Martine 
Sekali, Univ. Paris Ouest Nanterre)
        " 10:30-11am: Coffee break
        " 11:00-12:00am: From repairs to self-repairs in adult-child 
interactions (Marie Leroy, Univ. Paris Descartes; Stéphanie Caet, Univ. 
Sorbonne Nouvelle; Aliyah Morgenstern, Univ. Sorbonne Nouvelle)
        " 12-12:30am: Discussion

        Friday 27 November (afternoon, room GR05) - Contrastive studies
        " 2-3pm: Over-informative children: Production/comprehension 
asymmetry, or tolerance to pragmatic violations? (Cat Davies, Univ. of 
Cambridge)
        " 3:4pm: Acquiring tense and aspect in Tamil (Dr. Lavanya Sankaran, 
Univ. of London, Queen Mary)
        " 4-4:30pm: Coffee break
        " 4:30-5:30pm: Comparing processes in child L1 and child and adult 
L2 acquisition (Henriette Hendriks and Helen Engemann, Univ. of Cambridge).

        Saturday 28 November (morning, room GR06/07) - Reformulations
        " 10:11am: The acquisition and development of argumentative skills 
in children from 4-18: mechanisms underlying deductive and probabilistic 
reasoning (Jodi Tommerdahl, University of Birmingham)
        " 11:12am: General discussion Large periods of time will be devoted 
to discussion, with panelists introducing comments after paper 
presentation.

Workshop Chairs

        " Henriette Hendriks
        (RCEAL, University of Cambridge, UK)
        " Aliyah Morgenstern
        (Univ. Sorbonne Nouvelle, France)
        " Thierry Poibeau
        (CNRS and Université Paris 13, France)

The workshop will be held on the ground floor (rooms GR05, GR06 and GR07) 
of the Research Centre for English and Applied Linguistics 9 West Road, 
Cambridge, UK (Faculty of English).

More information:
http://www-lipn.univ-paris13.fr/~poibeau/acquisition.html

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Mike Scott | 2 Nov 2009 15:17
Favicon

Re: About Part of Speech in English and Chinese

Simon Smith said

 

<at> Mike: Can you say any more about the theory that words do not have a pre-determined POS? I have to confess I had always thought the interpretation of "church" in "church tower" as an adjective to be nothing more than a non-linguist's misunderstanding. 

 

It seems more economical to say that any noun can (in principle) modify any other noun, as part of a noun compound, than to record an additional POS, namely adjective, for every single noun in the lexicon. I would say, too, that there is no more reason to make out a special case for substance-item compounds, such as "gold watch" and "lead balloon", than there is for "radium watch" (G Pullum's example) or "iridium balloon".

 

However, theories which assign POS elsewhere than in the lexicon would cause problems for that explanation (as well as for lexicographers, I would have thought).

 

That assumes there is a "true" POS which each word possesses. A rose is a noun is a noun, so to speak . Wouldn't it be more economical still, though, to say there are roles, and that almost any word can take on almost any role?  So that in the case of
(1) That "through" is saw in your essay should have been a "throughout"
both through and throughout were playing noun roles and we know that by their being preceded by that or a.

Cheers -- Mike
-- Mike Scott *** If you publish research which uses WordSmith, do let me know so I can include it at http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm *** University of Aston and Lexical Analysis Software Ltd. mike.scott <at> aston.ac.uk www.lexically.net
_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Xing Fukun | 2 Nov 2009 15:11
Picon

Re: Chinese and English POS

Hi,
Thanks very much for all the replies to my question. I will sum up all the relevent responses and share it with all the members later .
Simon said the Chinese sentence in the last mail cannot be displayed. The sentence is:
一件 包装/v n 精美 的 礼品
a present with wonderful decoration
a present decorated wonderfully
 
Best regards
 
Xing
 
 
Xing Fukun
Phd candidate for Language Engineering

Center for Language Information Processing

Beijing Language and Cultural University

北京语言大学语言信息处理研究所

2009-11-02
发件人: simon smith
发送时间: 2009-11-02 21:06:33
收件人: xingfukun001 <at> gmail.com; CORPORA <at> uib.no; mike <at> lexically.net; Adam Kilgarriff
抄送:
主题: Chinese and English POS
 

Hi everybody,

I am puzzled with the part of speech of "chief" in the phrase "the chief

executive officer". 

 

 

<at> Fukun

 

I think that your Chinese example sounds interesting, but unfortunately I can't read it. Can you try again, with a Unicode font?

 

"Chief" is a legitimate example of a word that has distinct noun and adjective readings. As an adjective it means "main" or "principal"; as a noun it means "person in charge". A "chief executive officer" is of course a "person in charge" too, but the "chief" part should still be analyzed as "main", I think.

 

<at> Mike: Can you say any more about the theory that words do not have a pre-determined POS? I have to confess I had always thought the interpretation of "church" in "church tower" as an adjective to be nothing more than a non-linguist's misunderstanding. 

 

It seems more economical to say that any noun can (in principle) modify any other noun, as part of a noun compound, than to record an additional POS, namely adjective, for every single noun in the lexicon. I would say, too, that there is no more reason to make out a special case for substance-item compounds, such as "gold watch" and "lead balloon", than there is for "radium watch" (G Pullum's example) or "iridium balloon".

 

However, theories which assign POS elsewhere than in the lexicon would cause problems for that explanation (as well as for lexicographers, I would have thought).

 

Simon

 

歡迎以中文回信

 

Simon Smith, PhD

Assistant Professor

Foreign Language Center

National Chengchi University

 

政大外文中心助理教授

 

http://www3.nccu.edu.tw/~smithsgj/

 

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Picon

Call for papers: NEWS, (NEW) MEDIA, AND CORPORA: FROM METHODOLOGY TO THEORY

Call for Papers

WORKSHOP: NEWS, (NEW) MEDIA, AND CORPORA: FROM METHODOLOGY TO THEORY
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Giessen,  May 26 2009

To be held in conjunction with ICAME 2010

	
TOPIC
The focus of this pre-conference workshop will be on a set of topical issues
pertaining to corpus-based studies on the language of news, including both
printed and broadcast news.
So far, corpus studies in this context have focused mainly on written media
and on the language of newspapers in particular. While not disregarding
these studies, the workshop is intended to also address the interplay of
different media in the actualization of news on television, radio and on the
Internet. For example, we will also look at blogs, podcasts, vodcasts, and
video sharing from a corpus-linguistic perspective.

Papers (20 mins + 10 mins discussion) are invited, mainly focusing on
methodological and theoretical issues concerning two lead questions: 
* How can corpora and corpus-linguistic methods be applied to the study of
news in old and new media, including the wide range of Internet-based
communication?
* How do corpora and corpus-linguistic methods have to change to come to
grips with this new multi-modal scenario?

DEADLINE
The deadline for the submission of abstracts (max 400 words) is 20 December
2010. Please indicate your name and affiliation, your snail mail and email
address on the abstract and send it as a Word document to:
roberta.facchinetti <at> univr.it

NOTIFICATION of acceptance will be sent out by 31 January 2010.

WEBSITE:
http://www.uni-giessen.de/cms/faculties/f05/engl/ling/icame2010/home/prework
/#2

I am looking forward to seeing you at the workshop.

Roberta Facchinetti
University of Verona
Via S. Francesco 22
37129 Verona
Italy

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Mike Scott | 2 Nov 2009 15:54
Favicon

Re: About Part of Speech in English and Chinese


> 1. We can say that '"through"' is a sort of shorthand for 'instance of 
> the word "through"'. Without the inverted commas or special intonation 
> it becomes difficult to interpret.
Maybe.
>  
> 2. Are you happy with "sooner" and "better" as nouns in "the sooner 
> the better"? Or "good" or "bad" or "ugly" in... well, you know. What 
> about "recently" as an adjective in (Pullum example again) "The winner 
> recently of [two prestigious awards]"? -- it modifies the noun 
> "winner", but it doesn't look like an adjective and doesn't even go in 
> the right place.
>  
I'm not saying there won't be difficulties or that recognising a noun is 
clear, simply by virtue of it being preceded by "the", but I do think 
that the assumption that a word has a "natural" in-built POS leads us 
into greater difficulties. But I would not want to set myself up as a 
grammarian or theoretical linguist. That was my twopenny worth. I'd be 
quite happy thinking of the good, the bad and the ugly as nouns, 
incidentally. Like the robbers, the bandits and the cowboys that they were.

Cheers -- Mike

--

-- 
Mike Scott

***
If you publish research which uses WordSmith, do let me know so I can include it at
http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm
***
University of Aston and Lexical Analysis Software Ltd.
mike.scott <at> aston.ac.uk
www.lexically.net

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora


Gmane