Mark Davies | 2 Jan 2011 00:55

COME and GO + ADJ

I know I've seen some corpus-based analyses of COME and GO + ADJECTIVE, e.g.:

GO: crazy, wrong, bankrupt, postal ( = "negative")
e.g. http://corpus.byu.edu/coca/?c=coca&q=8078028

COME: clean, true, alive, clear ( = "positive")
e.g. http://corpus.byu.edu/coca/?c=coca&q=8078026

 Biber et al (1999:444-45) discusses it a bit, but I know that I've seen longer treatments of the topic, from a
semantic prosody / corpus-based perspective. But in searching LLBA and Google for these today, I can't
seem to find any of these articles again. Any pointers for this?

Thanks in advance,

Mark Davies

============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: http://davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================
_______________________________________________
Corpora mailing list
Corpora <at> uib.no
(Continue reading)

Kristina Vuckovic | 3 Jan 2011 08:07
Picon

NooJ 2011 - 2nd Call for Papers

 

Second Call for Papers

 

NooJ 2011 Conference

Dubrovnik, Croatia

June 13-15, 2011

 

http://lt.ffzg.hr/nooj2011

 

***********************************************

Faculty of Humanities and Social Sciences of the University of Zagreb (FFZG), Croatian Language Technologies Society (HDJT) the Laboratoire de Sémio-Linguistique et Didactique (LASELDI) of the University Franche-Comté and the Maison des Sciences de l'Homme et de l'Environnement Ledoux organize the 2011 NooJ Conference in Dubrovnik, Croatia from 13-15 June, 2011.

 

NooJ is both a corpus processing tool and a linguistic development environment: it allows linguists to formalize several levels of linguistic phenomena: orthography and spelling; lexicons of simple words; multiword units and frozen expressions; inflectional, derivational and productive morphology; local, structural syntax and transformational syntax. For each of these levels NooJ provides linguists with one or more formal tools specifically designed to facilitate the description of each phenomenon, as well as parsing tools designed to be as computationally efficient as possible. This approach distinguishes NooJ from most computational linguistic tools, which provide a single formalism that could describe every linguistic phenomena. As a corpus processing tool, NooJ allows users to apply sophisticated linguistic queries to large corpora in order to build indices and concordances, annotate texts automatically, perform statistical analyses, etc.

 

NooJ is freely available and linguistic modules can already be freely downloaded for Acadian, Arabic, Armenian, Bulgarian, Catalan, Chinese, Croatian, French, English, German, Hebrew, Greek, Hungarian, Italian, Polish, Portuguese, Spanish and Turkish. A dozen other modules are under construction.

 

******************************

The conference intends to:

******************************

*              give NooJ users and researchers in Linguistics and in Computational Linguistics the opportunity to meet and share their experience as developers, researchers and teachers;

*              present to NooJ users the latest linguistic resources and NLP applications developed for/with NooJ, its latest functionalities, as well as its future developments;

*              offer researchers and graduate students two tutorials (one basic and one advanced) to help them parse corpora and build NLP applications using NooJ.

*              provide the occasion to present and discover the recent developments of NooJ itself (v3.0).

 

*******************

Topics of interest:

*******************

*              Semantic analysis

*              Syntactic analysis

*              Lexical analysis

*              Linguistic resources

*              Dictionaries

 

***************

Submission:

***************

We invite the submission of abstracts in English until 17 January 2011. The abstracts should contain the title, name, institution and email of the author(s). Abstracts should not exceed one page (between 400 and 600 words) and should be sent to nooj2011 <at> ffzg.hr . All proposals will be reviewed by the scientific committee and authors will be given notice of acceptance of their papers no later than 15 March 2011.

 

Further information about the conference could be found at http://lt.ffzg.hr/nooj2011/. You can also contact the organizing committee at nooj2011 <at> ffzg.hr for any additional information.

 

*******************

Important dates:

*******************

Abstract submission: 17 January 2011

Notification of acceptance: 15 March 2011

Registration: until 15 April 2011

Conference dates: 13-15 June, 2011

 

************************

Scientific Committee:

************************

* Abdelmajid Ben Hamadou (MIRACL, ISIM-Sfax, Tunisia)

* Božo Bekavac (University of Zagreb, Croatia)

* Yaakov Bentolila (Ben-Gurion University, Israel)

* Xavier Blanco (University Autonomous Barcelona, Spain)

* Krzysztof Bogacki (University of Warsaw, Poland)

* Gisele Chevalier (University of Moncton, Canada)

* Anaid Donabédian (INALCO, Paris)

* Zdravko Dovedan (University of Zagreb, Croatia)

* Chantal Enguehard (LINA, UMR CNRS 6241, France)

* Zoé Gavriilidou (Democritus University of Thrace, Greece)

* Svetla Koeva (Sofia University, Bulgaria)

* Denis Le Pesant (University Paris 10, France)

* Peter Machonis (Florida International University, USA)

* Slim Mesfar (ISI-Tunis, Tunisia)

* Claude Montacié (Université Paris 4, France)

* Odile Piton (University Paris 1, France)

* Max Silberztein (University of Franche-Comté, France)

* Marko Tadić (University of Zagreb, Croatia)

* Tamás Váradi, (Budapest Academy of Sciences, Hungary)

* Simona Vietri (University of Salerne, Italy)

* Duško Vitas (University of Belgrade, Serbia)

* Kristina Vučković (University of Zagreb, Croatia)

 

**************************

Organizing Committee:

**************************

* Željko Agić (University of Zagreb, Croatia)

* Božo Bekavac (University of Zagreb, Croatia)

* Marko Tadić (University of Zagreb, Croatia)

* Kristina Vučković (University of Zagreb, Croatia)

 

 

 

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
طه taha | 3 Jan 2011 09:56

Arabic Corpora resource now available


Arabic Corpora resource now available on
    http://aracorpus.e3rab.com/index.php?content=english

The Arabic Corpora resource project aims to give references of arabic
corpora with free access, to allow computing linguistics to accomplish
their works.
corpora

Ajdir Corpora

    * Collector : Dr Ahmed Abdelali
    * Access:Free
    * References:Journals ()
    * Size: 113 millions words, 800 Mb
    * Link: compressed file (tar.gz)
      individual files

Arabic corpus (Watan&Khaleej)

    * Collector : Dr Mourad Abbes
    * Access:Free
    * References:Journals (AlWatan & Alkhaleej)
    * Size: 14 Mb
    * Link: Arabic Corpus

Arabic Corpora

    * Collector : Dr. Latifa Al-Sulaiti
    * Access:Free
    * References:Journals()
    * Size:-
    * Link: Arabic corpus

Open Source Arabic Corpora

    * Collector : Dr Saad Motaz
    * Access:Free
    * References:Journals ()
    * Size:20 millions words, 15 Mb
    * Link: ar-text-mining

Arabic Words Corpora

    * A list of words
    * Collector : Muayyed Al-Saadi
    * Access:Free
    * References:Thwab library
    * Size: 1.5 million words, 6 Mb (zip)
    * Link: Arabic word corpus

-----------------------------------------------------------------------------------------------------------------------
Send big files for free. Simple steps. No registration.
Visit now http://www.nawelny.com

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Picon

NooJ 2011 - 2nd Call for Papers

Second Call for Papers

 

NooJ 2011 Conference

Dubrovnik, Croatia

June 13-15, 2011

 

http://lt.ffzg.hr/nooj2011

 

***********************************************

Faculty of Humanities and Social Sciences of the University of Zagreb (FFZG), Croatian Language Technologies Society (HDJT) the Laboratoire de Sémio-Linguistique et Didactique (LASELDI) of the University Franche-Comté and the Maison des Sciences de l'Homme et de l'Environnement Ledoux organize the 2011 NooJ Conference in Dubrovnik, Croatia from 13-15 June, 2011.

 

NooJ is both a corpus processing tool and a linguistic development environment: it allows linguists to formalize several levels of linguistic phenomena: orthography and spelling; lexicons of simple words; multiword units and frozen expressions; inflectional, derivational and productive morphology; local, structural syntax and transformational syntax. For each of these levels NooJ provides linguists with one or more formal tools specifically designed to facilitate the description of each phenomenon, as well as parsing tools designed to be as computationally efficient as possible. This approach distinguishes NooJ from most computational linguistic tools, which provide a single formalism that could describe every linguistic phenomena. As a corpus processing tool, NooJ allows users to apply sophisticated linguistic queries to large corpora in order to build indices and concordances, annotate texts automatically, perform statistical analyses, etc.

 

NooJ is freely available and linguistic modules can already be freely downloaded for Acadian, Arabic, Armenian, Bulgarian, Catalan, Chinese, Croatian, French, English, German, Hebrew, Greek, Hungarian, Italian, Polish, Portuguese, Spanish and Turkish. A dozen other modules are under construction.

 

******************************

The conference intends to:

******************************

*              give NooJ users and researchers in Linguistics and in Computational Linguistics the opportunity to meet and share their experience as developers, researchers and teachers;

*              present to NooJ users the latest linguistic resources and NLP applications developed for/with NooJ, its latest functionalities, as well as its future developments;

*              offer researchers and graduate students two tutorials (one basic and one advanced) to help them parse corpora and build NLP applications using NooJ.

*              provide the occasion to present and discover the recent developments of NooJ itself (v3.0).

 

*******************

Topics of interest:

*******************

*              Semantic analysis

*              Syntactic analysis

*              Lexical analysis

*              Linguistic resources

*              Dictionaries

 

***************

Submission:

***************

We invite the submission of abstracts in English until 17 January 2011. The abstracts should contain the title, name, institution and email of the author(s). Abstracts should not exceed one page (between 400 and 600 words) and should be sent to nooj2011 <at> ffzg.hr . All proposals will be reviewed by the scientific committee and authors will be given notice of acceptance of their papers no later than 15 March 2011.

 

Further information about the conference could be found at http://lt.ffzg.hr/nooj2011/. You can also contact the organizing committee at nooj2011 <at> ffzg.hr for any additional information.

 

*******************

Important dates:

*******************

Abstract submission: 17 January 2011

Notification of acceptance: 15 March 2011

Registration: until 15 April 2011

Conference dates: 13-15 June, 2011

 

************************

Scientific Committee:

************************

* Abdelmajid Ben Hamadou (MIRACL, ISIM-Sfax, Tunisia)

* Božo Bekavac (University of Zagreb, Croatia)

* Yaakov Bentolila (Ben-Gurion University, Israel)

* Xavier Blanco (University Autonomous Barcelona, Spain)

* Krzysztof Bogacki (University of Warsaw, Poland)

* Gisele Chevalier (University of Moncton, Canada)

* Anaid Donabédian (INALCO, Paris)

* Zdravko Dovedan (University of Zagreb, Croatia)

* Chantal Enguehard (LINA, UMR CNRS 6241, France)

* Zoé Gavriilidou (Democritus University of Thrace, Greece)

* Svetla Koeva (Sofia University, Bulgaria)

* Denis Le Pesant (University Paris 10, France)

* Peter Machonis (Florida International University, USA)

* Slim Mesfar (ISI-Tunis, Tunisia)

* Claude Montacié (Université Paris 4, France)

* Odile Piton (University Paris 1, France)

* Max Silberztein (University of Franche-Comté, France)

* Marko Tadić (University of Zagreb, Croatia)

* Tamás Váradi, (Budapest Academy of Sciences, Hungary)

* Simona Vietri (University of Salerne, Italy)

* Duško Vitas (University of Belgrade, Serbia)

* Kristina Vučković (University of Zagreb, Croatia)

 

**************************

Organizing Committee:

**************************

* Željko Agić (University of Zagreb, Croatia)

* Božo Bekavac (University of Zagreb, Croatia)

* Marko Tadić (University of Zagreb, Croatia)

* Kristina Vučković (University of Zagreb, Croatia)

 

 

Kristina Vučković

(For the NooJ 2011 Organizing Committee)

nooj2011 <at> ffzg.hr

 

 

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Sondes BANNOUR | 3 Jan 2011 16:22
Picon

Recherche de ressources

Bonjour,

Dans le cadre de ma thèse, j'ai besoin de données sur lesquelles travailler et plus exactement dans l'ordre de préférence:
 
 * Une ontologie (ou une taxonomie) et un corpus annoté sémantiquement au regard de cette ontologie
 * Une ontologie (ou une taxonomie) et un corpus du même domaine ou le corpus à partir duquel l'ontologie a été construite
 * Une ontologie d'un domaine qui ne nécessite pas une grande expertise

Etant consciente de la rareté de ces ressources, n'importe quelle alternative me serait d'une grande aide.

Merci d'avance
Cordialement,
--
Sondes BANNOUR

LIPN - CNRS UMR 7030              Tel.   01 49 40 40 82
Université Paris 13                      Fax.  01 48 26 07 12
99, av. J-B. Clement                    Email:  sondes.bannour <at> lipn.univ-paris13.fr

93430 Villetaneuse FRANCE       http://www-lipn.univ-paris13.fr/~bannour/

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
John MCKENNY | 3 Jan 2011 16:43
Picon
Favicon

Stop-list in Wodrsmith Tools Concord

Could anyone help me with a problem I have encountered with Wordsmith Tools. I am working with  a large corpus of more than 7 million words and I want to focus on  a large set of technical, sub-technical and content words without too much noise from function words. I made a stop-list of such words as instructed in the extensive help file where it mentions that Key word, Concord and Wordlist  tools can  use a stop list. I loaded this .stp file and each time I tested a small part of my corpus using the
as   a search word I  got a full concordance of the in that small section.  My stoplist  hadn't worked.
Have I misunderstood the use of stoplist in Concord?   Is there something wrong with my stoplist file or my use of the Concord settings?
Thank you for any help you might give
All the best for 2011
John McKenny
 
 
 
 
Division of English Studies
University of Nottingham Ningbo, China
199 Taikang Dong Lu
Ningbo, Zhejiang Province
P.R.China   315100
Tel +86(0)574 8818 0271
Fax +86(0)574 8818 0125
_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Picon

Re: Stop-list in Wodrsmith Tools Concord

Dear John,

As far as I know, the stoplist works with lists rather than with concordances. So use any noun as search word
and check the Collocates for the inclusion of function words in order to see whether your stoplist works.

Normally, you should not need a stoplist for concordances (i.e. you wouldn't not look for "the" if you know
that the programme will exclude examples with "the"). In case you use wildcards (e.g. words starting with
"the*"), you would need to use the "Advanced" and "exclude if context contains", listing all the words to
exclude (e.g. the/they/them/these/then/there...) with a horizon of L0/R0. I do not think that it is
possible to use lists in this function, as normally the relevant sets should be limited.

I hope I have not completely misunderstood your question.

All the best

Georg
________________________________________
Von: corpora-bounces <at> uib.no [corpora-bounces <at> uib.no] im Auftrag von John MCKENNY [john.mckenny <at> nottingham.edu.cn]
Gesendet: Montag, 03. Jänner 2011 16:43
An: corpora <at> uib.no
Betreff: [Corpora-List] Stop-list in Wodrsmith Tools Concord

Could anyone help me with a problem I have encountered with Wordsmith Tools. I am working with  a large corpus
of more than 7 million words and I want to focus on  a large set of technical, sub-technical and content words
without too much noise from function words. I made a stop-list of such words as instructed in the extensive
help file where it mentions that Key word, Concord and Wordlist  tools can  use a stop list. I loaded this .stp
file and each time I tested a small part of my corpus using the
as   a search word I  got a full concordance of the in that small section.  My stoplist  hadn't worked.
Have I misunderstood the use of stoplist in Concord?   Is there something wrong with my stoplist file or my use
of the Concord settings?
Thank you for any help you might give
All the best for 2011
John McKenny

Division of English Studies
University of Nottingham Ningbo, China
199 Taikang Dong Lu
Ningbo, Zhejiang Province
P.R.China   315100
Tel +86(0)574 8818 0271
Fax +86(0)574 8818 0125

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Mike Scott | 3 Jan 2011 18:33
Favicon

Re: Stop-list in Wodrsmith Tools Concord

I think Georg is right, but just wanted to add that you do need to be 
careful to check you're setting the stop-list for the right tool 
(Concord, KeyWords or WordList) in the main controller program.

Cheers -- Mike

On 03/01/2011 16:56, Marko, Georg (georg.marko <at> uni-graz.at) wrote:
> Dear John,
>
> As far as I know, the stoplist works with lists rather than with concordances. So use any noun as search word
and check the Collocates for the inclusion of function words in order to see whether your stoplist works.
>
> Normally, you should not need a stoplist for concordances (i.e. you wouldn't not look for "the" if you know
that the programme will exclude examples with "the"). In case you use wildcards (e.g. words starting with
"the*"), you would need to use the "Advanced" and "exclude if context contains", listing all the words to
exclude (e.g. the/they/them/these/then/there...) with a horizon of L0/R0. I do not think that it is
possible to use lists in this function, as normally the relevant sets should be limited.
>
> I hope I have not completely misunderstood your question.
>
> All the best
>
> Georg
> ________________________________________
> Von: corpora-bounces <at> uib.no [corpora-bounces <at> uib.no] im Auftrag von John MCKENNY [john.mckenny <at> nottingham.edu.cn]
> Gesendet: Montag, 03. Jänner 2011 16:43
> An: corpora <at> uib.no
> Betreff: [Corpora-List] Stop-list in Wodrsmith Tools Concord
>
> Could anyone help me with a problem I have encountered with Wordsmith Tools. I am working with  a large
corpus of more than 7 million words and I want to focus on  a large set of technical, sub-technical and
content words without too much noise from function words. I made a stop-list of such words as instructed in
the extensive help file where it mentions that Key word, Concord and Wordlist  tools can  use a stop list. I
loaded this .stp file and each time I tested a small part of my corpus using the
> as   a search word I  got a full concordance of the in that small section.  My stoplist  hadn't worked.
> Have I misunderstood the use of stoplist in Concord?   Is there something wrong with my stoplist file or my
use of the Concord settings?
> Thank you for any help you might give
> All the best for 2011
> John McKenny
>
>
>
>
> Division of English Studies
> University of Nottingham Ningbo, China
> 199 Taikang Dong Lu
> Ningbo, Zhejiang Province
> P.R.China   315100
> Tel +86(0)574 8818 0271
> Fax +86(0)574 8818 0125
>
> _______________________________________________
> Corpora mailing list
> Corpora <at> uib.no
> http://mailman.uib.no/listinfo/corpora

--

-- 
Mike Scott

***
If you publish research which uses WordSmith, do let me know so I can include it at
http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm
***
University of Aston and Lexical Analysis Software Ltd.
mike.scott <at> aston.ac.uk
www.lexically.net

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Sukriye Ruhi | 4 Jan 2011 00:24
Picon
Picon
Favicon

Call for Panel Papers: 6th Intl. Symposium on Politeness - Corpus Approaches

Apologies for multiple postings

Dear Colleagues,

We cordially invite papers for presentation at the following panels,  
which will be held as part of the events at the Sixth International  
Symposium on Politeness, July 11-13, 2011, in Ankara, Turkey:

1. COUNTERPOINT: (IM)POLITENESS RESEARCH THROUGH THE LENS OF CORPORA  
AND CORPORA THROUGH THE LENS OF (IM)POLITENESS RESEARCH

Convenors: Sukriye Ruhi (Middle East Technical University) & Yesim  
Aksan (Mersin University)

2. CORPUS BASED APPROACHES TO POLITENESS AND AUTHOR POSITIONING IN  
ACADEMIC WRITING

Convenor: Yasemin Bayyurt (Bogazici University)

3. POLITENESS IN CHILD LANGUAGE

Convenor: Hatice Sofu (Cukurova University)

Further information on the panels is available at

http://www.arber.com.tr/isp2011.org/index.php/page,34,accepted_panel_proposals

We kindly ask contributors to keep in touch with the convenors on  
papers that they may wish to submit. Please note that abstract  
submissions to panels are handled through the Symposium web site at  
http://www.arber.com.tr/isp2011.org

Sukriye Ruhi
On behalf of the Organising Committee

--

-- 
Prof. Dr. Sukriye Ruhi
Dept. of Foreign Language Education
Middle East Technical University
06531 Ankara, Turkey
email: sukruh <at> metu.edu.tr
Fax: +90 312 210 79 69
Tel: +90 312 210 40 83
Sözlü Türkçe Derlemi/Spoken Turkish Corpus: http://std.metu.edu.tr

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Kevin B. Cohen | 4 Jan 2011 00:49
Picon

Re: Recherche de ressources

Sondes,

>  * Une ontologie (ou une taxonomie) et un corpus annoté sémantiquement au
> regard de cette ontologie
>  * Une ontologie (ou une taxonomie) et un corpus du même domaine ou le
> corpus à partir duquel l'ontologie a été construite
>  * Une ontologie d'un domaine qui ne nécessite pas une grande expertise

Early this year we will be releasing the CRAFT corpus, consisting of
about 600,000 words annotated with respect to eight ontologies.  The
documents are in the domain of mouse genomics, and the ontologies are
all biomedical.

Best wishes,

Kevin

--

-- 
Kevin Bretonnel Cohen, PhD
Biomedical Text Mining Group Lead, Center for Computational
Pharmacology, U. Colorado School of Medicine
and
Lead Artificial Intelligence Engineer, The MITRE Corporation, Human
Language Technology Division
303-916-2417 (cell) 303-377-9194 (home)
http://compbio.ucdenver.edu/Hunter_lab/Cohen

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora


Gmane