Miles Osborne | 1 Sep 15:14 2008
Picon
Picon

Re: Google Ngram count

I can't comment on the status of counts derived from Google's search
page and there is no published statement on the relationship (if any)
between the Ngram counts and any other counts.

That aside,  as with any resource gathered from the Web, caveat
emptor.  I do know of people using the Ngram set within SMT language
models to advantage, so it can be a useful resource.

Miles

2008/8/31 Christopher Brewster <C.Brewster <at> dcs.shef.ac.uk>:
> There was an extensive analysis by Jean Veronis a while ago showing that
> Google counts were invalid (long before the release of the ngram corpus).
> Is that analysis still valid?
> Does that mean we should not trust Google's ngram corpus at all?
>
> Christopher
>
> *****************************************************
> Department of Computer Science, University of Sheffield
> Regent Court, 211 Portobello Street
> Sheffield   S1 4DP   UNITED KINGDOM
> Web: http://www.dcs.shef.ac.uk/~kiffer/
> Tel: +44(0)114-22.21967  Fax: +44 (0)114-22.21810
> Skype: christopherbrewster
> SkypeIn (UK): +44 (20) 8144 0088
> SkypeIn (US): +1 (617) 381-4281
> *****************************************************
> Corruptissima re publica plurimae leges. Tacitus. Annals 3.27
>
(Continue reading)

Wray, Rebecca | 1 Sep 16:49 2008

Special issue of the International Journal of Lexicography now online

International Journal of Lexicography
Special Issue: The Legacy of John Sinclair
Guest edited by Patrick Hanks

John Sinclair was the most radical thinker on the lexicon of the 20th
century. His insights into the nature of collocations and discourse
structure have inspired new ways of analysing meaning. He was never
afraid to face up to awkward questions such as the vague and
probabilistic nature of meaning and of evidence of word use. His
insistence on close, detailed analysis of evidence played a major role
in the development of the emerging discipline of corpus linguistics, now
universally recognised as a cornerstone of modern lexicography. In this
memorial issue, some of his leading former colleagues and admirers from
Asia, Africa, and America as well as Britain and Europe, present a broad
spectrum of papers inspired by the Sinclairian approach, ranging from
practical dictionary making to new developments in linguistic theory.

Table of contents
Volume 21, Issue 3
http://www.oxfordjournals.org/page/3284/1    

Visit the links below to read article abstracts. If your institution has
a subscription, you will be able to access the full text.

FREE ARTICLE: The Lexicographical Legacy of John Sinclair
Patrick Hanks
http://www.oxfordjournals.org/page/3284/2  

Corpus-driven Lexicography
Ramesh Krishnamurthy
(Continue reading)

Mario Pavone | 2 Sep 09:33 2008
Picon

last CFP: NICSO 2008 - extended deadline: September 12


               (Apologies for multiple postings)

                    LAST CALL FOR PAPERS

                         NICSO 2008
           International Workshop on Nature Inspired
            Cooperative Strategies for Optimization

                      Puerto Palace Hotel
                  Puerto de La Cruz, Tenerife
                     12-14 November  2008

                http://www.gci.org.es/nicso2008

                     nicso2008 <at> gci.org.es

       *NEWS*: EXTENDED SUBMISSION DEADLINE: September 12th, 2008

NICSO 2008 is the third edition of the International Workshop on
Nature Inspired  Cooperative Strategies for Optimization.
Its  aims are:
(1) to foster a deeper understanding of "cooperativity" in
        computational optimisation systems, and
(2) to encourage a vigorous exchange of ideas about emerging
        research areas in cooperative problem solving strategies.

The accepted papers will be published in the Springer book series on
Studies in Computational Intelligence.

(Continue reading)

Erik Tjong Kim Sang | 2 Sep 11:54 2008
Picon
Picon

Treebanks and Linguistic Theories: Final Call for Papers

THE SEVENTH INTERNATIONAL WORKSHOP ON TREEBANKS AND LINGUISTIC THEORIES

January 23-24, 2009
Groningen, The Netherlands
http://www.let.rug.nl/tlt

FINAL CALL FOR PAPERS

The Seventh International Workshop on Treebanks and Linguistic
Theories will be held on January 23 to 24, 2009 in Groningen, the
Netherlands. Submissions are invited for papers, posters and
demonstrations presenting high quality, previously unpublished
research in the topics described below. Contributions should focus on
results from completed as well as ongoing research, with an emphasis
on novel approaches, methods, ideas, and perspectives, whether
descriptive, theoretical, formal or computational. Papers and poster
abstracts will be published in paper as well as online proceedings.

WORKSHOP MOTIVATION AND AIMS

Treebanks are language resources that include annotations at levels of
linguistic structure beyond the word level. They typically provide
syntactic constituent or dependency structures for sentences and
sometimes functional and predicate-argument structures. Treebanks have
become crucially important for the development of data-driven
approaches to natural language processing, human language
technologies, grammar extraction and linguistic research in
general. There are a number of ongoing projects aiming at compiling
representative treebanks for specific languages. In addition, there
are projects that develop tools or explore annotation beyond syntactic
(Continue reading)

Gemma Boleda | 2 Sep 15:20 2008

Chinese texts

Dear members,

I am looking for a couple of texts in Chinese that have the properties 
listed below. Finding them is more difficult than I had foreseen (e.g., 
I have checked the Gutenberg project, but most texts are in classical 
Chinese; those that are not, for instance those by Lu Xun, are too short 
for my purposes). Any pointers would be appreciated.

The texts should be:

- freely available for research purposes;
- written in modern Chinese;
- by a single author (no translations);
- long (the longer, the better), at least 100 thousand words;

Any topic/genre would do; ideally, I'd like to have at least one novel 
and one non-literary piece (e.g., a textbook on economy or history).

Thank you,

Gemma Boleda
Universitat Politècnica de Catalunya

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

jaime.hunt | 3 Sep 02:10 2008
Picon
Picon

concordance program for large files

Hello everyone,

I was just wondering if you know of a good concordance program that deals with large files of over 1 million
words that I might be able to use for my research. Has anyone had any experience with one? There are a few free
ones on the internet, but they often don't deal with really large files.

Regards,
Jaime

Mr Jaime Hunt MAppLing (TESOL), BA (Hons)
PhD (Linguistics) Candidate
School of Humanities and Social Science
McMullin Building
University of Newcastle 
Callaghan
NSW 2308
Australia

Ph. +61 (0)2 4921 5175
Email: jaime.hunt <at> studentmail.newcastle.edu.au

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Paul Johnston | 3 Sep 08:41 2008
Picon

Re: concordance program for large files

On Wednesday 03 September 2008 01:10:05 
jaime.hunt <at> studentmail.newcastle.edu.au wrote:
> Hello everyone,
>
> I was just wondering if you know of a good concordance program that deals
> with large files of over 1 million words that I might be able to use for my
> research. Has anyone had any experience with one? There are a few free ones
> on the internet, but they often don't deal with really large files.
>
> Regards,
> Jaime
>
> Mr Jaime Hunt MAppLing (TESOL), BA (Hons)
> PhD (Linguistics) Candidate
> School of Humanities and Social Science
> McMullin Building
> University of Newcastle
> Callaghan
> NSW 2308
> Australia
>
> Ph. +61 (0)2 4921 5175
> Email: jaime.hunt <at> studentmail.newcastle.edu.au
>
> _______________________________________________
> Corpora mailing list
> Corpora <at> uib.no
> http://mailman.uib.no/listinfo/corpora

For REALLY large file you could use the Cambridge-CMU Toolkit to build 
(Continue reading)

Chris Tribble | 3 Sep 09:20 2008
Picon

Re: concordance program for large files

For larger files in a windows environment I'd recommend Wordsmith Tools
(http://www.lexically.net/wordsmith/) or MonoConc Pro
(http://www.athel.com/).  Neither are free - but they're not that expensive
either.  I've used WS Tools with the whole of BNC and larger corpora with no
problems.  WS Tools works well with Unicode and is very good for Ngrams,
conc-grams and other delights.

Best

C:
--
IN CHESHIRE TODAY
Dr Christopher Tribble
TEL 	|| +44 (0)161 929 4411
EMAIL	|| ctribble <at> clara.co.uk
WEB	|| www.ctribble.co.uk  

> -----Original Message-----
> From: corpora-bounces <at> uib.no [mailto:corpora-bounces <at> uib.no] 
> On Behalf Of jaime.hunt <at> studentmail.newcastle.edu.au
> Sent: 03 September 2008 01:10
> To: Corpora <at> uib.no
> Subject: [Corpora-List] concordance program for large files
> 
> Hello everyone,
> 
> I was just wondering if you know of a good concordance 
> program that deals with large files of over 1 million words 
> that I might be able to use for my research. Has anyone had 
> any experience with one? There are a few free ones on the 
(Continue reading)

Pierre Nugues | 3 Sep 09:11 2008
Picon
Picon

Re: concordance program for large files

You may try this program in Perl (10 lines):
http://www.cs.lth.se/EDA171/Programs/ch02/concord_perl.pl

Pierre
--
Pierre Nugues, Lunds Tekniska Högskola, Institutionen för  
datavetenskap, Box 118, S-221 00 Lund, Suède.
Tél. (0046) 46 222 96 40, http://www.cs.lth.se/~pierre
Visiteurs: Lunds Tekniska Högskola, E-huset, rum 4134A, Ole Römers väg  
3, S-223 63 Lund.
Mon livre/My book: http://www.cs.lth.se/home/Pierre_Nugues/ilppp/

Le 3 sept. 08 à 08:41, Paul Johnston a écrit :

> On Wednesday 03 September 2008 01:10:05
> jaime.hunt <at> studentmail.newcastle.edu.au wrote:
>> Hello everyone,
>>
>> I was just wondering if you know of a good concordance program that  
>> deals
>> with large files of over 1 million words that I might be able to  
>> use for my
>> research. Has anyone had any experience with one? There are a few  
>> free ones
>> on the internet, but they often don't deal with really large files.
>>
>> Regards,
>> Jaime
>>
>> Mr Jaime Hunt MAppLing (TESOL), BA (Hons)
(Continue reading)

Emiliano Guevara | 3 Sep 09:40 2008
Picon

Re: concordance program for large files

Don't waste money:

Laurence Anthony's AntConc is free, multiplatform, and does most of what
other commercial packages do (freq. lists, concordancing, collocates,
keywords).

http://www.antlab.sci.waseda.ac.jp/antconc_index.html

My students use AntConc with corpora as big as 5M words, but it starts
feeling sluggish over 3M words.

However, all the current stand-alone programs suffer this problem.
Someone in this thread just said that WSTools can manage
"with the whole of BNC", but it is not true. At least not in this part
of the world....

E.

On Sep 3, 2008, at 02:10 AM, jaime.hunt <at> studentmail.newcastle.edu.au  
wrote:

> I was just wondering if you know of a good concordance program that  
> deals with large files of over 1 million words that I might be able  
> to use for my research. Has anyone had any experience with one?  
> There are a few free ones on the internet, but they often don't deal  
> with really large files.
>
> Regards,
> Jaime

(Continue reading)


Gmane