Su-hsun Tsai | 2 Oct 02:49 2006
Picon

Numbers of English vocabulary required for students

Dear Corpora-Lers,

I am writing to ask some information about learning English (or textbook compilation info) in your country.

I would like to know how many words of English vocabulary required for a high school graduate entering a college/university program?  Do you have a similar requirement for students entering into high schools?

Thank you for your response.  When you respond, please also indicate whether English is an official/mother/second/foreign language in your context.  This information would be useful, too.

Best regards,

Su-hsun
Su-hsun Tsai
Assistant Professor
Taipei Municipal University of Education
Taipei, Taiwan
 
Brett Reynolds | 2 Oct 03:23 2006
Picon

Re: Numbers of English vocabulary required for students

On Oct 1, 2006, at 8:49 PM, Su-hsun Tsai wrote:

I would like to know how many words of English vocabulary required for a high school graduate entering a college/university program?  Do you have a similar requirement for students entering into high schools?

Leaving aside the question of what it means to "know" a word...

In Ontario, Canada, I know of no school that gets as specific as to have a minimum number of known words as an entry requirement. However, from a practical viewpoint, you should have a look at Paul Nation's work with word families. Paul's results suggests that the 2000 most common word families, plus the 570 word families of Averil Coxhead's Academic Word List would be a minimum for most non-native speakers of English to begin tertiary education.

Best,
Brett


-----------------------
Brett Reynolds
English Language Centre
Humber Institute of Technology and Advanced Learning
Toronto, Ontario, Canada



James_L._Fidelholtz | 2 Oct 00:18 2006
Picon

Re: Numbers of English vocabulary required for students

Hi, Su-hsun, 

Brett Reynolds's suggestions seem to me pretty sound. Just a couple of 
comments: 

Aside from his comment about 'knowing' a word, you also have to have clear 
what you mean by 'word'. I'm assuming Brett is including word derivation in 
what he calls 'word families'. Most theorists, as well as most 
practitioners, would call derived words separate words (eg, they have a 
separate entry in a normal dictionary). For at least some practical 
applications, we also need to have separate distinctly inflected words 
listed separately, or at least derived or derivable (eg, spell-checkers). 

In the first half of the last century, there was published a book which 
listed about 800 'basic' words of English, with the claim that virtually all 
communication in English (except, perhaps, the very most technical) could be 
carried out with just those 800 words. They were, of course, approximately 
the 800 most common words. This was clearly a cheat, since many such word 
are highly polysemous and, in some cases, even homonymous. I think nowadays 
most people would accept Brett's +/- 2500 words (plus close relatives) as 
(maybe a sub-) minimum for the number of words necessary to 'manage' English 
in an academic context. 

One needs to keep in mind also that, if the student is entering a 
predominantly or largely English-speaking environment, that their productive 
and receptive control of the language will naturally get better with the 
exposure to many different communicative contexts within and outside of the 
academic context. 

Jim 

PS: My specific context (Mexico) is not very relevant for your interests, I 
don't think. Of course, Spanish is the national language, though English is 
very widely (and often very lousily) taught in schools, and even used on 
occasion in academic contexts. Like Brett, I know of no university in the US 
or in Mexico that requires (control of) a certain number of words in English 
or in Spanish. Many do require or test for a certain level of (general or 
specific) knowledge, which would require being able to take and pass a test 
written in the language, with the answers likewise written in the language. 
The test, of course, would vary in generality and difficulty, depending on 
the level the student is aspiring to. 

Su-hsun Tsai escribió: 

> Dear Corpora-Lers, 
> 
> I am writing to ask some information about learning English (or textbook
> compilation info) in your country. 
> 
> I would like to know how many words of English vocabulary required for a
> high school graduate entering a college/university program?  Do you have a
> similar requirement for students entering into high schools? 
> 
> Thank you for your response.  When you respond, please also indicate 
> whether
> English is an official/mother/second/foreign language in your context.  
> This information would be useful, too. 
> 
> Best regards,
> Su-hsun
> Su-hsun Tsai
> Assistant Professor
> Taipei Municipal University of Education
> Taipei, Taiwan

James L. Fidelholtz
Posgrado en Ciencias del Lenguaje, ICSyH
Benemérita Universidad Autónoma de Puebla     MÉXICO

Jakub Marecek | 2 Oct 09:07 2006
Picon

Re: Numbers of English vocabulary required for students

Hello folks,

see http://www.infogreta.org/magazine/articles-9-1.htm or
one of books by Norbert Schmitt if you don't like the 2000
number. Paul Nation was (supposedly) using corpora of spoken
English to arrive at the artificially low number of 2000
while a decent coverage of real written English would
(again, supposedly) require learning many more words
(word families).

Best
Jakub

Brett Reynolds napsal(a):
> On Oct 1, 2006, at 8:49 PM, Su-hsun Tsai wrote:
>>
>> I would like to know how many words of English vocabulary required for 
>> a high school graduate entering a college/university program?  Do you 
>> have a similar requirement for students entering into high schools?
>>
> Leaving aside the question of what it means to "know" a word...
> 
> In Ontario, Canada, I know of no school that gets as specific as to have 
> a minimum number of known words as an entry requirement. However, from a 
> practical viewpoint, you should have a look at Paul Nation's work with 
> word families. Paul's results suggests that the 2000 most common word 
> families, plus the 570 word families of Averil Coxhead's Academic Word 
> List would be a minimum for most non-native speakers of English to begin 
> tertiary education.
> 
> Best,
> Brett
> 
> <http://english-jack.blogspot.com>
> 
> -----------------------
> Brett Reynolds
> English Language Centre
> Humber Institute of Technology and Advanced Learning
> Toronto, Ontario, Canada
> brett.reynolds <at> humber.ca <mailto:brett.reynolds <at> humber.ca>
> 
> 
> 

TadPiotr | 2 Oct 09:22 2006
Picon

RE: Numbers of English vocabulary required for students

I am afraid that in work like that by Nation we can find the common pitfall
of vocabulary-based word lists: common items have "little lexical meaning",
it is infrequent items which really carry the message (that is why we can
find the topic of a text by automatic retrieval procedures). It is
sufficient to have a look at various vocabulary tests applied to American
students to see that the range of the vocabulary they are expected to have
is enormous, though there are obviously some preferred areas, such as
Latinate and classical items, etc. Have a look at
http://www.vocaboly.com/vocabulary-test/, for example. And there are
hundreds of pages like that.
Best wishes
Tadeusz Piotrowski

> -----Original Message-----
> From: owner-corpora <at> lists.uib.no 
> [mailto:owner-corpora <at> lists.uib.no] On Behalf Of Jakub Marecek
> Sent: Monday, October 02, 2006 9:07 AM
> To: CORPORA <at> HD.UIB.NO
> Subject: Re: [Corpora-List] Numbers of English vocabulary 
> required for students
> 
> Hello folks,
> 
> see http://www.infogreta.org/magazine/articles-9-1.htm or one 
> of books by Norbert Schmitt if you don't like the 2000 
> number. Paul Nation was (supposedly) using corpora of spoken 
> English to arrive at the artificially low number of 2000 
> while a decent coverage of real written English would (again, 
> supposedly) require learning many more words (word families).
> 
> Best
> Jakub
> 
> 
> Brett Reynolds napsal(a):
> > On Oct 1, 2006, at 8:49 PM, Su-hsun Tsai wrote:
> >>
> >> I would like to know how many words of English vocabulary required 
> >> for a high school graduate entering a college/university 
> program?  Do 
> >> you have a similar requirement for students entering into 
> high schools?
> >>
> > Leaving aside the question of what it means to "know" a word...
> > 
> > In Ontario, Canada, I know of no school that gets as specific as to 
> > have a minimum number of known words as an entry 
> requirement. However, 
> > from a practical viewpoint, you should have a look at Paul Nation's 
> > work with word families. Paul's results suggests that the 2000 most 
> > common word families, plus the 570 word families of Averil 
> Coxhead's 
> > Academic Word List would be a minimum for most non-native 
> speakers of 
> > English to begin tertiary education.
> > 
> > Best,
> > Brett
> > 
> > <http://english-jack.blogspot.com>
> > 
> > -----------------------
> > Brett Reynolds
> > English Language Centre
> > Humber Institute of Technology and Advanced Learning 
> Toronto, Ontario, 
> > Canada brett.reynolds <at> humber.ca <mailto:brett.reynolds <at> humber.ca>
> > 
> > 
> > 
> 
> 

Xabier Saralegi Urizar | 2 Oct 12:26 2006

Standard ontology for document classification?

Dear all,
I want to classify many scientific documents among different categories 
based on their knowledge area, such as health, geography...
My question is whether there is a standard ontology for such a 
classification.
Regards,

--

-- 
Xabier Saralegi Urizar
Elhuyar I+G+B
Zelai Haundi kalea, 3
Osinalde industrialdea
20170 Usurbil
(+34) 943 36 30 40
xabiers <at> elhuyar.com / www.elhuyar.org

Brett Reynolds | 2 Oct 13:58 2006
Picon

Re: Numbers of English vocabulary required for students

On Oct 2, 2006, at 3:22 AM, TadPiotr wrote:

> I am afraid that in work like that by Nation we can find the common  
> pitfall
> of vocabulary-based word lists: common items have "little lexical  
> meaning",
> it is infrequent items which really carry the message (that is why  
> we can
> find the topic of a text by automatic retrieval procedures). It is
> sufficient to have a look at various vocabulary tests applied to  
> American
> students to see that the range of the vocabulary they are expected  
> to have
> is enormous, though there are obviously some preferred areas, such as
> Latinate and classical items, etc. Have a look at
> http://www.vocaboly.com/vocabulary-test/, for example. And there are
> hundreds of pages like that.

Indeed, native-speaker college students know tens of thousands of  
words. The point is that you don't NEED this vocabulary to get by.  
There is research indicating that people can guess many unknown words  
and generally understand a written text if they know 95% of the word  
families in it. That 95% number can often be reached in first-year  
university and college textbooks with a combination of the first 2000  
word families plus the academic word list. For example, using Michael  
West's General Service List as the top 2000 words, an analysis of the  
business textbook used by all first-year business students at my  
college shows that the first 2000 word families give 80% coverage.  
The academic word list gives another 10%. Proper nouns make up  
another roughly 4%, giving 94% coverage.

If we keep in mind that the learners are *beginning* tertiary  
education, and that much of first-year courses is dedicated to  
learning the technical vocabulary of the field, (indeed, this  
vocabulary is often glossed in the textbooks) then this group of 2500  
words should represent a reasonable entry minimum. Of course, the  
reading will not be easy, and a dictionary will be required, but it  
will be possible.

Now, again we come back to the question of what it means to know a  
word. In the research that produced the 95% number, 'knowing' was  
operationalised simply as any word that a learner said they knew.

Best,
Brett

<http://english-jack.blogspot.com>

-----------------------
Brett Reynolds
English Language Centre
Humber Institute of Technology and Advanced Learning
Toronto, Ontario, Canada
brett.reynolds <at> humber.ca

Tony Abou-Assaleh | 2 Oct 20:12 2006
Picon

Re: Standard ontology for document classification?

The Open Directory (dmoz.org) is an example of a classification system
that is used by a number of search engines. It is designed for web sites,
but can be equally used for documents.

Cheers,

TAA

-----------------------------------------------------
Tony Abou-Assaleh
Email:    taa <at> acm.org
Web site: http://taa.eits.ca
----------------------[THE END]----------------------

On Mon, 2 Oct 2006, Xabier Saralegi Urizar wrote:

> Dear all,
> I want to classify many scientific documents among different categories
> based on their knowledge area, such as health, geography...
> My question is whether there is a standard ontology for such a
> classification.
> Regards,
>
> --
> Xabier Saralegi Urizar
> Elhuyar I+G+B
> Zelai Haundi kalea, 3
> Osinalde industrialdea
> 20170 Usurbil
> (+34) 943 36 30 40
> xabiers <at> elhuyar.com / www.elhuyar.org
>
>

George Foster | 2 Oct 21:55 2006
Picon

Call for contributions: NIPS 2006 Workshop on MACHINE LEARNING FOR MULTILINGUAL INFORMATION ACCESS

Call for contributions

NIPS 2006 Workshop

MACHINE LEARNING FOR MULTILINGUAL INFORMATION ACCESS
====================================================

http://ilt.iit.nrc.ca/MLIA/

Description:
------------
In many different settings, accessing information available in different
languages is a challenge.

In Europe, the wide variety of languages is clearly a bottleneck for
efficient circulation and access to information. More than half of EU
citizens cannot hold a conversation in a language other than their
mother tongue. Even in an officially bilingual country like Canada, less
than one in five are considered to have a good enough command of both
official languages (2001 census data).

The traditional paradigm for addressing this issue is to perform human
translation on a massive scale, and rely on monolingual information
access technology. Although this model has worked reasonably well in the
past, the rapid increase in the amount of information produced (and, in
Europe, in the number of languages covered) raises questions as to its
sustainability. Machine Learning has the potential to help develop and
deploy technology that provides:

    1. access to information across different languages,
    2. usable translation from one language to another.

We are interested in Machine Learning techniques addressing for example
the following problems:

   * Word alignment
   * Machine translation
   * Multilingual lexicon and terminology extraction
   * Cross-lingual information retrieval
   * Cross-lingual categorisation

Goals of the workshop:
----------------------

Multilingual applications are also emerging as a promising application
for some Machine Learning techniques, for example the use of Kernel CCA
for Cross-Language applications, or large-margin approaches to word
alignment. This new trend converges with a well-established interest of
the Natural Language Processing community for learning approaches.

The purpose of this workshop is to provide a forum for discussion of
current developments at the intersection between multilingual processing
and machine learning. This includes developing new techniques to address
various multilingual information access problems (e.g. translation), but
also scaling up existing techniques to the available NLP data,
developing tools for cross-language information retrieval, etc.

We will promote discussions of some inter-related key issues in applying
Machine Learning to Multilingual problems:

* SCALING UP:
   - Applying ML to 100 million words corpora (e.g. SMT)
   - Deploying ML solutions on new language pairs

* SCARCE RESOURCES:
   - Languages or domains with limited bilingual corpora
   - Bootstrapping limited resources

* EVALUATION:
   - Design of better performance measures
   - Optimisation of application-specific measures
   - Learning human evaluation

* PRIOR LINGUISTIC KNOWLEDGE:
  - Modelling and using linguistic knowledge in ML
  - The continuum between all-data (SMT) and all prior knowledge
    (handcrafted rules)

Submission instructions:
------------------------

Researchers interested in presenting their work at the workshop should
send an email to: mlia <at> nrc-cnrc.gc.ca
(preferably plain text) with the following information:

- Title
- Author(s)
- Abstract (around 1 page)

Schedule:
Submission deadline: 29 October 2006
Notification: 6 November 2006
Workshop date: 8 or 9 December 2006

Co-organisers:
--------------
Cyril Goutte, National Research Council Canada (contact)
Nicola Cancedda, Xerox Research Centre Europe
Marc Dymetman, Xerox Research Centre Europe
George Foster, National Research Council Canada

Workshop format:
----------------
We intend to leave a good part of the workshop to panel discussions that
will address relevant topics in multilingual information access (MIA),
as well as invited talks presenting some important MIA problems and
associated challenges for Machine Learning. For each half day, we will
start with either a keynote or a short tutorial, continue with a few
shorter technical presentations, and end with a panel discussion (topics
to be decided depending on the confirmed list of speakers).

Invited speakers:

- Dan Melamed (Courant Institute, NYU)
- John Shawe-Taylor (ECS, U. of Southampton, UK), tbc
- Ralf Steinberger (JRC, Ispra, Italy)
- Wray Buntine (HIIT, Helsinki, Finland), tbc

Related work:
-------------
Past NIPS workshops have addressed related topics such as learning with
structured data, or the use of Machine Learning for Natural Language
Processing. There is also some ongoing interest within the European
network of excellence Pascal, as exemplified by the recent workshop on
intelligent information access. However none of these specifically
target multilingual aspects. We believe there is sufficient interest and
genuine need on this particular aspect to justify a specific focus on
multilingual information access. The newly started European project
SMART (Statistical Multilingual Analysis for Retrieval and Translation)
is specifically targeting advanced machine learning techniques for
multilingual applications.

Ralf Steinberger | 3 Oct 09:09 2006
Picon

RE: Standard ontology for document classification?

Xabier,

 

The multilingual (over 20 languages), wide-coverage Eurovoc thesaurus with its approximately 6000 classes has a subset of about 60 science-oriented classes, plus many related terms and classes in other domains that may also be useful (e.g. politics, law, economics, trade, finance, social questions, education, employment, transport, envirosnment, agriculture, energy, geography). The science-oriented classes provide the major science domains, but may not be detailed enough for your purposes. Please check out for yourself.

 

Eurovoc is browsable at http://europa.eu/eurovoc/ and is available free for research purposes. For details on where to get Eurovoc, see http://langtech.jrc.it/0509_EU-Enlargement-Workshop.html#HOW_TO_GET_THE_AC_CORPUS_AND_EUROVOC.

 

Eurovoc was developed for manual cataloguing of mainly parliamentary documents, but collections of multi-label classified documents such as the JRC-Acquis (http://langtech.jrc.it/JRC-Acquis.html) have been used to train an automatic multi-label Eurovoc classification system.

 

I hope this helps. All the best,

 

Ralf

 

 

Ralf Steinberger
European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology (
http://langtech.jrc.it, http://press.jrc.it/NewsExplorer
T.P. 267, Via Fermi 1
21020 Ispra (VA), Italy

 

 

 

-----Original Message-----
From: owner-corpora <at> lists.uib.no [mailto:owner-corpora <at> lists.uib.no] On Behalf Of Xabier Saralegi Urizar
Sent: 02 October 2006 12:26
To: CORPORA <at> uib.no
Subject: [Corpora-List] Standard ontology for document classification?

 

Dear all,

I want to classify many scientific documents among different categories

based on their knowledge area, such as health, geography...

My question is whether there is a standard ontology for such a

classification.

Regards,

 

--

Xabier Saralegi Urizar

Elhuyar I+G+B

Zelai Haundi kalea, 3

Osinalde industrialdea

20170 Usurbil

(+34) 943 36 30 40

xabiers <at> elhuyar.com / www.elhuyar.org

 


Gmane