Christopher Tribble | 1 Mar 2007 01:08
Picon

RE: Search Wordsmith by tag passive voice

Linda not sure if Rich Text is allowed on the list, so am also sending the
following search string:

<w VBD* <w VVN*

This works.  Don't forget to declare the BNC World Tag set in Settings /
Tags.

Best

Chris 

--
Dr Christopher Tribble
TEL #1  || +44 (0)20 7833 4271
TEL #2  || +44 161 929 4411
EMAIL   || ctribble <at> clara.co.uk
WEB     || www.ctribble.co.uk
BLOG    || http://ctribble.blogspot.com <http://ctribble.blogspot.com/>  

________________________________

	From: owner-corpora <at> lists.uib.no [mailto:owner-corpora <at> lists.uib.no]
On Behalf Of Linda Bawcom
	Sent: 28 February 2007 18:28
	To: corpora <at> uib.no
	Subject: [Corpora-List] Search Wordsmith by tag passive voice
	
	
	Dear List Members,
(Continue reading)

Marcello Federico | 1 Mar 2007 09:18
Picon

IWSLT '07: CFP

-------- FIRST CALL FOR PARTICIPATION  ----------------------

INTERNATIONAL WORKSHOP ON SPOKEN LANGUAGE TRANSLATION

IWSLT - 15-16 October 2007, Trento, Italy

http://iwslt07.itc.it

This year's  IWSLT workshop continues  the tradition of  organizing an
open evaluation campaign for spoken language translation followed by a
scientific workshop, in which  both system descriptions and scientific
papers  are  presented.    IWSLT's  evaluations  are  not  competition
oriented but,  on the  contrary, their goal  is to  foster cooperative
work  and  scientific  exchange.   In  this  respect,  IWSLT  proposes
challenging research tasks and an open experimental infrastructure for
the scientific community working on spoken language translation.

Evaluation

The  IWSLT  2007  Evaluation  Campaign will  feature  two  challenges,
namely,  the translation  of spontaneous  conversations in  the travel
domain from Italian  and Chinese-to English, as well  as two classical
tasks, that is,  the translation of read speech  in the travel domain,
from Arabic  and Japanese into  English.  For all  tasks, participants
will  be provided  with  training and  development  data sets,  useful
linguistic resources, and links  to open software tools for developing
state-of-the-art statistical machine translation systems.  In contrast
with  other MT  evaluation  campaigns, input  for  translation is  not
written text but transcripts generated by automatic speech recognition
systems.    Translation  systems  able   to  process   multiple  input
(Continue reading)

Georg Marko | 1 Mar 2007 10:57
Picon

Re: Search Wordsmith by tag passive voice

Dear Linda,

I just tried a search (on a different corpus, but also one with the same tag set, even though the tags are attached to the ending rather than to the beginning of words). Anyway the following should work:
<w VBD>* <w VVN>  I put * after the first tag to mean any form of BE
<w VB*

This should include all forms of be (I think VBD just refers to the past tense and does not occur on its own just as VBDZ for was, and VBDR for were)

<w VVN>*

This should include any word classified as a past participle.
So the search string <w VB* <w VVN>* should work, maybe also <w VB* and <w VVN>* as context word (with horizon 0/2), because this would also cover passives with an adverb intervening between the auxiliary and the participle (e.g. was really adopted).

I may have made a mistake here myself...

Best regards

Georg
"I drew a treasure map on your hand" Ani diFranco
Ksenia | 1 Mar 2007 12:00
Picon
Picon

PhD Studentship

PhD Studentship in Machine Learning at University of Bristol

Applications are invited for a 3-year PhD research studentship in the
Machine Learning and Biological Computation Group at the University of
Bristol. The studentship forms part of an EPSRC-funded project aimed at
learning the rules for morphological analysis of synthetic (morphologically
complex) languages, including Russian, Turkish and isiZulu. The PhD student
will develop a Machine Learning approach in order to efficiently induce
morphological rules (or grammars) both for regular and context-free
languages. More information about the project is available at
http://www.cs.bris.ac.uk/Research/MachineLearning/morph/index.html.

This position is available from April 2007. The funding will cover EU
tuition fees and an annual stipend, currently GBP 12,300. (The funding
will not cover non-EU tuition fees, so candidates from outside the EU must
find alternative means to fund the difference between EU and non-EU tuition
fees.) The successful candidate has a strong background in artificial
intelligence and machine learning and excellent analytical skills. Previous
experience with computational linguistics is preferred but not essential.
Due to the international context of the project, excellent communication
skills in English are required, as well as the ability to work in a team.

Applicants should send a full CV, with accompanying letter and name and
email address of two referees, to Professor Peter Flach
(Peter.Flach <at> bristol.ac.uk) or Dr. Ksenia Shalonova (ksenia <at> cs.bris.ac.uk).
Further details regarding the studentship are available on request. The
closing date for applications is Friday 9th March 2007.

Niels Ott | 1 Mar 2007 16:04
Picon
Favicon

German lemma list


Dear all,

about a month ago there as a little discussion going on here about
English lemma lists.

We should have a lemma list for German. There is no special requirement
but containing lemmata, e.g.

Haus
Katze
gehen
sitzen

Furthermore it would be nice if the list was equipped with POS. But
that's not a strict requirement.

It would be admirable if this list was free in the sense of free
speech/open source or if use was restricted to non-commercial
applications. (This is for a student's project at Univ.)

Thank you very much in advance for your assistance.

Regards,

   Niels Ott

--
Niels Ott - Computational Linguist (B.A.) - http://www.drni.de/niels/
Wer ohne Grund traurig ist, hat Grund traurig zu sein.
Yannick Versley | 1 Mar 2007 17:16
Picon
Favicon

Re: German lemma list

Dear Niels,

the lexicon of the CDG parser, which is available as open source software at
http://nats-www.informatik.uni-hamburg.de/view/CDG/DownloadPage
contains a listing of verb, noun and adjective lemmas with their inflection 
class (for noun lemmas). The list is included in plain-text form, so it 
should be possible to use it without installing the whole parser.
As the lemma list has been compiled by hand, its coverage is good, but not 
perfect.

Best regards,
Yannick

> about a month ago there as a little discussion going on here about
> English lemma lists.
>
> We should have a lemma list for German. There is no special requirement
> but containing lemmata, e.g.
>
> Haus
> Katze
> gehen
> sitzen
>
> Furthermore it would be nice if the list was equipped with POS. But
> that's not a strict requirement.
>
> It would be admirable if this list was free in the sense of free
> speech/open source or if use was restricted to non-commercial
> applications. (This is for a student's project at Univ.)
>
> Thank you very much in advance for your assistance.
>
> Regards,
>
>    Niels Ott

--

-- 
Yannick Versley
Seminar für Sprachwissenschaft, Abt. Computerlinguistik
Wilhelmstr. 19, 72074 Tübingen
Tel.: (07071) 29 77352

Nomi Guthmann | 1 Mar 2007 17:28

Corpus of translated material

Dear corpora list members,

We are doing a project concerned with corpus-based translation studies.
For this purpose, we are trying to collect a corpus of translated
material in the target language. The main requirement is to know
exactly what the source language was. Otherwise, we are happy with
data in any language and of any domain. For example, parallel corpora
(not necessarily aligned) would be an excellent resource, provided
that we know what the source language is.

We would highly appreciate any suggestions and references you may
have. I will post a summary of the replies.

Thanks,

Noemie Guthmann
Translation and Interpreting Studies Department
Bar Ilan University

Mario Crespo Miguel | 1 Mar 2007 17:38
Picon
Favicon

Frequencies of words in English given a certain domain

Dear everybody,

I am looking for different frequencies of words in English given 
different domains. I already have it for the medical domain, but I 
would like to extend my study to other domains. I would really 
appreciate it if someone can help me. all the best,

Mario

Jer M | 1 Mar 2007 18:51
Picon

Re: Search Wordsmith by tag passive voice

Linda,

Using the view.byu.edu BNC interface enter  [vb*]  [v?n]  and this will give you a frequency list.

I know its not wordsmith but there it is

Jeremiah McGhee

On 3/1/07, Georg Marko <georg.marko <at> uni-graz.at> wrote:
Dear Linda,

I just tried a search (on a different corpus, but also one with the same tag set, even though the tags are attached to the ending rather than to the beginning of words). Anyway the following should work:
<w VBD>* <w VVN>  I put * after the first tag to mean any form of BE
<w VB*

This should include all forms of be (I think VBD just refers to the past tense and does not occur on its own just as VBDZ for was, and VBDR for were)

<w VVN>*

This should include any word classified as a past participle.
So the search string <w VB* <w VVN>* should work, maybe also <w VB* and <w VVN>* as context word (with horizon 0/2), because this would also cover passives with an adverb intervening between the auxiliary and the participle (e.g. was really adopted).

I may have made a mistake here myself...

Best regards

Georg
"I drew a treasure map on your hand" Ani diFranco

Mark Davies | 2 Mar 2007 02:55

RE: Frequencies of words in English given a certain domain

Mario,

At http://view.byu.edu (BNC; 100 million words), you can select any of
70+ registers/genres, and then get a frequency listing for that genre.
Just enter "*" (without quotation marks) for a general frequency listing
for the selected register, "[nn1]" for singular nouns in that register,
etc. 

You can also easily compare word frequency in one register (or set of
registers) against another, e.g. sermons vs. spoken, tabloids vs.
broadsheet, medical vs. academic, etc.

Best,

Mark Davies

============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================

-----Original Message-----
From: owner-corpora <at> lists.uib.no [mailto:owner-corpora <at> lists.uib.no] On
Behalf Of Mario Crespo Miguel
Sent: Thursday, March 01, 2007 9:39 AM
To: corpora <at> uib.no
Subject: [Corpora-List] Frequencies of words in English given a certain
domain

Dear everybody,

I am looking for different frequencies of words in English given 
different domains. I already have it for the medical domain, but I 
would like to extend my study to other domains. I would really 
appreciate it if someone can help me. all the best,

Mario


Gmane