Gill Philip | 1 Dec 2009 11:22
Picon

Re: Data Driven Learning: summary of responses

Dear all,
last week I asked the list to pool their knowledge about research carried out on DDL with phrases rather than single words. I didn't expect to find much, having already trawled through the literature, but the picture is pretty bleak. To put it bluntly, with very few exceptions, DDL has not been used to teach phrases.
However, from this depressing picture comes a ray of light:a gaping hole in the literature that is begging to be filled. Short of research projects to assign your students? Run a little dry of inspiration recently? Anything you do in this field will be new! (and a welcome addition to knowledge of MWU acquisition).

Below is a list of the references I was able to collect which had direct relevance to applied linguistics (teaching or learner corpora); individual respondents have been thanked individually, but thanks again (you know who you are)

best,
Gill


1 - DDL + phrases: references
->phrasal verbs
Boulton, A. (2008). Looking for empirical evidence for DDL at lower levels. Practical applications of language and computers. In B. Lewandowska-Tomaszczyk (Ed.), Corpus Linguistics, Computer Tools, and Applications: State of the Art, (pp. 581–598). Frankfurt: Peter Lang.

(phraseological false friends)
Boulton, A. (forthcoming 2010). Data-driven learning: Taking the computer out of the equation . Language Learning 60 (3)
Philip, G. (2000). L’uso delle concordanze bilingui nell’insegnamento dei “falsi amici”. In Rossini Favretti, R. (Ed.), Linguistica e Informatica. Corpora, Multimedialità e Percorsi di Apprendimento, (pp. 363–373). Bologna, Italy: Bulzoni.

->linking adverbials
Boulton, A. (2009). Testing the limits of data-driven learning: language proficiency and training. ReCALL 21 (1). pp37-51

->future forms "will"/ "going to"
Boulton, A. (2007). DDL is in the details... and in the big themes. Proceedings of Corpus Linguistics 2007. URL http://ucrel.lancs.ac.uk/publications/CL2007/paper/126_Paper.pdf

->other
Alejandro Curado Fuentes (2001). Tasks for Business Science and Technology English: Evaluating Corpus-driven Data for ESP. ESP World 1 (1) http://www.esp-world.info/Articles_1/tasks.html
Tan, M. (2002). Fixed expressions, prepositional clusters and language teaching. In M. Tan (ed.) Corpus Studies in Language Education. Bangkok: IELE Press

-> there are also kibbitzers available, though phrases, when they occur, are never the main focus (as far as I could see)
http://www.eisu.bham.ac.uk/support/online/kibbitzers.shtml (links to kibbitzers need updating, but I hear the Bham techies are working on it)
https://lw.lsa.umich.edu/eli/micase/kibbitzer.htm


2 - Phrase recognition/identification etc.from corpora/learner corpora
Biber, D. (2006). University language: a corpus-based study of spoken and written registers. Amsterdam/Philadelphia: John Benjamins.
Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: examples from history and biology. English for Specific Purposes. 23 pp. 397-423
Hyland, K. (2008). Academic clusters: text patterning in published and postgraduate writing. International Journal of Applied Linguistics. 18 (1) pp. 41-62
Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes. 27 pp. 4–21
Scott, M. and C.Tribble (2006). Textual Patterns: Key words and corpus analysis in language education. Amsterdam/Philadelphia: John Benjamins.


*********************************
Dr. Gill Philip
CILTA
Università degli Studi di Bologna
Piazza San Giovanni in Monte, 4
40124 Bologna
Italy

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Ulle Endriss | 1 Dec 2009 11:48
Picon
Picon
Favicon

ECAI-2010: Final Call for Workshop Proposals

Dear colleague,

Please be reminded that the deadline for submission of workshop
proposals for ECAI-2010 (Lisbon, August 2010) is fast approaching:

Friday, 11 December 2009

Serious proposals from all all areas of AI as well as interdisciplinary
proposals with a substantial AI component are very welcome. Further
information is available here (or get in touch with me directly):

19th European Conference on Artificial Intelligence, Lisbon, 2010
http://ecai2010.appia.pt

ECAI-2010 Call for Workshop Proposals
http://www.illc.uva.nl/~ulle/ECAI-2010/ecai-2010-workshop-call.txt

Best wishes,
Ulle Endriss
(ECAI-2010 Workshop Chair)

--

-- 
Ulle Endriss         http://www.illc.uva.nl/~ulle/
Institute for Logic, Language & Computation (ILLC)
University of Amsterdam    Tel: +31 (0)20 525 6511
Postbus 94242              Fax: +31 (0)20 525 5206
1090 GE Amsterdam (NL)     Email: u.endriss <at> uva.nl

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

John Corbett | 1 Dec 2009 12:07
Picon
Picon

Child and teenage language

1. The SCOTS corpus (www.scottishcorpus.ac.uk) has a number of
transcribed oral interactions between young children and caregivers. If
you click on Standard Search, select 'spoken' and put 'buckie' in the
title, you'll get most of them. Click on each one for the transcript. At
the bottom of the screen you'll see options to listen to the recordings,
see the relevant metadata, and to download the transcripts either as
plain text or html for further manipulation.

John

John Corbett
Professor of Applied Language Studies
Head of the Department of English Language
School of Scottish and English Language and Literature 
University of Glasgow
12 University Gardens, GLASGOW G12 8QQ, 
Tel: +44 (0)141 330 6340/2978 Fax: +44 (0)141 330 3531

http://www.glasgow.ac.uk/englishlanguage/  

www.scottishcorpus.ac.uk

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

María Sánchez-Tornel | 1 Dec 2009 13:45
Picon
Favicon

Re: Child and teenage language

Hello Simone,

You may try this: www.um.es/sacodeyl or  
http://sacodeyl.inf.um.es/sacodeyl-search2/
You'll find 7 corpora of European teen talk in different languages,  
including English.

Regards,
María

Simone Löhndorf <Simone.Lohndorf <at> ling.lu.se> escribió:

> Dear corpora-list members,
>
> I'm a Ph.D. student in linguistics searching for English child and  
> teenage language corpora. I currently work with the CHILDES  
> database, PoW (Polytechnic of  Wales) and COLT (The Bergen Corpus of  
> London Teenage Language), but I am in need for more data. I'm  
> interested both in spoken and written language. If you know about  
> any such resources, I would be very grateful if let me know!
>
> Best wishes,
> Simone Löhndorf - Lund University, Sweden (simone.lohndorf <at> ling.lu.se)
> _______________________________________________
> Corpora mailing list
> Corpora <at> uib.no
> http://mailman.uib.no/listinfo/corpora
>

--

-- 
María Sánchez-Tornel
Departamento de Filología Inglesa
Campus de La Merced
Universidad de Murcia
30071 Murcia (Spain)
Tel. +34 868 884864

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Marketa Lopatkova | 1 Dec 2009 16:33
Picon
Favicon

PhD position in Computational Linguistics at Charles University in Prague

Vacant position for PhD candidates in Computational Linguistics
===============================================================

The Charles University in Prague (CUNI) invites applications for a
position as research scholar (PhD) in computational linguistics at the
Institute of Formal and Applied Linguistics.

The 3-year position involves participation as an Early Stage Researcher
in the project "Translation tools and resources for under-resourced
languages", which is the subproject of the EU project CLARA (Common
Language Resources and their Applications, Marie Curie Initial Training
Network receiving funding from the European Commission's 7th Framework
Programme).
The overall aim of the subproject is to investigate possibilities to
facilitate development of translation tools and resources for languages
that currently do not have or have limited translation technologies and
resources.

Application deadline: January 15, 2010
Duration: 3 years (starting from April 1, 2010 at the earliest and June
1, 2010 at the latest)

Further information, criteria for applicants and how to apply:
http://clara.uib.no/vacancies/esr-position-at-charles-univ-prague/

For further information about the available position, the project and
for other practical information, please contact Marketa Lopatkova.
See also the web page of the institute:
http://ufal.mff.cuni.cz/.

===========================================
RNDr. Marketa Lopatkova, Ph.D.
Institute of Formal and Applied Linguistics
Charles University in Prague

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Ines Rehbein | 1 Dec 2009 18:47
Picon

First Call for Papers: KONVENS 2010


===========================================
First Call for Papers for KONVENS 2010
===========================================

http://www.konvens2010.de

September 6 - 8, 2010, Saarbrücken, Germany

*IMPORTANT DATES*

      * April 23, 2010   Paper submissions due
      * June 15, 2010   	Notification of acceptance
      * July 17, 2010   	Camera-ready copy due

*TOPICS*

The Conference on Natural Language Processing ("Konferenz zur Verarbeitung 
Natürlicher Sprache", KONVENS) aims at offering a broad perspective on current 
research and developments within the interdisciplinary field of natural 
language processing. It provides a forum for researchers from all disciplines 
relevant to this field of research to present their work.

The central theme of the 10th KONVENS is
  	"Semantic approaches in Natural Language Processing".

We especially encourage the submission of contributions addressing linguistic 
aspects of meaning. Topics of interest include deep as well as shallow 
approaches, and knowledge-based as well as data-driven methods for modelling 
and acquiring semantic information. We equally encourage submissions on the use 
of semantic information in applications of language technology. Submissions 
should describe original and unpublished research or innovative industrial 
applications.

Young researchers in particular are encouraged to present their completed work 
for discussion.

*INVITED SPEAKERS*

Anette Frank (Universität Heidelberg)
Ed Hovy (ISI, University of Southern California)
Gerhard Weikum (Max-Plack-Institut für Informatik, Saarbrücken)

*FORMATS*

We welcome two types of contributions:

      * Full papers for oral presentation (8 pages)
      * Short papers for presentation as posters (4 pages)

Short papers/posters can be combined with a system demonstration. All 
submissions must be made electronically through the conference website. Reviews 
will be anonymous. Accepted full and short papers will be published in the 
conference proceedings.

The conference languages are English and German. We encourage the submission of 
contributions in English.

Submissions should not exceed 8 pages for full papers or 4 pages for short 
papers/posters. Submission will be electronic, managed by the EasyChair system, 
which will be made available soon. The only accepted format for submitting 
papers is Adobe PDF. For more information and formatting guidelines please see 
the full Call for Papers at http://www.konvens2010.de/cfp.html.

As the review process will be anonymous, your submission must not include the 
author(s) name(s) and affiliation(s), or other indications of the author(s) 
identity. If you encounter a submission problem or in case you have any 
questions, please contact the program committee (program <at> konvens2010.de).

WEBSITE:  	http://www.konvens2010.de/
E-MAIL:	 	info <at> konvens2010.de
_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
Graeme Hirst | 1 Dec 2009 21:19
Picon
Favicon

New Book: Kuebler, McDonald, Nivre: Dependency Parsing

BOOK ANNOUNCEMENT

Dependency Parsing

Sandra Kübler (Indiana University)
Ryan McDonald (Google Research)
Joakim Nivre (Uppsala and Växjö Universities)

Synthesis Lectures on Human Language Technologies #2 (Morgan &  
Claypool Publishers), 2009, 127 pages

Dependency-based methods for syntactic parsing have become  
increasingly popular in natural language processing in recent years.  
This book gives a thorough introduction to the methods that are most  
widely used today. After an introduction to dependency grammar and  
dependency parsing, followed by a formal characterization of the  
dependency parsing problem, the book surveys the three major classes  
of parsing models that are in current use: transition-based, graph- 
based, and grammar-based models. It continues with a chapter on  
evaluation and one on the comparison of different methods, and it  
closes with a few words on current trends and future prospects of  
dependency parsing. The book presupposes a knowledge of basic concepts  
in linguistics and computer science, as well as some knowledge of  
parsing methods for constituency-based representations.

Table of Contents: Introduction / Dependency Parsing / Transition- 
Based Parsing / Graph-Based Parsing / Grammar-Based Parsing /  
Evaluation / Comparison / Final Thoughts

http://dx.doi.org/10.2200/S00169ED1V01Y200901HLT002

This title is available online without charge to members of  
institutions that that have licensed the Synthesis Digital Library of  
Engineering and Computer Science.  Members of licensing institutions  
have unlimited access to download, save, and print the PDF without  
restriction; use of the book as a course text is encouraged.  To find  
out whether your institution is a subscriber, visit <http://www.morganclaypool.com/page/licensed 
 >, or just click on the book's URL above from an institutional IP  
address and attempt to download the PDF.  Others may purchase the book  
from this URL as a PDF download for US$30 or in print for US$40.   
Printed copies are also available from Amazon and from booksellers  
worldwide at approximately US$40 or local currency equivalent.
_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Graeme Hirst | 1 Dec 2009 21:15
Picon
Favicon

New Book: Zhai: Statistical Language Models for Information Retrieval

BOOK ANNOUNCEMENT

Statistical Language Models for Information Retrieval

ChengXiang Zhai (University of Illinois, Urbana-Champaign)

Synthesis Lectures on Human Language Technologies #1 (Morgan &  
Claypool Publishers), 2009, 141 pages

As online information grows dramatically, search engines such as  
Google are playing a more and more important role in our lives.  
Critical to all search engines is the problem of designing an  
effective retrieval model that can rank documents accurately for a  
given query. This has been a central research problem in information  
retrieval for several decades. In the past ten years, a new generation  
of retrieval models, often referred to as statistical language models,  
has been successfully applied to solve many different information  
retrieval problems. Compared with the traditional models such as the  
vector space model, these new models have a more sound statistical  
foundation and can leverage statistical estimation to optimize  
retrieval parameters. They can also be more easily adapted to model  
non-traditional and complex retrieval problems. Empirically, they tend  
to achieve comparable or better performance than a traditional model  
with less effort on parameter tuning. This book systematically reviews  
the large body of literature on applying statistical language models  
to information retrieval with an emphasis on the underlying  
principles, empirically effective language models, and language models  
developed for non-traditional retrieval tasks. All the relevant  
literature has been synthesized to make it easy for a reader to digest  
the research progress achieved so far and see the frontier of research  
in this area. The book also offers practitioners an informative  
introduction to a set of practically useful language models that can  
effectively solve a variety of retrieval problems. No prior knowledge  
about information retrieval is required, but some basic knowledge  
about probability and statistics would be useful for fully digesting  
all the details.

Table of Contents: Introduction / Overview of Information Retrieval  
Models / Simple Query Likelihood Retrieval Model / Complex Query  
Likelihood Model / Probabilistic Distance Retrieval Model / Language  
Models for Special Retrieval Tasks / Language Models for Latent Topic  
Analysis / Conclusions

http://dx.doi.org/10.2200/S00158ED1V01Y200811HLT001

This title is available online without charge to members of  
institutions that that have licensed the Synthesis Digital Library of  
Engineering and Computer Science.  Members of licensing institutions  
have unlimited access to download, save, and print the PDF without  
restriction; use of the book as a course text is encouraged.  To find  
out whether your institution is a subscriber, visit <http://www.morganclaypool.com/page/licensed 
 >, or just click on the book's URL above from an institutional IP  
address and attempt to download the PDF.  Others may purchase the book  
from this URL as a PDF download for US$30 or in print for US$40.   
Printed copies are also available from Amazon and from booksellers  
worldwide at approximately US$40 or local currency equivalent.

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

egrcpce | 2 Dec 2009 09:07
Picon

Deadline extended: 'LSP and Professional Communication' Conference

Languages for Specific Purposes and Professional Communication: Collaboration and Engagement

 

Date: 15 - 17 July 2010
Venue: Petaling Jaya Hilton Hotel, Selangor, Malaysia

Organizers:

Faculty of Languages and Linguistics, University of Malaya

Asia-Pacific LSP and Professional Communication Association

 

Call for Papers

The conference aims to bring together researchers and practitioners in Languages for Specific Purposes (LSP) and Professional Communication in the Asia-Pacific Rim and beyond to take part in a dialogue on research and collaboration in academic, professional and workplace contexts. To build a clearer understanding of communication in the workplace it is crucial to have effective collaboration between language experts and trainers and institutions specializing in communication skills. Such collaboration in turn is fundamental to the design and implementation of effective pedagogy, assessment and curricula.

 

** The deadline for abstract submission has been extended to 31 December 2009.**

 

New Website: http://umconference.um.edu.my/aplspca2010

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora
simon smith | 1 Dec 2009 14:09
Picon
Favicon

Re: Corpora Digest, Vol 30, Issue 1

2009/12/1 simon smith <smithsgj <at> gmail.com>

Dear all,
last week I asked the list to pool their knowledge about research carried
out on DDL with phrases rather than single words. I didn't expect to find
much, having already trawled through the literature, but the picture is
pretty bleak. To put it bluntly, with very few exceptions, DDL has not been
used to teach phrases.

 
I was thinking about Alex Boulton's work when I read this yesterday, but he (or someone else) pipped me at the post.
 
A lot of Michael Barlow's work is on MWUs, so take a look at his site.
 
In a way, though, your question puzzles me. Surely almost all of what we do in corpus studies is about which words go with which? Learning about collocations and context is the whole point, and this applies as much to DDL as to any other application of corpus linguistics.

_______________________________________________
Corpora mailing list
Corpora <at> uib.no
http://mailman.uib.no/listinfo/corpora

Gmane