Apologies for multiple postings
Registrations to the Labs offered at CLEF 2011 remain open till May 15
If you would like to participate in one or more of the CLEF 2011 Labs, please do not forget to register online at the CLEF 2011 website
CLEF 2011 Labs - Call for Participation -
The CLEF 2011 conference is the continuation of the popular CLEF campaigns and workshops that ran 2000-2009 (http://www.clef-campaign.org/). In 2010, CLEF was organized in a different way: the CLEF 2010 Conference (http://clef2010.org) presenting related research papers followed by a series of “Labs” covering a broad range of issues from fields of multilingual and multimodal information access evaluation. CLEF 2011 will be organized in a similar way.
Two different forms of Labs are offered:
- Labs that are a “campaign-style” evaluation for specific information access problems (during the twelve month period preceding the conference), similar in nature to the traditional CLEF campaign “tracks”
- Labs that follow a more classical “workshop” pattern, exploring issues of evaluation methodology, metrics, processes etc. in information access and closely related fields.
Below is a brief summary of the Labs proposed this year.
Cross-Language Image Retrieval (ImageCLEF)
This track evaluates retrieval from visual collections; both textual and visual retrieval techniques are exploitable. Four challenging tasks are foreseen: 1) retrieval from a collection of Wikipedia images with textual annotations and topics in several languages; 2) medical image retrieval with visual, semantic and mixed topics in several languages with a data collection from the scientific literature; 3) visual classification of leaf images for the identification of plant species; 4) a photo annotation task that investigates automated semantic annotation based on visual information with approaches based on Flickr user tags and multimodal approaches. In addition, a practical showcase is planned at the conference for the real-time evaluation of interactive image search systems.
Lab Coordinators are U. of Applied Sciences Western Switzerland (CH), Oregon Health and Science U. (US), CEA LIST (FR), U. of Geneva (CH), Fraunhofer Society (DE), University of North Texas (US), INRIA (FR). See also http://www.imageclef.org/.
Uncovering Plagiarism, Authorship, and Wikipedia Vandalism (PAN)
PAN <at> CLEF 2011 divides into three tasks:
- Plagiarism Detection. Today's plagiarism detectors face intricate situations, such as paraphrased plagiarism within and across languages. Moreover, the source of a plagiarism case may be hidden within a large collection of documents such as the Web, or it may not be available at all. Building on the successful evaluation framework developed in the last two years, we continue to add new challenges this year.
- Author Identification. Throughout history and especially today, many texts are written anonymously or under false names, so that readers may not be certain of a text's alleged author. Author identification is the task determining the true author of a text, and one of the main challenges is to automatically attribute a text to a set of known authors. For the purpose of the evaluation, we have developed a new authorship evaluation corpus.
- Wikipedia Vandalism Detection. Vandalism has always been one of Wikipedia's biggest problems. However, the detection of vandalism is done mostly manually by volunteers, and research on automatic vandalism detection is still in its infancy. Hence, solutions are to be developed which aid Wikipedians in their efforts.
PAN is organized by the Bauhaus-Universität Weimar, the Universidad Politécnica de Valencia, the University of the Aegean, the Bar-Ilan University, the Illinois Institute of Technology, and the Duquesne University. For more information see http://pan.webis.de/.
The research goal of LogCLEF is the analysis and classification of queries in order to understand search behaviour in multilingual contexts, and it consists of three tasks: 1) Language identification task; 2) Query classification, 3) Success of a query. In coordination with the organizers, participating groups will be devoted to different tasks in exploring and understanding the data. For some of these tasks, test sets of human labelled queries (a subset from the log files) should be assembled and released to the participants. In order to create an interesting test and training set, participants will be required to submit both manual and automatic annotations. At least two large log datasets will be made available: The European Library action logs; the Sogou chinese search engine logs.
Lab Coordinators: University of Padova, Dublin City University, University of Hildesheim. For further information: http://www.promise-noe.eu/mining-user-preference/logclef-2011/home
QA4MRE - Question Answering for Machine Reading Evaluation
QA4MRE is a new task of the Question Answering (QA) Track. The goal is to evaluate Machine Reading abilities through Question Answering and Reading Comprehension Tests. The focus is on capturing knowledge from given text collections and using it to answer questions. The task consists of the reading of single documents, where correct answers require some inference and previously-acquired background knowledge. Like in Reading Comprehension Tests, questions will be in the form of multiple choice questions. The participating systems will be required to answer the questions by choosing in each case one answer from the alternatives proposed.
Coordination: UNED (Spain), ISI (USA), CELCT (Italy), University of Limerick (Ireland). For more information see: http://celct.fbk.eu/QA4MRE/
CLEF-IP: IR in the IP domain
In 2011, CLEF-IP puts to use a collection of more than 2 million patent documents in XML format with content in English, German, and French. The collection will also include patent images.
During this benchmarking activity, several tasks will be organised:
- The Prior Art Candidate Search task requires the participants to retrieve documents that are potential prior art to a given document.
- The Image-based Document Retrieval task is a pilot task where participants are asked to find patent documents or images relevant to a given patent document containing images.
- The Classification task will ask participants to classify documents according to the International Patent Classification scheme up to the subclass level. An optional Refined Classification task will be available, where participants are asked to classify patent documents up the group and subgroups levels when the subclass is given.
- The Image-based Classification task asks participants to categorise given patent images into pre-defined categories of images (flow-charts, hand drawings, technical drawings, chemical structures, etc.)
Coordinators: Information Retrieval Facility (AT), max.recall (AT). Website: http://www.ir-facility.org/clef-ip
Currently, music search is still mostly text-based. The goal of MUSICLEF is to promote the development of novel methodologies for music access and retrieval, which can combine content-based information, automatically extracted from music files, with contextual information, provided by users through tags, comments, reviews, possibly in different languages. The combination of these two sources of information is still under-investigated in music retrieval. To this end, MUSICLEF aims at developing and maintaining a test collection of both music files and relevant desciptors, to be distributed to participants in the form of feature vectors (e.g. MFCCs, Chroma-vectors, rhythmic features, prominent pitch) and links to public available music excerpts.
Coordinators: University of Padova, Italy, and University of Alicante, Spain. More information at: http://ims.dei.unipd.it/websites/MusiCLEF/
CHiC 2011 – Cultural Heritage in CLEF: From Use Cases to Evaluation in Practice for Multilingual Information Access to Cultural Heritage
The CHiC2011 workshop at CLEF 2011 aims at moving towards a systematic and large-scale evaluation of cultural heritage digital libraries and information access systems by surveying all the evaluation efforts in the cultural heritage field as well as defining user scenarios and identifying possible relevant metrics.
Workshop participants are asked to introduce ideas for evaluation scenarios derived from use cases, user studies and previous evaluation efforts and should focus on describing possible scenarios from the CH domain that are applicable for various systems and projects.
Coordinators are: University of Padua, Humboldt-Universität zu Berlin and University of Sheffield. For more information see: http://www.promise-noe.eu/chic-2011/home
CELCT (web: www.celct.it)
Center for the Evaluation of Language and Communication Technologies
Via alla Cascata 56/c
38100 Povo – TRENTO –Italy
email: forner <at> celct.it
tel.: +39 0461 314 804
fax: +39 0461 314 846
Secretary Phone: +39 0461 314 870