Elena Beisswanger | 5 Feb 21:51
Picon
Picon
Favicon

2nd CFP New Challenges for NLP Frameworks, a workshop at LREC 2010

======================================================================

    2nd Call for Papers

    New Challenges for NLP Frameworks, a workshop at LREC 2010

    22 May 2010, La Valleta, Malta

    http://nlpframeworks2010.semanticsoftware.info

======================================================================

Natural language processing frameworks like GATE and UIMA have
significantly changed the way NLP applications are designed,
developed, and deployed. Features such as component-based design,
test-driven development, and resource meta-descriptions now routinely
provide higher robustness, better reusability, faster deployment, and
improved scalability. They have become the staple of both NLP
research and industrial application, fostering a new generation of
NLP users and developers.

Nevertheless, after more than a decade of the current generation of
NLP frameworks, the NLP research and application landscape is
shifting. This brings new challenges to both the developers of NLP
frameworks and their users.  Driving forces include in particular:

- Social Media

- Interoperability needs between different NLP frameworks,
  components, and resources

- Terabyte-Scale Data Sets

- Cloud and Grid Computing

- Semantic Computing, Ontologies, and Reasoning

- Cross-Media Language Analysis (text, speech, images, video)

- Ambient Computing

- Addressing more complex genres of language

THEMES AND TOPICS

This workshop will provide a venue for reporting ongoing work in the
context of NLP frameworks, such as UIMA, GATE, and other related systems.
Principal themes include:

- issues and approaches in processing of very large data collections, e.g.,
  parallelisation and distribution (particularly in relation to cloud
  computing)

- sophisticated tools to build and manage complex processing pipelines
and to
  analyse results

- software engineering in relation to language computation

- solutions to interoperability issues combining components from different
  sources (e.g., GATE, UIMA, NLTK, OpenNLP, NooJ)

- integration with related areas (data mining, semantic
  repositories, big table databases)

- persisting experimental contexts (computation and data), e.g. via
  virtualisation

- distribution of self-developed components, repositories of ready-to-use
  UIMA/GATE-based components

- efficient embedding of NLP processing in diverse environments (including
  small memory devices)

- research on genericity of components and type-system independence

- Service-Oriented Architecture (SOAs) and Software-as-a-Service
  (SaaS) models of language computation

- automatic feedback processes of knowledge discovery and reuse from text

INTENDED AUDIENCE

The workshop aims to bring together developers and users of NLP frameworks
from different perspectives, in order to elicit new requirements, feature
successful solutions, and exchange successful patterns of NLP engineering.
In particular, perspectives from the following user groups are welcome

- Application Developers, from both research and industry,
  with application experience reports

- Framework Developers, with an NLP/software engineering background

- Researcher users of NLP architectures

SUBMISSION FORMATS

We solicit the following types of publications:

Full research papers, describing novel, mature work, with an appropriate
level
of evaluation. Maximum of 8 pages in LREC format.

Short research papers, describing novel, early work, with preliminary
results;
as well as position papers or application experience reports. Length of
exactly 4 pages in LREC format.

Open source tool/resource papers, between 4-8 pages in LREC format. To
qualify for this category, the code or data must be accessible to the
reviewers and, if accepted, published together with the workshop under an
OSI-approved open source or open content license.

Note that the PC may suggest reassignment of a paper into a different
category depending on its contribution.

Your submission must be formatted according to LREC's authoring guidelines,
see http://www.lrec-conf.org/lrec2010/?Author-s-Kit-and-Templates

Submissions will be handled through the START system. When submitting a
paper
from the START page, authors will be asked to provide essential information
about resources (in a broad sense, i.e. also technologies, standards,
evaluation kits, etc.) that have been used for the work described in the
paper
or are a new result of your research.  For further information on this new
iniative, please refer to
http://www.lrec-conf.org/lrec2010/?LREC2010-Map-of-Language-Resources.

IMPORTANT DATES

February 12, 2010 - Deadline for workshop paper
March 8, 2010 - Notification of acceptance
March 18, 2010 - Camera-ready papers due
May 22, 2010 - Workshop in Malta

ORGANISERS

Rene Witte, Concordia University, Montréal
Hamish Cunningham, University of Sheffield
Jon Patrick, University of Sydney
Elena Beisswanger, University of Jena
Ekaterina Buyko, University of Jena
Udo Hahn, University of Jena
Karin Verspoor, University of Colorado Denver
Anni R. Coden, IBM T.J. Watson Research Center

PROGRAM COMMITTEE

Aaron Kaplan (Xerox, France)
Adam Funk (Uni. Sheffield)
Angus Roberts (Uni. Sheffield)
Anni R. Coden (IBM T.J. Watson Research Center)
Claude Roux (Xerox Research Labs)
Diana Inkpen (Uni Ottawa)
Diana Maynard (Uni. Sheffield)
Dietmar Rösner (Uni. Magdeburg)
Dragan Gasevic (Uni. Athabasca)
Ekaterina Buyko (Uni. Jena)
Elena Beisswanger (Uni. Jena)
Epaminondas Kapetanios (Uni Westminster)
Eric W. Brown (IBM T.J. Watson Research Center)
Graham Wilcock (Uni. Helsinki)
Guergana K. Savova (Mayo Clinic)
Hamish Cunningham (Uni. Sheffield)
Horacio Saggion (Uni. Sheffield)
Iryna Gurevych (Uni. Darmstadt)
Jian Su (I2R, Singapore)
Jochen Leidner (Thomson Reuters)
Jon Patrick (Uni. Sydney)
Juergen Rilling (Concordia Uni, Montréal)
Kalina Bontcheva (Uni. Sheffield)
Karin Verspoor (Uni. Colorado)
Katrin Tomanek (Uni. Jena)
Kevin B. Cohen (MITRE)
Leila Kosseim (Concordia Uni., Montréal)
Leo Ferres (Uni. of Concepcion)
Marc Light (Thomson Corp. R&D)
Michael Tanenblatt (IBM T.J. Watson Research Center)
Nancy Ide (Vassar College)
Nicolas Hernandez (Uni. Nantes)
Philip V. Ogren (Uni. Colorado)
Ralf Krestel (L3S Research Center, Hannover)
Rene Witte (Concordia Uni., Montréal)
Richard Eckart de Castilho (Uni. Darmstadt)
Sameer Pradhan (BBN)
Stefan Geißler (TEMIS GmbH)
Steven Bethard (Stanford Uni.)
Thilo Götz (IBM Germany)
Udo Hahn (Uni. Jena)
Valentin Tablan (Uni. Sheffield)
Yoshinobu Kano (Uni. Tokyo)
Yuntao Zhang (Shanghai Jiaotong Uni.)

--

-- 
Elena Beisswanger
Jena University Language and Information Engineering (JULIE) Lab
Fuerstengraben 30
07743 Jena, Germany
Phone: +49-(0)3641-944303
Fax:   +49-(0)3641-944321
URL:   http://www.julielab.de

Radwen ANIBA | 5 Feb 13:23
Picon

CPM still running after process

Hello,

Well the problem is on the title.

after mCPM.process() the CPM seems to still run.

How to figure out this issue ?

Regards

Rad
Radwen ANIBA | 5 Feb 11:53
Picon

telling the CPM to consider some changes

Hello,

I come back with a problem I have to run a CPM programmatically.

This is what I did :

I am based on the uima default FileSystemCollectionReader descriptor that I
call using

ResourceSpecifier colReaderSpecifier =
UIMAFramework.getXMLParser().parseCollectionReaderDescription(new
XMLInputSource("desc/FileSystemCollectionReader.xml"));
CollectionReader collectionReader =
UIMAFramework.produceCollectionReader(colReaderSpecifier);

Then I developed 4 analysis engines that I call like this

((BaseCPMImpl) mCPM).addCasProcessor(ae1);
((BaseCPMImpl) mCPM).addCasProcessor(ae2);
((BaseCPMImpl) mCPM).addCasProcessor(ae3);
((BaseCPMImpl) mCPM).addCasProcessor(ae4);

the problem is that I want to give the user the option to tell the
collection reader the folder he want to use containing the documents to be
analyzed so i used this method after producing the collection reader

 ConfigurationParameterSettings settings =
collectionReader.getMetaData().getConfigurationParameterSettings();

     org.apache.uima.resource.metadata.NameValuePair[] valuePairs =
settings.getParameterSettings();

     for (org.apache.uima.resource.metadata.NameValuePair nvp : valuePairs)
{

         // TODO: customize settings and save changes back using this crappy
CPE API

if(nvp.getName().matches("InputDirectory"))nvp.setValue("/path/to/test/if/that/work");

     }

And unfortunately the CPM seems to ignore this change and I think I'm
missing something here.

How to tell the CPM that I've changed the collection reader's configuration
paramater settings ?

Thx
Rad
Radwen ANIBA | 3 Feb 11:40
Picon

CPM class usage

Hello,

I have devoloped a serie of AEs that I tested through CPE GUI,and now I'm
trying to write a java aplication using these AEs and trying to do th same
thing as CPE GUI. So I took as reference SimpleRunCPM example and I made
these changes

 // create a new Collection Processing Manager
    mCPM = UIMAFramework.newCollectionProcessingManager();

    // Register AE and CAS Consumer with the CPM
    mCPM.setAnalysisEngine(ae1);
    mCPM.setAnalysisEngine(ae2);
    mCPM.setAnalysisEngine(ae3);
    mCPM.setAnalysisEngine(ae4);

Notice here I have 4 ae and not only one

So when running this I have an error message in mCPM.setAnalysisEngine(ae2)
saying

Initializing AnalysisEngines
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0,
Size: 0
    at java.util.LinkedList.entry(LinkedList.java:365)
    at java.util.LinkedList.remove(LinkedList.java:357)
    at
org.apache.uima.collection.impl.cpm.engine.CPMEngine.removeCasProcessor(CPMEngine.java:1188)
    at
org.apache.uima.collection.impl.cpm.BaseCPMImpl.removeCasProcessor(BaseCPMImpl.java:361)
    at
org.apache.uima.collection.impl.cpm.CPMImpl.setAnalysisEngine(CPMImpl.java:70)

So I don't know what is really the problem, if CPM do not take up to one ae
or this is a problem in ae2 that I already tried without any bug.

Any idea ?

Thx

Rad
Ekaterina Buyko | 1 Feb 13:16
Picon
Picon
Favicon

LREC Workshop "New Challenges for NLP Frameworks" - 2nd Call for Papers

======================================================================

   2nd Call for Papers

   New Challenges for NLP Frameworks, a workshop at LREC 2010

   22 May 2010, La Valleta, Malta

   http://nlpframeworks2010.semanticsoftware.info

======================================================================

Natural language processing frameworks like GATE and UIMA have
significantly changed the way NLP applications are designed,
developed, and deployed. Features such as component-based design,
test-driven development, and resource meta-descriptions now routinely
provide higher robustness, better reusability, faster deployment, and
improved scalability. They have become the staple of both NLP
research and industrial application, fostering a new generation of
NLP users and developers.

Nevertheless, after more than a decade of the current generation of
NLP frameworks, the NLP research and application landscape is
shifting. This brings new challenges to both the developers of NLP
frameworks and their users.  Driving forces include in particular:

- Social Media

- Interoperability needs between different NLP frameworks,
 components, and resources

- Terabyte-Scale Data Sets

- Cloud and Grid Computing

- Semantic Computing, Ontologies, and Reasoning

- Cross-Media Language Analysis (text, speech, images, video)

- Ambient Computing

- Addressing more complex genres of language

THEMES AND TOPICS

This workshop will provide a venue for reporting ongoing work in the
context of NLP frameworks, such as UIMA, GATE, and other related systems.
Principal themes include:

- issues and approaches in processing of very large data collections, e.g.,
 parallelisation and distribution (particularly in relation to cloud
 computing)

- sophisticated tools to build and manage complex processing pipelines
and to
 analyse results

- software engineering in relation to language computation

- solutions to interoperability issues combining components from different
 sources (e.g., GATE, UIMA, NLTK, OpenNLP, NooJ)

- integration with related areas (data mining, semantic
 repositories, big table databases)

- persisting experimental contexts (computation and data), e.g. via
 virtualisation

- distribution of self-developed components, repositories of ready-to-use
 UIMA/GATE-based components

- efficient embedding of NLP processing in diverse environments (including
 small memory devices)

- research on genericity of components and type-system independence

- Service-Oriented Architecture (SOAs) and Software-as-a-Service
 (SaaS) models of language computation

- automatic feedback processes of knowledge discovery and reuse from text

INTENDED AUDIENCE

The workshop aims to bring together developers and users of NLP frameworks
from different perspectives, in order to elicit new requirements, feature
successful solutions, and exchange successful patterns of NLP engineering.
In particular, perspectives from the following user groups are welcome

- Application Developers, from both research and industry,
 with application experience reports

- Framework Developers, with an NLP/software engineering background

- Researcher users of NLP architectures

SUBMISSION FORMATS

We solicit the following types of publications:

- full research papers (6-8 pages in LREC format)

- short papers (3-4 pages to be presented as demos/posters)

- open source tool/resource papers (full or short, must be accompanied by
 working code or accessible data)

Submission will be handled through the START system. When submitting a
paper
from the START page, authors will be asked to provide essential information
about resources (in a broad sense, i.e. also technologies, standards,
evaluation kits, etc.) that have been used for the work described in the
paper
or are a new result of your research.  For further information on this new
iniative, please refer to
http://www.lrec-conf.org/lrec2010/?LREC2010-Map-of-Language-Resources.

IMPORTANT DATES

February 12, 2010 - Deadline for workshop paper
March 8, 2010 - Notification of acceptance
March 18, 2010 - Camera-ready papers due
May 22, 2010 - Workshop in Malta

ORGANISERS

Rene Witte, Concordia University, Montréal
Hamish Cunningham, University of Sheffield
Jon Patrick, University of Sydney
Elena Beisswanger, University of Jena
Ekaterina Buyko, University of Jena
Udo Hahn, University of Jena
Karin Verspoor, University of Colorado Denver
Anni R. Coden, IBM T.J. Watson Research Center

PROGRAM COMMITTEE

Aaron Kaplan (Xerox, France)
Adam Funk (Uni. Sheffield)
Angus Roberts (Uni. Sheffield)
Anni R. Coden (IBM T.J. Watson Research Center)
Claude Roux (Xerox Research Labs)
Diana Inkpen (Uni Ottawa)
Diana Maynard (Uni. Sheffield)
Dietmar Rösner (Uni. Magdeburg)
Dragan Gasevic (Uni. Athabasca)
Ekaterina Buyko (Uni. Jena)
Elena Beisswanger (Uni. Jena)
Epaminondas Kapetanios (Uni Westminster)
Eric W. Brown (IBM T.J. Watson Research Center)
Graham Wilcock (Uni. Helsinki)
Guergana K. Savova (Mayo Clinic)
Hamish Cunningham (Uni. Sheffield)
Horacio Saggion (Uni. Sheffield)
Iryna Gurevych (Uni. Darmstadt)
Jian Su (I2R, Singapore)
Jochen Leidner (Thompson Reuters)
Jon Patrick (Uni. Sydney)
Juergen Rilling (Concordia Uni, Montréal)
Kalina Bontcheva (Uni. Sheffield)
Kano Yoshinobu (Uni. Tokyo, Tsujii Lab)
Karin Verspoor (Uni. Colorado)
Katrin Tomanek (Uni. Jena)
Kevin B. Cohen (MITRE)
Leila Kosseim (Concordia Uni., Montréal)
Leo Ferres (Uni. of Concepcion)
Marc Light (Thomson Corp. R&D)
Michael Tanenblatt (IBM T.J. Watson Research Center)
Nancy Ide (Vassar College)
Nicolas Hernandez (Uni. Nantes)
Philip V. Ogren (Uni. Colorado)
Ralf Krestel (L3S Research Center, Hannover)
Rene Witte (Concordia Uni., Montréal)
Richard Eckart de Castilho (Uni. Darmstadt)
Sameer Pradhan (BBN)
Stefan Geißler (TEMIS GmbH)
Steven Bethard (Stanford Uni.)
Thilo Götz (IBM Germany)
Udo Hahn (Uni. Jena)
Valentin Tablan (Uni. Sheffield)
Yuntao Zhang (Shanghai Jiaotong Uni.)

Igor Sominsky | 1 Feb 10:01
Picon
Favicon

AUTO: Igor Sominsky is out of the office (returning Mon 02/08/2010)


I am out of the office from Sun 01/31/2010 until Mon 02/08/2010.

Note: This is an automated response to your message  "Getting
non-annotations (e.g. TOP) from a CAS" sent on 2/1/10 3:50:12.

This is the only notification you will receive while this person is away.
Picon
Favicon

Getting non-annotations (e.g. TOP) from a CAS

Hello folks,

after upgrading to UIMA 2.3.0, I notice that all (J)CAS access methods I found so far (including indexes)
always return Annotation or AnnotationFS.

How can one get access now to types that directly inherit from TOP?

Cheers,

Richard

--

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Software Engineer
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone +49 (6151) 16 - 6218, fax -5455, room S2/02/E225
eckartde@... 
www.ukp.tu-darmstadt.de 
------------------------------------------------------------------- 

Greg Holmberg | 29 Jan 19:44
Picon
Favicon

XMI XML XSD?

I know that the XMI schema is not defined by UIMA, but does anyone happen  
to know of an XML schema definition (.xsd) file for XMI?  It might make  
parsing XMI XML easier.

Thanks,

Greg Holmberg

Greg Holmberg | 29 Jan 19:40
Picon
Favicon

UIMA-AS binary serialization

Hi UIMA users--

I see in the README for 2.3 that UIMA-AS uses a new, efficient binary  
serialization for remote services.

I couldn't find much information about it in the Async Scaleout docs.  It  
was briefly mentioned as a configuration option, but not described.

Is this the same format that is used to serialize to C++?

If not, where can I find more information?

Must the recipient re-constitute the CAS, or is it self-describing like  
XML and could be handled by a non-UIMA recipient?

Thanks,

Greg Holmberg

Jörn Kottmann | 29 Jan 10:26
Picon

UIMA AS: Duplicate Request

Hi,

there is this message in the service logs:
1/29/10 3:09:40 AM - 16:

org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handleProcessRequestFromRemoteClient: 
INF
O: Duplicate Request With Cas Reference Id: 2d0e21bb:12664a22384:7eac 
Received. Ignoring Duplicate.

What does it mean ? And how can it happen ?

Jörn

Marshall Schor | 28 Jan 23:58
Picon
Favicon

[Announce] Apache UIMA 2.3.0 released

The Apache UIMA development community is pleased to announce the release
of version 2.3.0 of UIMA (Unstructured Information Management
Architecture).  Apache UIMA is a framework supporting combining and
reusing components that annotate unstructured information content such
as text, audio, and video.

This release consists of 4 packages:

 - UIMA Java SDK - the base framework, with development tools and examples
 - UIMA-AS (Asynchronous Scalout capability)
 - UIMACPP (c++ support framework, for components written in c++ and
other languages)
 - UIMA Addons - a growing set of annotators and other tools.

This release is generally backwards compatable with previous releases,
except that Java 5 is now the minimum Java level required.

The add-ons package contains many new components and annotators, including:

  - Bean Scripting Framework supporting annotators written in popular
scripting languages
  - Lucas - an interface to using UIMA with Apache Lucene
  - TikaAnnotator - an annotator using the Apache Tika project text
extractors

The UIMA-AS (Asynchronous Scaleout) framework is extensively enhanced
with much more support for error/failure recovery, driven by feedback
from actual use in several large scale deployments (1000's of nodes).
The base framework now supports Java 5 generics, and is enhanced to make
it even more light-weight and efficient; for example, it now supports a
new network serialization format for communicating with remote
annotators using a "delta-CAS" - limiting the response sent to just
those items which have changed.

Full information and summaries of the changes are contained in the
release notes, which you can find on our downloads page - scroll down to
the 2.3.0 release section, and click on the package of interest in the
release notes column.

Apache UIMA welcomes your help.  Any contribution (code, testing,
documentation, bug reporting/fixing) is always appreciated.  For more
information on how to get involved, please visit the website at:

  http://incubator.apache.org/uima

Thank you for your interest in Apache UIMA.

-The Apache UIMA development community


Gmane