Peter Klügl | 28 Feb 17:32 2015

Re: Ruta partofneq

Thank you. I try to take a look at it this weekend.



Am 27.02.2015 um 14:40 schrieb Silvestre Losada:
> Hi,
> I submited a patch with a solution for the error reported
> Best
> On 25 February 2015 at 10:41, Silvestre Losada <silvestre.losada@...>
> wrote:
>> Done
>> On 24 February 2015 at 19:35, Peter Klügl <pkluegl@...>
>> wrote:
>>> Hi,
>>> could you open an issue and attach it there? That would be great.
(Continue reading)

Silvestre Losada | 27 Feb 14:46 2015

Ruta select all annotations with same feature velue.

Hi All,

I want to select all the annotations that has the same feature value. In
case exist more than one annotation with same feature value I want to
consider them as duplicates and keep only one of them.
All annotations belongs to same type.


Ann1, Ann3 and Ann6 should be selected. Then I plan to use unmark action to
remove not needed annotations.

Kind regards
Philippe de Rochambeau | 19 Feb 21:28 2015

Analysing archive PDFs


In the past few months, I have indexed tens of thousands of PDFs containing newspaper articles from 1887
until 1940 using SOLR for my company.

Every day, my colleagues in the Archive Department spend hours searching through the archives using SOLR,
looking for potentially-interesting articles from a social and historical point of view.

Can UIMA or OpenNLP be used to automate their work and/or to analyze patterns in the data?

Many thanks.

Alberto Garcia | 19 Feb 12:50 2015

Concept Mapper Annotator Question

We are starting to use UIMA framework for entity identification. We base
our solution on some dictionaries which contains the entities we need to

We are using the Concept Mapper annotator, and it works really fast
recognizing the complete name of an entity, but it fails recognizing part
of the entity, let me explain that with an example,

Lets say we have this entry on the dictionary:

<token canonical=*"Location"* DOCNO=*"10000"*>

    <variant base = *"New York City"*/>


If we call the service with “*New York City” *as input text  it recognize
the entity as Location,

If we call the service with “*New City York” *or different permutations it
recognize the entity as Location,

BUT If we call the service with “*New City”*  it does not recognize it as a

Can anyone tell me how I can implement or configure this behavior for the
Concept Annotator?
Mario Gazzo | 18 Feb 21:46 2015

Approach for keeping track of formatting associated with text views

We are starting to use the UIMA framework for NL processing article text, which is usually stored with
metadata in some XML format. We need to extract text elements to be processed by various NL analysis
engines that only work with pure text but we also need to keep track of the formatting information related
to the processed text. It is in general also valuable for us to be able to track every annotation back to the
original XML to maintain provenance. Before embarking on this I like to validate our approach with more
experienced users since this is the first application we are building with UIMA.

In the first step we would annotate every important element of the XML including formatting elements in the
body. We maintain some DOM-like relationships between the body text and formatting annotations so that
text formatting can be reproduced later with NLP annotations in some article viewer.

Next we would in another AE produce a pure text view of the text annotations in the XML view that need to be NL
analysed. In this new text view we would annotate the different text elements with references back to
their counterpart in the original XML view so that we can trace back positions in the original XML and the
formatting relations. This of course will require mapping NLP annotation offsets in the text view back to
the XML view but the information should then be there to make this possible.

This approach requires somewhat more handcrafted book keeping than we initially hoped would be
necessary. We haven’t been able to find any examples of how this is usually done and the UIMA docs are
vague regarding managing this kind of relationships across views. We would therefore really like to know
if there is a simpler and better approach.

Any feedback is greatly appreciated. Thanks.
Priyanka Das | 16 Feb 13:18 2015

UIMA-AS build failure

Hi ,

My UIMA-AS build is getting failed with the following error

Apache UIMA-AS: uima-as ............................ FAILURE [07:27 min]




[INFO] Total time: 13:33 min

[INFO] Finished at: 2015-02-16T17:29:16+05:30

[INFO] Final Memory: 81M/490M


[ERROR] Failed to execute goal
(javadocs-distr) on project uima-as: An error has occurred in JavaDocs
report generation:

[ERROR] Exit code: 1 -
(Continue reading)

Silvestre Losada | 12 Feb 10:12 2015

Ruta partofneq

I dont know if this is a bug or if it is wokring well. I have the following

     begin: 4

Then if apply the following ruta

(AnnotationA{-> UNMARK(AnnotationA)}){PARTOFNEQ(AnnotationA)};

The output is
     begin: 4

(Continue reading)

Valentin Tablan | 10 Feb 14:08 2015

RUTA: GATHER and optional rule elements


How does GATHER interact with optional rule elements? I've not found any
explicit statements about this in the docs, and I would prefer getting an
authorised opinion.

The following RUTA grammar:

DECLARE Annotation A;
DECLARE Annotation B;
DECLARE Annotation C(Annotation a, Annotation b);


A B?{-> GATHER(C, 1, 2, "a" = 1, "b" = 2)};

Run s as expected on the input "A B C", but gets a NullPointerException on
"A X C", presumably because the 2 index is not mapped to anything.

Is there a way to handle this kind of situation; for example to create the
C annotation, with an unpopulated "b" feature?

Here's the exception trace, for reference:

SEVERE: Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator
(Continue reading)

reshu.agarwal | 10 Feb 13:06 2015

DUCC- Heartbeat Packets?


I read in DUCC book about:

Agents monitors nodes, sending heartbeat packets with node statistics to 
interested components (such as the RM and web-server).


    This shows the current state of a machine. Values include:

        The node is in the DUCCnodes file
        but no DUCC process has been started there, or else there is a
        communication problem and the state messages are not being
        The node has a DUCC Agent process running on it and the web
        server is receiving regular heartbeat packets from it.
        The node had a healthy DUCC Agent on it at some point in the
        past (since the last DUCC boot), but the web server has stopped
        receiving heartbeats from it.

        The agent may have been manually shut down, may have crashed, or
        there may be a communication problem.

        Additionally, very heavy loads from jobs running the the node
        can cause the DUCC Agents heartbeats to be delayed.
(Continue reading)

Aleksandar Dimitrov | 25 Jan 22:59 2015

Using OpenNLP type annotations with UIMAfit


The UIMAfit manual (5.1) states that the preferred way to iterate over tokens in
the CAS is the following:

    // JCas version
    for (Token token :, Token.class)) {

This assumes a Token.class is importable somewhere. But I'm using the OpenNLP
tools, which don't provide such a type. Instead, it seems to be generated at run
time during configuration steps, and is not accessible as a class in the AE (to
my knowledge.)

Additionally, when extending instead
of o.a.u.component.JCasAnnotator_ImplBase, the method void typeSystemInit(TypeSytem)
is not provided, which makes instatiating the type system the same way OpenNLP
does it rather cumbersome (I generate an empty CAS with the typSystemDescription,
then get its TypSystem and provide the Type and Feature objects from this
TypeSystem instance as UIMAfit configuration parameters before deploying my AE.)

Even then, I can only use the less type-safe method of iterating over
annotations: for (AnnotationFS token : cas.getAnnotationIndex(tokenType)) where
tokenType is the Type instance I acquired from the TypeSystem either during
typeSystemInit() or during configuration with the above hack.

Is there some good way of solving this dilemma while still using UIMAfit's
classes? Obviously, I could go back to using just plain UIMA, but I quite like
UIMAfit's way of dealing with external resources! And I don't like the
(Continue reading)

Robin Chesterman | 24 Jan 23:31 2015

Getting started essies

Hi all,

I'm so sorry for such a newbie problem - I'm new to UIMA and also to Java
and Eclipse - I'm a .Net / Visual Studio guy usually.

I've just followed all instructions to install the UIMA SDK, installing
Eclipse and Setting up Eclipse to View Example Code, as described here: - I'm at
section 3.2.

It says there should be no compilation errors, but I'm getting:

XML format error in '.classpath' file of project 'uimaj-examples':

Any ideas?