Bonnie MacKellar | 22 Jun 21:55 2016

problems integrating Ruta and uimaFit

I am still trying to figure out how to count Ruta annotations across a
bunch of input files. There doesn't seem to be any Workbench way to do it.
So now I am trying to call Ruta from UimaFit so I can do the job in Java.

However, I am having serious configuration problems, plus I have a question
on how do bring in PlainTextAnnotator.

I am using Maven, with the jcasgen-maven-plugin, the ruta-maven-plugin, and
the uimafit-maven-plugin. I will include the pom file at the end of this

I want my Java code to be aware of the types declared in the Ruta script -
that is the whole point - I want to count those annotations.

My Ruta script also uses PlainTextAnnotator. The problem with this is that
I can't figure out where to put it. In a Workbench based Ruta project,
PlainTextAnnotator.xml and PlainTextAnnotatorTypeSystem get put
automatically into descriptor/utils, along with a number of other
descriptors that seem to be built into Ruta. But when I create a project
using maven, there is no such location, and these descriptors do not get
put anywhere. I tried a number of places but could not get my script to see
the type system for PlainTextAnnotator. Finally, I hit on putting the files
in target/generated-sources/ruta/descriptor/utils, and finally my script is
able to see the types and I can run it. This is good because at that point,
the ruta-maven-plugin does its job and generates the descriptors for my
script. However, I suspect this is not a good place to put the
PlainTextAnnotator files since doing a clean overwrites them. Where should
they go? Is there any entry in the pom file that is needed?

The second problem is that although my Ruta script works nicely on its own,
(Continue reading)

Augusto Ribeiro Silva | 22 Jun 15:29 2016

Non-linear pipelines


I couldn’t find any example on the documentation about the definition of non-linear pipelines (not sure
this is the right name to call it). 
What I want to do is something like this:

Pipeline: A -> (B or C) -> D

So the step A supports two file formats, then depending on the file format a normalisation step B or C should
be performed. Then D should be performed for the result of B and C. How would I go about defining such
pipeline or if it is even possible to do it.

Thanks for the help in advance.

Best regards,
Bonnie MacKellar | 17 Jun 21:21 2016

question on Ruta Query View


I am trying to use Ruta Query View to get a view of all matches for a particular annotation type across a large set of .xmi files. However, I am noticing something strange about Ruta Query View - it doesnt't report lots of matches that are shown in the Annotation browser (and which I believe are correct matches). For example, a given annotation type tsCurrent has 4 matches in the file NCT0036712, but these matches do not appear at all in the list of results in Ruta Query View when I query for tsCurrent.  For some files, though, the results for all matches do show up, and for other files, only a partial set of matches are in the query results. I cannot understand why this is happening. Perhaps my query syntax is wrong?  I can only find the one example in the manual, which isn't much to go on. 

I am attaching a screenshot showing the AnnotationBrowser on the top right in Eclipse, with all of the matches for tsCurrent, and the Ruta Query view on bottom, which does not contain those matches. I think it is easier to see the problem visually.

Also,ultimately I am just trying to get a count of the number of times certain annotations are made across all of my files. Is there a better way to do that instead of Ruta Query View?  I can't find another way to total matches across lots of files.

Bonnie MacKellar

Jaroslaw Cwiklik | 20 May 15:56 2016

[ANNOUNCE] Apache UIMA-AS 2.8.1 released

The Apache UIMA team is pleased to announce the release of the Apache
UIMA-AS version 2.8.1, which includes asynchronous scaleout capabilities
for the UIMA annotators.

UIMA-AS includes the base UIMA SDK and augments it with scaleout
capability; it is a next-generation replacement for the original CPM
(Collection Processing Management) scaleout that is part of the core UIMA
Framework. For more information, please visit:

This release contains a number of improvements and bug fixes. Notable
in this release include:

- Replaced Activemq 5.7.0 with 5.13.2
- Added dependency on UIMA SDK 2.8.1
- Fixed per CAS Performance Metrics breakdown for async deployments
- Added new feature to allow warm up of a JVM service instance before real
processing begins,  by feeding it a specified set of CASes before the
instance connects to the service input queue.
- Allow dd2spring to use a custom XML parser

For a complete list of bugs and improvements included in this release
please see

-- Jerry Cwiklik, for the Apache UIMA development team
Pablo N. Mendes | 7 May 00:30 2016

No sofaFS for specified sofaRef found

I am getting "No sofaFS for specified sofaRef found" while trying to
deserialize an XMI. I found the message a bit cryptic and didn't find much
help on the lazyweb, so I bit the bullet and spent a few hours poking
around. It seems to be a missing "sofa" attribute. If the sofa attribute
has the wrong value, then you get "xmi id <id> is referenced but not
defined" which is very nice and clear. But if you omit the sofa attribute
you get "No sofaFS for specified sofaRef found" which is less informative

Extra info below.


$ diff cas1.xmi cas2.xmi
< <ls:DocumentMetadata xmi:id="18" sofa="1" source="file001.txt"
> <ls:DocumentMetadata xmi:id="18" source="file001.txt" documentId="001"/>






Exception in thread "main" org.apache.uima.cas.CASRuntimeException: No
sofaFS for specified sofaRef found.
at org.apache.uima.cas.impl.CASImpl.getSofa(
at org.apache.xerces.parsers.AbstractSAXParser.endDocument(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl.endEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.endEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.skipSpaces(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)


<?xml version="1.0" encoding="UTF-8"?>
<cas:NULL xmi:id="0"/>
<ls:DocumentMetadata xmi:id="18" sofa="1" source="file001.txt"
<cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView" mimeType="text"
sofaString="This is a test."/>
<cas:View sofa="1" members="18"/>


<?xml version="1.0" encoding="UTF-8"?>
<cas:NULL xmi:id="0"/>
<ls:DocumentMetadata xmi:id="18" source="file001.txt" documentId="001"/>
<cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView" mimeType="text"
sofaString="This is a test."/>
<cas:View sofa="1" members="18"/>


<?xml version="1.0" encoding="UTF-8" ?>

<typeSystemDescription  xmlns="">
        <description>Just an example</description>






Pablo N. Mendes
Mario Diana | 5 May 22:03 2016

Is there a simple PEAR Maven-plugin example project? ("Hello, World!" variety)

Can anyone point me to a UIMA project on GitHub, BitBucket, or elsewhere that uses the Maven-plugin to build
PEAR packages? The closer the project is to the "Hello, World!" variety, the better.

I'm just trying to get a jump on the learning curve. Thanks!

Mario Diana
Software Developer
Technically Creative Inc.
Simplifying IT Solutions
Office: 845.725.7883

Sean Crist | 5 May 21:44 2016

C++/Python annotators in Eclipse on Mac OS


I’m trying to set up the ability to write annotators in C++ and in Python using Eclipse on Mac OS X.

I read the following two sources:

Also the README file in the download of UIMACPP

Both documents seem geared for using UIMA from the command line in Windows or Linux.  It wasn’t
immediately evident how to translate those instructions to my situation.  There were a few passing
mentions of Eclipse or Mac OS, but nothing like a step-by-step.

Is there a writeup on this that I’ve missed in my Google search?  Absent that, any pointers or suggestions
on how to proceed?

—Sean Crist

Anni R Coden | 28 Apr 23:06 2016

UIMAfit - cannot find type system

Hi - 

I am using UIMAfit 

I created a a file: META-INF/

in the file I put

classpath*:Users/anni/ ............/typesystem.xml

However I get a enrror that TypeSystemMgr requires a particular type 
(specified in typesystem.xml) which was not found in the CAS

Here is the stack

Exception in thread "main" Annotator class requires Type, which was not found in the CAS.
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(

any hints are appreciated. 

Thanks, Anni

Dr. Anni R. Coden
IBM T.J. Watson Research Center
1101 Kitchawan Road, Route 134
Yorktown Heights, NY 10598
Tel: (914) 945 2359 t/l 862 2359

Larry Cousin | 25 Apr 22:31 2016

CPE GUI Jar issue


I have a pipeline that has a  casProcessor descriptor in a jar file that gives me an error when I try to run it in
the Eclipse CPE GUI.
The pipeline casProcessor has the following form:

<casProcessor deployment="integrated" name="annotateThis">
                <import name="com.a.b.c.aggregate.AProcessorAggregate"/>

I get the following type error when the pipeline is loaded (the aggregate definition file is in a Maven jar):

Could not load descriptor from URL
  CPR Configuator only supports file: URLs

So the system correctly translated "com.a.b.c.aggregate.AProcessorAggregate" into
and com/a/b/c/aggregate/APRocessorAggregate.xml is in the Jar /C:/Users/me/.m2/repository/com/a/b/c/1.2.3/Ajar-1.2.3.jar
but CPE GUI can't seem to use it.

Is there a way to reference this Jar aggregate xml definition file in a pipeline so Eclipse CPE GUI will not
error out?


Richard Eckart de Castilho | 6 Apr 21:05 2016

[ANNOUNCE] Apache uimaFIT 2.2.0 released

The Apache UIMA team is pleased to announce the release of

  Apache uimaFIT, version 2.2.0

Apache uimaFIT is a library that facilitates the building of
Apache UIMA components, the programmatic use of Apache UIMA
analysis pipelines, and their testing.

uimaFIT employs Java annotations to integrate UIMA meta data
directly into the source code, allowing for less lines of code
and better refactorability then traditional, XML descriptor-based
UIMA projects. It is capable of automatically detecting meta data,
e.g. type system information, from the classpath. Convenience
methods are provided for constructing components, pipelines,
and for accessing annotations.

The major changes in this release are:

* System requirements changed to Java 7
* new FSUtil class with methods to get/set feature values
* new selectAt method
* improved compatibility with thread context classloaders
* upgrades to dependencies including UIMA SDK and Spring Framework
* use of iteratorWithSnapshot in select methods
* ... otherwise this is a bug-fix release to version 2.1.0

For a full list of the changes, please refer to Jira:

Note on compatibility and migration:

Apache uimaFIT 2.2.0 is a drop-in replacement for previous
Apache uimaFIT 2.x versions.

-- Richard Eckart de Castilho, for the Apache UIMA development team
Jos Denys | 5 Apr 14:54 2016

RE: UIMACPP and multi-threading

Hi Eddie,

I worked on the CPP-side, and what I noticed was that the JNI Interface always passes an instance pointer :

JNIEXPORT void JNICALL JAVA_PREFIX(resetJNI) (JNIEnv* jeEnv, jobject joJTaf) {
  try {
    UIMA_TPRINT("entering resetDocument()");

    uima::JNIInstance* pInstance = JNIUtils::getCppInstance(jeEnv, joJTaf);

Now the strange thing, and finally what caused the acces violation error, was that the pInstance pointer
was the same for the 3 threads that (simultaneously) did the UIMA processing,
so it looks like the same CAS was passed for 3 different analysis worker threads.

Any idea why and how this can happen ?

Thanks for your feedback,
Jos Denys,
InterSystems Benelux.

De : Benjamin De Boe
Envoyé : mardi 5 avril 2016 09:33
À : user <at>
Cc : Jos Denys <Jos.Denys <at>>; Chen-Chieh Hsu <Chen-Chieh.Hsu <at>>
Objet : RE: UIMACPP and multi-threading

Hi Eddie,

Thanks for your prompt response.

In our experiment, we have one initial thread instantiating a CasPool and then passing it on to newly
spawned threads that each have their own DaveDetector instance and fetch a new CAS from the shared pool.
The UimacppEngine objects' cppEnginePointer variable differs per thread, but on the C++ side, it looks
like all threads are pointing to the same memory address for the CAS they operate on. Given the actions
UimacppEngine:process() performs and its cas being process registered as a protected field rather than
a local variable, it's no wonder it causes trouble.

I can imagine UIMA-AS follows a path that's perhaps slightly different (and apparently safe, given your
test case), but I'm wondering what we're doing wrong that we need to fiddle with synchronized keywords on
the framework classes to ensure we avoid the crash.

Here's our test program. When the CAS pool is small enough (i.e. 5), things work fine. When it is larger than
the number of documents we want to process (23), it also works. When it is somewhere in between (i.e. 20), we
get the crash.

package com.intersys.uima.test;




import org.apache.uima.UIMAFramework;

import org.apache.uima.analysis_engine.AnalysisEngine;

import org.apache.uima.cas.CAS;

import org.apache.uima.resource.ResourceSpecifier;

import org.apache.uima.util.CasCreationUtils;

import org.apache.uima.util.CasPool;

import org.apache.uima.util.Level;

import org.apache.uima.util.XMLInputSource;



*  <at> author bdeboe


public class Standalone implements Runnable {

    private String text;

    private AnalysisEngine ae;

    private CasPool pool;

    public Standalone(String txt, AnalysisEngine ae, CasPool pool) {

        this.text = txt; = ae;

        this.pool = pool;


    public static void main(String[] args) throws Exception {

        String descPath = ((args != null) && (args.length > 0)) ? args[0] : "C:\\InterSystems\\UIMA\\bin\\DaveDetector.xml";

       int casPoolSize = ((args != null) && (args.length > 1)) ? Integer.valueOf(args[1]) : 20;

        XMLInputSource in = new XMLInputSource(descPath);

        ResourceSpecifier specifier

                = UIMAFramework.getXMLParser().parseResourceSpecifier(in);

        AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(specifier);

        String[] text = new String[23];

        // populating the array…

        text[22] = "…";

        CasPool pool = (casPoolSize > 0) ? new CasPool(casPoolSize, ae) : null;

        for (int i = 0; i < text.length; i++) {

            Standalone task = new Standalone(text[i], UIMAFramework.produceAnalysisEngine(specifier),
(casPoolSize > 0) ? pool : null);

            Thread t = new Thread(task);




     <at> Override

    public void run() {

        CAS cas  = null;

        try {

            if (pool != null) {

                cas = pool.getCas();

            } else {

                cas = CasCreationUtils.createCas(ae.getAnalysisEngineMetaData());




            System.out.println("Done processing text");

        } catch (Exception e) {


        } finally {

            if (pool != null) pool.releaseCas(cas);




Probably also of note: we sometimes get a simple exception on destroyJNI() (pasted below), rather than the
outright total process crash described earlier. We assume this is just “luck” in that the different
threads are invoking a not-so-critical section.

Apr 05, 2016 9:25:25 AM org.apache.uima.uimacpp.UimacppAnalysisComponent logJTafException

SEVERE: The following internal exception was caught: 5,002 (UIMA_ERR_ENGINE_UNEXPECTED_EXCEPTION)

Apr 05, 2016 9:25:25 AM org.apache.uima.uimacpp.UimacppAnalysisComponent logJTafException(431)


Error number  : 5002

Recoverable   : No

Error         : Unexpected error



Error number  : 5002

Recoverable   : No

Error         : Unexpected error


        at org.apache.uima.uimacpp.UimacppEngine.destroyJNI(Native Method)

        at org.apache.uima.uimacpp.UimacppEngine.destroy(

        at org.apache.uima.uimacpp.UimacppAnalysisComponent.destroy(

        at org.apache.uima.uimacpp.UimacppAnalysisComponent.finalize(

        at java.lang.System$2.invokeFinalize(

        at java.lang.ref.Finalizer.runFinalizer(

        at java.lang.ref.Finalizer.access$100(

        at java.lang.ref.Finalizer$

Many thanks for your feedback,



Benjamin De Boe | Product Manager

M: +32 495 19 19 27 | T: +32 2 464 97 33

InterSystems Corporation |

-----Original Message-----

From: Eddie Epstein [mailto:eaepstein <at>]

Sent: Tuesday, April 5, 2016 12:47 AM

To: user <at><mailto:user <at>>

Subject: Re: UIMACPP and multi-threading

Hi Benjamin,

UIMACPP is thread safe, as is the JNI interface. To confirm, I just created a UIMA-AS service with 10
instances of DaveDetector, and fed the service

800 CASes with up to 10 concurrent CASes at any time.

It is not the case with DaveDetector, but at annotator initialization some analytics will store info in
thread local storage, and expect the same thread be used to call the annotator process method. UIMA-AS and
DUCC guarantee that an instantiated AE is always called on the same thread.


On Mon, Apr 4, 2016 at 10:56 AM, Benjamin De Boe <
Benjamin.DeBoe <at><mailto:Benjamin.DeBoe <at>>> wrote:

> Hi,


> We're working with a UIMACPP annotator (wrapping our existing NLP

> library) and are running in what appears to be thread safety issues,

> which we can reproduce with the DaveDetector demo AE.

> When separate threads are accessing separate instances of the

> org.apache.uima.uimacpp.UimacppAnalysisComponent wrapper class on the

> Java side, it appears they are invoking the same object on the C++

> side, which results in quite a mess (access violations and process

> crashes) when different threads concurrently invoke resetJNI() and

> fillCASJNI() on the org.apache.uima.uimacpp.UimacppAnalysisComponent

> object. When using a small CAS pool on the Java side, the problem does

> not seem to occur, but it resurfaces if the CAS pool grows bigger and

> memory settings are not increased accordingly. However, if this were a

> pure memory issue, we had hoped to see more telling errors and just

> guessing how big memory should be for larger deployments isn't very appealing an option either.

> Adding the synchronized keyword to the relevant method of the wrapper

> class on the Java side also avoids the issue, at the obvious cost of

> performance. Moving to UIMA-AS is not an option for us, currently.


> Given that the documentation is not explicit about it, we're hoping to

> get an unambiguous answer from this list: is UIMACPP actually supposed

> to be thread-safe? We saw old and resolved JIRA's that addressed

> thread-safety issues for UIMACPP, so we assumed it was the case, but

> reality seems to point in the opposite direction.



> Thanks in advance for your feedback,


> benjamin



> --

> Benjamin De Boe | Product Manager

> M: +32 495 19 19 27 | T: +32 2 464 97 33 InterSystems Corporation |