Mario Gazzo | 6 Oct 19:02 2015

Support for UIMA arrays in Ruta

Hej Peter,

Does Ruta still not support UIMA arrays in version 2.3?

Found this post from May this year that says it isn’t supported in version
2.2.1: <at> < <at>>

Couldn’t find anything about it in the latest Ruta docs either.

Armin.Wegner | 5 Oct 17:20 2015

Ruta Maven Plugin


how ist ruta-maven-plugin supposed to be used? Is there a detailed step by step description?

I've created a new empty maven project, added a script in the source folder src/main/ruta and a text file
containing a list of words to src/main/resources.
mvn package builds a ...Engine.xml and a ...TypeSystem.xml in
target/generated-sources/ruta/descriptor and a ...twl file in target/ruta/resources. But none of
them is packaged in the jar file.

I intend to add that jar file as a maven dependency and create the analysis engine by
AnalysisEngineFactory.createEngineDescription(<engine name>). Did I miss something?


Juan Ignacio Velez | 1 Oct 22:23 2015

UIMA Ruta.

Hello, i'm developing a tools in java with UIMA Ruta in Eclipse IDE. I
create the descriptor ina UIMA Ruta Project, and there is no problem. But
when i want to generate JCas with "JCasGen" buttom, that message appears:

"no source directories have been defined. jcasgen will not be run"

How can i solve this?

Thanks, Juan.

Ruta, uimafit & configure

Hi all,

  I'm trying to use UIMAFIT inside a Ruta script passing some parameters
but i get an error, here an example:


Document{CONTAINS(Entity)-> CONFIGURE(XmiWriter, "targetLocation" =
"src/test/resources"), EXEC(XmiWriter)};

  And here the error:

Caused by: org.apache.uima.resource.ResourceConfigurationException: No
value has been assigned to the mandatory configuration parameter

  It is possible pass parameters to an uimaFit engine?

Thanks in advance,
José Tomás Atria | 29 Sep 20:57 2015

How to correclty implement delta serialization in locally deployed CPE pipeline?

Hello all,

I've been trying to wrap my head around this for a while, and I can't seem
to get it to work. Could someone please explain what is the most
straightforward way of implementing delta serialization in a local,
multithreaded CPE pipeline?

So far, I've tried using a collection reader that uses a
SharedSerializationData that is stored in the current UIMA session, and
creates a CAS marker that is also stored in a map in the current UIMA
session under a CAS identifier key, and then using this
SharedSerializationData oject and the marker retrieved from the UIMA
session from the CAS identifier to serialize the delta to disk, but this
procedure causes an OutOfMemory exception if I try to process all of my
data (Not that much in my opinion, ~2000 CASes).

I assume that I'm missing some basic aspect of the API, but after trying to
deal with it for a while I just gave up...

A more specific version, as far as I could understand: Delta serialization
requires a SharedSerializationData object and a CAS marker. What is the
correct way to create, store and retrieve these in a simple,
multi-threaded, locally deployed CPE processing pipeline? (i.e. No need to
support AS or DUCC facilities, etc).

Any help would be greatly appreciated.


(Continue reading)

Satya Nand Kanodia | 29 Sep 09:12 2015

C-Groups status remains off in web server after installing C-Groups


I am using CentOS release 6.6 for DUCC installation. I did all according 
to documentation to enable C-Groups.
Following command executed without any error.( I had to execute it using 

cgcreate -t ducc -a ducc -g memory:ducc/test-cgroups

But on webserver in machines section , it is showing *off* status under 
the C-Groups.

I don't know what went wrong.


Thanks and Regards,
Satya Nand Kanodia

Baker James D | 28 Sep 15:31 2015

[UK OFFICIAL] Baleen - UIMA Based Text Analytics Framework

Classification: UK OFFICIAL
Afternoon everyone,

I would like to draw your attention to a text analytics framework that has just been released by Dstl (part of
the UK Ministry of Defence). It uses UIMA as part of its underlying architecture but provides additional
functionality on top of that, and simplifies much of the user configuration and experience, as well as the
development process. A number of collection readers, annotators and consumers are included as part of
the framework.

The tool is called Baleen, and is released under Apache Software License 2.

There is more information about the tool on the press release
(, and on the GitHub page (

Contributions to the code base are welcomed.

Many thanks,
James Baker

"This e-mail and any attachment(s) is intended for the recipient only.   Its unauthorised use, 
disclosure, storage or copying is not permitted.  Communications with Dstl are monitored and/or 
recorded for system efficiency and other lawful purposes, including business intelligence, business 
metrics and training.  Any views or opinions expressed in this e-mail do not necessarily reflect Dstl policy."

"If you are not the intended recipient, please remove it from your system and notify the author of 
the email and centralenq@..."
reshu.agarwal | 28 Sep 13:52 2015

Re: DUCC - Work Item Queue Time Management

The log is:/

1000 Command to exec: /usr/java/jdk1.7.0_71/jre/bin/java
     arg[1]: -DDUCC_HOME=/home/ducc/apache-uima-ducc-2.1.0-SNAPSHOT
     arg[3]: -Dducc.agent.process.state.update.port=56622
     arg[4]: -Dducc.process.log.dir=/home/ducc/ducc/logs/67/
     arg[5]: -Dducc.process.log.basename=67-JD-S211
     arg[8]: -Dducc.deploy.components=jd
     arg[10]: -Xmx300M
     arg[11]: -Dducc.deploy.JobId=67
     arg[14]: -Dducc.deploy.WorkItemTimeout=5
     arg[15]: -Dducc.deploy.JobDirectory=/home/ducc/ducc/logs/
     arg[17]: -Dducc.deploy.JpAeDescriptor=desc/ae/aggregate/AggDescriptor
     arg[19]: -Dducc.deploy.JpDdName=DUCC.Job
     arg[20]: -Dducc.deploy.JpDdDescription=DUCC.Generated
     arg[21]: -Dducc.deploy.JpThreadCount=3
(Continue reading)

Ronny Hapke | 28 Sep 13:06 2015

Problem with WORDLISTs and WORDTABLEs where an entry starts with a shared substring of another entry

I've stumbled upon a problem with UIMA Ruta Workbench 2.3.1 in Eclipse 
Luna 4.4.2. Whenever working with a WORDLIST or WORDTABLE where one entry 
starts with a common substring of another one, it will not be recognized 
and therefore not annotated.

Consider this minimal example:

WORDLIST "Keywords.txt"in resources directory with the following entries:
Bill Clinton

Input file in input directory with the following contents:
Billy wished he was president, just like Bill Clinton once was.

Main.ruta script in scripts directory:
WORDLIST list = 'Keywords.txt';
DECLARE president;
Document {->MARKFAST(president, list)};

Upon execution, only Bill Clinton will be annotated while Billy will be 

Any help/hints/comments appreciated!

Best regards, 

Kahini Wadhawan | 24 Sep 08:41 2015

Finding Tokenizer references in UIMA Project


I am working on something which is based on a paper and it is mentioned in
the paper that they have used UIMA default tokenizer. This is a confusing
to me as apparently there is no single default tokenizer. So, it would be
great if I can get some insight on some things:

1) How can I find default versions of the UIMA-native tokenizer shipped
with UIMA v 2.2.2 ?

2) There is UIMA project that I am exploring the code of. I want to find
which tokenizer they have referenced. There are no xml descriptors in the
code. Is it possible to have a uima project without xml descriptors? Where
else I can get the tokenizer information in that project?



Kahini Wadhawan
Contact - 720 548 8532
Kahini Wadhawan | 23 Sep 05:35 2015

RE: Default UIMA Tokenizer


I am new to UIMA and looking for default tokenizers that are shipped with
UIMA. Please redirect me to where I can find information about default


Kahini Wadhawan
Contact - 720 548 8532