Jaroslaw Cwiklik | 23 Oct 20:48 2014

[ANNOUNCE] Apache UIMA DUCC 1.1.0 released

The Apache UIMA team is pleased to announce the release of
Apache UIMA DUCC, version 1.1.0.

DUCC stands for Distributed UIMA Cluster Computing. DUCC is a cluster
management system providing tooling, management, and scheduling facilities
to automate the scale-out of applications written to the UIMA framework.
Core UIMA provides a generalized framework for applications that process
unstructured information such as human language, but does not provide
a scale-out mechanism. UIMA-AS provides a scale-out mechanism to distribute
UIMA pipelines over a cluster of computing resources, but does not provide
job or cluster management of the resources. DUCC defines a formal job model
that closely maps to a standard UIMA pipeline. Around this job model DUCC
provides cluster management services to automate the scale-out of UIMA
pipelines over computing clusters.

This is a maintenance release that contains fixes and improvements over
UIMA DUCC 1.0.0.

For a full list of changes, please refer to Jira:

More information about UIMA DUCC can be found here:

 - Jaroslaw Cwiklik, for the Apache UIMA development team
Armin.Wegner | 23 Oct 13:05 2014

PearPackagingMavenPlugin and CVS


PearPackagingMavenPlugin copies the CVS subdirs to the PEAR. Can this be changed? How?

Piyush Paliwal | 22 Oct 13:35 2014

UIMA Ruta into jar?


we are developing one Ruta Project and want to access it in java project.
Currently what we did is to add the descriptor (generated from ruta script)
into UIMA pipeline which is in java project.

The pipeline can only be run on workspace, we are not able to make a single
jar of that java project and run on command line because it can not access
Ruta project as dependency.

There is also a direct way to read ruta script within java, but the script
can not import annotations from type systems if we put in java project
(i.e. it needs Ruta editor).

Any way to add Ruta project dependency into java?




Piyush Paliwal
Amit Gupta | 16 Oct 01:11 2014

Scale out tuning for jobs

I've been trying to find the options related to configuration of scaleout
of a ducc job.

Thus far the only ones Ive found are:

which limits the maximum number of processes spawned by a ducc job.

At what point does DUCC decide to spawn a new process or spread processing
out to a new node. Is there a tuning parameter for an optimal number of
work items per process spawned? Can the user control this behavior?

For example,
I have a job large enough that DUCC natively spreads it across 2 nodes.
I havent been able to force this job, via a config parameter, to spread
across 4 nodes (or "X" nodes) for faster processing times.

Does anyone know if theres a parameter than can directly control scaleout
in this manner?



Amit Gupta
Amit Gupta | 15 Oct 00:40 2014

query about PEAR installation for DUCC


I had a query about pear installation on a "headless" system via the
command line.
I dont see any instructions in the documentation on how to proceed.

Specifically, I'm attempting to run the Raw Text Processing example
documented in the DUCC Book,

The problematic step is "Installing the OpenNLP Pear".

Almost everywhere I have seen, it instructs the use of the GUI Installer.

DUCC Book points to the runPearInstaller script (which is strangely not
shipped with the DUCC binaries)
I managed to find it in the shipped binaries of UIMA. I set unpacked the
binaries and set up the environment as instructed in the UIMA SDK
(the script resides in $UIMA_HOME/bin) and running it fails with the
following error.

runPearInstaller.sh --help

Exception in thread "AWT-EventQueue-0" java.awt.HeadlessException:

No X11 DISPLAY variable was set, but this program performed an operation
which requires it.

at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:207)
(Continue reading)

Debbie Zhang | 9 Oct 12:58 2014

RE: Error in running UIMA Ruta sample file

Hi Peter,

It seems after I clean the project, then run "debug", then double click the .xmi file in the output folder,
the rules will appear.



> -----Original Message-----
> From: Debbie Zhang [mailto:debbie.d.zhang@...]
> Sent: Thursday, 9 October 2014 9:49 PM
> To: user@...
> Subject: RE: Error in running UIMA Ruta sample file
> Thanks Peter for your reply. I cleaned the project after I received your email.
> However, I still got the error when I tried to debug.
> Just now, I double clicked the output file Example1.txt.xmi. I got the rules
> displayed on the rule views. Does it mean that I don't need to run the debug
> to get rules on the rule views (The Reference document said debug need to be
> run)? It seems I only need to run the Main.ruta from "Annotation Test" to get
> the .xmi files.
> Regards,
> Debbie
> > -----Original Message-----
> > From: Peter Klügl [mailto:pkluegl@...]
(Continue reading)

Debbie Zhang | 9 Oct 11:33 2014

Error in running UIMA Ruta sample file


I am new to UIMA Ruta. I try to learn UIMA Ruta by following the Ruta Guide
and Reference:

So far, I am able to follow the guide to section 3.5. UIMA Ruta Explain
Perspective. According to the guide, I import the UIMA Ruta example project
and open the main Ruta script file 'Main.ruta'. I right click the mouse and
select “Debug As” “1 UIMA Ruta” on the "Main.ruta" file. However, I get
the following error:

Source not found for URLClassPath$JarLoader.getJarFile(URL) line: 644

For all “Applied Rules” “Failed Rules” “Matched Rules” views, the
following message is displayed:
The instance view is currently not available

Could someone tell me what I did wrong so I can see Rules in those views?

Thank you in advance.


Debbie Zhang

Peter Klügl | 9 Oct 11:06 2014

Publication about UIMA Ruta


it has been about one and a half year since we renamed the system to its
current and nice name. However, from then on, there was no main
publication to cite in order to refer to UIMA Ruta.

I can proudly announce that this changed now.

The journal Natural Language Engineering just published the new main
article for UIMA Ruta with the title "UIMA Ruta: Rapid development of
rule-based information extraction applications". It provides a nice
overview of the language and tooling, and additionally a comparison to
related systems and descriptions of some case studies. If you are
interested in UIMA Ruta or in rule-based approaches in general, this
article could be of interest.

The direct link to the FirstView version:
I temporarily added the pdf of the accepted manuscript to my personal

If you use UIMA Ruta in academic context, please consider to cite this
paper. A bibtex entry will be added to the ruta page.


jeffery yuan | 1 Oct 05:13 2014

Could UIMA AS client send custom key value parameters to annotator?

Hi, Dear UIMA Developer and Users:

Thanks advance for any help.

I am using UIMA AS and the RegExAnnotator.
I am wondering whether the client can send some custom key value parameters to 
the annotator.

The real function I want to implement is:
As there are multiple(10+) regex defined in the regex.pear, client may be only 
interested in several entity types, so we want RegExAnnotator only run regex 
for types that client specifies.

If I use synchronous UIMA API, I can set ResultSpecification in client.
AnalysisEngine ae;
ResultSpecification rs = createResultSpecification(types);
ae.process(cas, rsf);

RegExAnnotator then check 
getResultSpecification().getResultTypesAndFeatures(), then only run needed 

But as I use UIMA AS(ae.sendCAS(cas);), there is no API to set 
ResultSpecification or specify custom key value parameters.


Peter Klügl | 29 Sep 10:09 2014

[ANNOUNCE] Apache UIMA Ruta 2.2.1 released

The Apache UIMA team is pleased to announce the release of
Apache UIMA Ruta (Rule-based Text Annotation), version 2.2.1.

Apache UIMA Ruta is a rule-based script language supported by
Eclipse-based tooling. The language is designed to enable rapid
development of text processing applications within UIMA. A special focus
lies on the intuitive and flexible domain specific language for defining
patterns of annotations. The Eclipse-based tooling,
called the Apache UIMA Ruta Workbench, was created to support the
user and to facilitate every step when writing rules. Both
the rule language and the workbench integrate
smoothly with Apache UIMA.

This is a bugfix release. The UIMA Ruta Workbench 2.2.1 requires a newer
version of Eclipse (4.3.2 recommended).

For a full list of the changes, please refer to Jira:

More information about UIMA Ruta can be found here:

 - Peter Klügl, for the Apache UIMA development team

Alexandre Patry | 24 Sep 16:15 2014

Re: sendCAS is slow

This is good news :)

Did you try to increase the number of CAS in the pool as Jerry suggested 

You can reply to the list as well, there are a lot of people more 
knowledgeable than me that can help you there.


On 24/09/2014 09:49, xym210 wrote:
> no, everything seems has worked right, when I deploy two collection 
> reader instances, the processing speed improved
> -- 
> 发自 Android 网易邮箱
> 在2014年09月24日 21:44, Alexandre Patry 
> <mailto:alexandre.patry@...>写 道:
> Did you get an error message or a stack trace?
> On 24/09/2014 09:38, xym210 wrote:
>> it doesn't work, when I deploy the collectionReader and the AE 
>> colocated, it doesn't work either, is there something i 
>> misunderstood, thanks.
(Continue reading)