Silvestre Losada | 18 Jul 10:18 2016

Ruta variable annotations

HI All,

UIMA-4657 <>  improvement
has been implemented in ruta version 2.4.0. But I cannot see any
documentation on how to use it on ruta documentation.

Is there any example on how to use it?

Armin.Wegner | 15 Jul 08:59 2016

Collection processing engine remove annotations


How to remove annotations in a collection processing engine? Doing it in process() of an annotator failed.
Is this even possible? 

Rich Bowen | 14 Jul 20:05 2016

ApacheCon Europe call for papers open

Dear Apache Enthusiast,

As you are no doubt already aware, we will be holding ApacheCon in
Seville, Spain, the week of November 14th, 2016. The call for papers
(CFP) for this event is now open, and will remain open until
September 9th.

The event is divided into two parts, each with its own CFP. The first
part of the event, called Apache Big Data, focuses on Big Data
projects and related technologies.


The second part, called ApacheCon Europe, focuses on the Apache
Software Foundation as a whole, covering all projects, community
issues, governance, and so on.


ApacheCon is the official conference of the Apache Software
Foundation, and is the best place to meet members of your project and
other ASF projects, and strengthen your project's community.

If your organization is interested in sponsoring ApacheCon, contact me
at evp@...  ApacheCon is a great place to find the brightest
developers in the world, and experts on a huge range of technologies.

(Continue reading)

Andrea Turbati | 11 Jul 12:50 2016

Type in an FSList starting from the TypeSystem

is there a way to access the type of a FSList from the TypeSystem?
I've tried with the following code:

FeatureDescription featureDescription = 
String typeStringOfList = featureDescription.getElementType();

but I always get a null in the variable typeStringOfList, even if the 
description of such method is:

"For a feature with a range type that is an array or list, gets the 
expected type of the elements of that array or list. This is optional; 
if ommitted the array or list can contain any type. There is currently 
no guarantee that the framework will enforce this type restriction. This 
property should not be set for features whose range type is not an array 
or list."

Thanks in advance for your feedback,




Dott. Andrea Turbati, PhD
AI Research Group,
Dept. of Enterprise Engineering
University of Roma, Tor Vergata
Via del Politecnico 1 00133 ROMA (ITALY)
(Continue reading)

Yamen Ajjour | 8 Jul 15:53 2016

Basic question about UIMA as service deployment

Hello ,

I have a node on which I want to deploy an asynchronous analysis engine.
After running ActiveMq broker there I deployed the analysis engine on that
machine from the client as specified in the specification. Even though
everything seems to be working correclty I got the feeling that the
analysis engines are still running in my the client's JVM and not on the
node only the queue seems to be running there . Is deploying a UIMA as
service implies its execution on the computer where the broker is ?

Bonnie MacKellar | 6 Jul 02:41 2016

missing Ruta annotations from uimaFit

I have a very lengthy Ruta script which annotates my files successfully. I
can see all the annotations in AnnotationBrowser and they are correct.
I want to get all the annotations in a Java program, so I can count
occurrences.  I am using uimaFit. I am getting very odd results.

When I use CasDumpWriter, I see all my annotations, correctly written to
the dump file. Here is the code that does this
AnalysisEngineDescription rutaEngineDesc =
           RutaEngine.PARAM_SCRIPT_PATHS, new String[]
           RutaEngine.PARAM_DESCRIPTOR_PATHS,  new String[]
AnalysisEngineDescription writerDesc =
CasDumpWriter.PARAM_OUTPUT_FILE, "dump2.txt");
AnalysisEngine rae = AnalysisEngineFactory.createEngine(rutaEngineDesc);
SimplePipeline.runPipeline(readerDesc, rutaEngineDesc, writerDesc);

However, when I try to do this myself, using iteratePipeline to iterate
through the JCas structures for each input file, many of the annotations
are missing. I have a suspicion that the missing annotations are ones that
annotate text for which there is another annotation.   For example, text
will be annotated with Line, and with my own annotation. My code to print
(Continue reading)

Henrik Matzen | 5 Jul 12:02 2016

Serialization NonXML


because of the known problem that you cannot serialize the cas if it has
non xml chracters I tried this:

I know its not working because of this (cas =
- Because there is no .toCas method.

Does anyone of you know how I can solve this?

     <at> Override
    public void process(final JCas cas) throws
AnalysisEngineProcessException {
        JCas oldcas = cas;
        cas = doReplaceNonXml(cas.toString()).toCas;
        try {
            final String xmlContent = this.serializeCas(cas);
            final Map<String, String> metadataFields =

            //Do something with metadatafields
            cas = oldcas;

        } catch (SAXException e) {
            throw new AnalysisEngineProcessException(e);
        } catch (IOException e) {
            throw new AnalysisEngineProcessException(e);
        } catch (ParserConfigurationException e) {
(Continue reading)


DKPro Core 1.8.0 released

We are pleased to announce the release of

==== DKPro Core, version 1.8.0 ====

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

== Changed minimal system requirements ==

- Requires Java 8 (Issue #369)
- Upgrade Apache UIMA to version 2.8.1 (Issue #662)
- Upgrade uimaFIT to version 2.2.0 (Issue #664)
- Upgrade Spring Framework to version 3.2.16 (Issue #815)

==  Major improvements ==

- Extensive automatically generated reference documentation (e.g. Issues
#753, #635, #589)
- New framework for text normalization and transformation (e.g. Issue #537)
- New validation framework, mainly for improved bug detection in unit
tests (Issue #728)
- Writer components write to console if no target is specified (Issue #700)
- Renamed some components for a more uniform naming scheme (e.g. Issue #717)
- Writers per default refuse to overwrite files (Issue #669, #564)
- Dependency parsers and readers consistently create a self-looped ROOT
node (Issue #628)

== Analysis components ==
(Continue reading)

Andrea Turbati | 1 Jul 15:30 2016

UIMA Regular Expression Annotator and UIMA documentAnalyzer

I was wondering what was the best and easiest way to execute the UIMA 
Regular Expression Annotator ( 
) inside the UIMA documentAnalyzer (launch from the console via the bat 
file ).
Any help or suggestion is appreciated.





Dott. Andrea Turbati, PhD
AI Research Group,
Dept. of Enterprise Engineering
University of Roma, Tor Vergata
Via del Politecnico 1 00133 ROMA (ITALY)
tel: +39 06 7259 7334
lab: +39 06 7259 7332
e_mail: turbati@...
home page:


Bonnie MacKellar | 22 Jun 21:55 2016

problems integrating Ruta and uimaFit

I am still trying to figure out how to count Ruta annotations across a
bunch of input files. There doesn't seem to be any Workbench way to do it.
So now I am trying to call Ruta from UimaFit so I can do the job in Java.

However, I am having serious configuration problems, plus I have a question
on how do bring in PlainTextAnnotator.

I am using Maven, with the jcasgen-maven-plugin, the ruta-maven-plugin, and
the uimafit-maven-plugin. I will include the pom file at the end of this

I want my Java code to be aware of the types declared in the Ruta script -
that is the whole point - I want to count those annotations.

My Ruta script also uses PlainTextAnnotator. The problem with this is that
I can't figure out where to put it. In a Workbench based Ruta project,
PlainTextAnnotator.xml and PlainTextAnnotatorTypeSystem get put
automatically into descriptor/utils, along with a number of other
descriptors that seem to be built into Ruta. But when I create a project
using maven, there is no such location, and these descriptors do not get
put anywhere. I tried a number of places but could not get my script to see
the type system for PlainTextAnnotator. Finally, I hit on putting the files
in target/generated-sources/ruta/descriptor/utils, and finally my script is
able to see the types and I can run it. This is good because at that point,
the ruta-maven-plugin does its job and generates the descriptors for my
script. However, I suspect this is not a good place to put the
PlainTextAnnotator files since doing a clean overwrites them. Where should
they go? Is there any entry in the pom file that is needed?

The second problem is that although my Ruta script works nicely on its own,
(Continue reading)

Augusto Ribeiro Silva | 22 Jun 15:29 2016

Non-linear pipelines


I couldn’t find any example on the documentation about the definition of non-linear pipelines (not sure
this is the right name to call it). 
What I want to do is something like this:

Pipeline: A -> (B or C) -> D

So the step A supports two file formats, then depending on the file format a normalisation step B or C should
be performed. Then D should be performed for the result of B and C. How would I go about defining such
pipeline or if it is even possible to do it.

Thanks for the help in advance.

Best regards,