Frank Enders | 22 Apr 10:38 2014

Hanging UIMA AS requests

Dear all,

we are using a synchronous sendAndReceiveCAS() call within a webservice 
endpoint (JAX WS RI).
Doing so, in some cases we find hanging requests, which are not getting 
I am attaching a corresponding part of a thread dump.

We are using UIMA AS 2.4.0. Application environment is Tomcat 6.0.32, 
JAX WS RI 2.1.7.

Have you encountered a similar behaviour?

Thank and all the best

"catalina-exec-77" Id=3412437 in WAITING cpu=2083520 ms usr=2056580 ms 
blocked 547742 for -1 ms waited 297560 for -1 ms
     locks java.util.concurrent.locks.ReentrantLock$NonfairSync <at> 19c88785
     at sun.misc.Unsafe.park(Native Method)
     - waiting on (a java.util.concurrent.Semaphore$NonfairSync <at> 3c3939fc)
     at java.util.concurrent.locks.LockSupport.park(
     at java.util.concurrent.Semaphore.acquire(
(Continue reading)

Petr Baudis | 22 Apr 04:20 2014

Deduplicating Annotations With Same coveredText


  I'm facing a task of deduplicating annotations that have the same
getCoveredText() value (possibly at different sofa locations) - I'd
like to keep just a single of each; for example if I were to make
a bag-of-words with only single annotation per word and number of
occurences as a feature.  (Or, in my case, the annotations are scored
candidate answers in a QA system that I'd like to merge if they are
textually the same.)

  Is there a better way than simply loading all annotations of the type
to a java map, mass-dropping them from indexes, then readding some of

  My idea was to simply index them by coveredText and then by sequential
iteration, it's enough to just compare getCoveredText() of current and
previous annotation to decide whether to merge them. However, it appears
that coveredText is not supported as a key feature, I'd have to make an
explicit copy of it as a separate feature. Is there any other option?


				Petr "Pasky" Baudis

Kothuvatiparambil, Viju | 20 Apr 22:10 2014

SemClass feature not working in ConceptMapper add-on

Hi All, 

I am trying to use the ConceptMapper add on to assign a SemClass feature to tokens. I am getting the following error:

SEVERE: ConceptMapper SEVERE: FeatureList[1] 'SemClass' specified, but does not exist for type: org.apache.uima.conceptMapper.DictTerm

I configured FeatureList and AttributeList in ConceptMapperOffsetTokenizer.xml as given below:

(Continue reading)

Con O'Leary | 18 Apr 17:02 2014

Con O'Leary is out of the office.

I will be out of the office starting  17/04/2014 and will not return until

 If urgent please contact Vincent_Kelly@...

Peter Klügl | 17 Apr 14:04 2014

Sofa-unaware AEs that create new views in an AAE


as I understand the implementation, an AE is sofa aware if it specifies
input or output views in its capabilities. Let's say it only specifies
an output view, so it's sofa aware. If it is part of an AAE with sofa
mapping (one AAE sofa mapped to the default input view of the AE), then
it get passed the base CAS independently of the sofa mapping. Shouldn't
it get the view mapped in the AAE?

I have a simple AE that should just get the mapped sofa as input and
then should create a new view, which name is given by a parameter.  Is
it correct that I have to introduce another parameter for the input view
and have to "getView" in the AE? Is there no way to just use the mapped



Hugo Mougard | 16 Apr 08:26 2014

CAS Multiplier usage in UIMAfit

Dear all,

I'm trying to use a multiplier to discard some CASes based on some
annotation. It currently doesn't work (the CASes are not discarded). I
also noticed several tickets opened on the suject of multipliers and
am therefore not sure if it's currently possible to use them in 

If it possible, what are the necessary steps so that only CASes
returned by next() are considered?

Any pointer welcome.


Peter Klügl | 15 Apr 10:48 2014

[ANNOUNCE] Apache UIMA Ruta 2.2.0 released

The Apache UIMA team is pleased to announce the release of the Apache
UIMA Ruta (Rule-based Text Annotation), version 2.2.0.

Apache UIMA Ruta is a rule-based script language supported by
Eclipse-based tooling. The language is designed to enable rapid
development of text processing applications within UIMA. A special focus
lies on the intuitive and flexible domain specific language for defining
patterns of annotations. The Eclipse-based tooling,
called the Apache UIMA Ruta Workbench, was created to support the
user and to facilitate every step when writing rules. Both
the rule language and the workbench integrate
smoothly with Apache UIMA.

Major Changes in this Release

UIMA Ruta Language and Analysis Engine:
Major performance improvements (3-17 times faster in test use cases)
Improved import type functionality and handling of ambiguous short names
Support of block extensions for rule inference adaptions
Options to determine where the next match should start
Requires at least Java 6
Many bug fixes

UIMA Ruta Workbench:
Smaller improvements in many views
Support of mixin Java/Ruta projects
Many bug fixes

For a full list of the changes, please refer to Jira:
(Continue reading)

Brian Dolan | 10 Apr 16:16 2014

auto detect cas serialization method?

Hi All,

I have been handed a serialized CAS by another application.  I'm getting the
"ARRAY OUT OF BOUNDS" error when trying to deserialize.  My understanding is
that this is fixed by sending Serialization.deserailizeCAS() the correct
type system, which I don't have.  That guy is gone :)

Any ideas?


Petr Baudis | 9 Apr 04:34 2014

Complex architectures with multiple CASes - how to?


  I'd like to ask about the philosophy and typical usage patterns behind
multiple CASes, CAS multipliers and CAS mergers.

  I'm working on a simple question-answering system built on top of
UIMA and mirroring DeepQA architecture.  Basically, on input I have
a CAS with the input question as a sofa, and after some processing,
a "search" CAS multiplier produces a CAS for each search result that
might contain an answer.

  However, at this point, I may want to use an AE that needs to see both
the question CAS and the search result CAS. Typically, I could try to
align sentences, i.e. with question sofa "Who invented the transistor?"
and stand-off Focus annotation for "Who", I may want to search the
result CAS for "(\S+) invented the transistor".

  But now I'm stuck.  How can I build such an AE that has access to
information in two CASes?  It seems one approach is to copy featuresets
to result CAS in the multiplier.  However, if the CAS sofa is different,
how can stand-off annotations (like Focus) be carried over?  Also, I may
want to match parse trees instead of strings, which suddenly means
potentially a lot of data is copied, and I will need to distinguish
annotations of the question and of the searh result.  A similar problem,
but in a much clumsier way, seems to arise if I were to make the
alignment AE a CAS merger.

  I must be missing something obvious here, but reading the developer guide
back and forth doesn't help... Thanks for any hints!

(Continue reading)

Erik Fäßler | 8 Apr 17:17 2014

FeaturePath with FSArray

Hi all,

I have a component where a parameter is supposed to be a FeaturePath string, e.g.


Another parameter would be the type name, e.g. “person”.

The component would now get an iterator over all “person” instances in the CAS and from each person geht
the name of the street he or she is living in.

The problem is that “address” is actually an FSArray of type “Address”, i.e. one person can have
multiple addresses. Each address has then a feature “streetname”
I am easily able to get feature values when there is no array involved or when I use built-in functions. But I
can not manage to get back all my street names.
The code:

TypeSystem ts = aJCas.getTypeSystem();
Type entityType = ts.getType(entityTypeString);
FeaturePath fp = aJCas.createFeaturePath();

try {
	FSIterator<Annotation> entityIterator = aJCas.getAnnotationIndex(entityType).iterator();
	while (entityIterator.hasNext()) {
		Annotation entity =;
		String streetname = fp.getValueAsString(entity);
(Continue reading)

Silvestre Losada | 5 Apr 09:53 2014

New feeature in uima-ruta.

Hi All,

I think it would be interesting add new functionality to UIMA ruta. I
don't know
which is the best way to make such proposal, I'm willing to contribute in
development of such feature if you think it is useful.

Currently UIMA ruta has WORDLIST, it is a list of text items that can be
specified in different ways. This is nice feature and very powerfull,
however there is no way to plug my own WORDLIST implementation, for example
WORDLIST that find matches in database table in lucene index etc...