Zesch, Torsten | 2 Sep 22:22 2015
Picon

Error when trying to drop CAS with FlowController

Hi all,

I'm trying to implement a FlowController that drops CASes matching certain
critera. The FlowController is defined on an inner AAE which sets
casproduced to true. The inner AAE resides in an outer AAE which contains
additional processing before and after the inner AAE.

Reader -> Outer AAE { ProcŠ Inner AAE { FlowController } ProcŠ Consumer}
The aggregate receives various input CASes and is supposed to drop some
but not others. When I try to drop a CAS in my FlowController now, I get
the error 

Caused by: org.apache.uima.analysis_engine.AnalysisEngineProcessException:
The FlowController attempted to drop a CAS that was passed as input to the
Aggregate AnalysisEngine containing that FlowController.  The only CASes
that may be dropped are those that are created within the same Aggregate
AnalysisEngine as the FlowController.

How can I drop CASes using a FlowController such that they do not proceed
in the outer aggregate?

thanks,
Torsten

Matthew DeAngelis | 28 Aug 23:25 2015
Picon

Exposing a View to Downstream AEs

Hi all,

I have been hitting my head against this all afternoon, so I figured I'd
ask.

I am using a CollectionReader to read in HTML files and then using an AE to
add a view that contains the "clean", plaintext versions of those files. I
would like to pass the plaintext version to an AE downstream in the
pipeline (Stanford NLP's tokenizer from DKPro, specifically, for now);
currently, it only sees and tokenizes the original HTML file. Since the
tokenizer is not written by me, I would prefer to be able to modify my
view-creating AE instead of mucking about in someone else's.

I have tried using an Aggregate Analysis Engine to combine the AEs together
and Sofa mappings to connect the views, but the downstream AE does not have
any input views to connect to. When I add an input view named
"_InitialView" to both the Aggregate AE and the descriptor for the
tokenizer, I get an interesting-looking NPE in some deep sub-function of
the tokenizer that suggests I have not been successful.

Is there a way to tell my AE what view to pass as the "_InitialView", or
some way to set a different default view to pass?

Regards and thanks,
Matt
Martin Wunderlich | 26 Aug 21:07 2015
Picon
Picon

Wrong CAS being passed on in AAE

Hi all, 

I am running a SimplePipeline which consists of a Reader and two Aggregate Analysis Engines. The output CAS
of the first AAE is meant to feed into the second AAE. However, I find that even though the output of the first
AAE is correct, it is different from the input to the second AAE. I have verified this by writing to XMI as the
last step of AAE 1 and the first step of AAE 2. In AAE 1 I have a CAS merger as the final step and a flow controller
which is set to drop the input CAS to the merger. This is working fine: Only the merged CAS remains after the
merge step. And yet, the input CAS to AAE 2 is missing the annotations which were added by annotator steps in
AAE 1 before the merge. 

If I combine both AAEs into one super-AAE and add this one to the SimplePipeline instead, it works fine. So,
there is a work-around. I am just curious why the AAEs might behave this way when I have the separate. 
Thanks a lot. 

Cheers, 

Martin

Matthew DeAngelis | 26 Aug 16:45 2015
Picon

Views or Separate CASes?

Hello UIMA Gurus,

I am relatively new to UIMA, so please excuse the general nature of my
question and any butchering of the terminology.

I am attempting to write an application to process transcripts of audio
files. Each "raw" transcript is in its own HTML file with a section listing
biographical information for the speakers on the call followed by a number
of sections containing transcriptions of the discussion of different
topics. I would like to be able to analyze each speaker's contributions
separately by topic and then aggregate and compare these analyses between
speakers and between each speaker and the full text. I was thinking that I
would break the document into a new segment each time the speaker or the
section of the document changes (attaching relevant speaker metadata to
each section), run additional Analysis Engines on each segment (tokenizer,
etc.), and then arbitrarily recombine the results of the analysis by
speaker, etc.

Looking through the documentation, I am considering two approaches:

1. Using a CAS Multiplier. Under this approach, I would follow the example
in Chapter 7 of the documentation, divide on section and speaker
demarcations, add metadata to each CAS, run additional AEs on the CASes,
and then use a multiplier to recombine the many CASes for each document
(one for the whole transcript, one for each section, one for each speaker,
etc.). The advantage of this approach is that it seems easy to incorporate
into a pipeline of AEs, since they are designed to run on each CAS. The
disadvantage is that it seems unwieldy to have to keep track of all of the
related CASes per document and aggregate statistics across the CASes.

(Continue reading)

Peter Klügl | 26 Aug 09:24 2015
Picon

[ANNOUNCE] Apache UIMA Ruta 2.3.1 released


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

The Apache UIMA team is pleased to announce the release of
Apache UIMA Ruta (Rule-based Text Annotation), version 2.3.1.

Apache UIMA Ruta is a rule-based script language supported by
Eclipse-based tooling. The language is designed to enable rapid
development of text processing applications within UIMA. A special focus
lies on the intuitive and flexible domain specific language for defining
patterns of annotations. The Eclipse-based tooling,
called the Apache UIMA Ruta Workbench, was created to support the
user and to facilitate every step when writing rules. Both
the rule language and the workbench integrate
smoothly with Apache UIMA.

This is a bugfix release. It fixes diverse problems of the rule inference,
the ruta-maven-plugin and the UIMA Ruta Workbench.

For a full list of the changes, please refer to Jira:
http://uima.apache.org/d/ruta-2.3.1/issuesFixed/jira-report.html

More information about UIMA Ruta can be found here:
http://uima.apache.org/ruta.html

 - Peter Klügl, for the Apache UIMA development team
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

(Continue reading)

John Shen | 25 Aug 00:04 2015
Picon

Missing user-defined annotations from Aggregate Analysis Engine

Hi all,
I'm having trouble with developing in Java around a PEAR that I've developed
from LanguageWare which is an aggregate analysis engine that has a
JFSTAnnotator as its last AE.  This last AE produces user-defined
annotations with features, as do other stages before it.

When I run the PEAR in the CVD I can see all annotations that I expect. 
However, when I use the ExampleApplication example (as packaged in
uimaj-2.8.1) to run the same PEAR, only the annotations from the
JSFTAnnotator stage do not appear.  I've also tried using the RunAE example
which does produce these annotations.  

I feel like I'm missing some configuration parameter that makes the last
produced annotations available.  Does anyone have experience in this domain?

For reference, I'm building against uimaj-2.8.1.  

Thanks in advance,
John

Martin Wunderlich | 17 Aug 21:20 2015

Running UIMA 2.8.1 in Eclipse Mars - Component Descriptor Editor not available

Hi, 

Well, as the subject says, I am having issues with the current release version 2.8.1 and Eclipse Mars. When I
try to open an existing type system description (XML file) with the Component Descriptor Editor, this is
not possible, because the editor is not recognized. I have dropped the
org.apache.uima.desceditor_2.8.1.jar into the Eclipse plugin directory, restarted Eclipse with the
„-clean“ option and, AFAIK, this is all that should be necessary for installation. But the editor is
not even showing up in the list of internal editors with which I could open the XML file. 

Anyone using the Component Descriptor Editor under Eclipse Mars successfully? Is there any special
installation procedure? 

Cheers, 

Martin

Joshua Eisenberg | 17 Aug 16:32 2015

Getting UIMA working with Eclipse-- Cannot find main for Document Analyzer

I am trying to set up UIMA for Java with Eclipse.

I downloaded the UIMA binary zip from Apache, unzipped it, put it in a
directory that I configured as
the UIMA_HOME environment variable in my bash profile and in eclipse. I
also set my paths as the
UIMA readme instructs. I ran adjustExamplePaths.sh from the terminal with
no errors. I am able to run
the document analyzer through the terminal with no errors; I am able to see
the example annotations.

In eclipse I downloaded the EMF plugin and both UIMA plugins. Then to test
if everything works in
Eclipse, I imported (from existing projects) the examples folder in
UIMA_HOME. I run it, select
document analyzer from the list and I get the following error:

Error: could not find or load main class
org.apach.uima.tools.docanalyzer.DocumentAnalyzer

Since tehn I've added the uima-tools.jar to the class path for doc analyzer
and I am still getting the
same error.

Any advice would help.

Thanks!
John David Osborne (Campus | 16 Aug 17:34 2015
Picon

CAS to RDBS

Can anybody tell me if there are any UIMA annotators/consumers that can write the CAS to the database in
discrete tables dependent on the type system?

I know the entire CAS can be dumped to the database as a varchar/lob but I was looking for something that would
create a table for each object in the CAS and write the instances out as rows in that table. So it would have to
be smart enough to map an uima.cas.Integer to the appropriate type in the database.

I don't care if it is database specific or not.

I just added some new types and I am not looking forward to writing a bunch of boiler-plate code to save it to
the database.

Any advice appreciated,

 -John

Jaroslaw Cwiklik | 12 Aug 15:37 2015
Picon

[ANNOUNCE] Apache UIMA DUCC 2.0.0 released

The Apache Maven team is pleased to announce the release of the UIMA DUCC,
version 2.0.0.

DUCC stands for Distributed UIMA Cluster Computing. DUCC is a cluster
management system providing tooling, management, and scheduling facilities
to automate the scale-out of applications written to the UIMA framework.
Core UIMA provides a generalized framework for applications that process
unstructured information such as human language, but does not provide a
scale-out mechanism. UIMA-AS provides a scale-out mechanism to distribute
UIMA pipelines over a cluster of computing resources, but does not provide
job or cluster management of the resources. DUCC defines a formal job model
that closely maps to a standard UIMA pipeline. Around this job model DUCC
provides cluster management services to automate the scale-out of UIMA
pipelines over computing clusters.

This is a major release containing new features and bug fixes. Please visit
http://uima.apache.org/news.html for more details.

-Jerry Cwiklik, for the Apache UIMA community
Marshall Schor | 11 Aug 17:55 2015
Picon

[ANNOUNCE} Apache UIMA Java SDK 2.8.1 released


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

The Apache UIMA team is pleased to announce the release of the Apache UIMA Java
SDK, version 2.8.1.  This release is a bug fix release, and should be used
instead of 2.8.0.

Apache UIMA <http://uima.apache.org> is a component architecture and framework
for the analysis of unstructured content like text, video and audio data.

- -Marshall Schor, for the Apache UIMA development team
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.21 (MingW32)

iQIcBAEBCgAGBQJVyhsLAAoJEHMl+QLhMDqZMEQP+wSZDFjZUJcs6D4rj8ZeQdhW
8ShH1QVhFSONju/gKO1FPv9Q84pcx2K0RlOiFRayXMlCVxv8tL+CbKNqYVKRZLd0
FsrL8kR1c8lGPXnpV02NbIP7vpGnn9gA4YJ7EHbnSneq+aWxDidTPASohnfnbAIv
iu4Px3VWHRt8KNY1QpH7NxasQaAPDtmSa/Cu4XsAtOgiJzCN5L5ZAkCmu4KrFSoB
IMkFbPZ6kqG/gaGC0h80eRpGgcxV7WAjSc/tJrd8Iac9QbJ/9Cy0DPLfVrrgyADC
TZilevojB8v0Kvq4Zj6Z99+7lPIm5DknMGKwSv1hbJsBiG8ofg4ThNi/biKUpjAI
5SeXwSexl/25u2c3hiq/WQ4BsBdWQzaDQABsJdDw8FSByJR9sco7pFlHkMzCPc3O
/9pvOEQ5srZWouPOYSIeNoZG+CRH4n0JnrHUKuxiwJyJZ5taNhmXQEVnucDc6e8v
fTHnNsoapZQfY8Rh3vPqokyi4Ir/2Bxp44HGhRG125EOD3SwzaJOGkfgrxMcmVxf
TqZlM/9wPVLnvsa36mQdwYqd3DKgIzSbMrSNTpEKLt7hYuYSBcskebp/FIwS0dfY
Sz8/WScM8vE2rWDn7+2RG7yPNc7iLmvPtnJyqzFyOTHwdRXgEKHxXUU57ojHM1jL
akpWLudsjgY/wzMOEgv6
=Ow9I
-----END PGP SIGNATURE-----

(Continue reading)


Gmane