Armin.Wegner | 27 Nov 14:09 2014

A simple CAS Consumer for populating a Solr Index

Hi Rob!

This simple code example sends annotations of type Person, Location and Organization to a Solr server.
There must be the fields text, person, location, and organization defined in Solr, as well. You need
org.apache.solr:solr-solrj:4.9.0  or higher jar.


public class SolrPopulator extends CasConsumer_ImplBase {
	private Logger mLogger;

	 * A Solr server.
	private SolrServer mSolrServer;

	public static final String PARAMETER_SOLR_SERVER_URL = "solrServerUrl";
	 <at> ConfigurationParameter(name = PARAMETER_SOLR_SERVER_URL, mandatory = true)
	private URL mSolrServerUrl;

	 <at> Override
	public void initialize(final UimaContext context) throws ResourceInitializationException {
		mLogger = context.getLogger();

		mSolrServer = new HttpSolrServer(mSolrServerUrl.toString());

(Continue reading)

Simon Hafner | 27 Nov 04:06 2014

DUCC org.apache.uima.util.InvalidXMLException and no logs

When launching the Raw Text example application, it doesn't load with
the following error:

[ducc <at> ip-10-0-0-164 analysis]$ MyAppDir=$PWD MyInputDir=$PWD/txt
MyOutputDir=$PWD/txt.processed ~/ducc_install/bin/ducc_submit -f
Job 50 submitted
id:50 location:5991 <at> ip-10-0-0-164
id:50 state:WaitingForDriver
id:50 state:Completing total:-1 done:0 error:0 retry:0 procs:0
id:50 state:Completed total:-1 done:0 error:0 retry:0 procs:0
id:50 rationale:job driver exception occurred:
org.apache.uima.util.InvalidXMLException at

However, there are no logs with a stacktrace or similar, how do I get
hold of one? The only files in the log directory are:

[ducc <at> ip-10-0-0-164 analysis]$ cat logs/50/
#Thu Nov 27 03:00:57 UTC 2014
(Continue reading)

Daniel Heinze | 21 Nov 00:45 2014

DUCC web server interfacing

I just installed DUCC this week and can process batch jobs.  I would like
DUCC to initiate/manage one or more copies of the same UIMA pipeline that
has high startup overhead and keep it/them active and feed it/them with
documents that arrive periodically over a web service.  Any suggestions on
the preferred way (if any) to do this in DUCC.  

Thanks / Dan 

Dan Heinze | 18 Nov 21:37 2014

DUCC stuck Waiting for Resources - new install on CentOS 6.5 VM

I've read the "DUCC stuck Waiting for Resources on Amazon..." thread.
I have a similar problem.  I did my first install of DUCC yesterday on a
CentOS 6.5 VM with 9GB RAM.  No problems with the install. ./start_ducc -s
seems to work fine, but when I look at ducc-mon Reservations, I find that
Job Driver is stuck "Waiting for Resources", I have given it hours, but it
just stays stuck there.  Also, nothing is being written to the logs... the
${DUCC_HOME}/logs directory is empty.  Any help will be appreciated.


reshu.agarwal | 18 Nov 07:05 2014

DUCC-Un-managed Reservation??


I am bit confused. Why we need un-managed reservation? Suppose we give 
5GB Memory size to this reservation. Can this RAM be consumed by any 
process if required?

In my scenario,  when all RAMs of Nodes was consumed by JOBs, all 
processes went in waiting state. I need some reservation of RAMs for 
this so that it can not be consumed by shares for Job Processes but if 
required internally it could be used.

Can un-managed reservation be used for this?

Thanks in advanced.


Simon Hafner | 17 Nov 12:48 2014

DUCC doesn't use all available machines

I fired the DuccRawTextSpec.job on a cluster consisting of three
machines, with 100 documents. The scheduler only runs the processes on
two machines instead of all three. Can I mess with a few config
variables to make it use all three?

id:22 state:Running total:100 done:0 error:0 retry:0 procs:1
id:22 state:Running total:100 done:0 error:0 retry:0 procs:2
id:22 state:Running total:100 done:0 error:0 retry:0 procs:4
id:22 state:Running total:100 done:1 error:0 retry:0 procs:8
id:22 state:Running total:100 done:6 error:0 retry:0 procs:8

reshu.agarwal | 17 Nov 07:00 2014

DUCC 1.1.0- How to Run two DUCC version on same machines with different user


I want to run two DUCC version i.e. 1.0.0 and 1.1.0 on same machines 
with different user. Can this be possible?

Thanks in advanced.


James Kitching | 16 Nov 13:08 2014

UIMA pipeline output persistence and multiple layer web based visualisation tools? Suggestions?


(First of all a BIG THANKS to ALL open source developers at UIMA and the 
other projects I mention below whom I am now relying on  :-) ).

I am looking at researching a particular knowledge base extraction task 
using UIMA components as part of the solution.  To do this work I need 
UIMA output persistence and to be able to visualise this output as 
multiple annotation layers on the same text.  Ultimately I want my 
automated annotations and visualisations to be web based and allow me to 
make additional manual annotations if required.  Once I have my multiple 
annotations made on a text I will then be able to apply my new knowledge 
extraction logic.

I have looked at webanno (which incorporates Brat for its UI) and 
U-Compare as well as Argo (See,,,,  I had hoped that I could use 
webanno for this task however webanno does not allow the direct import 
of UIMA components or UIMA output.  I found that I could get U-Compare 
to work as I wanted and it shows promise however if I get my any 
configuration wrong between any UIMA components it crashes out.  I got 
the software to work for me after I spent more time reading the manual.  
I found I needed to manually configure the input types for each 
component in the pipeline.  The software recognises subsequent pipeline 
component compatibility when a new component is added to a work flow.  
My initial errors came as I had initially expected subsequent U-Compare 
components to automatically pick up their input from the output from 
previous workflow components.  Whilst the U-compare software does 
(Continue reading)

Renaud Richardet | 16 Nov 11:32 2014

Ruta UIMAFIT component declaration with configuration parameters


I can't seem to find how to define additional configuration parameters for
UIMAFIT components in Ruta scripts, something along the lines of:

UIMAFIT org.apache.uima.ruta.engine.XMIWriter, 'Output', '/path/to/xmi.xml';

(How) is this possible?

Thanks, Renaud
Silvestre Losada | 14 Nov 10:18 2014

ruta update site

Hi All,

I'm trying to generate eclipse update site for UIMA current Ruta version
2.2.2-SNAPSHOT. There is a project ruta-eclipse-update-site when I try to
use it, is asking me to set-up different properties on several files.
Now there is an error related that is related to the property,
 uima-eclipse-jar-processor in settigns file. could you advise to me on how
add this property.

thanks in advance.
Simon Hafner | 12 Nov 11:51 2014

DUCC stuck at WaitingForResources on an Amazon Linux

I've set up DUCC according to

    ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job

the job is stuck at WaitingForResources.

12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
process     N/A ... Agent Collecting User Processes
12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
copyAllUserReservations     N/A +++++++++++ Copying User Reservations
- List Size:0
12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ********** User Process Map Size After
12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ********** User Process Map Size After
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call     N/A
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ******************************************************************************
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
process     N/A ... Agent
Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
Low Swap Threshold Defined in
12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
reportIncomingStateForThisNode     N/A Received OR Sequence:699 Thread
12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
reportIncomingStateForThisNode     N/A
(Continue reading)