Daniel Heinze | 21 Nov 00:45 2014

DUCC web server interfacing

I just installed DUCC this week and can process batch jobs.  I would like
DUCC to initiate/manage one or more copies of the same UIMA pipeline that
has high startup overhead and keep it/them active and feed it/them with
documents that arrive periodically over a web service.  Any suggestions on
the preferred way (if any) to do this in DUCC.  

Thanks / Dan 

Dan Heinze | 18 Nov 21:37 2014

DUCC stuck Waiting for Resources - new install on CentOS 6.5 VM

I've read the "DUCC stuck Waiting for Resources on Amazon..." thread.
I have a similar problem.  I did my first install of DUCC yesterday on a
CentOS 6.5 VM with 9GB RAM.  No problems with the install. ./start_ducc -s
seems to work fine, but when I look at ducc-mon Reservations, I find that
Job Driver is stuck "Waiting for Resources", I have given it hours, but it
just stays stuck there.  Also, nothing is being written to the logs... the
${DUCC_HOME}/logs directory is empty.  Any help will be appreciated.


reshu.agarwal | 18 Nov 07:05 2014

DUCC-Un-managed Reservation??


I am bit confused. Why we need un-managed reservation? Suppose we give 
5GB Memory size to this reservation. Can this RAM be consumed by any 
process if required?

In my scenario,  when all RAMs of Nodes was consumed by JOBs, all 
processes went in waiting state. I need some reservation of RAMs for 
this so that it can not be consumed by shares for Job Processes but if 
required internally it could be used.

Can un-managed reservation be used for this?

Thanks in advanced.


Simon Hafner | 17 Nov 12:48 2014

DUCC doesn't use all available machines

I fired the DuccRawTextSpec.job on a cluster consisting of three
machines, with 100 documents. The scheduler only runs the processes on
two machines instead of all three. Can I mess with a few config
variables to make it use all three?

id:22 state:Running total:100 done:0 error:0 retry:0 procs:1
id:22 state:Running total:100 done:0 error:0 retry:0 procs:2
id:22 state:Running total:100 done:0 error:0 retry:0 procs:4
id:22 state:Running total:100 done:1 error:0 retry:0 procs:8
id:22 state:Running total:100 done:6 error:0 retry:0 procs:8

reshu.agarwal | 17 Nov 07:00 2014

DUCC 1.1.0- How to Run two DUCC version on same machines with different user


I want to run two DUCC version i.e. 1.0.0 and 1.1.0 on same machines 
with different user. Can this be possible?

Thanks in advanced.


James Kitching | 16 Nov 13:08 2014

UIMA pipeline output persistence and multiple layer web based visualisation tools? Suggestions?


(First of all a BIG THANKS to ALL open source developers at UIMA and the 
other projects I mention below whom I am now relying on  :-) ).

I am looking at researching a particular knowledge base extraction task 
using UIMA components as part of the solution.  To do this work I need 
UIMA output persistence and to be able to visualise this output as 
multiple annotation layers on the same text.  Ultimately I want my 
automated annotations and visualisations to be web based and allow me to 
make additional manual annotations if required.  Once I have my multiple 
annotations made on a text I will then be able to apply my new knowledge 
extraction logic.

I have looked at webanno (which incorporates Brat for its UI) and 
U-Compare as well as Argo (See https://code.google.com/p/webanno/, 
http://brat.nlplab.org/, http://u-compare.org/, 
http://argo.nactem.ac.uk/about-argo/).  I had hoped that I could use 
webanno for this task however webanno does not allow the direct import 
of UIMA components or UIMA output.  I found that I could get U-Compare 
to work as I wanted and it shows promise however if I get my any 
configuration wrong between any UIMA components it crashes out.  I got 
the software to work for me after I spent more time reading the manual.  
I found I needed to manually configure the input types for each 
component in the pipeline.  The software recognises subsequent pipeline 
component compatibility when a new component is added to a work flow.  
My initial errors came as I had initially expected subsequent U-Compare 
components to automatically pick up their input from the output from 
previous workflow components.  Whilst the U-compare software does 
(Continue reading)

Renaud Richardet | 16 Nov 11:32 2014

Ruta UIMAFIT component declaration with configuration parameters


I can't seem to find how to define additional configuration parameters for
UIMAFIT components in Ruta scripts, something along the lines of:

UIMAFIT org.apache.uima.ruta.engine.XMIWriter, 'Output', '/path/to/xmi.xml';

(How) is this possible?

Thanks, Renaud
Silvestre Losada | 14 Nov 10:18 2014

ruta update site

Hi All,

I'm trying to generate eclipse update site for UIMA current Ruta version
2.2.2-SNAPSHOT. There is a project ruta-eclipse-update-site when I try to
use it, is asking me to set-up different properties on several files.
Now there is an error related that is related to the property,
 uima-eclipse-jar-processor in settigns file. could you advise to me on how
add this property.

thanks in advance.
Simon Hafner | 12 Nov 11:51 2014

DUCC stuck at WaitingForResources on an Amazon Linux

I've set up DUCC according to

    ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job

the job is stuck at WaitingForResources.

12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
process     N/A ... Agent Collecting User Processes
12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
copyAllUserReservations     N/A +++++++++++ Copying User Reservations
- List Size:0
12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ********** User Process Map Size After
12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ********** User Process Map Size After
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call     N/A
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ******************************************************************************
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
process     N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
Low Swap Threshold Defined in ducc.properties:0
12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
reportIncomingStateForThisNode     N/A Received OR Sequence:699 Thread
12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
reportIncomingStateForThisNode     N/A
(Continue reading)

reshu.agarwal | 12 Nov 06:45 2014

DUCC-1.1.0: Machines are going down very frequently


When I was trying DUCC-1.1.0 on multi machine, I have faced an up-down 
status problem in machines. I have configured two machines and these 
machines are going down one by one. This makes the DUCC Services disable 
and Jobs to be initialize again and again.

DUCC 1.0.0 was working fine on same machines.

How can I fix this problem? I have also compared ducc.properties file 
for both versions. Both are using same configuration to check heartbeats.

Re-Initialization of Jobs are increasing the processing time. Can I 
change or re-configure this process?

Services are getting disabled automatically and showing excessive 
Initialization error status on mark over on disabled status but logs are 
not showing any error.

I have to use DUCC 1.0.0 instead of DUCC 1.1.0.

Thanks in Advance.


Signature *Reshu Agarwal*

Carsten Schnober | 6 Nov 14:54 2014

Filter Cas from UIMA fit pipeline

I wonder whether there is a recommended way to remove certain (J)Cas'
(i.e. documents) from a pipeline after reading.
The scenario in my case is that I use a standard reader
(BinaryCasReader) which returns many documents. I only want a subset of
these documents to be processed by the following pipeline (comprising a
segmenter, a writer and some other engines), subject to a certain value
in a custom annotation.

The initial intuition would be to use/implement a reader that only
selects those documents that fulfil the given condition. In my case that
would mean, however, that I'd need to implement a new Reader extending
the BinaryCasReader by the described functionality. From a high-level
view at least, this seems much more complicated than just removing
documents from the pipeline.
Can I avoid that effort somehow without breaking conventions?



Carsten Schnober, M.Sc.
Doctoral Researcher
Ubiquitous Knowledge Processing (UKP Lab)
FB 20 Computer Science Department
Technische Universit├Ąt Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone (0)6151 16-6227, room S2/02/B111

(Continue reading)