24 May 2013 18:03
24 May 2013 04:44
23 May 2013 13:49
Ruta 2.0.2 Textruler problems
Hi, I tried the textruler view (whisk (token) ) under Ruta 2.0.2 and i got weard results like rules with p=0 and n=0 while i fixed the maximum error threshold to 0.1. Whisk (generic) doesn't even work for me (it seems like crashing on the first step and still waiting without any results) (i am just trying the pre-build version). I intend to do some development on textruler but still don't find the appropriate version: i was using the textruler source code with textmarker 1.0 but now that textmarker has been improved i would like to use the latest version, the problem is that textruler source code isn't working fine with these new versions of textmarker. cheers Sondes
23 May 2013 13:09
23 May 2013 09:36
Adding features to TokenAnnotation and DictTerm in Concept Mapper
Hi I am using UIMA for annotating some documents using Concept Mapper. I have built the dictionaries and configured it to our requirements. However, I wanted to add features to TokenAnnotation and DictTerm. For Example existing TokenAnnotation annotation supports the following features : text, tokenType, tokenClass and uima.tt.tokenAnnotation. Now if I want to add more features such as POS or group etc. Is there a neat way of doing it without touching the conceptMapper.jar and changing our typesystem to extend the two types ? Thanks Manisha
22 May 2013 18:31
managing resources for UIMA?
Hi, while not strictly a UIMA issue, we have a problem that seems very relevant in the context of UIMA analysis engines: how to manage large binary resources such as trained models used by an AE, etc. So far, we have managed to achieve a good separation between code development and the actual AEs, using Maven (and git for version control). An AE thus consists only of a POM referencing the code, the AE descriptor, and the resources used for the AE. The AE poms are configured to generate PEAR archives that include all dependencies and resources. At this point we have the code in git, and the AEs' pom and descriptor also, while we manually copy the resources to the directory before running `mvn package` (and exclude those resources from git). We're missing a way to manage those resources, including versioning etc. I'm guessing that this is a rather typical problem, so what solutions do you use? We're thinking of having all resources also in Maven (e.g. Artifactory) so we can reference them with a unique identifier and version. This would also help us when moving to more complex pipeline assemblies using uimafit instead of generating individual PEARS for each component in order to create complete packages. Btw, we are just very few core developers, with most of the team made up of linguists, so we want to make it easy for them to save versions of resources they create and assemble AEs by just referencing the algorithm and resource (e.g. "create a new OpenNLP POStagger using spanish-pos-model.bin, version 1.2.3"). Thanks for sharing your experiences with this...(Continue reading)
21 May 2013 15:49
21 May 2013 12:47
Ruta - Token Order
Hi, In Ruta 2.0.2-SNAPSHOT a token with begin offset 0 and end offset 2 comes before a token with begin offset 0 and end offset 0. The token order is not as I expected. Thus in my case, SourceDocumentAnnotation was the second token in the token sequence and the rule didn't match. It took me some time to find that out. The end offset of SourceDocumentAnnotation should better be the length of the text. How is the token ordering defined? Cheers, Armin
21 May 2013 09:18
Ruta 2.0.2-SNAPSHOT - Eclipse Plugin Installation
Hi! I've checkout Ruta 2.0.2-SNAPSHOT with svn checkout https://svn.apache.org/repos/asf/uima/sandbox/ruta/trunk and build it succesfully with mvn clean install. Now, how to install the Eclipse plugins? Is there a local reposity or update site for Eclipse? Or, which files need to be copied manually? Thanks, Armin
21 May 2013 08:49
Concept Mapper in code
Hello Everyone,
I am currently writing ConceptMapper in code (not using
XML files). Basically I am definitely AnalysisEngineDescriptions and
TypeSystem Description in java code. I create the following 2 parameters
using "ConfigurationParameter" class.
1. AttributeList
2. FeatureList
The problem is both these should be arrays. But ConfigurationParameter only
provide the following types (String, Boolean, Integer, Float). When I pass
these values as string it fails and throws a java.lang.ClassCastException
error. COz it expects an array vs a String. How should I solve this issue?
Below is the AnalysisEngineDescription code.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
primitiveDesc = new AnalysisEngineDescription_impl();
primitiveDesc.setPrimitive(true);
primitiveDesc.getMetaData().setName("Concept Mapper Offset Tokenizer");
primitiveDesc.setAnnotatorImplementationName(
"org.apache.uima.conceptMapper.ConceptMapper");
ConfigurationParameter p1 = new ConfigurationParameter_impl();
p1.setName("AttributeList");
p1.setDescription("Attribute List");
(Continue reading)
20 May 2013 22:56
Aggregate Delegates and Remote Services
I've created two primative AEs and tested them locally on my machine. I then created a aggregate AE that calls each of the primitive AEs. The aggregate AE was created using the Component Descriptor Editor plugin for eclipse and things went smoothly until I got to adding the Aggregate Delegates. I clicked on remote, but I' then prompted to select either the SOAP or Vinci protocol. The two primatives have been deployed and are running on a test AS system (2.4.1). Which protocol should I be using, if either? Since I wasn't sure which protocol to use, I made a copy of the AE descriptors for each of the primatives and added those using the "Add..." button. Then, in the deployment descriptor for the aggregate AE, I set the delegates to run remotely and gave it the address of the broker and the queue name. After creating the pear and deploying it, the aggregate AE deploys fine, but errors pop up when we try to run it. The errors are: *** ERROR: line-number: 13 deployment descriptor for analysisEngine: specifies async="true" but the analysis engine is a primitive *** ERROR: line-number: 13 deployment descriptor for analysisEngine: specifies false for the async attribute, but contains a delegates element, which is not allowed in this case. If the async="true" is removed, we get the following error: *** ERROR: line-number: 16 The delegate in the deployment descriptor with key="MgrsRegExQueue" does not match any delegates in the referenced descriptor *** ERROR: line-number: 25 The delegate in the deployment descriptor with key="MgrsValidatingAnnotatorQueue" does not match any delegates in the referenced descriptor I've compared the deployment descriptor to the example one, and nothing is jumping out at me as wrong. Is this a problem with the deployment descriptor or AE descriptor? Thanks in advance.
RSS Feed