Roberto Franchini | 3 Dec 16:06
Picon
Gravatar

Lucene cas consumer

Hi,
I'm going to write a Lucene CAS consumer. The porpouse is to create a
Lucene document, or more than one, for each CAS.
Last year (2007)  the JENA university lab (JULIE lab? is it right?)
delivered such a component, named LUCAS. Then it disappeared.
LUCAS seems a good piece of software.
The Technische Universität Darmstadt developed one too:
http://www.ukp.tu-darmstadt.de/projects/dkpro/. (I will write to
them).

There's anybody interested to share knowledge and/or code to do that component?
I think that Lucene and UIMA can be very good friends :)

Roberto

PS: I apologize for my bad English.

--

-- 
Roberto Franchini
http://www.celi.it
http://www.blogmeter.it
http://www.memesphere.it
Tel +39-011-6600814
jabber:ro.franchini@... skype:ro.franchini

Thilo Goetz | 4 Dec 10:07
Picon
Picon

Printing UIMA PDF documentation

There's a problem with printing some of the current UIMA
PDFs.  Some pages will not print.  Marshall has fixed this
and regenerated the docs on our website.  So if you're
having problems with printing the docs, get the PDFs from
the website.  This will be fixed in the next release.

--Thilo

Greg Holmberg | 4 Dec 19:12
Picon

Re: Lucene cas consumer

Roberto--

It does seem like there should be a close relationship between the two groups.

I don't know much about Lucene--can you educate me?  For example, have you given any thought to what to do with
UIMA annotations?  From what little I've read about Lucene, they seem to have a thing called a document
analyzer, but they don't mean the same thing we mean by analysis in the NLP community.  They appear to mean
something more like "tokenizer".  So I haven't yet found a place to put UIMA annotations, say for example,
named entities or parts of speech.  I'm wondering if Lucene needs a major feature enhancement before its
truly useful with UIMA?

What are your thoughts on how the integrate the two?  What functionality is possible?

Greg Holmberg

 -------------- Original message ----------------------
From: "Roberto Franchini" <ro.franchini@...>
> Hi,
> I'm going to write a Lucene CAS consumer. The porpouse is to create a
> Lucene document, or more than one, for each CAS.
> Last year (2007)  the JENA university lab (JULIE lab? is it right?)
> delivered such a component, named LUCAS. Then it disappeared.
> LUCAS seems a good piece of software.
> The Technische Universit�t Darmstadt developed one too:
> http://www.ukp.tu-darmstadt.de/projects/dkpro/. (I will write to
> them).
> 
> There's anybody interested to share knowledge and/or code to do that component?
> I think that Lucene and UIMA can be very good friends :)
> 
(Continue reading)

Niels Ott | 4 Dec 19:36
Picon
Favicon
Gravatar

Re: Lucene cas consumer

Hi all,

I'm using both Lucene and UIMA in one project.

Lucene is primarily an information retrieval API. It provides a
framework and default implementations for analyzing several languages.
Analyzing means tokenization, stop words, etc. Furthermore, it brings
the key functionality to build an inverted index and to search it.

Lucene can be extended easily. E.g. one can implement an analyzer that
does lemmatization or that looks up synonyms in Wordnet  and adds them
to the index.

What Lucene cannot do - or at least not without a lot of hacking - is
aggregating analyses as UIMA can using the CAS. Usually your knowledge
grows during an UIMA-based NLP-pipeline: you add the a token annotation,
a lemma annotation, a POS-annotation and so on...  In Lucene, you have
the classical pipeline: the output replaces the input. (Yes, by
subclassing Lucene's "Token" class, one can fiddle around the issue, but
it is not elegant at all.)

What makes Lucene + UIMA interesting for me is a simple fact: I can do
all the NLP I want and be as flexible as I need in UIMA. Then I can feed
the outcome (or rather: a small part of it) into a Lucene index.

In my special case, I'm not using a CAS Consumer, but I can imagine
other people would appreciate it in their application scenarios.

To conclude: Lucene and UIMA aren't competitors, but in some cases 
having one feeding the other is what you want.
(Continue reading)

Dan McCreary | 4 Dec 20:32
Picon

Re: Lucene cas consumer

Hello,

I am somewhat new to UIMA so I apologize if I misunderstand some things.
But this is a very interesting question for me.

I see Lucene as a very wildly adopted but *Java-only framework* of tools for
building and maintaining keyword *indexes *on many types of documents.
Lucene also has great support for HADOOP and MapForce-type saleability.  But
Lucene is also designed to work with many front end tools like POI libraries
to extract text from Microsoft Word, Excel, PowerPoint etc.

I see Apache UIMA as a general purpose *analytic pipeline architecture *with
the strengths of a very advanced common in-memory processing model.

I thin there is a huge win-win for both projects if we can make UIMA enrich
text documents with entities before they are indexed by Lucene and also make
these tools much easier to install and work together.  You should not have
to be a Java developer just to install these tools and have them index and
search our file systems.

I have spent many hours trying to get UIMA to work without success.  Perhaps
it has to do with trying to get it to work on a 64 bit Vista....  :-O

- Dan

On Thu, Dec 4, 2008 at 12:12 PM, Greg Holmberg <holmberg2066 <at> comcast.net>wrote:

> Roberto--
>
> It does seem like there should be a close relationship between the two
(Continue reading)

Olivier Terrier | 5 Dec 09:44
Favicon

RE: Lucene cas consumer

Hi all

We, at Temis, have also made a prototype integration of Lucene and UIMA as a proof of concept.
More exactly we have written a Solr Cas consumer.
Solr http://lucene.apache.org/solr/ is a Lucene sub project that provide a kind of indexation server
layer on top of Lucene.
The idea behind was to be able to index documents using a UIMA processing chain with both full-text and
entities based on UIMA annotations.
More over Solr provides a support for 'faceted search' that can be based on annotation.
Let's suppose you have a UIMA typesystem that defines annotations like Person, Company, Location etc...
You can easily index these entities into a lucene index using the Solr java API.
In the prototype we also used a Solr contribution (not already integrated in the trunk) names solr-ui
available here
https://issues.apache.org/jira/browse/SOLR-634
It provides a simple UI to serach into your indexed documents using a combination of full text and facets
(look at attached screenshot).
Of course our Solr consumer is for now a very basic piece of code: for example it is tightly linked to our own
typesystem but we would be more than happy to collaborate with the communtiy on this subject if there is interest.

Regards

Olivier Terrier
Temis

> -----Message d'origine-----
> De : Niels Ott [mailto:nott@...] 
> Envoyé : jeudi 4 décembre 2008 19:37
> À : uima-user@...
> Cc : Roberto Franchini
> Objet : Re: Lucene cas consumer
(Continue reading)

Ashutosh Sharma | 5 Dec 10:49
Picon

Override annotation


Hi All,

I would like to override the annotation done by the previous analysis engine based on start offset values in
current Analysis Engine.
Suppose, there are two AEs - AE1 and AE2. AE1 has annotated VAL1. When the AE2 run, it again annotate the VAL1.
So, I need to override the annotation made by the AE1 when AE2 annotate the same VAL1.

Is there any feature or options by which we can do above.

Thanks & Regards,
Ashu

_________________________________________________________________
Register once and play all contests. Increase your scores with bonus credits for logging in daily on MSN.
http://specials.msn.co.in/msncontest/index.aspx
Christof Mueller | 5 Dec 10:53
Picon
Favicon

Re: Lucene cas consumer

Roberto Franchini wrote:
> Hi,
> I'm going to write a Lucene CAS consumer. The porpouse is to create a
> Lucene document, or more than one, for each CAS.
> Last year (2007)  the JENA university lab (JULIE lab? is it right?)
> delivered such a component, named LUCAS. Then it disappeared.
> LUCAS seems a good piece of software.
> The Technische Universität Darmstadt developed one too:
> http://www.ukp.tu-darmstadt.de/projects/dkpro/. (I will write to
> them).
>
> There's anybody interested to share knowledge and/or code to do that component?
> I think that Lucene and UIMA can be very good friends :)
>
> Roberto
>
> PS: I apologize for my bad English.
>
>   

Hi Roberto,

our group at Technische Universität Darmstadt has indeed developed a
consumer that outputs Lucene indexes. It is described in our paper for
the LREC workshop "UIMA for NLP" which can be found here:
http://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2008/paper.pdf

We are very willing to share this and other UIMA components with the
community. Unfortunately we haven't been able to release the components
yet, due to a lack of time.
(Continue reading)

Jörn Kottmann | 5 Dec 11:40
Picon

Re: Lucene cas consumer

Christof Mueller wrote:
> our group at Technische Universität Darmstadt has indeed developed a
> consumer that outputs Lucene indexes. It is described in our paper for
> the LREC workshop "UIMA for NLP" which can be found here:
> http://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2008/paper.pdf
>
> We are very willing to share this and other UIMA components with the
> community. Unfortunately we haven't been able to release the components
> yet, due to a lack of time.
>
> If you contact me directly I can send you the code of an example Lucene
> CAS consumer which we use in some student projects. This could be a good
> starting point for building your own consumer.
>   
I am also interested in a Lucene CAS consumer.
Maybe we can work together and set up a sandbox project ?

Jörn

Christof Mueller | 5 Dec 16:30
Picon
Favicon

Re: Lucene cas consumer

Jörn Kottmann wrote:
> I am also interested in a Lucene CAS consumer.
> Maybe we can work together and set up a sandbox project ?
>
> Jörn
Hi Jörn,

we would be happy to contribute the code of the example Lucene CAS
consumer as base for the sandbox project.

Christof

--

-- 
Christof Müller
UKP Lab
Technische Universität Darmstadt
http://www.ukp.tu-darmstadt.de


Gmane