Clandes Tino | 1 Aug 16:32
Picon
Favicon

subscription

Hello,
I would like to subscribe to UIMA user list.
Thanks,
Milan

      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
Tong Fin | 1 Aug 17:10
Picon

Re: subscription

I think you have to subscribe by yourself.
Please follow the instruction on the following link::

" You subscribe and unsubscribe by sending an empty email message to the
list name suffixed with -subscribe or -unsubscribe. The buttons in the table
below will create the correct email for you in your mail system; just push
send...."
http://incubator.apache.org/uima/mail-lists.html

-- Tong

On Fri, Aug 1, 2008 at 10:32 AM, Clandes Tino
<clandestino_bgd@...>wrote:

> Hello,
> I would like to subscribe to UIMA user list.
> Thanks,
> Milan
>
>
>
>      __________________________________________________________
> Not happy with your email address?.
> Get the one you really want - millions of new email addresses available now
> at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
>
Villemos, Gert | 4 Aug 15:33
Picon
Favicon

Using UIMA for structured data sources

We have a number of data sources, some of which are fully structured,
other which are informal (unstructured). We would like to create a
central search facility covering structured as well as unstructured
data. 
UIMA seems to fit the bill, but is focused on unstructured data.
Can/should I use it to also integrate structured data? 

If yes, what are the modules which I must develop for the framework?

If no, what tools should I use in combination with UIMA to integrate
unstructured data?

Thanks,
Gert.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain
proprietary material, confidential information and/or be subject to legal privilege. It should not be
copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then
please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

Eddie Epstein | 4 Aug 18:55
Picon

Re: Parallelizing UIMA

Greg Holmberg <holmberg2066@...> wrote:

> I think this "extreme scaling" configuration could be important in
> "selling" UIMA-AS.  I don't see it described in the docs.  It might be
> useful to folks if it was, along with some example configuration files.  Due
> to my misconceptions, I had rejected UIMA-AS as having too much overhead,
> but I will revisit it now.  So showing folks how to deploy it efficiently
> might prevent others from rejecting it for the same reasons.

Thanks for the comment.  This scenario has been added to the example
scenarios section of http://incubator.apache.org/uima/doc-uimaas-what.html

There actually is an example configuration that deploys a service with
embedded collection reader and Cas consumer. See
$UIMA_HOME/examples/deploy/as/Deploy_MeetingFinder.xml and the associated
MeetingFindingAggreate.xml in the same directory.

Karin Verspoor <Karin.Verspoor@...> wrote:
>Thank you all ... wouldn't you know that you have already solved the
problem!
>I missed the UIMA-AS announcement as I was on travel last week,
>but I will definitely look into it, as well as Hadoop.

Good luck Karin. Hoping to hear about your experience.

Regards,
Eddie
Greg Holmberg | 4 Aug 23:39
Picon

Re: Using UIMA for structured data sources

Gert--

I'm not sure I understand what you're trying to build.  Your description is a little vague.  Perhaps you could
provide some use-cases?

I recommend that you read the UIMA docs and then ask any questions you still have.

Be aware the UIMA is not a search engine.  If all you want to do is index some documents, then maybe all you need
is Apache Lucene.  For the structured side, maybe you need a data warehouse.  Or maybe you just want to index
some of the CLOBs and VARCHARS into a search engine.  It's hard to tell from your description.

Greg Holmberg

 -------------- Original message ----------------------
From: "Villemos, Gert" <gert.villemos@...>
> We have a number of data sources, some of which are fully structured,
> other which are informal (unstructured). We would like to create a
> central search facility covering structured as well as unstructured
> data. 
> UIMA seems to fit the bill, but is focused on unstructured data.
> Can/should I use it to also integrate structured data? 
>  
> If yes, what are the modules which I must develop for the framework?
>  
> If no, what tools should I use in combination with UIMA to integrate
> unstructured data?
>  
> Thanks,
> Gert.
> 
(Continue reading)

Villemos, Gert | 4 Aug 23:59
Picon
Favicon

AW: Using UIMA for structured data sources

Thanks for your answer. Indeed I need to read the UIMA documentation better.

We are building a system that will support Busines Intelligence applications based on a data warehouse, as
well as knowledge management features based on a knowledge base. We are looking at UIMA for the loading
into the knowledge base.

We have multiple data sources, some are completly structured. Others are semi-structured (well defined
fields, but main input is free text fields).  And other again are completly unstructured (presentations,
concept papers, etc).

The data warehouse we will use for report generation, trending and data mining.

On the knowledge base we would like to perform simple keyword search and indeed Lucene is a candidate (Solr
is a better candidate as it among others support substitution) but we would also like to perform based
reasoning, as well as ontology based reasoning / derivation of knowledge. And we are therefore looking at
a knowledge base containing a RDF data graph, not just a flat index.

As far as I have been able to gather from the internet there has been some of discussion on integrating Apache
UIMA and Lucene, but no integration has actually been made.

A better way of asking the question is therefore; for our knowledge base, what do we use to create the RDF data
graph? Should we:

1. Split this into two separate tool chains, one for structured data and one for unstructured data (based on UIMA)?
2. Use UIMA for structured as well as unstructured?

Gert.

________________________________

(Continue reading)

Greg Holmberg | 5 Aug 00:47
Picon

Re: AW: Using UIMA for structured data sources

Gert--

Ah, well, I don't know much about RDF, but you might want to take a look at some of the projects IBM Research has
done using UIMA, named entity extraction, and OWL:

    http://researchweb.watson.ibm.com/UIMA/SUKI/index.html

Their Semantic Search engine is also interesting:

    http://domino.research.ibm.com/comm/research_projects.nsf/pages/uima.semanticSearch.html

There are a lot of pieces you'll need to acquire to make this work: crawlers, adapters, file format filters,
an entity and relationship extractor, UIMA-to-RDF converter, etc.  There are many choices both
commercial and open source for each of these pieces.

Except that last one, which I think is a pretty hard problem.  You'll probably also have to hire some
computational linguists for the natural languages you want to support, since reliably extracting facts
from human-generated text is extremely difficult (if not impossible).  I'd say that the system you
describe is probably at or even beyond what researchers are attempting today.  And I'm not aware of any
commecial software that actually tries to reason on facts extracted from natural language.

UIMA can help you process those CLOB and VARCHAR fields from your database, but probably isn't a good match
for processing INTEGER, DOUBLE, TIMESTAMP, etc.

Greg Holmberg

 -------------- Original message ----------------------
From: "Villemos, Gert" <gert.villemos@...>
> Thanks for your answer. Indeed I need to read the UIMA documentation better.
>  
(Continue reading)

Villemos, Gert | 5 Aug 00:54
Picon
Favicon

AW: AW: Using UIMA for structured data sources

Luckily we have included some pretty tough semantic / linguistic experts in the project.

Another question; 
You mention that we need a UIMA-to-RDF converter. I had assumed that Apache UIMA stored the data graph in RDF
format... as this is apparently not the case; which format is UIMA using?

Thanks,
Gert.

________________________________

Von: Greg Holmberg [mailto:holmberg2066@...]
Gesendet: Di 05.08.2008 00:47
An: uima-user@...
Cc: Villemos, Gert
Betreff: Re: AW: Using UIMA for structured data sources

Gert--

Ah, well, I don't know much about RDF, but you might want to take a look at some of the projects IBM Research has
done using UIMA, named entity extraction, and OWL:

    http://researchweb.watson.ibm.com/UIMA/SUKI/index.html

Their Semantic Search engine is also interesting:

    http://domino.research.ibm.com/comm/research_projects.nsf/pages/uima.semanticSearch.html

There are a lot of pieces you'll need to acquire to make this work: crawlers, adapters, file format filters,
an entity and relationship extractor, UIMA-to-RDF converter, etc.  There are many choices both
(Continue reading)

Greg Holmberg | 5 Aug 02:12
Picon

Re: AW: AW: Using UIMA for structured data sources

Gert--

UIMA does't store at all.  It's just an API you call--document in, annotations out.  That is to say, Java
objects.  What you do with those returned objects is your business.  There's example code that can write the
annotations to an XML file (one XML file for each input document).  If you want to write the annotations to a
database, a search engine, an RDF store, etc. you'll have to write that code.  UIMA knows nothing about RDF
or OWL.

Greg

 -------------- Original message ----------------------
From: "Villemos, Gert" <gert.villemos@...>
> Luckily we have included some pretty tough semantic / linguistic experts in the 
> project.
>  
> Another question; 
> You mention that we need a UIMA-to-RDF converter. I had assumed that Apache UIMA 
> stored the data graph in RDF format... as this is apparently not the case; which 
> format is UIMA using?
>  
> Thanks,
> Gert.
> 
> ________________________________
> 
> Von: Greg Holmberg [mailto:holmberg2066@...]
> Gesendet: Di 05.08.2008 00:47
> An: uima-user@...
> Cc: Villemos, Gert
> Betreff: Re: AW: Using UIMA for structured data sources
(Continue reading)

jochen.leidner | 5 Aug 03:22
Favicon

RE: AW: AW: Using UIMA for structured data sources

Gert,

As Greg pointed out, UIMA is just an API spec with Java (and soon C++)
Implementation(s), but you can iterate over UIMA's "CAS" annotations and
then use another existing API, Jena, to create and persist RDF as XML or
to a DB via JDBC:
	http://jena.sourceforge.net/
This has a class Triple for RDF statements, an interface RDFXMLWriterI,
and so on.
    Your requirement sounds like such a piec of code might be useful for
more people as well, so you might want to consider contributing your
solution to the UIMA codebase, which would have the benefit to the
community that they don't need to re-write that glue code, and to you
that you would get free extensions and bug patches from other people who
choose to use it.

	Best regards,
		Jochen

-----Original Message-----
From: Greg Holmberg [mailto:holmberg2066@...] 
Sent: Monday, August 04, 2008 7:13 PM
To: uima-user@...
Cc: Villemos, Gert
Subject: Re: AW: AW: Using UIMA for structured data sources

Gert--

UIMA does't store at all.  It's just an API you call--document in,
annotations out.  That is to say, Java objects.  What you do with those
(Continue reading)


Gmane