Encolpe Degoute | 1 Jul 03:15 2011
Picon

ESA Summer of Code in Space 2011

Hello,

Does anybody know about this project: http://sophia.estec.esa.int/socis2011/

It seems to be similar to the GSoc and the mentoring organization is for
the July 15th.
A bad point: there is country limitation for the student and organization.

Regards,

--

-- 
Encolpe DEGOUTE
http://encolpe.degoute.free.fr/
Logiciels libres, hockey sur glace et autres activités cérébrales

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
Plone Tests Summarizer | 1 Jul 13:57 2011
Picon

Plone Tests: 6 OK

Summary of messages to the testbot list.
Period Thu Jun 30 12:00:00 2011 UTC to Fri Jul  1 12:00:00 2011 UTC.
There were 6 messages: 1 from ATContentTypes Tests, 1 from Archetypes Tests, 1 from Plone Libraries Tests,
1 from Plone Products Tests, 2 from Plone Tests.

Tests passed OK
---------------

Subject: OK : Plone-3.3 Zope-2.10 Python-2.4.6
From: Plone Products Tests
Date: Fri Jul  1 06:34:55 UTC 2011
URL: https://lists.plone.org/pipermail/plone-testbot/2011-July/016850.html

Subject: OK : Plone-3.3 Zope-2.10 Python-2.4.6
From: Plone Libraries Tests
Date: Fri Jul  1 06:39:55 UTC 2011
URL: https://lists.plone.org/pipermail/plone-testbot/2011-July/016851.html

Subject: OK : AT-1.5 Plone-3.3 Zope-2.10 Python-2.4.6
From: Archetypes Tests
Date: Fri Jul  1 06:46:12 UTC 2011
URL: https://lists.plone.org/pipermail/plone-testbot/2011-July/016852.html

Subject: OK : ATCT-1.3 Plone-3.3 Zope-2.10 Python-2.4.6
From: ATContentTypes Tests
Date: Fri Jul  1 06:52:39 UTC 2011
URL: https://lists.plone.org/pipermail/plone-testbot/2011-July/016853.html

Subject: OK (99 packages) : Plone-4.0 Zope-2.12 Python-2.6.6
From: Plone Tests
(Continue reading)

Matt Hamilton | 1 Jul 18:21 2011
Picon

Re: Plone now sexy: plone.app.cmsui testing and help needed!

Martin Aspeli <optilude <at> ...> writes:

> As many of you know, there was a UI Sprint in Bristol, over the previous 
weekend, fantastically hosted by Matt Hamilton and NetSight.

A write-up of the sprint is now up here:

http://www.netsight.co.uk/blog/plone-ui-sprint-2011-writeup

There is a screencast in there showing the main features worked on at 
the sprint and how the package currently stands... an amazing amount 
was done in such a short time!

-Matt

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
Alan Runyan | 1 Jul 18:42 2011
Picon

Re: File improvements?

> What formats do we want to support?
>
> I think we should aim to support any halfway modern versions of
> Microsoft Office incl. the old binary formats, OpenOffice, some Apple
> Pages/Numbers, PDF, ... and some more.

Windows has built-in functionality
for indexing.  It indexes all of the formats you mention.
Here is the Zope component:

http://dist.enfoldsystems.com/catalog/filterpack

This could ship *today* for Windows installer.  It uses
"modern" (been around for 8+ years) iFilter functionality inside of
Windows.  This has been opensourced since we opened up all
Enfold Server components 2 years ago.

Running SOLR/pyUNO is only really required for more advanced
features (preview, transformations, etc).  Indexing, the Windows users
should use the native functionality of the OS.

Some notes:
   - When you install latest Adobe Reader, it comes with IFilter.  You
   need to reinstall FilterPack but then you can index PDF, etc.
   - When you install Sharepoint IFilter extensions you get all
   MS Office extensions.
   - There are 3rd party IFilter companies that index nearly everything

Regardless of IFilter/SOLR/PyUNO/popplar -- its always best to do
this stuff out-of-process.  IFilter is C++ and a bad extension can take
(Continue reading)

Alan Runyan | 1 Jul 18:49 2011
Picon

Re: File improvements?

>> How do other CMS's handle pdf support? What about other office documents?
>
> They use commercially licensed components, generally, either in Java or
> .NET/COM.

 iFilter is apart of the OS It is how Windows Search works.

If we want to think big - i believe - we should scrap all of this
insanity of installing
software everywhere.  Let a company provide a WEB API.  Give them prominent
exposure.  Let's have a web-api handle the tokenizing, preview, etc.  And Plone
can ship with that.  Installing software sucks at customer location.

We can continue to use the existing catalog's.  Just go out of process to
call into web api.  Maybe refactor the transformation subsystem and the
indexing subsystem.  both need to some love to work asynchronously.

The modern way to think about this is exposing a web api.  This could be the
"default" transformation.  People enable the "web api" and it will
send information
to the vendors web api.  the vendor maintains the service.  they could even make
money off it.

Enfold would possibly be interested in something like this.  But i'm sure there
are many other vendors who are interested and could do it.

alan

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
(Continue reading)

Alan Runyan | 1 Jul 19:56 2011
Picon

Re: File improvements?

> software everywhere.  Let a company provide a WEB API.  Give them prominent
> exposure.  Let's have a web-api handle the tokenizing, preview, etc.  And Plone
> can ship with that.  Installing software sucks at customer location.

I was rushing while writing the other 2 emails.

IIRC it is convoluted to integrate a indexing/preview/transformation system into
Plone without having quite a bit of implementation knowledge of a lot of the
system.  The public API for this functionality exists but it is not
isolated.  Also
the transformation system does not support async operations.

When working with large sets of Files/content Enfold has found:

  - Transactions are not always useful.  Especially bulk operations.
Batching works but its kinda painful.
  - You want to split navigation indexing (synchronous) from
full-text/trasnformation indexing (asynchronous).
  - zc.async works but its a heavy hammer.  a simpler queue
implementation could work just as well. (celery?)
  - transformation, indexing, preview and these operations are
separate concerns and often different processes
    handling the operations.  Setting this up and having it run on a
customer environment is a maintenance burden.

What do people think is off limits to the File discussion?  What is
taboo?  What changes should we not consider?  For instance, we can
say, WIndows users can index binary documents in process.  Linux users
dont get this functionality.  It requires them to use 3rd party
add-ons or install SOLR.  That seems
(Continue reading)

Martin Aspeli | 1 Jul 22:38 2011
Picon

Re: File improvements?


On 1 Jul 2011, at 18:56, Alan Runyan <runyaga@...> wrote:

>> software everywhere.  Let a company provide a WEB API.  Give them prominent
>> exposure.  Let's have a web-api handle the tokenizing, preview, etc.  And Plone
>> can ship with that.  Installing software sucks at customer location.
> 
> I was rushing while writing the other 2 emails.
> 
> IIRC it is convoluted to integrate a indexing/preview/transformation system into
> Plone without having quite a bit of implementation knowledge of a lot of the
> system.  The public API for this functionality exists but it is not
> isolated.  Also
> the transformation system does not support async operations.
> 
> When working with large sets of Files/content Enfold has found:
> 
>  - Transactions are not always useful.  Especially bulk operations.
> Batching works but its kinda painful.
>  - You want to split navigation indexing (synchronous) from
> full-text/trasnformation indexing (asynchronous).
>  - zc.async works but its a heavy hammer.  a simpler queue
> implementation could work just as well. (celery?)
>  - transformation, indexing, preview and these operations are
> separate concerns and often different processes
>    handling the operations.  Setting this up and having it run on a
> customer environment is a maintenance burden.

> What do people think is off limits to the File discussion?  What is
> taboo?  What changes should we not consider?  For instance, we can
(Continue reading)

Jon Stahl | 1 Jul 23:04 2011
Picon

Re: File improvements?

Just some random Googling, but I wonder if we could use:

catdoc (& xls & ppt): http://vitus.wagner.pp.ru/software/catdoc/
python-docx: https://github.com/mikemaccana/python-docx
slate (PDF): http://pypi.python.org/pypi/slate

As open-source "batteries included" tokenizers?

:jon

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
Alan Runyan | 1 Jul 23:27 2011
Picon

Re: File improvements?

On Fri, Jul 1, 2011 at 4:04 PM, Jon Stahl <jonstahl@...> wrote:
> Just some random Googling, but I wonder if we could use:
>
> catdoc (& xls & ppt): http://vitus.wagner.pp.ru/software/catdoc/
> python-docx: https://github.com/mikemaccana/python-docx
> slate (PDF): http://pypi.python.org/pypi/slate
>
> As open-source "batteries included" tokenizers?

I would shoot for 1 tool that handles all text extraction per platform.
I believe you wont be able (in Python) to have a better experience
than IFilter integration for extraction on Windows.

Having multiple extractors, each for a different format, is a
maintenance/support problem.

The best:
  - Cross platform web service

Good:
  - SOLR or some other service you must maintain on-premise
  - IFilter on Windows (better than SOLR)

Worst:
  - Each format gets its own extractor

Each of the above options require infrastructure in Plone for them
to operate seamlessly.  The big question, in my mind, is how
do we handle the synchronous vs. asynchronous updating.
Should the catalog be split into at least 2: navigation/structure,
(Continue reading)

Alan Runyan | 1 Jul 23:31 2011
Picon

Re: File improvements?

>> I have brought up the idea of having vendors owning/maintaining
>> certain functionality for Plone.  It seems having vendors tied to
>> specific OOTB Plone functionality is an unpopular idea.  If it is not,
>> maybe a vendor would step up for hosting a web api for tokenizing.
>
> I think it should be optional, it's interesting to explore.

So, how do we explore it?  Maybe we could get feedback asking
if customers would feel comfortable sending their documents to
a web service for transformation/text extraction.  If people ask their
customers and we get some results indicating they are comfortable
with the service given its private and no information is retained.

That is one way.  I have brought this up with several customers and
they are hesitant.  Although they would be less hesitant if there were
rich features such as web-based preview of content.  Features compelling
enough would possibly change their minds.

I still think doing out of process calls to web service should not be done
synchronously/transactionally.

alan

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
(Continue reading)


Gmane