Charlie Hull | 2 May 11:10

Xapian's 9th Birthday

Hi all,

Last year we had a number of small real-life social gatherings to 
celebrate Xapian's birthday - let's try and do this again! I hope with 
some more notice we can get more people together this time.

We can use this page to arrange things:
http://trac.xapian.org/wiki/MeetingsAndGatherings

I propose a date either in the first or second week of September.

Cheers

Charlie
tindal | 5 May 12:44
Picon

locate and omega: how to index file names?

hallo,

I'm indexing a filesystem using omindex, and users can query the 
database via omega: everything works fine

now I'd like to add an option like "search files by name" and I'm 
wondering how to do this

can omega search files by name directly? how should I build the query?

could I use scriptindex to index, eg., the locate database? how?

should I index file names directly?
can omindex do this or should I use scriptindex? how?

thanks in advice
tindal
Bill Hutten | 6 May 01:57

Dreaded "Premature end of script headers: omega"...


Hi all:

I've (almost) successfully setup Xapian and Omega on a Linux machine -  
the index has been created successfully, and using omega from the  
command line works perfectly...

However - it's not working as a CGI - I consistently get "Premature  
end of script headers: omega" in the Apache error log.

I've placed the "omega" executable in the cgi-bin directory, along  
with the "omega.conf" file.  (Is the .conf file required to be  
"beside" the omega executable)?  I've also checked permissions - my  
cgi-bin directory is owned by "southshore.psaserv", as is the omega  
executable and the omega.conf.

I'm obviously missing something... anyone care to point it out to me?   
I've checked the mailing list archives and the only suggestion was to  
make sure that the permissions on the omega executable matched the  
permissions on the cgi-bin directory - but they seem to match fine...

Help appreciated.

- bill
---
bill <at> hutten.org
Olly Betts | 6 May 11:45
Favicon
Gravatar

Re: Dreaded "Premature end of script headers: omega"...

On Mon, May 05, 2008 at 08:57:42PM -0300, Bill Hutten wrote:
> I've (almost) successfully setup Xapian and Omega on a Linux machine -  
> the index has been created successfully, and using omega from the  
> command line works perfectly...

OK, that's good so far.

> However - it's not working as a CGI - I consistently get "Premature  
> end of script headers: omega" in the Apache error log.

In case you aren't aware, this generally means "script died before
writing any output" (strictly speaking, before writing a blank line to
end the headers, but it's rare to die halfway through the headers for
a CGI which works elsewhere.

> I've placed the "omega" executable in the cgi-bin directory, along  
> with the "omega.conf" file.  (Is the .conf file required to be  
> "beside" the omega executable)?

It's not *required* to be there, but that is one of the places it can
be.  For more details, see the "omega configuration" section here:

http://xapian.org/docs/omega/overview.html

> I've also checked permissions - my  
> cgi-bin directory is owned by "southshore.psaserv", as is the omega  
> executable and the omega.conf.
> 
> I'm obviously missing something... anyone care to point it out to me?   
> I've checked the mailing list archives and the only suggestion was to  
(Continue reading)

Olly Betts | 6 May 11:53
Favicon
Gravatar

Re: locate and omega: how to index file names?

On Mon, May 05, 2008 at 12:44:18PM +0200, tindal wrote:
> I'm indexing a filesystem using omindex, and users can query the 
> database via omega: everything works fine
> 
> now I'd like to add an option like "search files by name" and I'm 
> wondering how to do this
> 
> can omega search files by name directly? how should I build the query?

Not if you index with omindex, since it doesn't index the full path of
files in any way.

> could I use scriptindex to index, eg., the locate database? how?

If you're able to dump the locate database's contents, just write a
script in your favourite scripting language to convert that to
scriptindex's input format.

> should I index file names directly?
> can omindex do this or should I use scriptindex? how?

Directly?  You can certainly index the filenames as you index the files
but you'd have to modify omindex to do this, or recurse the directory
tree dumping it into scriptindex's input format.  Or write your own
indexer from scratch.

Cheers,
    Olly
Francis Irving | 6 May 12:25
Gravatar

Re: acts_as_xapian, pre-release (Ruby on Rails)

I'm now using this on the live version of our Freedom of Information
website.

http://www.whatdotheyknow.com

If you fancy using acts_as_xapian, it's quite a bit more mature now.

Is anybody a Rails / Gem guru, or know one who fancies working out how
to package it up properly for the Rails world to enjoy?

Francis

On Fri, Apr 25, 2008 at 11:51:00AM +0100, Francis Irving wrote:
> Hi all,
> 
> I've been using Ruby on Rails, and finally got fed up with Solr/Lucene. So I've
> made acts_as_xapian. An early version is available here:
> 
> https://secure.mysociety.org/cvstrac/dir?d=mysociety/foi/vendor/plugins/acts_as_xapian
> 
> It works, but isn't deployed on a live site yet (will be on our UK Freedom of
> Information request filing/archiving site www.whatdotheyknow.com soon)
> 
> I've put the parts of the documentation which compare it to acts_as_solr at
> the bottom of this email.
> 
> Any suggestions as to features it should have that would be easy to add? It's
> got sort, date range, collapse, spelling, offline indexing, and integration
> with Rail models. Anything else big/obvious that most people will need?
> Or anything easy to add and genius looking (like spelling was!)?
(Continue reading)

tindal | 6 May 12:31
Picon

Re: locate and omega: how to index file names?

ok, that's my (still partial) solution:

find /dir/* -type f -printf 
'url=%p\npath=%h\nname=%f\nsize=%s\nmodtime=%AY-%Am-%Ad\n\n' |awk '{if 
($1 ~ /^path=/) gsub(/\//, "\n="); if ($1 ~ /^name=/) 
sub(/\./,"\nformat="); print}'| scriptindex /database/dir/filelist 
filelist2omega.script

in which filelist2omega.script contains:

url : index field=id field=url
name : weight=3 indexnopos hash field=name
path : indexnopos field=path
format : index field=format
size : index field=size
modtime : index field=modtime

and that's the record format for scriptindex:

url=/full/url/of/the/file.txt
path=
=full
=url
=of
=the
name=file
format=txt
size=436110
modtime=2008-05-06

(Continue reading)

James Aylett | 6 May 12:55

Re: locate and omega: how to index file names?

On Tue, May 06, 2008 at 12:31:13PM +0200, tindal wrote:

> url=/full/url/of/the/file.txt
> path=
> =full
> =url
> =of
> =the
> name=file
> format=txt
> size=436110
> modtime=2008-05-06

modtime should be a Unix timestamp (number of seconds since midnight,
1st January 1970).

> omega reads the size wrong, as it says, in this example, "436 bytes": why?

That's not obvious to me. If you pull out the document data for that
document, what does it look like?

> for the search I'd like to be able to choose between the "default" 
> database (made with omindex) and my "filelist" database, but I end up 
> searching both databases
> 
> the relevant code in the template is:
> <INPUT TYPE=radio NAME="DB" VALUE="default" 
> $if{$eq{$dbname,default},CHECKED}>Search the contents<br>
> <INPUT TYPE=radio NAME="DB" VALUE="filelist" 
> $if{$eq{$dbname,filelist},CHECKED}>Search the names<br>
(Continue reading)

tindal | 6 May 14:35
Picon

Re: locate and omega: how to index file names?

ok, problems solved :)

>> modtime=2008-05-06
> 
> modtime should be a Unix timestamp (number of seconds since midnight,
> 1st January 1970).

I didn't notice the specification of the "date" action, but it claims it 
should work with "yyyymmdd" format, too, but it doesn't (at least for me)

in unix format it works, thanks

> 
>> omega reads the size wrong, as it says, in this example, "436 bytes": why?
> 
> That's not obvious to me. If you pull out the document data for that
> document, what does it look like?
> 

it turns out that I didn't purge the database, and some past experiment 
gave me the unexpected result, now it works as expected

>> for the search I'd like to be able to choose between the "default" 
>> database (made with omindex) and my "filelist" database, but I end up 
>> searching both databases
>> 
>> the relevant code in the template is:
>> <INPUT TYPE=radio NAME="DB" VALUE="default" 
>> $if{$eq{$dbname,default},CHECKED}>Search the contents<br>
>> <INPUT TYPE=radio NAME="DB" VALUE="filelist" 
(Continue reading)

James Aylett | 6 May 14:53

Re: locate and omega: how to index file names?

On Tue, May 06, 2008 at 02:35:32PM +0200, tindal wrote:

> > modtime should be a Unix timestamp (number of seconds since midnight,
> > 1st January 1970).
> 
> I didn't notice the specification of the "date" action, but it claims it 
> should work with "yyyymmdd" format, too, but it doesn't (at least for me)
> in unix format it works, thanks

The display uses the omegascript $date{} command, which only takes
unix times.

J

--

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james <at> tartarus.org                               uncertaintydivision.org

Gmane