Steve Zatz | 9 Feb 13:53
Picon

Any update on a python 3.x compatible bindings for Xapian

I see this trac ticket:  http://trac.xapian.org/ticket/346 which looks like
it was last updated 6 months ago.  Is there any information available on the
time frame?  Thanks,

Steve
Henry C. | 5 Feb 10:35
Picon

trac error adding comment to #394

Greets,

Just tried to add a comment to trac ticket 394.  Resulted in error:

Trac detected an internal error:
AttributeError: 'NoneType' object has no attribute 'split'

----

My comment, for posterity, was:

You may want to seriously consider pushing this patch into trunk since the
performance benefit is, well, significant.

My phrase test:  from 15s+ down to 1s

----

regards
Henry
Henry C. | 2 Feb 13:49
Picon

Optimal usage of xapian-compact for merging

Greets,

I've been wondering, what's the sane/optimal use of xapian-compact when
merging many indexes with a view to maximum merging performance?

The obvious:
- only use -F on the final db.
- use -m since I'm merging more than 3 dbs.

Best strategy?
a)  loop:  merge batches (of say 50, where the individual db's are small)
into a temp index, then merge the (larger) temp into the final product...
end-loop

b)  loop:  merge batches (of say 50, where the individual db's are small)
into many temp indexes... end-loop
Then merge those (larger) temps into the final product.

Finally, presumably it's best to use the same blocksize (-b) as the
underlying filesystem?  I see the default is 8K, but the default blocksize
on (eg) ext3 is 4k...  or am I way off here?

Thanks
Henry
Eugene! | 2 Feb 08:22
Picon

How to use a custom stemmer from Python bindings?

Hi,

I'm using Xapian bindings for Python in my project. How could I use a
custom stemmer instead of the included one (Snowball)? The one I'm
looking at right now is Hunspell (http://hunspell.sourceforge.net/)
which has Python bindings (http://code.google.com/p/pyhunspell/).

Thanks in advance,

Eugene
Tom | 1 Feb 10:41
Picon
Gravatar

Xapian core dumping on Solaris

Hi everyone,

I'm having a problem with xapian (the matchspy branch, with Python
bindings) on Solaris 10 / SPARC. Users can run queries for a few hours
with no problems, then it returns "inflate failed (invalid code
lengths set)"  to Python and dumps core. gdb reports:

#0  0xfe6dc910 in inflate_table () from /usr/lib/libz.so.1
#1  0xfe6d9d6c in inflate () from /usr/lib/libz.so.1
#2  0xfe2ebe20 in FlintTable::read_tag (this=0x75ed70, C_=0x75ede0,
    tag=0xfdcf8568, keep_compressed=false)
    at backends/flint/flint_table.cc:1254
#3  0xfe2eff70 in FlintTable::get_exact_entry (this=0x75ed70, key=@0xfdcf8578,
    tag=@0xfdcf8568) at backends/flint/flint_table.cc:1190
#4  0xfe2ddc04 in FlintSpellingTable::open_termlist (this=0x75ed70,
    word=@0xfdcf8998) at backends/flint/flint_spelling.h:46
#5  0xfe20dcb0 in Xapian::Database::get_spelling_suggestion (this=0x75b5a8,
    word=@0xfdcf8998, max_edit_distance=2) at include/xapian/base.h:476
#6  0xfe39ba90 in Xapian::QueryParser::Internal::parse_query (this=0x75b590,
    qs=@0x798288, flags=128, default_prefix=@0xfdcf8a44)
    at queryparser/queryparser.lemony:948
#7  0xfe38fa0c in Xapian::QueryParser::parse_query (this=0x798218,
    query_string=@0x798288, flags=128, default_prefix=@0xfdcf8be0)
    at include/xapian/base.h:154
#8  0xfe5551f0 in _wrap_QueryParser_parse_query (self=0x0, args=0x712648)
    at /usr/sfw/lib/gcc/sparc-sun-solaris2.10/3.4.3/../../../../include/c++/3.4.3/ext/new_allocator.h:69
#9  0x00117290 in PyCFunction_Call (func=0x60d9e0, arg=0x712648, kw=0x0)
    at Objects/methodobject.c:116

This doesn't seem to be related to any particular query, and I haven't
(Continue reading)

Tom | 1 Feb 10:48
Gravatar

Solaris core dump

Hi everyone,

I'm having a problem with xapian (the matchspy branch, with Python
bindings) on Solaris 10 / SPARC. Users can run queries for a few hours
with no problems, then it returns "inflate failed (invalid code
lengths set)"  to Python and dumps core. gdb reports:

#0  0xfe6dc910 in inflate_table () from /usr/lib/libz.so.1
#1  0xfe6d9d6c in inflate () from /usr/lib/libz.so.1
#2  0xfe2ebe20 in FlintTable::read_tag (this=0x75ed70, C_=0x75ede0,
   tag=0xfdcf8568, keep_compressed=false)
   at backends/flint/flint_table.cc:1254
#3  0xfe2eff70 in FlintTable::get_exact_entry (this=0x75ed70, key=@0xfdcf8578,
   tag=@0xfdcf8568) at backends/flint/flint_table.cc:1190
#4  0xfe2ddc04 in FlintSpellingTable::open_termlist (this=0x75ed70,
   word=@0xfdcf8998) at backends/flint/flint_spelling.h:46
#5  0xfe20dcb0 in Xapian::Database::get_spelling_suggestion (this=0x75b5a8,
   word=@0xfdcf8998, max_edit_distance=2) at include/xapian/base.h:476
#6  0xfe39ba90 in Xapian::QueryParser::Internal::parse_query (this=0x75b590,
   qs=@0x798288, flags=128, default_prefix=@0xfdcf8a44)
   at queryparser/queryparser.lemony:948
#7  0xfe38fa0c in Xapian::QueryParser::parse_query (this=0x798218,
   query_string=@0x798288, flags=128, default_prefix=@0xfdcf8be0)
   at include/xapian/base.h:154
#8  0xfe5551f0 in _wrap_QueryParser_parse_query (self=0x0, args=0x712648)
   at /usr/sfw/lib/gcc/sparc-sun-solaris2.10/3.4.3/../../../../include/c++/3.4.3/ext/new_allocator.h:69
#9  0x00117290 in PyCFunction_Call (func=0x60d9e0, arg=0x712648, kw=0x0)
   at Objects/methodobject.c:116

This doesn't seem to be related to any particular query, and I haven't
(Continue reading)

Jesper Krogh | 30 Jan 14:30

Failure trying to update document.

Hi list.

I have a specific document that does not handle updates sitting in the
index. What can I do about that?

2010-01-30T13:58:07     Eval failure: Exception: No termlist for
document 287376 at /usr/lib/perl5/Search/Xapian/Enquire.pm line 56.
2010-01-30T13:58:07     job failed.  considering retry.  is max_retries
of 1000 >= failures of 1?
2010-01-30T13:58:07     job failed: Exception: No termlist for document
287376 at /usr/lib/perl5/Search/Xapian/Enquire.pm line 56.

Xapian version 1.0.14

--

-- 
Jesper
emmanuel | 28 Jan 11:50

Problem getting Xapian working with Burmese


 On Fri, Aug 21, 2009 at 02:44:44PM +0200, emmanuel at engelhart.org wrote:

>> I want to update my request. >> Is my question bad formulated? too trivial? ... or maybe pretty >> complicated/unclear? > >I think nobody answered as it was hard to follow your example because >the Burmese characters seem to have been mangled (at least the message I >received wasn't valid utf-8). > >But looking at the code, I see an issue: > >> my $db = Search::Xapian::Database->new( './xapdb' ); >> my $enq = $db->enquire( $ARGV[0] ); > >What this does is to create an Enquire object and set Query($ARGV[0]) as >the query. That works OK if $ARGV[0] is a single word which gets >indexed as a single term, but you really want to parse the query string >to get a Query object: > > my $db = Search::Xapian::Database->new( './xapdb' ); > my $queryparser = Search::Xapian::QueryParser->new(); > my $query = $queryparser->parse_query( $ARGV[0] ); > my $enq = $db->enquire( $query ); > >I'd guess that is probably your problem, but I can't tell for sure as I >can't test your examples... > >For further information on debugging this sort of problem, see: > >http://trac.xapian.org/wiki/FAQ/NoMatches >
Hi Olly, thank vor your answer (and sorry not having answered before). Your answer helped me and I think I now understand why "it does not work". For test purpose I index one document with one string with index_text_without_positions() (C++ API) the string "ဝီ​ကီ​ပိ​သုံး​စွဲ​သူ​များက" See this log: http://tmp.kiwix.org/tmp/kiwix-index.log (utf8 encoded) But if I run "delve -r 1 /path/to/db" on the index I get following answer: Term List for record #1: test က စ ပ မ ဝ သ (utf8 encoded) See the log : http://tmp.kiwix.org/tmp/delve.log So, it seems to be clear for me why "it does not work" : my word is splitted in single lletters and a lot of letters are removed. Do I'm right? Do we can avoid that and index "ဝီ​ကီ​ပိ​သုံး​စွဲ​သူ​များက" as only one word? Regards Emmanuel _______________________________________________ Xapian-discuss mailing list Xapian-discuss <at> lists.xapian.org http://lists.xapian.org/mailman/listinfo/xapian-discuss
(Continue reading)

Marlon Baculio | 21 Jan 17:31
Picon
Favicon

Xapian under 360 MB VPS


Hello,

I wish to get some feedback on the use of Xapian in a virtual machine hosting plan with 360MB. The processes to
share the 360MB will be the following:

0. nginx web server as front (estimated 5MB)

1. custom C++ FastCGI for dynamic requests (estimated 10MB)

2. Xapian writer (1 process and 1 thread)

3. Xapian readers (1 process with n threads for n readers)

4. PostgreSQL (estimated 50MB or lower)

That leaves about 300MB for Xapian and the rest of the Linux OS. The main UI will be a Google style search box.

Questions:

0. How would you configure Xapian for such low memory systems (e.g. how many readers, flush threshold for writer)?

1. Will file handle limitation be a problem for multithreaded Xapian reader?

2. What are advantages of multiprocess readers (compared to multithreaded) aside from crash isolation

Thanks so much!
Marlon

 		 	   		  
(Continue reading)

Henry | 20 Jan 10:18
Picon

Error when creating trac bug ticket

Greets

Just tried to create a bug ticket on trac.xapian.org and it croaked with
the error:

-----------
Trac detected an internal error:
IntegrityError: columns ticket, name are not unique

The action that triggered the error was:
POST: /newticket
-----------

Clicking on the Create button to report the error results in an invalid URL.

What's the best way to proceed to report my bug?

Thanks
Henry
Menard, Daniel | 19 Jan 16:35
Picon
Favicon

QueryParser: aliases and OP_AND

Hello,

I'm wondering about how the QueryParser parses a query containing an "alias" when the default operator is OP_AND
(by "alias", I mean a search field mapped to multiple term prefixes).

With the following php code :
<?php
$parser=new XapianQueryParser();
$parser->set_default_op(XapianQuery::OP_AND);
$parser->add_prefix('alias', 'AUT1:');
$parser->add_prefix('alias', 'AUT2:');
echo $parser->parse_query('alias:(john smith)')->get_description();
?>

I get:

Xapian::Query(((AUT1:john:(pos=1) OR AUT2:john:(pos=1)) AND (AUT1:smith:(pos=2) OR AUT2:smith:(pos=2))))
i.e. (AUT1:john OR AUT2:john) AND (AUT1:smith OR AUT2:smith)

I was expecting something like:

Xapian::Query(((AUT1:john:(pos=1) AND AUT1:smith:(pos=2)) OR (AUT2:john:(pos=1) AND AUT2:smith:(pos=2))))
i.e. (AUT1:john AND AUT1:smith) OR (AUT2:john AND AUT2:smith)

I think that I understand why I get the current result: "alias:(john smith)" is parsed as "alias:john AND
alias:smith" and the alias is then expanded.

But for my application, it produces some "noise" because a record containing the following data :
"Aut1=john lennon, Aut2=will smith" will appear in the Mset.

(Continue reading)


Gmane