Tim Brody | 23 Jan 11:23
Picon
Favicon

Re: get_docid???

On Sat, 2012-01-21 at 09:56 +0000, Olly Betts wrote:
> On Sat, Jan 21, 2012 at 04:26:41AM -0500, Jim Lynch wrote:
> > Thanks, but I'm just a little confused.  I thought all we had today were  
> > SWIG generated Perl bindings??
> 
> No.  Search::Xapian is hand-coded XS in 1.2.x (and earlier).  Since
> 1.2.4 there are also SWIG-generated Perl bindings in xapian-bindings,
> but we decided to make the final switch with a new release series, as
> it may cause minor incompatibilities with existing Perl code which uses
> Xapian.

The CPAN module is the hand-coded XS. You will need to build or get
packages for "xapian-bindings" to get the SWIG version.

Both versions live in the "Search::Xapian" namespace so you can install
one or other but not both.

/Tim.
Wes Chow | 20 Jan 18:16
Favicon
Gravatar

Python Xapian tutorial


I've written a little Xapian tutorial using Python that indexes tweets. 
It's hopefully useful to anybody trying to figure out how it all works 
in Python (actually, how it all works in general). Please let me know if 
I've made any mistakes, and comments are always welcome.

http://www.s7labs.com/learn/xaptut/

Wes

--

-- 
http://www.s7labs.com
Jim Lynch | 20 Jan 17:00
Favicon

get_docid???

     my $mset = $enq->get_mset($nstart,$nrecords);

     for(my $mit=$mset->begin(); $mit != $mset->end();$mit++) {

         my $doc = $mit->get_document();
         my $dat = $doc->get_data();
         my $id = $doc->get_docid();
     }

[Fri Jan 20 10:35:06 2012] newmail.cgi: Can't locate 
auto/Search/Xapian/Document/get_docid.al in @INC (@INC contains: 
/etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 
/usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 
/usr/local/lib/site_perl .) at newmail.cgi line 286

 From the api at 
http://xapian.org/docs/apidoc/html/classXapian_1_1Document.html

docid Xapian::Document::get_docid     (          )      const

Get the document id which is associated with this document (if any).

NB If multiple databases are being searched together, then this will be 
the document id in the individual database, not the merged database!

Returns:
     If this document came from a database, return the document id in 
that database. Otherwise, return 0 (in Xapian 1.0.22/1.2.4 or later; 
prior to this the returned value was uninitialised).

(Continue reading)

Jim Lynch | 20 Jan 12:02
Favicon

Perl version of sortable_serialize missing?

I attempted to use the sortable_serialize function from perl, however 
doesn't seem to exist.  The only occurrence of the string "sortable" in 
the /usr/local/perl/5.10.1/Search/ tree is in the pod in Xapian.pm.

What am I doing wrong?

use Search::Xapian;
...
             $doc->add_value(4,sortable_serialize($recdate));

Undefined subroutine &main::sortable_serialize called at 
newgenstaticindex line 444

Thanks,
Jim.
James Aylett | 15 Jan 17:49

Re: I'm trying to relate what I know about Omega/Scriptindex with the actual data

On 15 Jan 2012, at 15:05, Jim Lynch <jim <at> fayettedigital.com> wrote:

> So scriptindex does a set_doc but delve doesn't show the data  placed by set_doc as data.  I'm guessing delve
is in the omega family and interprets what's in the data but doesn't dump it in a raw format.  That makes sense.

Close -- delve is dumping the raw format, it just happens to be human readable.

J
Jim Lynch | 15 Jan 14:55
Favicon

I'm trying to relate what I know about Omega/Scriptindex with the actual data

James, thanks for the explanations.  I misread the notes.

As an exercise, I'm trying to convert an existing project that currently 
uses Scriptindex and Omega to direct Xapian API calls.  I did a (I 
think) complete dump of a document with
delve -r 565 -d database
and I see things like

subject='A typical subject'
  with a corresponding set of terms like
  Sa Stypical Ssubject

Which is what I expect, however I have two "fields" unixdate and summary 
which I've specified as

unixdate : field date=unix
summary : field

In my index file.  They are displayed in the delve output as

summary=Do you remember what was wrong with the bearings?
unixdate=1181883741

I don't see a set of terms that would correspond to either of these.  
Yes, the words (terms) are there but no prefixes to indicate how they 
are related to the field names.  I assume there is some magic and/or 
delve isn't dumping everything.

The purpose of this investigation is to figure out how to add something 
to the document, storing this info.  In looking at the Document api, I 
(Continue reading)

Jim Lynch | 15 Jan 14:07
Jim Lynch | 15 Jan 12:57
Favicon

Wiki broken link

There's a broken link on the wiki at http://trac.xapian.org/wiki/SampleCode.

The first Perl example points to 
http://svn.xapian.org/examples/?root=Search-Xapian but that url gives a 
404 error.  I'd fix it but I don't know where the examples are kept 
these days.

Jim.
Shane Spencer | 8 Jan 22:32
Gravatar

Testing document size preallocation.

https://gist.github.com/ad2accc5b4655753923d

So here I am creating a database with no values for each small
document and one with a bunch of blank values (uuid_blank).  Once
those are flushed then I reopen them and start replacing the documents
of each with identical documents that have an identical large set of
values.  I am using replace_document and a specific document ID.

Is there a specific problem that I'm up against that shows that
preallocation is up to 2 times slower for replacing an identically
sized document rather than adding to its final serialized size?

- Shane
hightman | 5 Jan 06:28

Enhance synonyms feature of the query parser (patch included)

Very few people seem to be using synonym in Xapian, I recently found some problems in the use of synonyms.

Normally, I think we should not contain any prefix info in synonym table except that 'Z'. 
For example, I have the following synonyms and prefix info:

db.add_synonym("search", "find");
db.add_synonym("Zsearch", "Zfind");
db.add_synonym("foo bar", "foobar");
qp.add_prefix("title", "T");

I think my expected results of query parser should be like this:

"search something" ==> "(Zsearch:(pos=1) SYNONYM find:(pos=1)) AND Zsometh:(pos=2)
"title:search" ==> "ZTsearch:(pos=1) SYNONYM Tfind:(pos=1)"
"title:searching" ==> "ZTSearch:(pos=1) SYNONYM ZTfind:(pos=1)"
"title:(foo bar)" ==> "(ZTfoo:(pos=1) AND ZTbar:(pos=2)) SYNONYM Tfoobar:(pos=1)
...
In general, it is hoped can add prefix info to synonym term automatically, But it does not supportted in
current xapian version.

In addition, I have another question about prefix_info of the Term object, it is a vector list, but I don't
know when 
there are multi prefixes for a term?? It leads me to worry about the modifier for multi words, because I only consider
the first prefix.

--- PATCH CONTENT BEGIN 'queryparser/queryparser.lemon' ---

*** queryparser.lemony  2012-01-05 12:28:39.000000000 +0800
--- queryparser.lemony.new      2012-01-05 12:52:56.000000000 +0800
***************
(Continue reading)

hightman | 4 Jan 08:53
Picon
Favicon

[issue] The difference between QueryParser::FLAG_AUTO_SYNONYMS and QueryParser::FLAG_AUTO_MULTIWORD_SYNONYMS

I don't know whether this is a BUG or for special purpose...

According to the definition of "xapian/queryparser.h", FLAG_AUTO_MULTIWORD_SYNONYMS contains bit of
FLAG_AUTO_SYNONYMS .

Therefore, long as I set the parse flag with FLAG_AUTO_SYNONYMS, the query parser will automatically
activate 
the function of FLAG_AUTO_MULTIWORD_SYNONYMS. See the below source code part from "queryparser.lemon"

...
1358     subqs.reserve(terms.size());
1359     if (state->flags & QueryParser::FLAG_AUTO_MULTIWORD_SYNONYMS) {
1360         // Check for multi-word synonyms.
1361         Database db = state->get_database();
...

Gmane