Olly Betts | 1 Sep 10:34
Favicon
Gravatar

Re: "Undefined Reference to" errors

On Thu, Aug 30, 2007 at 11:57:22AM +0100, Kwok-yau Kwong wrote:
> kwok <at> kwok-desktop:~/Desktop/Xapian-Index$ make -f Makefile all
> g++ -c -O3 -g -Wno-deprecated -I/usr/local/include/xapian/
> -I/usr/local/include/ trec_index.cc -o trec_index.o
> g++ -O3 -g -Wno-deprecated config_file.o htmlparse.o stopword.o gunzipper.o
> split.o timer.o trec_index.o /usr/local/lib/libxapian.a -o trec_index

It's better to replace:

    /usr/local/lib/libxapian.a

with (including the ``):

    `/usr/local/bin/xapian-config --libs`

(And if you really want to force static linking, just don't build the
shared libraries at all - configure with --disable-shared).

The purpose of xapian-config is to take care of any special compiler and
linker options for you, so it makes your makefile more portable.

> All errors seem to be related to "inflate" or "deflate". Am i missing
> something?

You didn't link against zlib.  If you use `xapian-config --libs` then
that should happen automatically.

Cheers,
    Olly
(Continue reading)

Olly Betts | 1 Sep 11:38
Favicon
Gravatar

Re: The zh_TW translation of intro_ir.html

On Thu, Aug 09, 2007 at 05:20:23PM +0800, ??? ????????? ??? (Yung-chung Lin) wrote:
> The translation is available at
> http://docs.google.com/View?docID=dxwtbbr_2fqwqmv
> 
> You may include the body on xapian's web site, or just provide a link to it.

Thankyou for translating, but I'm afraid we can't really use your
translation - you've licensed it as
http://creativecommons.org/licenses/by-nc-nd/3.0/ which doesn't allow
commercial use or modification, both of which we really need to be able
to permit.

Both restrictions are contrary to the DFSG, for example, so your
translation couldn't be included in Debian packages.  I imagine similar
issues would apply to most other Linux distributions.

Would you be willing to consider a less restrictive licence?

Cheers,
    Olly
Picon

Re: The zh_TW translation of intro_ir.html

It's all good. I have changed the license to be Creative Commons
Attribution-Share Alike 3.0 License. Please visit the web page again.
It should be no problem now.

Best,
Yung-chung Lin

On 9/1/07, Olly Betts <olly <at> survex.com> wrote:
> On Thu, Aug 09, 2007 at 05:20:23PM +0800, ??? ????????? ??? (Yung-chung Lin) wrote:
> > The translation is available at
> > http://docs.google.com/View?docID=dxwtbbr_2fqwqmv
> >
> > You may include the body on xapian's web site, or just provide a link to it.
>
> Thankyou for translating, but I'm afraid we can't really use your
> translation - you've licensed it as
> http://creativecommons.org/licenses/by-nc-nd/3.0/ which doesn't allow
> commercial use or modification, both of which we really need to be able
> to permit.
>
> Both restrictions are contrary to the DFSG, for example, so your
> translation couldn't be included in Debian packages.  I imagine similar
> issues would apply to most other Linux distributions.
>
> Would you be willing to consider a less restrictive licence?
>
> Cheers,
>     Olly
>
(Continue reading)

Olly Betts | 1 Sep 12:54
Favicon
Gravatar

Re: Djapian project

On Sat, Aug 25, 2007 at 04:13:59PM -0300, Rafael SDM Sierra wrote:
> Hi all, I develop a application to integrate with others projects in
> Django, to improve full text index suport, it was based in the stopped
> branche search-api:
> 
> http://code.google.com/p/djapian/

I've added a link to the "users" page.

Cheers,
    Olly
Olly Betts | 1 Sep 21:57
Favicon
Gravatar

Re: BUG IN XAPIAN_FLUSH_THRESHOLD

On Tue, Aug 28, 2007 at 11:17:22AM +0900, Sungsoo Kim wrote:
> I have the same experience with xapian 0.9.4 that Kevin described
> before. I am sure that XAPIAN_FLUSH_THRESHOLD is not working in 0.9.4.

You ought to consider upgrading incidentally - even if you aren't ready
to migrate to 1.0.x, 0.9.10 has a number of bug fixes and a few
performance tweaks too.

> I can see my indexer stops for a while every 10,000 records to flush
> the buffer after I set XAPIAN_FLUSH_THRESHOLD environment variable to
> 100,000.

I don't have 0.9.4 around, but in SVN HEAD, setting
XAPIAN_FLUSH_THRESHOLD to 1000 makes indexing the 5000 odd documents in
/usr/share/doc with omega flush 6 times rather than just once as it does
if XAPIAN_FLUSH_THRESHOLD isn't set.

There's a bug (fixed in 0.9.7) which double-counted calls to
replace_document(docid, doc) if docid wasn't already used, but otherwise
this code hasn't changed for ages that I can see.

Pauses could be due to other factors perhaps, but a reliable indicator
of how many flushes you've had can be got by running quartzcheck on the
record table:

quartzcheck /path/to/database/record_

The "revision" reported is how many times the database has been flushed
(implicitly or explicitly).

(Continue reading)

Kwok-yau Kwong | 2 Sep 13:44
Picon

Re: "Undefined Reference to" errors

Thank you for that and it has solved the problem that was occurring.
I am now encountering another problem when I attempt to execute my program,
but am unsure as to why my

kwok <at> kwok-desktop:~/Desktop/Xapian-Index$ ./trec_index
./trec_index: error while loading shared libraries: libxapian.so.15: cannot
open shared object file: No such file or directory

So just wondering at the reasons of why these shared libraries were not
built as I cannot find them manually either, and if  and how I can manually
build them?

Many many Thanks
Kwok!

On 9/1/07, Olly Betts <olly <at> survex.com> wrote:
>
> It's better to replace:
>
>     /usr/local/lib/libxapian.a
>
> with (including the ``):
>
>     `/usr/local/bin/xapian-config --libs`
>
James Aylett | 2 Sep 14:38

Re: Python bindings and unicode strings

On Thu, Aug 30, 2007 at 03:02:22PM -0400, Deron Meranda wrote:

> I understand that the Xapian core uses UTF-8, but is there a way to
> get the Python bindings to always work with Python's native unicode
> string type so that the underlying UTF-8 is not exposed?

This isn't true, and therein lies the problem. Xapian core treats
everything as blobs of bytes; in many cases the sensible choice for
applications is to put UTF-8 in there.

> It appears that I can store unicode strings, like;
> 
> >>>  document.set_term( u'panach\u00e9' )
> 
> but then when I get them back out they're plain byte sequences (UTF-8
> encoded) rather than nice unicode strings,
> 
> >>>  [t.term for t in document.allterms()]
> ['panach\xc3\xa9']
> 
> I would have expected to get [u'panach\u00e9'] out instead.

I'm not sure what the right way of solving this is. Ideally we want a
way of saying what encoding is being used, and have Python do the
right thing. It would probably always come out as a Unicode string,
but the deserialisation would depend on the encoding used.

We might be okay having one encoding for everything, rather than
separate for terms and doc data... and values. Hmm. And I guess we
could stuff this into database metadata, which would make it
(Continue reading)

Olly Betts | 2 Sep 14:38
Favicon
Gravatar

Re: "Undefined Reference to" errors

On Sun, Sep 02, 2007 at 12:44:31PM +0100, Kwok-yau Kwong wrote:
> kwok <at> kwok-desktop:~/Desktop/Xapian-Index$ ./trec_index
> ./trec_index: error while loading shared libraries: libxapian.so.15: cannot
> open shared object file: No such file or directory

This isn't really Xapian-specific.  You've installed libraries in
/usr/local/lib but the dynamic loader doesn't usually look there by
default.

Some options:

* configure xapian-core with --prefix=/usr so the libraries install in
  /usr/lib where the dynamic loader looks by default.

* Add /usr/local/lib to /etc/ld.so.conf and run /sbin/ldconfig (as
  root).

* Set environment variable LD_LIBRARY_PATH=/usr/local/lib (and export it).

* Set an rpath on trec_index - if you're using GCC, pass
  -Wl,-R/usr/local/lib when linking trec_index, or use libtool to
  do the linking (this is how the binaries shipped with Xapian know
  where to find libxapian even if it's installed outside the dynamic
  loader's search path).

Which approach is best is mostly down to local policy and personal
preference.  If you want a recommendation, setting rpath seems
neatest to me.

It would be useful to others to have a TREC indexer included in the
(Continue reading)

James Aylett | 2 Sep 14:46

Re: Clarification of values, data, fields, and prefixed terms

On Thu, Aug 30, 2007 at 05:31:23PM -0400, Deron Meranda wrote:

> I'm fairly new to Xapian and one of the more confusing hurdles to
> understand is the different ways to attach meta-data to documents.  It
> seems like there are several different ways:
> 
>  * values
>  * data (which can then by convention be formatted into fields)
>  * prefixed terms

Each of these has a distinct use in Xapian. Two (values and prefixes
terms) are giving different types of metadata that Xapian itself can
use; the other (data) is for application metadata that Xapian can
happily ignore.

> Values are user-defined discrete strings (identified by a "slot"
> number).  A document can have either zero or exactly one value for any
> given slot number.  Xapian does not interpret the meaning of the value
> string nor does it predefine any slots, but it does allow for
> filtering queries based upon a simple lexigraphical "range" of values
> that matched documents should posses.

Values are used for filtering in the match process. So collapsing can
be done on a value; you can use them in a MatchDecider and so
on. Range filtering is another example, as you point out.

> Prefixed terms index documents just like ordinary terms/words and thus
> are used in probabiistic searches, and can carry positional
> information if desired.  Prefix terms are really just a convention
> (not part of Xapian core) by prepending some letters to the front of
(Continue reading)

James Aylett | 2 Sep 14:50

Re: "Undefined Reference to" errors

On Sun, Sep 02, 2007 at 01:38:48PM +0100, Olly Betts wrote:

> * Add /usr/local/lib to /etc/ld.so.conf and run /sbin/ldconfig (as
>   root).

This isn't pertinent here, but note that on Solaris this is done using
the crle(1) tool (Configure Runtime Linking Environment). Other
approaches should work fine on Solaris, however.

J

--

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james <at> tartarus.org                               uncertaintydivision.org

Gmane