Olly Betts | 4 May 03:39 2010

Debian and Ubuntu packages of 1.2.0

I've updated the Debian packaging for Debian and Ubuntu for Xapian 1.2.

I'm aiming to get Xapian 1.2 into the next stable Debian release.  There's
no definite freeze date yet, but I've heard May/June mentioned.

Currently there are packages of Xapian 1.2.0 in Debian experimental, but
unstable and testing still have 1.0.x while I finish checking compatibility
with dependent packages, and wait for a suitable time to make the transition.

If you want to use these packages, see here for details as to how:

http://wiki.debian.org/DebianExperimental

The warning about "dangerous and harmful" doesn't apply to the Debian
packages currently there.

I've also created a new PPA on launchpad for backported versions of these
packages for various Ubuntu releases:

https://launchpad.net/~xapian-backports/+archive/xapian-1.2

At the time of writing, some i386 builds are still pending there.

For Xapian 1.2, I've stopped backporting packages to Ubuntu 6.06 (dapper) as
users conservative enough to stick with such an old release seem unlikely to
want the latest and greatest Xapian release, and the time it takes to continue
to support it can be more usefully spent.  There are also no packages for
Ubuntu 8.10 (intrepid) as that reached end of life at the end of last month.

Packages of 1.0.x are still available from the usual PPA:
(Continue reading)

Per Jessen | 11 May 15:18 2010
Picon

indexing words with alternative spellings

Some languages (e.g. German and Danish) have special letters that are
often written using two-letter combinations when the appropriate
keyboard or medium is not available:

ä = ae
ü = ue
ö = oe
æ = ae
ø = oe
å = aa
ß = ss 

(there are undoubtedly far more examples than those)

As a user of an index, I would like to be able to search for
e.g. "schaefer" and get matches on both 'ae' and 'ä' returned. Same if
I searched on 'schäfer'.  Is this something I would need to take into
account when I do the indexing or?

/Per Jessen, Zürich

_______________________________________________
Xapian-discuss mailing list
Xapian-discuss <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss
Oliver Flimm | 11 May 15:46 2010
Picon

Re: indexing words with alternative spellings

Hi,

On Tue, May 11, 2010 at 03:18:38PM +0200, Per Jessen wrote:
> Some languages (e.g. German and Danish) have special letters that are
> often written using two-letter combinations when the appropriate
> keyboard or medium is not available:
> ä = ae
[...]
> As a user of an index, I would like to be able to search for
> e.g. "schaefer" and get matches on both 'ae' and 'ä' returned. Same if
> I searched on 'schäfer'.  Is this something I would need to take into
> account when I do the indexing or?

you have to take it into account both when indexing and searching.

I'm using Xapian in a library catalogue and convert these "special"
character to the two-letter combination - both when generating terms
or postings and when processing user input. 

Regards,

O. Flimm

--

-- 
Universitaet zu Koeln :: Universitaets- und Stadtbibliothek
IT-Dienste :: Abteilung Universitaetsgesamtkatalog
Universitaetsstr. 33 :: D-50931 Koeln
Tel.: +49 221 470-3330 :: Fax: +49 221 470-5166
flimm <at> ub.uni-koeln.de :: www.ub.uni-koeln.de
(Continue reading)

Michel Pelletier | 11 May 18:00 2010
Picon

Re: indexing words with alternative spellings

Different languages have different libraries for dealing with this
issue.  We use one for Python called 'translitcodec' which can do both
long (ä -> ae) and short (ä -> a) conversion.  It's very likely there
is a similar library for whatever language you are using.

http://pypi.python.org/pypi/translitcodec/0.1

-Mike

On Tue, May 11, 2010 at 6:18 AM, Per Jessen <per <at> computer.org> wrote:
> Some languages (e.g. German and Danish) have special letters that are
> often written using two-letter combinations when the appropriate
> keyboard or medium is not available:
>
> ä = ae
> ü = ue
> ö = oe
> æ = ae
> ø = oe
> å = aa
> ß = ss
>
> (there are undoubtedly far more examples than those)
>
> As a user of an index, I would like to be able to search for
> e.g. "schaefer" and get matches on both 'ae' and 'ä' returned. Same if
> I searched on 'schäfer'.  Is this something I would need to take into
> account when I do the indexing or?
>
>
(Continue reading)

Per Jessen | 11 May 18:59 2010
Picon

Re: indexing words with alternative spellings

Michel Pelletier wrote:

> Different languages have different libraries for dealing with this
> issue.  We use one for Python called 'translitcodec' which can do both
> long (ä -> ae) and short (ä -> a) conversion.  It's very likely there
> is a similar library for whatever language you are using.
> 
> http://pypi.python.org/pypi/translitcodec/0.1

Thanks, interesting. (I'm using C/C++). 

/Per Jessen, Zürich

_______________________________________________
Xapian-discuss mailing list
Xapian-discuss <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss
Olly Betts | 13 May 04:06 2010

Re: indexing words with alternative spellings

On Tue, May 11, 2010 at 03:18:38PM +0200, Per Jessen wrote:
> Some languages (e.g. German and Danish) have special letters that are
> often written using two-letter combinations when the appropriate
> keyboard or medium is not available:

For German, you can use the "german2" stemmer which transliterates as
you describe.

There's also unac for more general accent normalisation:

http://www.nongnu.org/unac/

There's actually a version 1.8.0 not mentioned there (but Debian has it).
Not sure what's up, but the upstream page at http://www.senga.org/unac/ is no
longer there.

Cheers,
    Olly
Charlie Hull | 14 May 13:31 2010
Picon

Re: New stable release series - Xapian 1.2.0 released

On 29/04/2010 08:22, Olly Betts wrote:
> I've uploaded Xapian 1.2.0 (including Search::Xapian 1.2.0.0).  This is the
> first release in a new stable release series!
>
Hi all,

I've uploaded Visual C++ build files and prebuilt Windows binaries for 
1.2.0 to:

http://www.flax.co.uk/xapian_binaries

(please note this link has changed slightly - if you're linking to these 
binaries please update accordingly, the old link will work for a while).

There are some caveats with this release, there are full details in 
readme.txt, the summary is:

- xapian fails a single test of the replication system with Flint 
databases, so we don't recommend you use Flint & replication currently
- no java-swig bindings as these don't currently build on 1.2.0 or SVN trunk
- ruby bindings fail one test, we've included them anyway but be aware
- omega needs a simple patch before building

Cheers

Charlie
Emmanuel Engelhart | 14 May 19:29 2010

Re: New stable release series - Xapian 1.2.0 released


Le 14/05/2010 13:31, Charlie Hull a écrit :
> On 29/04/2010 08:22, Olly Betts wrote:
>> I've uploaded Xapian 1.2.0 (including Search::Xapian 1.2.0.0).  This
>> is the first release in a new stable release series!
>>
> Hi all,
> 
> I've uploaded Visual C++ build files and prebuilt Windows binaries for
> 1.2.0 to:

Thank you really much both for the work, I'm really happy to be able now
to use xapian 1.2.0 and the new chert backend in prod.

I have tested the win compilation scripts and they work by me for what I
need.

I have only detected a problem with the following files:
* win32_expand.mak
* win32_unicode.mak

It seems that the empty line after "# DO NOT DELETE THIS LINE -- xapdep
depends on it" disturbs the compilation process and without removing it
the compilation fails.

Regards
Emmanuel
Olly Betts | 16 May 08:22 2010

git mirror

While git-svn allows you to work against an SVN repo, it takes ages to clone
the full history for Xapian, so to make life easier for git users, I've set up
a git mirror of the Xapian SVN repo - details here:

http://xapian.org/bleeding#git

I'm certainly not a git wizard, so if anything seems wrong or weird, please
let me know (and thanks to Carl Worth for helping me get this far).

If there's interest in doing something similar for other VCSes, I can take a
look at doing so.

Cheers,
    Olly
Charlie Hull | 18 May 11:49 2010
Picon

Re: New stable release series - Xapian 1.2.0 released

On 14/05/2010 18:29, Emmanuel Engelhart wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Le 14/05/2010 13:31, Charlie Hull a écrit :
>> On 29/04/2010 08:22, Olly Betts wrote:
>>> I've uploaded Xapian 1.2.0 (including Search::Xapian 1.2.0.0).  This
>>> is the first release in a new stable release series!
>>>
>> Hi all,
>>
>> I've uploaded Visual C++ build files and prebuilt Windows binaries for
>> 1.2.0 to:
>
> Thank you really much both for the work, I'm really happy to be able now
> to use xapian 1.2.0 and the new chert backend in prod.
>
> I have tested the win compilation scripts and they work by me for what I
> need.
>
> I have only detected a problem with the following files:
> * win32_expand.mak
> * win32_unicode.mak
>
> It seems that the empty line after "# DO NOT DELETE THIS LINE -- xapdep
> depends on it" disturbs the compilation process and without removing it
> the compilation fails.

Thanks Emmanuel for the feedback - can you confirm which platform and 
compiler you're using?
(Continue reading)


Gmane