Sean McCleary | 4 Jan 2011 20:51
Picon
Gravatar

Excessive memory use when using FLAG_PARTIAL?

Hi everyone,

Sorry if this is an easy one, but I've Googled and can't find anyone else
who's mentioned this same problem.

I'm using Xapian (tried both versions 1.0.17 and 1.2.4) with the PHP
bindings on Ubuntu 10.04 (Lucid) and Apache 2.2.14.  I'm using it for an
"auto-complete" in the search form on a web page.  But whenever I use
FLAG_PARTIAL on my search, the memory usage of the apache process quickly
balloons up to almost 100% of the available memory resources, and hangs
there in "Sending reply" status.

The execution of the PHP script finishes, but the apache process is stuck,
and consuming almost all the available memory.

I've found that when I remove the "FLAG_PARTIAL" flag from my query, this
problem does not happen.

Is this expected behavior?  The server this is running on has 512 MB of
memory.  My Xapian index is only 108 MB in size.

Any help would be greatly appreciated.

Thanks,
Sean
Charlie Hull | 6 Jan 2011 14:50

Re: Xapian 1.2.4 released

On 20/12/2010 12:40, Olly Betts wrote:
> I've uploaded Xapian 1.2.4 (including Search::Xapian 1.2.4.0).
>
Windows binaries and build files are available from the usual place:
http://www.flax.co.uk/xapian_binaries

If you're building from source, please read the README file first - 
there's a few patches you'll need to make.

Charlie
Olly Betts | 10 Jan 2011 02:19
Favicon
Gravatar

Re: unoconv 0.4 issues

On Wed, Dec 29, 2010 at 08:32:26PM +0530, xapian <at> catcons.co.uk wrote:
> Implementing unoconv 0.4 as a filter as per
> http://trac.xapian.org/ticket/324 has exposed two issues with unoconv.  The
> first was fixed and is dexscribed here for community benefit; the second is
> unresolved.

Thanks for the information.  I've added a link to your message from ticket
324.

I suggest you talk to the author of unoconv, Dag Wieers:

http://dag.wieers.com/personal/

Hopefully he can apply the patch for the first issue, and advise on the
second.  Since unoconv 0.4 was only released a few months ago, it looks
to be actively maintained.

Cheers,
    Olly
xapian | 10 Jan 2011 13:59
Picon

Re: unoconv 0.4 issues


> -----Original Message-----
> Date: Mon, 10 Jan 2011 01:19:57 +0000
> From: Olly Betts <olly <at> survex.com>
> Subject: Re: [Xapian-discuss] unoconv 0.4 issues
> To: xapian <at> catcons.co.uk
> Cc: 'Xapian Discuss' <xapian-discuss <at> lists.xapian.org>
> Message-ID: <20110110011957.GL19840 <at> survex.com>
> Content-Type: text/plain; charset=us-ascii
> 
> On Wed, Dec 29, 2010 at 08:32:26PM +0530, xapian <at> catcons.co.uk wrote:
> > Implementing unoconv 0.4 as a filter as per
> > http://trac.xapian.org/ticket/324 has exposed two issues 
> with unoconv.  The
> > first was fixed and is dexscribed here for community 
> benefit; the second is
> > unresolved.
> 
> Thanks for the information.  I've added a link to your 
> message from ticket
> 324.
> 
> I suggest you talk to the author of unoconv, Dag Wieers:
> 
> http://dag.wieers.com/personal/
> 
> Hopefully he can apply the patch for the first issue, and 
> advise on the
> second.  Since unoconv 0.4 was only released a few months 
> ago, it looks
(Continue reading)

Olly Betts | 11 Jan 2011 13:41
Favicon
Gravatar

Re: Excessive memory use when using FLAG_PARTIAL?

On Tue, Jan 04, 2011 at 11:51:16AM -0800, Sean McCleary wrote:
> I'm using Xapian (tried both versions 1.0.17 and 1.2.4) with the PHP
> bindings on Ubuntu 10.04 (Lucid) and Apache 2.2.14.  I'm using it for an
> "auto-complete" in the search form on a web page.  But whenever I use
> FLAG_PARTIAL on my search, the memory usage of the apache process quickly
> balloons up to almost 100% of the available memory resources, and hangs
> there in "Sending reply" status.
> 
> The execution of the PHP script finishes, but the apache process is stuck,
> and consuming almost all the available memory.
> 
> I've found that when I remove the "FLAG_PARTIAL" flag from my query, this
> problem does not happen.
> 
> Is this expected behavior?  The server this is running on has 512 MB of
> memory.  My Xapian index is only 108 MB in size.

FLAG_PARTIAL currently just expands the partial word at the end of the
query to all the possible completions, so if the partial word is short
this can generate a query with a lot of terms (particularly when the
partial word is just a single common character, such as 's' in English).

Each term in the query needs a certain amount of memory, regardless of
the size of the database on disk - judging by the figures in another
recent post to the list, this is something like 55KB currently, so if
the partial word expands to 10000 or more terms, the process size will
grow to more than the size of your physical memory.  My guess would be
that this is the cause of your problem.

The memory overhead per term could probably be reduced, but actually
(Continue reading)

Richard Boulton | 11 Jan 2011 13:45
Gravatar

Re: Excessive memory use when using FLAG_PARTIAL?

On 11 January 2011 12:41, Olly Betts <olly <at> survex.com> wrote:
> The memory overhead per term could probably be reduced, but actually
> it's probably not useful to expand such short partial terms - a search
> for all words starting with the same letter is just going to be too
> noisy to be useful, regardless of the resources it would need.  So
> my thought would be to add a minimum length for the partial words
> which will be expanded under FLAG_PARTIAL, and probably a way to
> specify this via the API.

Agreed - though perhaps setting a limit on the number of terms it
expands to would be more useful (ie, it can try to expand, and if it
finds more than N terms, it gives up and doesn't generate a query with
extra terms at all).

--

-- 
Celestial Navigation Limited, incorporated in England & Wales
(registration number 06978117), registered office address: 58
Kingsway, Duxford, Cambridgeshire, CB224QN, UK.

_______________________________________________
Xapian-discuss mailing list
Xapian-discuss <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss
Olly Betts | 11 Jan 2011 13:49
Favicon
Gravatar

Re: Printing all records when search is not present

On Tue, Dec 28, 2010 at 05:30:34PM -0500, Shripad Bodas wrote:
> I want to print all the records from the passed database when the
> search string is not present. I quickly skimmed through the API but I
> couldn't find a way to do it. Can someone show some pointers.

Just use Xapian::Query::MatchAll and AND_NOT.  So if q is the Query object
from parsing your search string:

Xapian::Query all_but_q(Xapian::Query::AND_NOT, Xapian::Query::MatchAll, q);

MatchAll is very efficient if there are no gaps in the docid numbering.
If that isn't the case, it's more efficient with chert (default backend
in Xapian 1.2) than with flint (default backend in 1.0).

Cheers,
    Olly
Olly Betts | 11 Jan 2011 13:59
Favicon
Gravatar

Re: Excessive memory use when using FLAG_PARTIAL?

On Tue, Jan 11, 2011 at 12:45:05PM +0000, Richard Boulton wrote:
> On 11 January 2011 12:41, Olly Betts <olly <at> survex.com> wrote:
> > The memory overhead per term could probably be reduced, but actually
> > it's probably not useful to expand such short partial terms - a search
> > for all words starting with the same letter is just going to be too
> > noisy to be useful, regardless of the resources it would need.  So
> > my thought would be to add a minimum length for the partial words
> > which will be expanded under FLAG_PARTIAL, and probably a way to
> > specify this via the API.
> 
> Agreed - though perhaps setting a limit on the number of terms it
> expands to would be more useful (ie, it can try to expand, and if it
> finds more than N terms, it gives up and doesn't generate a query with
> extra terms at all).

The problem there is you do significant work before deciding that you
aren't going to expand after all.  So both limits are probably useful.

Incidentally, there's already a ticket for the "term limit" feature for
FLAG_WILDCARD:

http://trac.xapian.org/ticket/350

Cheers,
    Olly
Adam Sjøgren | 11 Jan 2011 15:02
X-Face
Picon
Favicon
Gravatar

Re: Excessive memory use when using FLAG_PARTIAL?

On Tue, 11 Jan 2011 12:59:32 +0000, Olly wrote:

> On Tue, Jan 11, 2011 at 12:45:05PM +0000, Richard Boulton wrote:

>> Agreed - though perhaps setting a limit on the number of terms it
>> expands to would be more useful (ie, it can try to expand, and if it
>> finds more than N terms, it gives up and doesn't generate a query with
>> extra terms at all).

> The problem there is you do significant work before deciding that you
> aren't going to expand after all.  So both limits are probably useful.

> Incidentally, there's already a ticket for the "term limit" feature for
> FLAG_WILDCARD:

> http://trac.xapian.org/ticket/350

Speaking of which - is there anything I can do with the patch in ticket
350 to push it further along?

Ought the problem be solved in a different way, or?

  Best regards,

    Adam

--

-- 
 "Accept the mystery!"                                        Adam Sjøgren
                                                         asjo <at> koldfront.dk

(Continue reading)

Olly Betts | 11 Jan 2011 15:28
Favicon
Gravatar

Re: Excessive memory use when using FLAG_PARTIAL?

On Tue, Jan 11, 2011 at 03:02:30PM +0100, Adam Sjøgren wrote:
> > http://trac.xapian.org/ticket/350
> 
> Speaking of which - is there anything I can do with the patch in ticket
> 350 to push it further along?
> 
> Ought the problem be solved in a different way, or?

I think the only issue is that we may want to push the expansion of the
wildcard out of the QueryParser and into the query optimiser or the
matcher itself, and then we'd have to expand twice to throw the
exception from QueryParser.

I'm not sure there is a better way to address that than just documenting
that the exception may be thrown later in future releases though.  I've
set the milestone to 1.2.5 to make sure we consider it for the next
release.

Cheers,
    Olly

Gmane