Chris Hostetter | 1 Sep 2008 02:05

Re: distributed search mechanism


: This is typically the kind of description I need, but I wonder if the 
: one cited above is still valid (since it was apparently written quite a 
: time before final commit).
: Assuming it is, what's then the difference between the STEPS mentioned 
" and the STAGES later introduced (STAGE_START, STAGE_PARSE_QUERY, etc...) ?

I don't have any answers to your questions, but i would like to suggest 
that whoever does have hte answers (hem: Yonik?) update 
"DistributedSearchDesign" to reflect reality (i think it was orriginally 
just some brainstorming notes) and link to it from "DistributedSearch" ...

http://wiki.apache.org/solr/DistributedSearch
http://wiki.apache.org/solr/DistributedSearchDesign

-Hoss

sanraj25 | 1 Sep 2008 08:23
Picon

Index partioning


Hi 
      I read the doument on  http://wiki.apache.org/solr/IndexPartitioning   
Now i want partition  my solr index into two. Based on that document I
changed solrconfig.xml. But I  can't visible  any partitioned folder other
than default one. I need help on index partitioning.give some suggestion to
this.
Thanks in advance

-Santhanaraj

--

-- 
View this message in context: http://www.nabble.com/Index-partioning-tp19249441p19249441.html
Sent from the Solr - User mailing list archive at Nabble.com.

Erik Hatcher | 1 Sep 2008 10:12
Favicon

Re: Index partioning

That wiki page is purely an idea proposal at this time, not a feature  
of Solr (yet or perhaps ever).

	Erik

On Sep 1, 2008, at 2:23 AM, sanraj25 wrote:

>
> Hi
>      I read the doument on  http://wiki.apache.org/solr/IndexPartitioning
> Now i want partition  my solr index into two. Based on that document I
> changed solrconfig.xml. But I  can't visible  any partitioned folder  
> other
> than default one. I need help on index partitioning.give some  
> suggestion to
> this.
> Thanks in advance
>
> -Santhanaraj
>
> -- 
> View this message in context: http://www.nabble.com/Index-partioning-tp19249441p19249441.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Tobias Hill | 1 Sep 2008 11:16
Picon

Conditional caching

Hi all,

Is there any way to suppress that a certain query gets added to the
caches (or is allowed to affect cache statistics) in Solr?

*Reason:* We have a very search oriented website. The SEO-aspects
of the site is also important why almost the entire search-space is
traversable for indexing bots (googlebot for instance). These bots
are a substantial part of the traffic on the site*. Needless to say, the
usage pattern of a bot is very different from a human being ... and
in short the bots are filling the caches with "corner-data" from the
search-space. As a consequence human initiated searches suffer
a lot and are far from *as cached as they could be*.

I have no problem with serving a bot a cached page, the only problem
is that the bots are allowed to be part of the cache-statistics.

Is there any way to easily suppress this?

Best regards,
Tobias

*) Actually this is not rare, see "Release It!: Design and Deploy
   Production-Ready Software"-book for more details on this reality.
Shalin Shekhar Mangar | 1 Sep 2008 13:47
Picon
Gravatar

Re: Conditional caching

If you are serving cached queries to the bot, what would be the benefit of
suppressing those queries from figuring into the cache statistics page?

On Mon, Sep 1, 2008 at 2:46 PM, Tobias Hill <tobias.hill <at> gmail.com> wrote:

> Hi all,
>
> Is there any way to suppress that a certain query gets added to the
> caches (or is allowed to affect cache statistics) in Solr?
>
> *Reason:* We have a very search oriented website. The SEO-aspects
> of the site is also important why almost the entire search-space is
> traversable for indexing bots (googlebot for instance). These bots
> are a substantial part of the traffic on the site*. Needless to say, the
> usage pattern of a bot is very different from a human being ... and
> in short the bots are filling the caches with "corner-data" from the
> search-space. As a consequence human initiated searches suffer
> a lot and are far from *as cached as they could be*.
>
> I have no problem with serving a bot a cached page, the only problem
> is that the bots are allowed to be part of the cache-statistics.
>
> Is there any way to easily suppress this?
>
> Best regards,
> Tobias
>
>
> *) Actually this is not rare, see "Release It!: Design and Deploy
>   Production-Ready Software"-book for more details on this reality.
(Continue reading)

Tobias Hill | 1 Sep 2008 16:14
Picon

Re: Conditional caching

Maybe I was a bit unclear, let me try with other words.

I didn't have the statistic-page in mind. All I care about is that I don't
want a massive amount of bot-generated queries affect the internal
statistics of the caches in Solr. If caching would be possible to switch
off for bot-queries the cache would reflect the human search pattern
much better. This in turn increases the cache hit-rate enormously
for the clients that we do care most about (i.e. humans).

Think about it: Say that you have 10-20 queries per second coming from
bots exploring the corners of your data (because that is what they do best)
...
wouldn't you consider it a problem that this result (which is highly
unlikely
to get another hit during it's lifetime) gets cached pushing out other
(possibly
human-generated) items from the cache in a LRU-fashion?

Most other cache solutions I've worked with offer ways to handle things like

this by providing silent ways (statistically-wise) to get the data from the
cache.

For instance, we are using EHCache for another part of our application like
this:

  Result result =
     search.isCacheUpdateAllowed() ? cache.get(search) : cache.*getQuietly*
(search);

(Continue reading)

Ryan McKinley | 1 Sep 2008 16:32
Picon

Re: Conditional caching

I get what you are trying to do....  yes, googlebot essentially fills  
up the cache with edge cases.

There is nothing in solr to prevent using the cache for some queries  
and not others -- given the way parts of solr works, it is a bad idea  
to turn off caching completly (a Document my be retrieved a few times  
within a single request)

One idea (i don't know if it is a good one) -- If you are in an a load  
balanced environment, you could send all the bot based requests to a  
single machine or set of machines while normal requests use the whole  
cluster.  This would keep most of the machines with common 'user'  
requests.

ryan

On Sep 1, 2008, at 4:14 PM, Tobias Hill wrote:

> Maybe I was a bit unclear, let me try with other words.
>
> I didn't have the statistic-page in mind. All I care about is that I  
> don't
> want a massive amount of bot-generated queries affect the internal
> statistics of the caches in Solr. If caching would be possible to  
> switch
> off for bot-queries the cache would reflect the human search pattern
> much better. This in turn increases the cache hit-rate enormously
> for the clients that we do care most about (i.e. humans).
>
> Think about it: Say that you have 10-20 queries per second coming from
(Continue reading)

Shalin Shekhar Mangar | 1 Sep 2008 16:34
Picon
Gravatar

Re: Conditional caching

Apart from hacking the internals, there's nothing inside Solr which will let
you do that. EHCache is for application layer caches, Solr is an external
server so it can't know about your application. I think that over a period
of time, the caches will be back to normal (through user-generated requests)
and it shouldn't be a big problem.

How slow are your user queries becoming? Will it help if you limit all bot
queries to certain fixed number of Solr instances?

On Mon, Sep 1, 2008 at 7:44 PM, Tobias Hill <tobias.hill <at> gmail.com> wrote:

> Maybe I was a bit unclear, let me try with other words.
>
> I didn't have the statistic-page in mind. All I care about is that I don't
> want a massive amount of bot-generated queries affect the internal
> statistics of the caches in Solr. If caching would be possible to switch
> off for bot-queries the cache would reflect the human search pattern
> much better. This in turn increases the cache hit-rate enormously
> for the clients that we do care most about (i.e. humans).
>
> Think about it: Say that you have 10-20 queries per second coming from
> bots exploring the corners of your data (because that is what they do best)
> ...
> wouldn't you consider it a problem that this result (which is highly
> unlikely
> to get another hit during it's lifetime) gets cached pushing out other
> (possibly
> human-generated) items from the cache in a LRU-fashion?
>
> Most other cache solutions I've worked with offer ways to handle things
(Continue reading)

Walter Underwood | 1 Sep 2008 18:07
X-Face
Picon

Re: Conditional caching

How many documents do you have in your index? How many unique
queries per day, bot and human? What are your cache hit ratios?

Maybe you can increase the size of the caches and not worry about
it. Search engine position is important. Have marketing pay for
the extra memory (I'm not kidding).

Sending all the bot queries to a separate machine is also
a reasonable approach. Heck, bill that machine to marketing!

wunder

On 9/1/08 7:34 AM, "Shalin Shekhar Mangar" <shalinmangar <at> gmail.com> wrote:

> Apart from hacking the internals, there's nothing inside Solr which will let
> you do that. EHCache is for application layer caches, Solr is an external
> server so it can't know about your application. I think that over a period
> of time, the caches will be back to normal (through user-generated requests)
> and it shouldn't be a big problem.
> 
> How slow are your user queries becoming? Will it help if you limit all bot
> queries to certain fixed number of Solr instances?
> 
> On Mon, Sep 1, 2008 at 7:44 PM, Tobias Hill <tobias.hill <at> gmail.com> wrote:
> 
>> Maybe I was a bit unclear, let me try with other words.
>> 
>> I didn't have the statistic-page in mind. All I care about is that I don't
>> want a massive amount of bot-generated queries affect the internal
>> statistics of the caches in Solr. If caching would be possible to switch
(Continue reading)

Erik Holstad | 1 Sep 2008 19:15
Picon

Problems connecting to service!

Hi!
I'm trying to use solrj to work with solr and tried the example on
http://e-mats.org/tag/solrj/
but I get this error message:

Sep 1, 2008 9:49:09 AM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Sep 1, 2008 9:49:09 AM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Sep 1, 2008 9:49:09 AM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Sep 1, 2008 9:49:09 AM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Sep 1, 2008 9:49:09 AM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Sep 1, 2008 9:49:09 AM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
org.apache.solr.client.solrj.SolrServerException: Error executing query
    at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:96)
    at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:109)
(Continue reading)


Gmane