Lance Norskog | 1 Nov 2010 03:12
Picon

Re: problem of solr replcation's speed

If you are copying from an indexer while you are indexing new content,
this would cause contention for the disk head. Does indexing slow down
during this period?

Lance

2010/10/31 Peter Karich <peathal <at> yahoo.de>:
>  we have an identical-sized index and it takes ~5minutes
>
>
>> It takes about one hour to replacate 6G index for solr in my env. But my
>> network can transfer file about 10-20M/s using scp. So solr's http
>> replcation is too slow, it's normal or I do something wrong?
>>
>
>

--

-- 
Lance Norskog
goksron <at> gmail.com

Lance Norskog | 1 Nov 2010 03:23
Picon

Re: Design and Usage Questions

2.
The SolrJ library handling of content streams is "pull", not "push".
That is, you give it a reader and it pulls content when it feels like
it. If your software to feed the connection wants to write the data,
you have to either buffer the whole thing or do a dual-thread
writer/reader pair.

The easiest way to pull stuff from SVN is to use one of the web server
apps. Solr takes a "stream.url" parameter. (Also stream.file.) Note
that there is no outbound authentication supported; your web server
has to be open (at least to the Solr instance).

On Sun, Oct 31, 2010 at 4:06 PM, getagrip <getagrip <at> web.de> wrote:
> Hi,
>
> I've got some basic usage / design questions.
>
> 1. The SolrJ wiki proposes to use the same CommonsHttpSolrServer
>   instance for all requests to avoid connection leaks.
>   So if I create a Singleton instance upon application-startup I can
>   securely use this instance for ALL queries/updates throughout my
>   application without running into performance issues?
>
> 2. My System's documents are stored in a Subversion repository.
>   For fast searchresults I want to periodically index new documents
>   from the repository.
>
>   What I get from the repository is a ByteArrayOutputStream. How can I
>   pass this Stream to Solr?
>
(Continue reading)

Lance Norskog | 1 Nov 2010 03:26
Picon

Re: Solr in virtual host as opposed to /lib

With virtual hosting you can give CPU & memory quotas to your
different VMs. This allows you to control the Nutch v.s. The World
problem. Unforch, you cannot allocate disk channel. With two i/o bound
apps, this is a problem.

On Sun, Oct 31, 2010 at 4:38 PM, Eric Martin <eric <at> makethembite.com> wrote:
> Excellent information. Thank you. Solr is acting just fine then. I can
> connect to it no issues, it indexes fine and there didn't seem to be any
> complication with it. Now I can rule it out and go about solving, what you
> pointed out, and I agree, to be a java/nutch issue.
>
> Nutch is a crawler I use to feed URL's into Solr for indexing. Nutch is open
> source and found on apache.org
>
> Thanks for your time.
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochkind <at> jhu.edu]
> Sent: Sunday, October 31, 2010 4:33 PM
> To: solr-user <at> lucene.apache.org
> Subject: RE: Solr in virtual host as opposed to /lib
>
> What servlet container are you putting your Solr in? Jetty? Tomcat?
> Something else?  Are you fronting it with apache on top of that? (I think
> maybe you are, otherwise I'm not sure how the phrase 'virtual host'
> applies).
>
> In general, Solr of course doesn't care what directory it's in on disk, so
> long as the process running solr has the neccesary read/write permissions to
> the neccesary directories (and if it doesn't, you'd usually find out right
(Continue reading)

Eric Martin | 1 Nov 2010 04:09

RE: Solr in virtual host as opposed to /lib

Oh. So I should take out the installations and move them to /<some_dir> as opposed to inside my virtual host
of /home/≤my solr & nutch is here>/www
'

-----Original Message-----
From: Lance Norskog [mailto:goksron <at> gmail.com] 
Sent: Sunday, October 31, 2010 7:26 PM
To: solr-user <at> lucene.apache.org
Subject: Re: Solr in virtual host as opposed to /lib

With virtual hosting you can give CPU & memory quotas to your
different VMs. This allows you to control the Nutch v.s. The World
problem. Unforch, you cannot allocate disk channel. With two i/o bound
apps, this is a problem.

On Sun, Oct 31, 2010 at 4:38 PM, Eric Martin <eric <at> makethembite.com> wrote:
> Excellent information. Thank you. Solr is acting just fine then. I can
> connect to it no issues, it indexes fine and there didn't seem to be any
> complication with it. Now I can rule it out and go about solving, what you
> pointed out, and I agree, to be a java/nutch issue.
>
> Nutch is a crawler I use to feed URL's into Solr for indexing. Nutch is open
> source and found on apache.org
>
> Thanks for your time.
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochkind <at> jhu.edu]
> Sent: Sunday, October 31, 2010 4:33 PM
> To: solr-user <at> lucene.apache.org
(Continue reading)

sivaprasad | 1 Nov 2010 03:40

Filtering results based on score


Hi,
As part of solr results i am able to get the max score.If i want to filter
the results based on the max score, let say the max score  is 10 And i need
only the results between max score  to 50 % of max score.This max score is
going to change dynamically.How can we implement this?Do we need to
customize the solr?Please any suggestions.

Regards,
JS
--

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Filtering-results-based-on-score-tp1819769p1819769.html
Sent from the Solr - User mailing list archive at Nabble.com.

sivaprasad | 1 Nov 2010 03:48

Solr Relevency Calculation


Hi,
I have 25 indexed fields in my document.But by default, if i give
"q=laptops" this is going to search on five fields and iam getting the score
as part of search results.How solr will calculate the score?Is it going to
calculate only on the five fields or on 25 fields which are indexed?What is
the order it is going to take to calculate score?Any documents related to
this topic is helpful for me.

Regards,
JS
--

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Relevency-Calculation-tp1819798p1819798.html
Sent from the Solr - User mailing list archive at Nabble.com.

sivaprasad | 1 Nov 2010 03:55

Boosting the score based on certain field


Hi,

In my document i have a filed called category.This contains
"electronics,games ,..etc".For some of the category values i need to boost
the document score.Let us say, for "electronics" category, i will decide the
boosting parameter grater than the "games" category.Is there any body has
the idea to achieve this functionality?

Regards,
Siva

--

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Boosting-the-score-based-on-certain-field-tp1819820p1819820.html
Sent from the Solr - User mailing list archive at Nabble.com.

Ahmet Arslan | 1 Nov 2010 09:59
Picon
Favicon
Gravatar

Re: Filtering results based on score

> As part of solr results i am able to get the max score.If i
> want to filter
> the results based on the max score, let say the max
> score  is 10 And i need
> only the results between max score  to 50 % of max
> score.This max score is
> going to change dynamically.How can we implement this?Do we
> need to
> customize the solr?Please any suggestions.

frange is advised in a similar discussion:
http://search-lucene.com/m/4AHNF17wIJW1/

Pawan Darira | 1 Nov 2010 10:00
Picon

Multiple Keyword Search

Hi

There is a situation where i search for more than 1 keyword & my main 2
fields are ad_title & ad_description.
I want those results which match all of the keywords in both fields, should
come on top. Then sequentially one by one keyword can be dropped in further
results.

E.g. In a search of 3 keywords, let there are 100 results. If 35 contain all
the keywords combined in ad_title & ad_description, then they should come
first. Then if 50 results contain combination of any 2 keywords, they should
come next. Finally results with single keyword should come at last

Please suggest

--

-- 
Thanks,
Pawan Darira
kafka0102 | 1 Nov 2010 10:30
Favicon

Re:Re: problem of solr replcation's speed

I hacked SnapPuller to log the cost, and the log is like thus:
[2010-11-01 17:21:19][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost 979
[2010-11-01 17:21:19][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost 4
[2010-11-01 17:21:19][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost 4
[2010-11-01 17:21:20][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost 980
[2010-11-01 17:21:20][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost 4
[2010-11-01 17:21:20][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost 5
[2010-11-01 17:21:21][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost 979

It's saying it cost about 1000ms for transfering 1M data every 2 times. I used jetty as server and embeded
solr in my app.I'm so confused.What I have done wrong?

At 2010-11-01 10:12:38,"Lance Norskog" <goksron <at> gmail.com> wrote:

>If you are copying from an indexer while you are indexing new content,
>this would cause contention for the disk head. Does indexing slow down
>during this period?
>
>Lance
>
>2010/10/31 Peter Karich <peathal <at> yahoo.de>:
>>  we have an identical-sized index and it takes ~5minutes
>>
>>
>>> It takes about one hour to replacate 6G index for solr in my env. But my
>>> network can transfer file about 10-20M/s using scp. So solr's http
>>> replcation is too slow, it's normal or I do something wrong?
>>>
>>
>>
(Continue reading)


Gmane