Shawn Heisey | 1 Jul 2010 01:44

Disk usage per-field

Is it possible for Solr (or Luke/Lucene) to tell me exactly how much of 
the total index disk space is used by each field?  It would also be very 
nice to know, for each field, how much is used by the index and how much 
is used for stored data.

Jason Chaffee | 1 Jul 2010 01:47
Favicon

RE: REST calls

Using Accept headers is a pretty standard practice and so are conditional GETs.

Quite easy to test with curl:

 curl  -X GET -H	 "Accept:application/xml" http://solr.com/search

curl  -X GET -H	 "Accept:application/json" http://solr.com/search

Jason

-----Original Message-----
From: Don Werve [mailto:donw <at> madwombat.com] 
Sent: Tuesday, June 29, 2010 9:40 PM
To: solr-user <at> lucene.apache.org
Subject: Re: REST calls

2010/6/27 Jason Chaffee <jchaffee <at> ebates.com>

> The solr docs say it is RESTful, yet it seems that it doesn't use http
> headers in a RESTful way.  For example, it doesn't seem to use the Accept:
> request header to determine the media-type to be returned.  Instead, it
> requires a query parameter to be used in the URL.  Also, it doesn't seem to
> use return 304 Not Modified if the request header "if-modified-since" is
> used.
>

The summary:

Solr is restful, and does a very good job of it.

(Continue reading)

Jason Chaffee | 1 Jul 2010 01:52
Favicon

RE: REST calls

In that case, being able to use Accept headers and conditional GET's
would make them more powerful and easier to use.  The Accept header
could be used, if present, otherwise use the query parameter.  Or, vice
versa.  Also, conditional GET's are a big win when you know the data and
results are not changing often.

Jason

-----Original Message-----
From: yseeley <at> gmail.com [mailto:yseeley <at> gmail.com] On Behalf Of Yonik
Seeley
Sent: Wednesday, June 30, 2010 7:12 AM
To: solr-user <at> lucene.apache.org
Subject: Re: REST calls

Solr's APIs are described as "REST-like", and probably do qualify as
"restful" the way the term is commonly used.

I'm personally much more interested in making our APIs more powerful
and easier to use, regardless of any REST purity tests.

-Yonik
http://www.lucidimagination.com

Jason Chaffee | 1 Jul 2010 01:58
Favicon

RE: REST calls

Two more jaxrs solutions:

http://www.jboss.org/resteasy

http://cxf.apache.org/docs/jax-rs.html

However, I am not suggesting changing the core implementation.  Just want to make it more powerful by
utilizing headers.  I can accept the other issues that have been mentioned as not RESTful.  

Also, I do plan to make patches for the issues I mentioned.  I just wanted to know if I was missing anything or
someone else already had contributed an extension.

Jason

-----Original Message-----
From: Ryan McKinley [mailto:ryantxu <at> gmail.com] 
Sent: Wednesday, June 30, 2010 3:07 PM
To: solr-user <at> lucene.apache.org
Subject: Re: REST calls

If there is a real desire/need to make things "restful" in the
official sense, it is worth looking at using a REST framework as the
controller rather then the current solution.  perhaps:

http://www.restlet.org/
https://jersey.dev.java.net/

These would be cool since they encapsulate lots of the request
plumbing work that it would be better if we could leverage more widely
used approaches then support our own.
(Continue reading)

Peter Spam | 1 Jul 2010 03:21
Picon

Re: Very basic questions: Faceted front-end?

Ah, I found this:

	https://issues.apache.org/jira/browse/SOLR-634

... aka "solr-ui".  Is there anything else along these lines?  Thanks!

-Peter

On Jun 30, 2010, at 3:59 PM, Peter Spam wrote:

> Wow, thanks Lance - it's really fast now!
> 
> The last piece of the puzzle is setting up a nice front-end.  Are there any pre-built front-ends available,
that mimic Google (for example), with facets?
> 
> 
> -Peter
> 
> On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote:
> 
>> To highlight a field, Solr needs some extra Lucene values. If these
>> are not configured for the field in the schema, Solr has to re-analyze
>> the field to highlight it. If you want faster highlighting, you have
>> to add term vectors to the schema. Here is the grand map of such
>> things:
>> 
>> http://wiki.apache.org/solr/FieldOptionsByUseCase
>> 
>> On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson <erickerickson <at> gmail.com> wrote:
>>> What are you actual highlighting requirements? you could try
(Continue reading)

Yonik Seeley | 1 Jul 2010 03:32

Re: OOM on uninvert field request

On Wed, Jun 30, 2010 at 6:19 PM, Robert Petersen <robertpe <at> buy.com> wrote:
> Most of these hundreds of facet fields have tens of values but a couple have thousands, is thousands of
different values too many to do enum or is that still ok?  If so I could apply it carte blanche to the whole field...

enum can still handle thousands, but often slower (and remember to
increase the size of your filterCache which will now see greater
usage).

I would do facet.method=enum for the default and then override that
for those few fields with thousands of unique terms via
f.123_contentAttributeToken.facet.method=fc

-Yonik
http://www.lucidimagination.com

> -----Original Message-----
> From: yseeley <at> gmail.com [mailto:yseeley <at> gmail.com] On Behalf Of Yonik Seeley
> Sent: Wednesday, June 30, 2010 1:38 PM
> To: solr-user <at> lucene.apache.org
> Subject: Re: OOM on uninvert field request
>
> On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen <robertpe <at> buy.com> wrote:
>> Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs,
running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export
JAVA_OPTS="-Xmx4096m -Xms4096m") for both min and max which is doing pretty well but occasionally still
getting the below OOM errors.  We're running on dual quad core xeons with 16GB memory installed.  I've
been getting the below OOM exceptions still though.
>>
>> Is the memsize mentioned in the INFO for the uninvert in bytes? is memSize=29604020 mean 29MB?
>
(Continue reading)

Koji Sekiguchi | 1 Jul 2010 04:15
Picon

Re: Wiki Documentation of facet.sort

(10/07/01 1:12), Chantal Ackermann wrote:
> Hi there,
>
> in the wiki, on http://wiki.apache.org/solr/SimpleFacetParameters
> it says:
>
> """
> The default is true/count if facet.limit is greater than 0, false/index
> otherwise.
> """
>
> I've just migrated to 1.4.1 (reindexed). I can't remember how it was
> with 1.4.0.
>
> When I specify my facet query with facet.mincount=0 (explicitely) or
> without mincount (default is 0), the resulting facets are sorted by
> count, nevertheless. Changing mincount from 0 to 1 and back actually
> makes not difference in the sorting.
> I'm fine with a constant default behaviour (always sorting by count,
> e.g., no matter what parameters are given).
> If this is intended - shall I change the wiki accordingly?
>
> Cheers,
> Chantal
>    
Chantal,

Wiki says "facet.limit" but you are changing "facet.mincount"?
:)

(Continue reading)

Erik Hatcher | 1 Jul 2010 04:50
Picon

Re: REST calls

Solr has 304 support with the last-modified and etag headers.

	Erik

On Jun 30, 2010, at 7:52 PM, Jason Chaffee wrote:

> In that case, being able to use Accept headers and conditional GET's
> would make them more powerful and easier to use.  The Accept header
> could be used, if present, otherwise use the query parameter.  Or,  
> vice
> versa.  Also, conditional GET's are a big win when you know the data  
> and
> results are not changing often.
>
> Jason
>
> -----Original Message-----
> From: yseeley <at> gmail.com [mailto:yseeley <at> gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Wednesday, June 30, 2010 7:12 AM
> To: solr-user <at> lucene.apache.org
> Subject: Re: REST calls
>
> Solr's APIs are described as "REST-like", and probably do qualify as
> "restful" the way the term is commonly used.
>
> I'm personally much more interested in making our APIs more powerful
> and easier to use, regardless of any REST purity tests.
>
> -Yonik
(Continue reading)

Ravi Kiran | 1 Jul 2010 06:57
Picon

Dilemma - Very Frequent Synonym updates for Huge Index

Hello,
        Hoping some solr guru can help me out here. We are a news
organization trying to migrate 10 million documents from FAST to solr. The
plan is to have our Editorial team add/modify synonyms multiple times during
a day as they deem appropriate. Hence we plan on using query time synonyms
as we cannot reindex every time they modify the synonyms file(for the
entities extracted by OpenNLP like locations/organizations/person names from
article body) . Since the synonyms are for names Iam concerned that the
multi-phrase issue crops up with the query-time synonyms. for example
synonyms could be as follows

The Washington Post Co., The Washington Post, Washington Post, The Post,
TWP, WAPO
DHS,D.H.S,D.H.S.,Department of Homeland Security,Homeland Security
USCIS, United States Citizenship and Immigration Services, U.S.C.I.S.

Barack Obama,Barack H. Obama,Barack Hussein Obama,President Obama
Hillary Clinton,Hillary R. Clinton,Hillary Rodham Clinton,Secretary
Clinton,Sen. Clinton
William J. Clinton,William Jefferson Clinton,President Clinton,President
Bill Clinton

Virginia, Va., VA
D.C,Washington D.C, District of Columbia

I have the following fieldType in schema.xml for the keywords/entites...What
issues should I be aware off ? And is there a better way to achieve it
without having to reindex a million docs on each synonym change. NOTE that I
use tokenizerFactory="solr.KeywordTokenizerFactory" for the
SynonymFilterFactory to keep the words intact without splitting
(Continue reading)

Rakhi Khatwani | 1 Jul 2010 08:35
Picon

Re: Unbuffered Exception while setting permissions

Hi Lance,
               Thankyou so much. It worked with pre-emptive authentication

On Thu, Jul 1, 2010 at 2:15 AM, Lance Norskog <goksron <at> gmail.com> wrote:

> Other problems with this error have been solved by doing pre-emptive
> authentication.
>
> On Wed, Jun 30, 2010 at 4:26 AM, Rakhi Khatwani <rkhatwani <at> gmail.com>
> wrote:
> > This error usually occurs when i do a server.add(inpDoc).
> >
> > Behind the logs:
> >
> > 192.168.0.106 - - [30/Jun/2010:11:30:38 +0000] "GET
> > /solr/GPTWPI/update?qt=%2Fupdate&optimize=true&wt=javabin&version=1
> > HTTP/1.1" 200 41
> >
> > 192.168.0.106 - - [30/Jun/2010:11:30:38 +0000] "GET
> > /solr/GPTWPI/select?q=aid%3A30234&wt=javabin&version=1 HTTP/1.1" 401 1389
> >
> > 192.168.0.106 - admin [30/Jun/2010:11:30:38 +0000] "GET
> > /solr/GPTWPI/select?q=aid%3A30234&wt=javabin&version=1 HTTP/1.1" 200 70
> >
> > 192.168.0.106 - - [30/Jun/2010:11:30:38 +0000] "POST
> > /solr/GPTWPI/update?wt=javabin&version=1 HTTP/1.1" 200 41 (Works when i
> > comment out the auth-constraint for RW)
> >
> >                                        AND
> >
(Continue reading)


Gmane