bq: Why do you say: "at 10M documents there's rarely a need
to shard at all?"
Because I routinely see 50M docs on a
single node and I've seen over 300M docs
on a single node
with sub-second responses. So if you're saying that
performance at 1M docs then I suspect there's something
wrong with your
setup. Too little memory, very
bad query patterns, whatever. If my
then sharding will just mask the underlying
You need to quantify your performance concerns.
It's one thing to say
"my node satisfies 50
queries-per-second with 500ms response time" and
say "My queries take 5,000 ms".
In the first case, you do
indeed need to add more servers to increase QPS if
500 QPS. And adding more slaves is the best way to do
In the second, you need to understand the slowdown
will be a band-aid.
Wed, Sep 2, 2015 at 8:19 AM, scott chu <scott.chu <at> udngroup.com
> Do you mean
I only have to put 10M documents in one index and copy it
> many slaves in a classic Solr master-slave
architecture to provide querying
> serivce on internet,
and it won't have obvious downgrade of query
performance? But I did have add 1M document into one index on
> provide 2 slaves to serve querying
service on internet, the query
> performance is kinda sad.
Why do you say: "at 10M documents there's rarely a
to shard at all?" Do I provide too few slaves? What amount of
> is suitable for a need for shard in
> ----- Original Message
> From: Erick Erickson
> Date: 2015-09-02, 23:00:29
Re: concept and choice: custom sharding or auto
> Frankly, at 10M documents there's
rarely a need to shard at all.
> Why do you think you need
to? This seems like adding
> complexity for no good
reason. Sharding should only really
> be used when you
have too many documents to fit on a single
> shard as it
adds some overhead, restricts some
(cross-core join for instance, a couple of
options don't work in distributed mode etc.).
You can still run SolrCloud and have it manage multiple
_replicas_ of a single shard for HA/DR.
> So this
seems like an XY problem, you're asking specific
questions about shard routing because you think it'll
solve some problem without telling us what the problem
> On Wed,
Sep 2, 2015 at 7:47 AM, scott chu <scott.chu <at> udngroup.com
>> I post a question on
However, since this is a mail-list, I repost the question below
>> for suggestion and more subtle concept of
SolrCloud's behavior on document
>> I want to establish a SolrCloud clsuter for
over 10 millions of news
>> articles. After reading
this article in Apache Solr Refernce guide: Shards
and Indexing Data in SolrCloud, I have a plan as
>> Add prefix ED2001! to document ID where ED
means some newspaper source and
>> 2001 is the year
part in published date of news article, i.e. I want to
>> all news articles of specific news paper source
published in specific year
>> to a shard.
Create collection with router.name set to
>> Add documents?
>> Practically, I got some
>> How to add doucments based on this plan?
Do I have to specify special
>> parameters when
updating the collection/core?
>> Is this called "custom
sharding"? If not, what is "custom sharding"?
auto sharding a better choice for my case since there's
>> shard-splitting feature for auto sharding when the
shard is too big?
>> Can I query without _router_
>> EDIT <at> 2015/9/2:
>> This is how
I think SolrCloud will do: "The amount of news articles
>> specific newspaper source of specific year tends
to be around a fix number,
>> e.g. Every year ED has
around 80,000 articles, so each shard's size won't
increase dramatically. For the next year's news articles of ED,
I only have
>> to add prefix 'ED2016!' to document ID,
SolrCloud will create a new shard
>> for me (which
contains all ED2016 articles), and later the Leader
>> spread the replica of this new shard to other
nodes (per replica per node
>> other than leader?)". Am
I right? If yes, it seems no need for
> 已透過 AVG 檢查 - www.avg.com
2015.0.6086 / 病毒庫: 4409/10562 - 發佈日期:
AVG 檢查 - www.avg.com
版本: 2015.0.6086 / 病毒庫: 4409/10562 -