The state of filtered replication
Stefan du Fresne <stefan@...
2016-05-25 08:34:52 GMT
I work on an app that involves a large amount of CouchDB filtered replication (every user has a filtered
subset of the DB locally via PouchDB). Currently filtered replication is our number 1 performance
bottleneck for rolling out to more users, and I'm trying to work out where we can go from here.
Our current setup is one CouchDB database and N PouchDB installations, which all two-way replicate, with
the CouchDB->PouchDB replication being filtered based on user permissions / relevance .
Our issue is that as we add users a) total document creation velocity increases, and b) the proportion of
documents that are relevant to any particular user decreases. These two points cause replication-- both
initial onboarding and continual-- to take longer and longer.
At this stage we are being forced to manually limit the number of users we onboard at any particular time to
half a dozen or so, or risk CouchDB being unresponsive . As we'd want to be onboarding 50-100 at any
particular time due to how we're rolling pit, you can imagine that this is pretty painful.
I have already re-written the filter in Erlang, which halved its execution time, which is awesome!
I also attempted to simplify the filter to increase performance. However, filter speed seems more
dependent on the physical size of your filter as opposed to what code executes, which makes writing a
simple filter that can fall-back to a complicated filter not terribly useful (see:
If the above linked ticket is fixed (if it can be) this would make our filter 3-4x faster again. However, this
still wouldn't address the fundamental issue that filtered replication is very CPU-intensive, and so as
noted above doesn't seem to scale terribly well.
Ideally then, I would like to remove filter replication completely, but there does not seem to be a good
alternative right now.