1 Jun 2011 01:16
Re: Semi-frequent lockup / "crash" in random nodes in ejabberd cluster
Evgeniy Khramtsov <xramtsov <at> gmail.com>
2011-05-31 23:16:07 GMT
2011-05-31 23:16:07 GMT
01.06.2011 01:18, Armando Di Cianno wrote: > I'm having an odd case of freezing / "crashing" on seemingly random > nodes in my 10 machine ejabberd cluster. > > Symptoms: > > * Seemingly with no periodicity, one of the nodes in the cluster will > freeze (the erlang process inside the beam.smp, not the VM) > * It doesn't crash (ergo the earlier "crash" scare-quotes), so > there's no good erl crash dump file to look at > * The OS beam.smp process is still running, so there are some crash > logs coming from our monitoring agent as it tries to restart ejabberd > and *that* process crashes, since a node is already using that name > * The few times I've been right at my workstation, and able to log in > and manually check what's going on, `ejabberdctl status` fails to run > manually / connect to the ejabberd process > > Notes > * 10 machines?! Yeah ... this is running on a managed VM service, > where we control everything about the guest VMs, but nothing about the > host machines. Suffice to say, our web services don't seem to exhibit > related issues, and I believe I have nearly exhausted all routes to > put blame on the fact that we're using VMs (although, frankly, I'm > still suspect). > * The machines seem to be over-provisioned for RAM, running ~4GiB > each -- our stats aggregator shows that ejabbered rarely takes up > >> 1.8GiB of RAM per node >> > * Average user count: ~4k(Continue reading)
RSS Feed