As far as I can tell it crashes even before it uses swap, I
had one test where it didn’t use swap space at all. Were you able to find
any reason for the memory usage to shoot up like that?
Bryan
From:
ejabberd-bounces <at> jabber.ru [mailto:ejabberd-bounces <at> jabber.ru] On Behalf Of Staudinger, Ulrich
Sent: Monday, February 06, 2006
12:41 PM
To: ejabberd <at> jabber.ru
Subject: AW: [ejabberd] Jab_simul
crashing ejabberd service
That basically matches my experience.
Everything goes fine until the server
starts to swap. So,
, try to not bring it into swapping.
We have had 20k users with 1 message per 1
user in 1 minute -> 20k messages/minute on a 1GB machine for days. No
problems so far.
The restriction that i found is the
memory consumption - it goes linear up through the ceiling. With 1GB of
Ram, 22k is the max for our linux machine. .
Cheers,
Ulrich
Von:
ejabberd-bounces <at> jabber.ru [mailto:ejabberd-bounces <at> jabber.ru] Im Auftrag von Bryan Barnes
Gesendet: Montag, 6. Februar 2006
19:36
An: ejabberd <at> jabber.ru
Betreff: [ejabberd] Jab_simul
crashing ejabberd service
Hello,
I am running Ejabberd 1.0.0 with Erlang 10B-8. I am using Jab_simul to
benchmark the system, and am using
http://tkabber.jabber.ru/files/badlop/jab_simul.xml.chat60 as a baseline for my
testing. I have modified it for my server, and turned off rostering. I can run
the test successfully, and as I lower the message frequency my performance
degrades as I expected it to. However, I notice that during these tests my
server occasionally spikes in memory usage and begins using swap space, then
the jabber service refuses all further connections.
The test will run with no errors for about an hour with message
frequency of 500 ms, then in the space of 5 minutes the memory usage will spike
from 400MB to 2GB and the swap space usage will jump from 0MB to 1GB. Then all
current connections are dumped and all further connections are refused. Even if
I restart the ejabberd service I am unable to log in, and have to restart the
server before I can connect again.
I am running Gentoo 2.6.14-r5 with 2GB of memory and a 1GB NIC. I have
ejabberd starting as a service with the following command line:
ulimit -n 15000;/usr/local/bin/erl -pa /var/lib/ejabberd/ebin -sname
ejabberd -s ejabberd -env ERL_MAX_PORTS 5000 -env ERL_MAX_ETS_TABLES 20000
-ejabberd config \"/etc/ejabberd/ejabberd.cfg\" log-path
\"/var/log/ejabberd.log\" -sasl sasl_error_logger
\{file,\"var/log/ejabberd/sasl.log\"\} -mnesia dir
\"var/lib/ejabberd/spool\" +P 250000 +K true -detached
After troubleshooting:
No error messages appear on the ejabberd server,
I checked the ejabberd.log, sasl.log, and the server logs. All of the errors
appeared on the Jab_simul server.
I figured out that every time the server
crashed the ejabberd server, the jab_simul server had run out of disk space. I assumed
this was unrelated, but setup a cron job to delete the tmp log files. This did
make the error go away, and I was able to run a simulation for 70 hours this
weekend with no errors. I had 500 users with a message frequency of 100 ms
using 160MB of memory.
I was able to recreate the error even with
these settings by adding an additional 300 users, bringing me to 800 total. The
job runs for 5 minutes, then I start getting Kolejka za dluga, pakiet
anulowany! errors. Shortly after that I get POLLERR: Connection terminated and
POLLERR: Connection refused errors. All of these errors occur on the
Jab_simul server.
Checking my ejabberd server, my beam
service has become a zombie process, and is still running, but refuses all
connections. The memory usage on the server spiked and then came back down
after the beam service crashed.
If I dial the message frequency back to
500 ms with 800 users, it runs.
I don't understand the interaction between
ejabberd and jab_simul enough to understand why this is happening, but I am
concerned that the POLLERR errors are causing my ejabberd server to crash. I
was unable to find any mention of this problem, has anything like it occurred
before?
After even more
troubleshooting:
I have discovered that if I restart the
ejabberd service, then I am still unable to connect using a Jabber client, but
that if I restart the server I am able to connect again.
If I leave the Jab_simul simulation
running, it will reconnect to Jabber after a server restart, but not a service
restart. For all of the test data above I was using the default message
in the .xml file.
Any assistance with this would be greatly
appreciated. Thank you.
Bryan Barnes