I'm trying to use Celery+RabbitMQ to run analysis on Amazon AWS Spot Instances.
The issue with Spot Instances, is that the entire instance can be shutdown instantaneously and without notice (akin to someone pulling the power cord of a server).
For my appication, such shutdown events are fine, the each analysis chunk is idempotent, and can be restarted without a problem.
What I'm trying to achieve is to setup Celery + RabbitMQ in such a way that a killed/lost task is properly detected and restarted.
It's even acceptable if the task is marked as failed due to server error, and I'll add application-level logic to re-submit the task.
The problem is, I'm not able to setup Celery or RabbitMQ to properly detect such event.
I've found only one thread with similar topic:
But wasn't able to even get a FAIL status.
I've tested it with the following "task" script:
from celery import Celery
from time import sleep
app = Celery('tasks', backend='amqp', broker='amqp://guest-iquZ65Jdg7V54TAoqtyWWQ@public.gmane.org//')
BROKER_HEARTBEAT = 10,
CELERY_ACKS_LATE = True,
CELERYD_PREFETCH_MULTIPLIER = 1,
CELERY_TRACK_STARTED = True,
def add(x, y):
return x + y
And started it inside a virtual machine (emulating an AWS Spot instance) using:
celery worker -A tasks -l INFO -Q ec2
Then submitted a task and monitored it like so (from the host):
from tasks import add
from time import sleep,strftime
a = add.apply_async( (1,2), queue='ec2');
print ("%s: %s = %s" % ( strftime("%H:%M:%S"), a.id, a.state ) )
Starting the "submit" script, I see:
18:35:57: eab3656b-2b48-4408-a8f6-1b4f45bd379f = STARTED
18:35:58: eab3656b-2b48-4408-a8f6-1b4f45bd379f = STARTED
18:35:59: eab3656b-2b48-4408-a8f6-1b4f45bd379f = STARTED
The celery log in the virtual machine shows that the task is started.
I then kill the virtual machine running the task (simulating AWS Spot instance termination without any proper shutdown).
But the "monitor" script still shows that state as "STARTED" - the disappearance of the celery server was not detected at all (even after waiting several minutes).
Am I missing a configuration option?
Any suggestions appreciated,
You received this message because you are subscribed to the Google Groups "celery-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to