I am writing a Python application with data warehousing aspects, and was inspired by Roger Barnes' talk and slides (https://www.youtube.com/watch?v=AhIoAMltzVw) to try and figure out if I can use Celery to organize my application.
One of the difficulties that I am encountering is that I'm not sure how to organize and schedule my tasks given that:
1) Many of them rely on *slow* external databases which I am scraping in batch mode; it is slow and ineffective to run multiple simultaneous instances of these tasks.
2) Many of them should run periodically, but others are highly contingent on the outcome of previous tasks.
To deal with (1), I'm using a locking strategy similar to what is discussed here (http://stackoverflow.com/questions/20894771/celery-beat-limit-to-single-task-instance-at-a-time): I use an advisory lock on my Postgresql DB as a context manager around tasks which
def slow_task(self, *args):
# do something
For (2), I thought the simplest solution would be to use the chain signature to order multiple tasks, however I've found that it's not obvious how to "break the chain" and stop processing if an earlier task doesn't give any useful results. For example, how can I skip the later tasks in this chain if the earlier ones find no data that will be useful as input to the later ones?
ch = celery.chain( slow_external_DB_task_1.s() ,
Are there any examples of publicly-available data warehousing code structured around Celery which I could follow to find some possible design patterns for my application?
You received this message because you are subscribed to the Google Groups "celery-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to