I am seeing tasks seemingly "disappear" in celery, running with 2 nodes. It seems to happen randomly. The task gets created like this:
task = perform_advance.apply_async(...)
logger.info('Task created, id: {}'.format(task.task_id))
When this works, I will see something like:
[2016-04-21 01:13:02,470: INFO/Worker-8] foo.tasks.some_task[e52615da-de7a-49de-88d6-b3ca43a3383f]: Task created, id: eaaeb427-a167-4a78-ba39-4803e20cc753
[2016-04-29 21:18:40,667: DEBUG/MainProcess] Task accepted: foo.tasks.some_task[eaaeb427-a167-4a78-ba39-4803e20cc753] pid:1104
But when it fails, I never see the task being accepted, only it being created. There are no errors in the logs.
celery version: 3.1.23
rabbitmq version: 3.3.3
It seems that the tasks are being lost while they are being sent to the message broker (RabbitMQ in this case). There are a few possible reasons why this might be happening:
The message broker is down or experiencing connectivity issues. This could cause the task messages to be lost or not delivered.
The message broker is overloaded or experiencing high latencies. This could cause tasks to be delayed or lost.
There is an issue with the task serialization. If the task arguments or return values cannot be serialized or deserialized correctly, the task may not be delivered or processed properly.
There is a problem with the task routing. If the task routing key is not set correctly, the task may not be delivered to the correct queue.
To troubleshoot this issue, you can try the following:
Check the status of the message broker to ensure it is running and accessible.
Monitor the message broker performance to see if it is experiencing any issues.
Check the task serialization settings to ensure that the task arguments and return values can be serialized and deserialized correctly.
Check the task routing settings to ensure that the tasks are being delivered to the correct queues.
Enable Celery debugging to get more information about the tasks that are being created and processed.
Consider using a different message broker or task queue system to see if the issue persists.