Celery dies with DBPageNotFoundError - python

I have 3 machines with celery workers and rabbitmq as a broker, one worker is running with beat flag, all of this is managed by supervisor, and sometimes celery dies with such error.
This error appears only on beat worker, but when it appears, workers on all machines dies.
(celery==3.1.12, kombu==3.0.20)
[2014-07-05 08:37:04,297: INFO/MainProcess] Connected to amqp://user:**#192.168.15.106:5672//
[2014-07-05 08:37:04,311: ERROR/Beat] Process Beat
Traceback (most recent call last):
File "/var/projects/env/local/lib/python2.7/site-packages/billiard/process.py", line 292, in _bootstrap
self.run()
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 527, in run
self.service.start(embedded_process=True)
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 453, in start
humanize_seconds(self.scheduler.max_interval))
File "/var/projects/env/local/lib/python2.7/site-packages/kombu/utils/__init__.py", line 322, in __get__
value = obj.__dict__[self.__name__] = self.__get(obj)
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 491, in scheduler
return self.get_scheduler()
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 486, in get_scheduler
lazy=lazy)
File "/var/projects/env/local/lib/python2.7/site-packages/celery/utils/imports.py", line 53, in instantiate
return symbol_by_name(name)(*args, **kwargs)
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 357, in __init__
Scheduler.__init__(self, *args, **kwargs)
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 184, in __init__
self.setup_schedule()
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 376, in setup_schedule
self._store['entries']
File "/usr/lib/python2.7/shelve.py", line 121, in __getitem__
f = StringIO(self.dict[key])
File "/usr/lib/python2.7/bsddb/__init__.py", line 270, in __getitem__
return _DeadlockWrap(lambda: self.db[key]) # self.db[key]
File "/usr/lib/python2.7/bsddb/dbutils.py", line 68, in DeadlockWrap
return function(*_args, **_kwargs)
File "/usr/lib/python2.7/bsddb/__init__.py", line 270, in <lambda>
return _DeadlockWrap(lambda: self.db[key]) # self.db[key]
DBPageNotFoundError: (-30985, 'DB_PAGE_NOTFOUND: Requested page not found')

I've ran into this issue and the cause was a corrupted db file (usually named "celerybeat-schedule").
Solution would be to delete the existing db file and restart the process.
Relavent:bsddb.db.DBPageNotFoundError
https://mail.python.org/pipermail/python-list/2009-October/554552.html

I had to remove some temp files in the /tmp directory. One was named celeryd-<NAME_OF_WORKER>-state and also celeryd-<NAME_OF_WORKER>-state-renamed. After removing those and I was able to restart my affected worker.

Related

Celery TypeError: unhashable type: 'dict'

I'm trying to run celery, and can't run it because of the following exception:
[2023-02-14 11:25:11,689: CRITICAL/MainProcess] Unrecoverable error: TypeError("unhashable type: 'dict'")
Traceback (most recent call last):
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/celery/bootsteps.py", line 365, in start
return self.obj.start()
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/celery/worker/consumer/consumer.py", line 332, in start
blueprint.start(self)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/celery/worker/consumer/consumer.py", line 628, in start
c.loop(*c.loop_args())
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/celery/worker/loops.py", line 94, in asynloop
update_qos()
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/kombu/common.py", line 435, in update
return self.set(self.value)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/kombu/common.py", line 428, in set
self.callback(prefetch_count=new_value)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/celery/worker/consumer/tasks.py", line 43, in set_prefetch_count
return c.task_consumer.qos(
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/kombu/messaging.py", line 558, in qos
return self.channel.basic_qos(prefetch_size,
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/amqp/channel.py", line 1894, in basic_qos
return self.send_method(
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/amqp/abstract_channel.py", line 79, in send_method
return self.wait(wait, returns_tuple=returns_tuple)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/amqp/abstract_channel.py", line 99, in wait
self.connection.drain_events(timeout=timeout)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/amqp/connection.py", line 525, in drain_events
while not self.blocking_read(timeout):
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/amqp/connection.py", line 531, in blocking_read
return self.on_inbound_frame(frame)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/amqp/method_framing.py", line 77, in on_frame
callback(channel, msg.frame_method, msg.frame_args, msg)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/amqp/connection.py", line 537, in on_inbound_method
return self.channels[channel_id].dispatch_method(
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/amqp/abstract_channel.py", line 156, in dispatch_method
listener(*args)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/amqp/channel.py", line 1629, in _on_basic_deliver
fun(msg)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/kombu/messaging.py", line 626, in _receive_callback
return on_m(message) if on_m else self.receive(decoded, message)
File "/Users/shira/PycharmProjects/demo/venv/lib/python3.10/site-packages/celery/worker/consumer/consumer.py", line 591, in on_task_received
strategy = strategies[type_]
TypeError: unhashable type: 'dict'
I tried to uninstall celery, stop rabbitMQ process, and googled it and didn't find any solution.
I run a simple basic code of celery using only one function ("add", without any dictionary).
I think maybe there is some issues with the libraries I import.
I put a breakpoint where the exception is thrown.
I found out that I sent the illegal task a few days ago, so I understood I need to remove it from the queue so other tasks could be done.
I used this command:
celery -A tasks purge
This solved me the issue :)

Python Redis - RuntimeError pubsub connection not set

Version: redis-py=3.1.0 and redis=3.2.10
Platform: Python 2.7.5 / CentOS Linux release 7.4.1708 (Core)
Infrastructure:
two machines (worker1, worker2 ) for running celery worker services with default concurrency (=8).
one dedicated machine (redis1) for running redis server.
Issue:
After the workers running for some time, suddenly a worker running on machine1 dies due to a RuntimeError raised losing a connection to pubsub.
machine1.worker.log
[2019-02-01 13:43:39,477: CRITICAL/MainProcess] Unrecoverable error: RuntimeError(u'pubsub connection not set: did you forget to call subscribe() or psubscribe()?',)
Traceback (most recent call last):
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/worker.py", line 205, in start
self.blueprint.start(self)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/bootsteps.py", line 369, in start
return self.obj.start()
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 322, in start
blueprint.start(self)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 598, in start
c.loop(*c.loop_args())
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/loops.py", line 91, in asynloop
next(loop)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/asynchronous/hub.py", line 354, in create_loop
cb(*cbargs)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/redis.py", line 1047, in on_readable
self.cycle.on_readable(fileno)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/redis.py", line 344, in on_readable
chan.handlers[type]()
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/redis.py", line 674, in _receive
ret.append(self._receive_one(c))
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/redis.py", line 685, in _receive_one
response = c.parse_response()
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/redis/client.py", line 3032, in parse_response
'pubsub connection not set: '
RuntimeError: pubsub connection not set: did you forget to call subscribe() or psubscribe()?
While at the same time I have spotted that worker running on machine2 suffers due to not being able to connect to redis. Eventually, it managed to recover and reconnect to redis and receiving the queued tasks.
machine2.worker.log
[2019-02-01 14:43:41,722: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 322, in start
blueprint.start(self)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/consumer/mingle.py", line 40, in start
self.sync(c)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/consumer/mingle.py", line 44, in sync
replies = self.send_hello(c)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/consumer/mingle.py", line 57, in send_hello
replies = inspect.hello(c.hostname, our_revoked._data) or {}
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/app/control.py", line 143, in hello
return self._request('hello', from_node=from_node, revoked=revoked)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/app/control.py", line 95, in _request
timeout=self.timeout, reply=True,
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/app/control.py", line 454, in broadcast
limit, callback, channel=channel,
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/pidbox.py", line 315, in _broadcast
serializer=serializer)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/pidbox.py", line 290, in _publish
serializer=serializer,
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/messaging.py", line 181, in publish
exchange_name, declare,
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/messaging.py", line 203, in _publish
mandatory=mandatory, immediate=immediate,
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/virtual/base.py", line 605, in basic_publish
message, exchange, routing_key, **kwargs
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/virtual/exchange.py", line 151, in deliver
exchange, message, routing_key, **kwargs)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/redis.py", line 781, in _put_fanout
dumps(message),
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/redis/client.py", line 2716, in publish
return self.execute_command('PUBLISH', channel, message)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/redis/client.py", line 775, in execute_command
return self.parse_response(connection, command_name, **options)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/redis/client.py", line 789, in parse_response
response = connection.read_response()
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/redis/connection.py", line 636, in read_response
raise e
ConnectionError: Error while reading from socket: (u'Connection closed by server.',)
[2019-02-01 14:44:50,237: INFO/MainProcess] Received task: c1_cip_middleware.tasks.validate_purchases.run_purchases_validation[a411dd90-ab50-4101-becb-90adda3663a2]
* Questions *
I wonder what are the circumstances/scenarios when RuntimeError is raised, thus, the worker gets into the "unrecovery" stage and must be stopped?
I am in doubt what could be a root-cause of having this issue, especially that one worker managed to recover but the other one just died?

Celery not revoked tasks

I have a task on Celery 4.1. I want to revoke this task like
task.revoke()
all is nang up on this time, then I raise KeyboardInterrupt and have the same Traceback:
File "/usr/local/lib/python3.6/queue.py", line 164, in get
self.not_empty.wait()
File "/usr/local/lib/python3.6/threading.py", line 295, in wait
waiter.acquire()
After that, I can revoke tasks.
full traceback with gevent.
I'm also try to change broker to RabbitMQ and has the same error
Traceback (most recent call last):
File "/usr/src/app/backend/tests.py", line 779, in test_reactivate_contract
contract.activate_contract(paid=True, reactivate=True)
File "/usr/src/app/backend/models/contract.py", line 83, in activate_contract
revoke_task(self.celery_task)
File "/usr/src/app/backend/celery/tasks.py", line 128, in revoke_task
app.control.revoke(task_id)
File "/usr/local/lib/python3.6/site-packages/celery/app/control.py", line 210, in revoke
}, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/app/control.py", line 436, in broadcast
limit, callback, channel=channel,
File "/usr/local/lib/python3.6/site-packages/kombu/pidbox.py", line 315, in _broadcast
serializer=serializer)
File "/usr/local/lib/python3.6/site-packages/kombu/pidbox.py", line 285, in _publish
with self.producer_or_acquire(producer, chan) as producer:
File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.6/site-packages/kombu/pidbox.py", line 247, in producer_or_acquire
with self.producer_pool.acquire() as producer:
File "/usr/local/lib/python3.6/site-packages/kombu/resource.py", line 83, in acquire
R = self.prepare(R)
File "/usr/local/lib/python3.6/site-packages/kombu/pools.py", line 62, in prepare
p = p()
File "/usr/local/lib/python3.6/site-packages/kombu/utils/functional.py", line 203, in __call__
return self.evaluate()
File "/usr/local/lib/python3.6/site-packages/kombu/utils/functional.py", line 206, in evaluate
return self._fun(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/site-packages/kombu/pools.py", line 42, in create_producer
conn = self._acquire_connection()
File "/usr/local/lib/python3.6/site-packages/kombu/pools.py", line 39, in _acquire_connection
return self.connections.acquire(block=True)
File "/usr/local/lib/python3.6/site-packages/kombu/resource.py", line 78, in acquire
R = self._resource.get(block=block, timeout=timeout)
File "/usr/local/lib/python3.6/queue.py", line 164, in get
self.not_empty.wait()
File "/usr/local/lib/python3.6/threading.py", line 295, in wait
waiter.acquire()
File "/usr/local/lib/python3.6/site-packages/gevent/thread.py", line 84, in acquire
return BoundedSemaphore.acquire(self, blocking, timeout)
File "src/gevent/_semaphore.py", line 198, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4541)
File "src/gevent/_semaphore.py", line 226, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4367)
File "src/gevent/_semaphore.py", line 166, in gevent._semaphore.Semaphore._do_wait (src/gevent/gevent._semaphore.c:3562)
File "/usr/local/lib/python3.6/site-packages/gevent/hub.py", line 630, in switch
return RawGreenlet.switch(self)
gevent.hub.LoopExit: ('This operation would block forever', <Hub at 0x7fd1b0269210 epoll default pending=0 ref=0 fileno=4 resolver=<gevent.resolver_thread.Resolver at 0x7fd1a7843c50 pool=<ThreadPool at 0x7fd1a78432b0 0/1/10>> threadpool=<ThreadPool at 0x7fd1a78432b0 0/1/10>>)
I'm using Python 3.6, Celery 4.1, Django 1.11, and Redis.

Issue with Postgres DB and Celery - DB_PAGE_NOTFOUND: Requested page not found [duplicate]

I have 3 machines with celery workers and rabbitmq as a broker, one worker is running with beat flag, all of this is managed by supervisor, and sometimes celery dies with such error.
This error appears only on beat worker, but when it appears, workers on all machines dies.
(celery==3.1.12, kombu==3.0.20)
[2014-07-05 08:37:04,297: INFO/MainProcess] Connected to amqp://user:**#192.168.15.106:5672//
[2014-07-05 08:37:04,311: ERROR/Beat] Process Beat
Traceback (most recent call last):
File "/var/projects/env/local/lib/python2.7/site-packages/billiard/process.py", line 292, in _bootstrap
self.run()
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 527, in run
self.service.start(embedded_process=True)
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 453, in start
humanize_seconds(self.scheduler.max_interval))
File "/var/projects/env/local/lib/python2.7/site-packages/kombu/utils/__init__.py", line 322, in __get__
value = obj.__dict__[self.__name__] = self.__get(obj)
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 491, in scheduler
return self.get_scheduler()
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 486, in get_scheduler
lazy=lazy)
File "/var/projects/env/local/lib/python2.7/site-packages/celery/utils/imports.py", line 53, in instantiate
return symbol_by_name(name)(*args, **kwargs)
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 357, in __init__
Scheduler.__init__(self, *args, **kwargs)
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 184, in __init__
self.setup_schedule()
File "/var/projects/env/local/lib/python2.7/site-packages/celery/beat.py", line 376, in setup_schedule
self._store['entries']
File "/usr/lib/python2.7/shelve.py", line 121, in __getitem__
f = StringIO(self.dict[key])
File "/usr/lib/python2.7/bsddb/__init__.py", line 270, in __getitem__
return _DeadlockWrap(lambda: self.db[key]) # self.db[key]
File "/usr/lib/python2.7/bsddb/dbutils.py", line 68, in DeadlockWrap
return function(*_args, **_kwargs)
File "/usr/lib/python2.7/bsddb/__init__.py", line 270, in <lambda>
return _DeadlockWrap(lambda: self.db[key]) # self.db[key]
DBPageNotFoundError: (-30985, 'DB_PAGE_NOTFOUND: Requested page not found')
I've ran into this issue and the cause was a corrupted db file (usually named "celerybeat-schedule").
Solution would be to delete the existing db file and restart the process.
Relavent:bsddb.db.DBPageNotFoundError
https://mail.python.org/pipermail/python-list/2009-October/554552.html
I had to remove some temp files in the /tmp directory. One was named celeryd-<NAME_OF_WORKER>-state and also celeryd-<NAME_OF_WORKER>-state-renamed. After removing those and I was able to restart my affected worker.

UnpicklingError on celerybeat startup

I've been using supervisord to run celery for my django project for a while, but suddenly celerybeat won't start. It gives the following traceback:
Traceback (most recent call last):
File "[...]celery/apps/beat.py", line 112, in start_scheduler
beat.start()
File "[...]celery/beat.py", line 454, in start
humanize_seconds(self.scheduler.max_interval))
File "[...]kombu/utils/__init__.py", line 322, in __get__
value = obj.__dict__[self.__name__] = self.__get(obj)
File "[...]celery/beat.py", line 494, in scheduler
return self.get_scheduler()
File "[...]celery/beat.py", line 489, in get_scheduler
lazy=lazy)
File "[...]celery/utils/imports.py", line 53, in instantiate
return symbol_by_name(name)(*args, **kwargs)
File "[...]celery/beat.py", line 358, in __init__
Scheduler.__init__(self, *args, **kwargs)
File "[...]celery/beat.py", line 185, in __init__
self.setup_schedule()
File "[...]celery/beat.py", line 377, in setup_schedule
self._store['entries']
File "/usr/local/lib/python2.7/shelve.py", line 122, in __getitem__
value = Unpickler(f).load()
UnpicklingError: pickle data was truncated
Haven't been able to find anything on this.
I pieced it together.
The issue was caused by a corrupt celerybeat-schedule file. To locate the file enter:
find ~/ -name celerybeat-schedule -print
Then delete or rename the file:
mv [filename] [newfilename]
Then restart your processes.

Categories