Celery could not start worker processes when using scrapy-Djangoitem - python

Question detail: https://github.com/celery/celery/issues/3598
I want to run a scrapy spider with celery, which contain Djangoitems.
this is my celery task:
# coding_task.py
import sys
from celery import Celery
from collector.collector.crawl_agent import crawl
app = Celery('coding.net', backend='redis', broker='redis://localhost:6379/0')
app.config_from_object('celery_config')
#app.task
def period_task():
crawl()
collector.collector.crawl_agent.crawl contains a scrapy crawler who uses djangoitem as item.
the item like:
import django
os.environ['DJANGO_SETTINGS_MODULE'] = 'RaPo3.settings'
django.setup()
from scrapy_djangoitem import DjangoItem
from xxx.models import Collection
class CodingItem(DjangoItem):
django_model = Collection
amount = scrapy.Field(default=0)
role = scrapy.Field()
type = scrapy.Field()
duration = scrapy.Field()
detail = scrapy.Field()
extra = scrapy.Field()
when run: celery -A coding_task worker --loglevel=info --concurrency=1, it wil get some errors below:
[2016-11-16 17:33:41,934: ERROR/Worker-1] Process Worker-1
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/billiard/process.py", line 292, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/site-packages/billiard/pool.py", line 292, in run
self.after_fork()
File "/usr/local/lib/python2.7/site-packages/billiard/pool.py", line 395, in after_fork
self.initializer(*self.initargs)
File "/usr/local/lib/python2.7/site-packages/celery/concurrency/prefork.py", line 80, in process_initializer
signals.worker_process_init.send(sender=None)
File "/usr/local/lib/python2.7/site-packages/celery/utils/dispatch/signal.py", line 151, in send
response = receiver(signal=self, sender=sender, **named)
File "/usr/local/lib/python2.7/site-packages/celery/fixups/django.py", line 152, in on_worker_process_init
self._close_database()
File "/usr/local/lib/python2.7/site-packages/celery/fixups/django.py", line 181, in _close_database
funs = [self._db.close_connection] # pre multidb
AttributeError: 'module' object has no attribute 'close_connection'
[2016-11-16 17:33:41,942: INFO/MainProcess] Connected to redis://localhost:6379/0
[2016-11-16 17:33:41,957: INFO/MainProcess] mingle: searching for neighbors
[2016-11-16 17:33:42,962: INFO/MainProcess] mingle: all alone
/usr/local/lib/python2.7/site-packages/celery/fixups/django.py:199: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
warnings.warn('Using settings.DEBUG leads to a memory leak, never '
[2016-11-16 17:33:42,968: WARNING/MainProcess] /usr/local/lib/python2.7/site-packages/celery/fixups/django.py:199: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
warnings.warn('Using settings.DEBUG leads to a memory leak, never '
[2016-11-16 17:33:42,968: WARNING/MainProcess] celery#MacBook-Pro.local ready.
[2016-11-16 17:33:42,969: ERROR/MainProcess] Process 'Worker-1' pid:2777 exited with 'exitcode 1'
[2016-11-16 17:33:42,991: ERROR/MainProcess] Unrecoverable error: WorkerLostError('Could not start worker processes',)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/celery/worker/__init__.py", line 208, in start
self.blueprint.start(self)
File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 127, in start
step.start(parent)
File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 378, in start
return self.obj.start()
File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 271, in start
blueprint.start(self)
File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 127, in start
step.start(parent)
File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 766, in start
c.loop(*c.loop_args())
File "/usr/local/lib/python2.7/site-packages/celery/worker/loops.py", line 50, in asynloop
raise WorkerLostError('Could not start worker processes')
WorkerLostError: Could not start worker processes
if i delete djangoitem in item:
from scrapy.item import Item
class CodingItem(item):
amount = scrapy.Field(default=0)
role = scrapy.Field()
type = scrapy.Field()
duration = scrapy.Field()
detail = scrapy.Field()
extra = scrapy.Field()
the task will play well and doesn't have any error.
What should i do if i want to use djangoitem in this celery-scrapy task?
Thanks!

You should check the ram usage. It might be possible celery is not getting enough ram

Upgrade Celery to 4.0 will solve the problem.
More detail: https://github.com/celery/celery/issues/3598

Related

Module not found when running scrapy in celery

EDIT: I found out it only happened on a Windows computer. Everything works fine on Linux server.
I am running a scrapy crawler in a celery process and keep getting this error. Any ideas what am I doing wrong?
[2021-08-18 11:28:42,294: INFO/MainProcess] Connected to sqla+sqlite:///celerydb.sqlite
[2021-08-18 11:28:42,313: INFO/MainProcess] celery#NP45086 ready.
[2021-08-18 09:46:58,330: INFO/MainProcess] Received task: app_celery.scraping_process_cli[e94dc192-e10e-4921-ad0c-bb932be9b568]
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Program Files\Python374\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Program Files\Python374\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
ModuleNotFoundError: No module named 'app_celery'
[2021-08-18 09:46:58,773: INFO/MainProcess] Task app_celery.scraping_process_cli[e94dc192-e10e-4921-ad0c-bb932be9b568] succeeded in 0.4380000000091968s: None
My app_celery looks like this:
app = Celery('app_celery', backend=..., broker=...)
def scrape_data():
process = CrawlerProcess(get_project_settings())
crawler = process.create_crawler(spider_cls)
process.crawl(spider_cls, **kwargs)
process.start()
#app.task(name='app_celery.scraping_process_cli', time_limit=1200, max_retries=3)
def scraping_process_cli(company_id):
import multiprocessing
a = multiprocessing.Process(target=scrape_data())
a.start()
a.join()
I am running the celery as:
celery -A app_celery worker -c 4 -n worker1 --pool threads
Before do step 2, change directory to the project root of your scrapy project.
You have to tell CrawlerProcess() to load settings.py.
import os
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
def scrape_data():
os.chdir("/path/to/your/scrapy/project_root")
process = CrawlerProcess(get_project_settings())
process.crawl('spider_name')
process.start()

Celery task hanging on Django app with Rabbitmq

I'm following the First Steps With Django for my app running on a docker container. I have rabbitmq setup on a separate docker container. Opening a python shell to run the task add just results in hanging/freezing without any reports/errors from celery interface. Below are the details.
Django version - 3.0.5
Celery - 5.0.2
amqp - 5.0.2
kombu - 5.0.2
Rabbitmq - 3.8.9
myapp/myapp/settings.py
CELERY_BROKER_URL = 'pyamqp://guest:guest#myhost.com//'
CELERY_RESULT_BACKEND = 'db+postgresql+psycopg2://postgres:111111#myhost.com/celery'
myapp/myapp/celery.py
import os
from celery import Celery
import logging
logger = logging.getLogger(__name__)
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myapp.settings')
app = Celery('myapp')
# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
logger.error('in celery')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print(f'Request: {self.request!r}')
myapp/myapp/init.py
from .celery import app as celery_app
import logging
logger = logging.getLogger(__name__)
logger.error('in init')
__all__ = ('celery_app',)
myapp/otherapp/tasks.py
from celery import shared_task
import logging
logger = logging.getLogger(__name__)
#shared_task
def add(x, y):
logger.error('add')
return x + y
Once I run celery -A demolists worker --loglevel=INFO in terminal, I get the following.
in celery
in init
/usr/lib/python3.8/site-packages/celery/platforms.py:797: RuntimeWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!
Please specify a different user using the --uid option.
User information: uid=0 euid=0 gid=0 egid=0
warnings.warn(RuntimeWarning(ROOT_DISCOURAGED.format(
-------------- celery#3c389cd683b2 v5.0.2 (singularity)
--- ***** -----
-- ******* ---- Linux-4.14.186-110.268.amzn1.x86_64-x86_64-with 2020-11-29 22:12:54
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: myapp:0x7f5a52f51c70
- ** ---------- .> transport: amqp://guest:**#myhost.com:5672//
- ** ---------- .> results: postgresql+psycopg2://postgres:**#myhost.com/celery
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. myapp.celery.debug_task
. otherapp.tasks.add
[2020-11-29 22:12:55,304: INFO/MainProcess] Connected to amqp://guest:**#myhost.com:5672//
[2020-11-29 22:12:55,337: INFO/MainProcess] mingle: searching for neighbors
[2020-11-29 22:12:56,388: INFO/MainProcess] mingle: all alone
[2020-11-29 22:12:56,412: WARNING/MainProcess] /usr/lib/python3.8/site-packages/celery/fixups/django.py:203: UserWarning: Using settings.DEBUG leads to a memory
leak, never use this setting in production environments!
warnings.warn('''Using settings.DEBUG leads to a memory
[2020-11-29 22:12:56,412: INFO/MainProcess] celery#3c389cd683b2 ready.
In a separate terminal tab, I go into the python shell and do the following.
>>> from otherapp.tasks import add
>>> result = add.delay(4, 5)
It hangs here without any change. Control-C produces this error however.
^CTraceback (most recent call last):
File "/usr/lib/python3.8/site-packages/kombu/utils/functional.py", line 32, in __call__
return self.__value__
AttributeError: 'ChannelPromise' object has no attribute '__value__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/amqp/transport.py", line 143, in _connect
entries = socket.getaddrinfo(
File "/usr/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name does not resolve
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/kombu/utils/functional.py", line 325, in retry_over_time
return fun(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/kombu/connection.py", line 866, in _connection_factory
self._connection = self._establish_connection()
File "/usr/lib/python3.8/site-packages/kombu/connection.py", line 801, in _establish_connection
conn = self.transport.establish_connection()
File "/usr/lib/python3.8/site-packages/kombu/transport/pyamqp.py", line 128, in establish_connection
conn.connect()
File "/usr/lib/python3.8/site-packages/amqp/connection.py", line 322, in connect
self.transport.connect()
File "/usr/lib/python3.8/site-packages/amqp/transport.py", line 84, in connect
self._connect(self.host, self.port, self.connect_timeout)
File "/usr/lib/python3.8/site-packages/amqp/transport.py", line 152, in _connect
raise (e
File "/usr/lib/python3.8/site-packages/amqp/transport.py", line 168, in _connect
self.sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/site-packages/celery/app/task.py", line 421, in delay
return self.apply_async(args, kwargs)
File "/usr/lib/python3.8/site-packages/celery/app/task.py", line 561, in apply_async
return app.send_task(
File "/usr/lib/python3.8/site-packages/celery/app/base.py", line 718, in send_task
amqp.send_task_message(P, name, message, **options)
File "/usr/lib/python3.8/site-packages/celery/app/amqp.py", line 523, in send_task_message
ret = producer.publish(
File "/usr/lib/python3.8/site-packages/kombu/messaging.py", line 175, in publish
return _publish(
File "/usr/lib/python3.8/site-packages/kombu/connection.py", line 525, in _ensured
return fun(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/kombu/messaging.py", line 184, in _publish
channel = self.channel
File "/usr/lib/python3.8/site-packages/kombu/messaging.py", line 206, in _get_channel
channel = self._channel = channel()
File "/usr/lib/python3.8/site-packages/kombu/utils/functional.py", line 34, in __call__
value = self.__value__ = self.__contract__()
File "/usr/lib/python3.8/site-packages/kombu/messaging.py", line 221, in <lambda>
channel = ChannelPromise(lambda: connection.default_channel)
File "/usr/lib/python3.8/site-packages/kombu/connection.py", line 884, in default_channel
self._ensure_connection(**conn_opts)
File "/usr/lib/python3.8/site-packages/kombu/connection.py", line 435, in _ensure_connection
return retry_over_time(
File "/usr/lib/python3.8/site-packages/kombu/utils/functional.py", line 339, in retry_over_time
sleep(1.0)
KeyboardInterrupt
I would appreciate any assistance, very confused by what is happening as on one hand, the celery interface suggests it is connected to rabbitmq, but the error messages in shell suggests there are some issues with that. Thank you in advanced

AttributeError when I'm running Celery worker

I'm trying to use celery for background process in my Django application. Django version is 1.4.8 and latest suitable celery version is 3.1.25.
I use Redis (3.1.0) as broker and backend, json as serializer.
When I start the worker
celery -A celery_app worker -l info I'm getting Attribute error 'unicode' object has no attribute 'iteritems'
My settings.py file:
BROKER_URL = 'redis://localhost'
CELERY_RESULT_BACKEND = 'redis://localhost/'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_STORE_ERRORS_EVEN_IF_IGNORED = True
celery_app.py:
import sys
from django.conf import settings
from celery import Celery
project_root = os.path.dirname(__file__)
sys.path.insert(0, os.path.join(project_root, '../env'))
sys.path.insert(0, os.path.join(project_root, '../'))
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'project.settings')
app = Celery('project')
app.config_from_object('project.settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS, force=True)
tasks.py:
#celery_app.task
def sample_task(x):
return 'Test response'
and that's how I run this task:
sample_task.delay({'key': 'test'})
And I get the following error:
File "/Users/user/project/venv/lib/python2.7/site-packages/redis/_compat.py", line 94, in iteritems
return x.iteritems()
AttributeError: 'unicode' object has no attribute 'iteritems'
full traceback:
[2019-01-31 16:43:08,909: ERROR/MainProcess] Unrecoverable error: AttributeError("'unicode' object has no attribute 'iteritems'",)
Traceback (most recent call last):
File "/Users/user/project/venv/lib/python2.7/site-packages/celery/worker/__init__.py", line 206, in start
self.blueprint.start(self)
File "/Users/user/project/venv/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start
step.start(parent)
File "/Users/user/project/venv/lib/python2.7/site-packages/celery/bootsteps.py", line 374, in start
return self.obj.start()
File "/Users/user/project/venv/lib/python2.7/site-packages/celery/worker/consumer.py", line 280, in start
blueprint.start(self)
File "/Users/user/project/venv/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start
step.start(parent)
File "/Users/user/project/venv/lib/python2.7/site-packages/celery/worker/consumer.py", line 884, in start
c.loop(*c.loop_args())
File "/Users/user/project/venv/lib/python2.7/site-packages/celery/worker/loops.py", line 76, in asynloop
next(loop)
File "/Users/user/project/venv/lib/python2.7/site-packages/kombu/async/hub.py", line 340, in create_loop
cb(*cbargs)
File "/Users/user/project/venv/lib/python2.7/site-packages/kombu/transport/redis.py", line 1019, in on_readable
self._callbacks[queue](message)
File "/Users/user/project/venv/lib/python2.7/site-packages/kombu/transport/virtual/__init__.py", line 534, in _callback
self.qos.append(message, message.delivery_tag)
File "/Users/user/project/venv/lib/python2.7/site-packages/kombu/transport/redis.py", line 146, in append
pipe.zadd(self.unacked_index_key, delivery_tag, time()) \
File "/Users/user/project/venv/lib/python2.7/site-packages/redis/client.py", line 2320, in zadd
for pair in iteritems(mapping):
File "/Users/user/project/venv/lib/python2.7/site-packages/redis/_compat.py", line 94, in iteritems
return x.iteritems()
AttributeError: 'unicode' object has no attribute 'iteritems'
I tried to find the issue on the internet, tried to pass another params to task. I don't know how to debug celery process and could not find the solution by myself. Please help me
Seems that this Celery version doesn't support Redis 3. Try to install Redis 2.10.6.

Django + Celery + Supervisord + Redis error when setting

I am going through the setting of the following components on CentOS server. I get supervisord task to get the web site up and running, but I am blocked on setting the supervisor for celery. It seems that it recognizes the tasks, but when I try to execute the tasks, it won't connect to them. My redis is up and running on port 6380
Django==1.10.3
amqp==1.4.9
billiard==3.3.0.23
celery==3.1.25
kombu==3.0.37
pytz==2016.10
my celeryd.ini
[program:celeryd]
command=/root/myproject/myprojectenv/bin/celery worker -A mb --loglevel=INFO
environment=PATH="/root/myproject/myprojectenv/bin/",VIRTUAL_ENV="/root/myproject/myprojectenv",PYTHONPATH="/root/myproject/myprojectenv/lib/python2.7:/root/myproject/myprojectenv/lib/python2.7/site-packages"
directory=/home/.../myapp/
user=nobody
numprocs=1
stdout_logfile=/home/.../myapp/log_celery/worker.log
sterr_logfile=/home/.../myapp/log_celery/worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 1200
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; Set Celery priority higher than default (999)
; so, if rabbitmq(redis) is supervised, it will start first.
priority=1000
The process starts and when I go to the project folder and do:
>python manage.py celery status
celery#ssd-1v: OK
1 node online.
When I open the log file of celery I see that the tasks are loaded.
[tasks]
. mb.tasks.add
. mb.tasks.update_search_index
. orders.tasks.order_created
my mb/tasks.py
from mb.celeryapp import app
import django
django.setup()
#app.task
def add(x, y):
print(x+y)
return x + y
my mb/celeryapp.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
from django.conf import settings
# set the default Django settings module for the 'celery' program.
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mb.settings")
app = Celery('mb', broker='redis://localhost:6380/', backend='redis://localhost:6380/')
app.conf.broker_url = 'redis://localhost:6380/0'
app.conf.result_backend = 'redis://localhost:6380/'
app.conf.timezone = 'Europe/Sofia'
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
my mb/settings.py:
...
WSGI_APPLICATION = 'mb.wsgi.application'
BROKER_URL = 'redis://localhost:6380/0'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'
...
when I run:
python manage.py shell
>>> from mb.tasks import add
>>> add.name
'mb.tasks.add'
>>> result=add.delay(1,1)
>>> result.ready()
False
>>> result.status
'PENDING'
And as mentioned earlier I do not see any change in the log anymore.
If I try to run from the command line:
/root/myproject/myprojectenv/bin/celery worker -A mb --loglevel=INFO
Running a worker with superuser privileges when the
worker accepts messages serialized with pickle is a very bad idea!
If you really want to continue then you have to set the C_FORCE_ROOT
environment variable (but please think about this before you do).
User information: uid=0 euid=0 gid=0 egid=0
But I suppose that's normal since I run it after with user nobody. Interesting thing is that the command just celery status (without python manage.py celery status) gives an error on connection, probably because it is looking for different port for redis, but the process of supervisord starts normally... and when I call 'celery worker -A mb' it says it's ok. Any ideas?
(myprojectenv) [root#ssd-1v]# celery status
Traceback (most recent call last):
File "/root/myproject/myprojectenv/bin/celery", line 11, in <module>
sys.exit(main())
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/__main__.py", line 3
0, in main
main()
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
81, in main
cmd.execute_from_commandline(argv)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
793, in execute_from_commandline
super(CeleryCommand, self).execute_from_commandline(argv)))
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/base.py", line 3
11, in execute_from_commandline
return self.handle_argv(self.prog_name, argv[1:])
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
785, in handle_argv
return self.execute(command, argv)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
717, in execute
).run_from_argv(self.prog_name, argv[1:], command=argv[0])
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/base.py", line 3
15, in run_from_argv
sys.argv if argv is None else argv, command)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/base.py", line 3
77, in handle_argv
return self(*args, **options)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/base.py", line 2
74, in __call__
ret = self.run(*args, **kwargs)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
473, in run
replies = I.run('ping', **kwargs)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
325, in run
return self.do_call_method(args, **kwargs)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
347, in do_call_method
return getattr(i, method)(*args)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/app/control.py", line 100, in ping
return self._request('ping')
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/app/control.py", line 71, in _request
timeout=self.timeout, reply=True,
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/app/control.py", line 316, in broadcast
limit, callback, channel=channel,
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/kombu/pidbox.py", line 283, in _broadcast
chan = channel or self.connection.default_channel
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/kombu/connection.py", line 771, in default_channel
self.connection
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/kombu/connection.py", line 756, in connection
self._connection = self._establish_connection()
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/kombu/connection.py", line 711, in _establish_connection
conn = self.transport.establish_connection()
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 116, in establish_connection
conn = self.Connection(**opts)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/amqp/connection.py", line 165, in __init__
self.transport = self.Transport(host, connect_timeout, ssl)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/amqp/connection.py", line 186, in Transport
return create_transport(host, connect_timeout, ssl)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/amqp/transport.py", line 299, in create_transport
return TCPTransport(host, connect_timeout)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/amqp/transport.py", line 95, in __init__
raise socket.error(last_err)
socket.error: [Errno 111] Connection refused
Any help will be highly appreciated.
UPDATE:
when I run
$:python manage.py shell
>>from mb.tasks import add
>>add
<#task: mb.tasks.add of mb:0x**2b3f6d0**>
the 0x2b3f6d0is different from what celery claims to be its memory space in its log, namely:
[config]
- ** ---------- .> app: mb:0x3495bd0
- ** ---------- .> transport: redis://localhost:6380/0
- ** ---------- .> results: disabled://
- *** --- * --- .> concurrency: 1 (prefork)
Ok, the answer in this case was that the gunicorn file was actually starting the project from the common python library, instead of the virtual env

Using multiprocessing pool from celery task raises exception

FOR THOSE READING THIS: I have decided to use RQ instead which doesn't fail when running code that uses the multiprocessing module. I suggest you use that.
I am trying to use a multiprocessing pool from within a celery task using Python 3 and redis as the broker (running it on a Mac). However, I don't seem to be able to even create a multiprocessing Pool object from within the Celery task! Instead, I get a strange exception that I really don't know what to do with.
Can anyone tell me how to accomplish this?
The task:
from celery import Celery
from multiprocessing.pool import Pool
app = Celery('tasks', backend='redis', broker='redis://localhost:6379/0')
#app.task
def test_pool():
with Pool() as pool:
# perform some task using the pool
pool.close()
return 'Done!'
which I add to Celery using:
celery -A tasks worker --loglevel=info
and then running it via the following python script:
import tasks
tasks.test_pool.delay()
that returns the following celery output:
[2015-01-12 15:08:57,571: INFO/MainProcess] Connected to redis://localhost:6379/0
[2015-01-12 15:08:57,583: INFO/MainProcess] mingle: searching for neighbors
[2015-01-12 15:08:58,588: INFO/MainProcess] mingle: all alone
[2015-01-12 15:08:58,598: WARNING/MainProcess] celery#Simons-MacBook-Pro.local ready.
[2015-01-12 15:09:02,425: INFO/MainProcess] Received task: tasks.test_pool[38cab553-3a01-4512-8f94-174743b05369]
[2015-01-12 15:09:02,436: ERROR/MainProcess] Task tasks.test_pool[38cab553-3a01-4512-8f94-174743b05369] raised unexpected: AttributeError("'Worker' object has no attribute '_config'",)
Traceback (most recent call last):
File "/usr/local/lib/python3.4/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.4/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/Users/simongray/Code/etilbudsavis/offer-sniffer/tasks.py", line 17, in test_pool
with Pool() as pool:
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 150, in __init__
self._setup_queues()
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 243, in _setup_queues
self._inqueue = self._ctx.SimpleQueue()
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/context.py", line 111, in SimpleQueue
return SimpleQueue(ctx=self.get_context())
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/queues.py", line 336, in __init__
self._rlock = ctx.Lock()
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/context.py", line 66, in Lock
return Lock(ctx=self.get_context())
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/synchronize.py", line 163, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/synchronize.py", line 59, in __init__
kind, value, maxvalue, self._make_name(),
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/synchronize.py", line 117, in _make_name
return '%s-%s' % (process.current_process()._config['semprefix'],
AttributeError: 'Worker' object has no attribute '_config'
This is a known issue with celery. It stems from an issue introduced in the billiard dependency. A work-around is to manually set the _config attribute for the current process. Thanks to user #martinth for the work-around below.
from celery.signals import worker_process_init
from multiprocessing import current_process
#worker_process_init.connect
def fix_multiprocessing(**kwargs):
try:
current_process()._config
except AttributeError:
current_process()._config = {'semprefix': '/mp'}
The worker_process_init hook will execute the code upon worker process initialization. We simply check to see if _config exists, and set it if it does not.
Via a useful comment in the Celery issue report linked to in Davy's comment, I was able to solve this by importing the billiard module's Pool class instead.
Replace
from multiprocessing import Pool
with
from billiard.pool import Pool
A quick solution is to use the thread-based "dummy" multiprocessing implementation. Change
from multiprocessing import Pool # or whatever you're using
to
from multiprocessing.dummy import Pool
However since this parallelism is thread-based, the usual caveats (GIL) apply.

Categories