Can't bring cassandra and celery together

Can't bring cassandra and celery together - python

I'm trying the the example to use celery and cassandra together:
http://datastax.github.io/python-driver/cqlengine/third_party.html
But without luck.
I get this exception the I'm starting the worker with:
$ celery -A tasks worker -l INFO
[2016-06-12 14:11:53,609: ERROR/Worker-1] Process Worker-1
Traceback (most recent call last):
File "/Users/lutz/work/truncated/truncated-worker/venv/lib/python3.5/site-packages/billiard/process.py", line 292, in _bootstrap
self.run()
File "/Users/lutz/work/truncated/truncated-worker/venv/lib/python3.5/site-packages/billiard/pool.py", line 292, in run
self.after_fork()
File "/Users/lutz/work/truncated/truncated-worker/venv/lib/python3.5/site-packages/billiard/pool.py", line 395, in after_fork
self.initializer(*self.initargs)
File "/Users/lutz/work/truncated/truncated-worker/venv/lib/python3.5/site-packages/celery/concurrency/prefork.py", line 84, in process_initializer
signals.worker_process_init.send(sender=None)
File "/Users/lutz/work/truncated/truncated-worker/venv/lib/python3.5/site-packages/celery/utils/dispatch/signal.py", line 166, in send
response = receiver(signal=self, sender=sender, **named)
TypeError: cassandra_init() got an unexpected keyword argument 'sender'
I'm Using osx el Capitan, python 3.5.1, Celery 3.1.23 and cassandra 3.5.
So any help will be welcome.

Your cassandra_init signal handler function needs to accept arbitrary keyword arguments.
Simply change the line:
def cassandra_init():
into:
def cassandra_init(**kwargs):
For more information about Celery signals, see the user guide at:
http://docs.celeryproject.org/en/latest/userguide/signals.html#basics
Note: It would be helpful if you also submitted some kind of report to the author of that tutorial. Celery signal handlers have always required the keyword arguments, so it's sad to have non-working examples out there.

Related

ModuleNotFoundError("'kafka' is not a valid name. Did you mean one of aiokafka, kafka?")

I am using Celery and Kafka to run some jobs in order to push data to Kafka. I also use Faust to connect the workers. But unfortunately, I got an error after running faust -A project.streams.app worker -l info in order to run the pipeline. I wonder if anyone can help me.
/home/admin/.local/lib/python3.6/site-packages/faust/fixups/django.py:71: UserWarning: Using settings.DEBUG leads to a memory leak, never
use this setting in production environments!
warnings.warn(WARN_DEBUG_ENABLED)
Command raised exception: ModuleNotFoundError("'kafka' is not a valid name. Did you mean one of aiokafka, kafka?",)
File "/home/admin/.local/lib/python3.6/site-packages/mode/worker.py", line 67, in exiting
yield
File "/home/admin/.local/lib/python3.6/site-packages/faust/cli/base.py", line 528, in _inner
cmd()
File "/home/admin/.local/lib/python3.6/site-packages/faust/cli/base.py", line 611, in __call__
self.run_using_worker(*args, **kwargs)
File "/home/admin/.local/lib/python3.6/site-packages/faust/cli/base.py", line 620, in run_using_worker
self.on_worker_created(worker)
File "/home/admin/.local/lib/python3.6/site-packages/faust/cli/worker.py", line 57, in on_worker_created
self.say(self.banner(worker))
File "/home/admin/.local/lib/python3.6/site-packages/faust/cli/worker.py", line 97, in banner
self._banner_data(worker))
File "/home/admin/.local/lib/python3.6/site-packages/faust/cli/worker.py", line 127, in _banner_data
(' transport', app.transport.driver_version),
File "/home/admin/.local/lib/python3.6/site-packages/faust/app/base.py", line 1831, in transport
self._transport = self._new_transport()
File "/home/admin/.local/lib/python3.6/site-packages/faust/app/base.py", line 1686, in _new_transport
return transport.by_url(self.conf.broker_consumer[0])(
File "/home/admin/.local/lib/python3.6/site-packages/mode/utils/imports.py", line 101, in by_url
return self.by_name(URL(url).scheme)
File "/home/admin/.local/lib/python3.6/site-packages/mode/utils/imports.py", line 115, in by_name
f'{name!r} is not a valid name. {alt}') from exc

I don't know what was wrong with Faust but I run pip install faust by chance and it solved the problem.

How to configure webhook in Pybossa

The Pybossa didn't describe how to configure webhook.
I met some issue when I'm configuring webhook, below is my steps:
fork pybossa webhook example
Run webhook with default settings(modified api_key and endpoint).
In Pybossa, modify the project and add webhook to point to webhook running URL.
Open a command line window and execute the following command:
# rqworker high
Then when a task is completed, I see there are logs in command line window. which is complaining the following I get the below error:
14:06:11 *** Listening on high...
14:07:42 high: pybossa.jobs.webhook(u'http://192.168.116.135:5001', {'project_short_name': u'tw', 'task_id': 172, 'fired_at': '2017-08-10 06:07:42', 'project_id': 17, 'result_id': 75, 'event': 'task_completed'}) (e435386c-615d-4525-a65d-f08f0afd2351)
14:07:44 UnboundLocalError: local variable 'project' referenced before assignment
Traceback (most recent call last):
File "/home/baib2/Desktop/pybossa_server/env/local/lib/python2.7/site-packages/rq/worker.py", line 479, in perform_job
rv = job.perform()
File "/home/baib2/Desktop/pybossa_server/env/local/lib/python2.7/site-packages/rq/job.py", line 466, in perform
self._result = self.func(*self.args, **self.kwargs)
File "./pybossa/jobs.py", line 525, in webhook
if project.published and webhook.response_status_code != 200 and current_app.config.get('ADMINS'):
UnboundLocalError: local variable 'project' referenced before assignment
I'm not sure if we should execute the following command
# rqworker high
But if this rqworker not running, I don't see any component picking up work from the redis queue.

You need to run a specific worker, not the default one from PYBOSSA. Just use https://github.com/Scifabric/pybossa/blob/master/app_context_rqworker.py to run it like this:
python app_context_rqworker.py high
This will set up the Flask context, and it will run properly ;-)
We're in the middle of improving our docs, so this should be better in the coming months.

using django celery beat locally I get error 'PeriodicTask' object has no attribute '_default_manager'

using django celery beat locally I get error 'PeriodicTask' object has no attribute '_default_manager'. I am using Django 1.10. When i schedule a task it works. But then a few moments later a red error traceback like the following occurs
[2016-09-23 11:08:34,962: INFO/Beat] Writing entries...
[2016-09-23 11:08:34,965: INFO/Beat] Writing entries...
[2016-09-23 11:08:34,965: INFO/Beat] Writing entries...
[2016-09-23 11:08:34,966: ERROR/Beat] Process Beat
Traceback (most recent call last):
File "/Users/ray/Desktop/myheroku/practice/lib/python3.5/site-packages/billiard/process.py", line 292, in _bootstrap
self.run()
File "/Users/ray/Desktop/myheroku/practice/lib/python3.5/site-packages/celery/beat.py", line 553, in run
self.service.start(embedded_process=True)
File "/Users/ray/Desktop/myheroku/practice/lib/python3.5/site-packages/celery/beat.py", line 486, in start
self.scheduler._do_sync()
File "/Users/ray/Desktop/myheroku/practice/lib/python3.5/site-packages/celery/beat.py", line 276, in _do_sync
self.sync()
File "/Users/ray/Desktop/myheroku/practice/lib/python3.5/site-packages/djcelery/schedulers.py", line 209, in sync
self.schedule[name].save()
File "/Users/ray/Desktop/myheroku/practice/lib/python3.5/site-packages/djcelery/schedulers.py", line 98, in save
obj = self.model._default_manager.get(pk=self.model.pk)
AttributeError: 'PeriodicTask' object has no attribute '_default_manager'
after this happens the next schedule wont run unless I "control+c" out of the terminal and start it again. I saw on git hub that this may be because I am using django 1.10. I have already git pushed this to my heroku server. How can I fix this issue? The git hub post said he fixed it by doing this
Model = type(self.model)
obj = Model._default_manager.get(pk=self.model.pk)
I was willing to try this but I don't know where to put this and I don't want to cause a bigger unforeseen issue that this could cause. What are my options? am I supposed to manually go inside my remote app and reset it after every time it runs? thats unfeasible and defeats the purpose of task automation.

I figured it out. At line 98 in schedulers.py it was
obj = self.model._default_manager.get(pk=self.model.pk)
so a line above it I added
Model = type(self.model)
and changed
obj = self.model._default_manager.get(pk=self.model.pk)
to
obj = Model._default_manager.get(pk=self.model.pk)
so completed it looks like this
98 Model = type(self.model)
99 obj = Model._default_manager.get(pk=self.model.pk)

How to debug intermittent errors from Django app served with gunicorn (possible race condition)?

I have a Django app being served with nginx+gunicorn with 3 gunicorn worker processes. Occasionally (maybe once every 100 requests or so) one of the worker processes gets into a state where it starts failing most (but not all) requests that it serves, and then it throws an exception when it tries to email me about it. The gunicorn error logs look like this:
[2015-04-29 10:41:39 +0000] [20833] [ERROR] Error handling request
Traceback (most recent call last):
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 130, in handle
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 171, in handle_request
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 206, in __call__
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 196, in get_response
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 226, in handle_uncaught_exception
File "/usr/lib/python2.7/logging/__init__.py", line 1178, in error
File "/usr/lib/python2.7/logging/__init__.py", line 1271, in _log
File "/usr/lib/python2.7/logging/__init__.py", line 1281, in handle
File "/usr/lib/python2.7/logging/__init__.py", line 1321, in callHandlers
File "/usr/lib/python2.7/logging/__init__.py", line 749, in handle
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/utils/log.py", line 122, in emit
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/utils/log.py", line 125, in connection
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/core/mail/__init__.py", line 29, in get_connection
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/utils/module_loading.py", line 26, in import_by_path
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/utils/module_loading.py", line 21, in import_by_path
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/utils/importlib.py", line 40, in import_module
ImproperlyConfigured: Error importing module django.core.mail.backends.smtp: "No module named smtp"
So some uncaught exception is happening and then Django is trying to email me about it. The fact that it can't import django.core.mail.backends.smtp doesn't make sense because django.core.mail.backends.smtp should definitely be on the worker process' Python path. I can import it just fine from a manage.py shell and I do get emails for other server errors (actual software bugs) so I know that works. It's like the the worker process' environment is corrupted somehow.
Once a worker process enters this state it has a really hard time recovering; almost every request it serves ends up failing in this same manner. If I restart gunicorn everything is good (until another worker process falls into this weird state again).
I don't notice any obvious patterns so I don't think this is being triggered by a bug in my app (the URLs error'ing out are different, etc). It seems like some sort of race condition.
Currently I'm using gunicorn's --max-requests option to mitigate this problem but I'd like to understand what's going on here. Is this a race condition? How can I debug this?

I suggest you use Sentry which gives a smart way of handling errors.
You can use it as a cloud based solution (getsentry) or you can install it on your own server (github).
Before I was using django core log mailer now I always use sentry.
I do not work at Sentry but their solution is pretty awesome !

We discovered one particular view that was pegging the CPU for a few seconds every time it was loaded that seemed to be triggering this issue. I still don't understand how slamming a gunicorn worker could result in a corrupted execution environment, but fixing the high-CPU view seems to have gotten rid of this issue.

Pika blocking_connection.py random timeout connecting to RabbitMQ

i have a rabbit mq running on machine
both client and rabbitMQ are running on the same network
rabbitMQ has many clients
i can ping client from rabbitMQ and back
longest latency measured between the machine is 12.1 ms
network details : Standard Switch network (network of virtual machines running on single physical machine - using vmware VC)
im getting random timeouts when initializing RPC connection
/usr/lib/python2.6/site-packages/pika-0.9.5-py2.6.egg/pika/adapters/blocking_connection.py
problem is that the timeout isn't consistent and happens from time to time.
when manually testing this issue and running the blocking_connection.py 1000 times from the same machine that it fails no timeout accrue.
this is the error i get when failing :
2013-04-23 08:24:23,396 runtest-trigger.24397 24397 DEBUG producer_rabbit initiate_rpc_connection Connecting to RabbitMQ RPC queue rpcqueue_java on host: auto-db1
2013-04-23 08:24:25,350 runtest-trigger.24397 24397 ERROR testrunner go Run 1354: cought exception: timed out
Traceback (most recent call last):
File "/testrunner.py", line 193, in go
self.set_runparams(jobid)
File "/testrunner.py", line 483, in set_runparams
self.runparams.producers_testrun = self.initialize_producers_testrun(self.runparams)
File "/basehandler.py", line 114, in initialize_producers_testrun
producer.set_testcase_checkout()
File "/baseproducer.py", line 73, in set_testcase_checkout
self.checkout_handler = pm_checkout.get_producer(self.testcasecheckout)
File "/producer_manager.py", line 101, in get_producer
producer = self.load_producer(plugin_dir, producer_name)
File "/producer_manager.py", line 20, in load_producer
producer = getattr(producer_module, 'Producer')(producer_name, self.runparams)
File "/producer_rabbit.py", line 13, in __init__
self.initiate_rpc_connection()
File "/producer_rabbit.py", line 67, in initiate_rpc_connection
self.connection = pika.BlockingConnection(pika.ConnectionParameters( host=self.conf.rpc_proxy))
File "/usr/lib/python2.6/site-packages/pika-0.9.5-py2.6.egg/pika/adapters/blocking_connection.py", line 32, in __init__
BaseConnection.__init__(self, parameters, None, reconnection_strategy)
File "/usr/lib/python2.6/site-packages/pika-0.9.5-py2.6.egg/pika/adapters/base_connection.py", line 50, in __init__
reconnection_strategy)
File "/usr/lib/python2.6/site-packages/pika-0.9.5-py2.6.egg/pika/connection.py", line 170, in __init__
self._connect()
File "/usr/lib/python2.6/site-packages/pika-0.9.5-py2.6.egg/pika/connection.py", line 228, in _connect
self.parameters.port or spec.PORT)
File "/usr/lib/python2.6/site-packages/pika-0.9.5-py2.6.egg/pika/adapters/blocking_connection.py", line 44, in _adapter_connect
self._handle_read()
File "/usr/lib/python2.6/site-packages/pika-0.9.5-py2.6.egg/pika/adapters/base_connection.py", line 151, in _handle_read
data = self.socket.recv(self._suggested_buffer_size)
timeout: timed out
please assist

I had a similar issue. If everything looks fine, then you most likely have some sort of miss configuration, e.g. bad binding. If miss configured, then you'll get a timeout because the script can't reach where it thinks it needs to go, so the error can be miss leading in this case.
For my problem, I specifically had issues with both my rabbitmq.config file and my bindings and had to use my python solution shown in: RabbitMQ creating queues and bindings from a command line over the command line example I showed. Once updated and configured properly, everything worked fine. Hopefully this gets you in the right direction.

Pika provides some time out issue when connecting different hosts.Solution is to pass a socket_timeout argument in connection parameter.Pika should upgrade to >=0.9.14
credentials = pika.PlainCredentials(RABBITMQ_USER, RABBITMQ_PASS)
connection = pika.BlockingConnection(pika.ConnectionParameters(
credentials=credentials,
host=RABBITMQ_HOST,
socket_timeout=300))
channel = connection.channel()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't bring cassandra and celery together - python

Related

ModuleNotFoundError("'kafka' is not a valid name. Did you mean one of aiokafka, kafka?")

How to configure webhook in Pybossa

using django celery beat locally I get error 'PeriodicTask' object has no attribute '_default_manager'

How to debug intermittent errors from Django app served with gunicorn (possible race condition)?

Pika blocking_connection.py random timeout connecting to RabbitMQ

Categories

Resources