rqscheduler docker is stopping with timemismatch error - python

I have created 3 dockers in same network
redis queue
rq scheduler
Python based docker
Error is coming when redis is trying to schedule the task on scheduler.
docker ps output
b18b7d21894f redis "docker-entrypoint.s…" 27 minutes ago Up 27 minutes 6379/tcp test_redis_1
140a7c31b87d python "python3" 13 hours ago Up 13 hours pyRed5
55dc5bcd3f57 anarchy/rq-scheduler "rqscheduler --host …" 27 minutes ago Exited (1) 13 minutes ago boring_bohr
I am trying to schedule the periodic task.
File iss.py
from rq_scheduler import Scheduler
from redis import Redis
from datetime import datetime, timedelta,timezone
import pytz
import mail
scheduler = Scheduler(connection=Redis("test_redis_1"))
def get_next_pass():
x= datetime.now() + timedelta(minutes = 1)
return x.replace(tzinfo=timezone.utc)
#.strftime("%Y-%m-%dT%H:%M:%SZ")
def send_text_message(time):
mail.mail()
scheduler.enqueue_at(time+100, iss.send_text_message,time+100)
File scheduler.py
from datetime import datetime
from redis import Redis
from rq_scheduler import Scheduler
import iss
scheduler = Scheduler(connection=Redis("test_redis_1")) # Get a scheduler for the "default" queue
next_pass = iss.get_next_pass()
if next_pass:
print(next_pass)
next_pass
print("reached here")
scheduler.enqueue_at(next_pass, iss.send_text_message,next_pass)
I am calling schduler.py from python docker. Task is going to rq but it is getting failed at rq scheduler with the below error
root#healthbot-build-vm1:~/redis# docker logs 55dc5bcd3f57
19:09:55 Running RQ scheduler...
19:09:55 Checking for scheduled jobs...
19:10:55 Checking for scheduled jobs...
19:11:55 Checking for scheduled jobs...
19:12:55 Checking for scheduled jobs...
19:13:55 Checking for scheduled jobs...
19:14:55 Checking for scheduled jobs...
19:15:56 Checking for scheduled jobs...
19:16:56 Checking for scheduled jobs...
19:17:56 Checking for scheduled jobs...
19:18:56 Checking for scheduled jobs...
19:19:56 Checking for scheduled jobs...
19:20:56 Checking for scheduled jobs...
19:21:56 Checking for scheduled jobs...
19:22:56 Checking for scheduled jobs...
19:23:56 Checking for scheduled jobs...
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/rq/utils.py", line 164, in utcparse
return datetime.datetime.strptime(string, '%Y-%m-%dT%H:%M:%SZ')
File "/usr/local/lib/python3.5/_strptime.py", line 510, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "/usr/local/lib/python3.5/_strptime.py", line 343, in _strptime
(data_string, format))
ValueError: time data '2021-01-14T19:22:07.242474Z' does not match format '%Y-%m-%dT%H:%M:%SZ'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/rqscheduler", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.5/site-packages/rq_scheduler/scripts/rqscheduler.py", line 53, in main
scheduler.run(burst=args.burst)
File "/usr/local/lib/python3.5/site-packages/rq_scheduler/scheduler.py", line 340, in run
self.enqueue_jobs()
File "/usr/local/lib/python3.5/site-packages/rq_scheduler/scheduler.py", line 322, in enqueue_jobs
jobs = self.get_jobs_to_queue()
File "/usr/local/lib/python3.5/site-packages/rq_scheduler/scheduler.py", line 271, in get_jobs_to_queue
return self.get_jobs(to_unix(datetime.utcnow()), with_times=with_times)
File "/usr/local/lib/python3.5/site-packages/rq_scheduler/scheduler.py", line 254, in get_jobs
job = Job.fetch(job_id, connection=self.connection)
File "/usr/local/lib/python3.5/site-packages/rq/job.py", line 294, in fetch
job.refresh()
File "/usr/local/lib/python3.5/site-packages/rq/job.py", line 410, in refresh
self.created_at = to_date(as_text(obj.get('created_at')))
File "/usr/local/lib/python3.5/site-packages/rq/job.py", line 403, in to_date
return utcparse(as_text(date_str))
File "/usr/local/lib/python3.5/site-packages/rq/utils.py", line 167, in utcparse
return datetime.datetime.strptime(string, '%Y-%m-%dT%H:%M:%S.%f+00:00')
File "/usr/local/lib/python3.5/_strptime.py", line 510, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "/usr/local/lib/python3.5/_strptime.py", line 343, in _strptime
(data_string, format))
ValueError: time data '2021-01-14T19:22:07.242474Z' does not match format '%Y-%m-%dT%H:%M:%S.%f+00:00'

Related

Airflow triggers Sagemaker job in test mode

I've an Airflow(v1.10.12) dag that triggers a Sagemaker Processor job as part of one of it's tasks. I've written few tests(pytest 6.2.2) to check the basic sanity of the dag.
It seems like just fetching the dag by id from DagBag triggers a Sagemaker job. i.e When I do pytest test_file_name.py, a job is triggered which isn't ideal.
from airflow.models import DagBag
class TestSagemakerDAG:
#classmethod
def setup(cls):
cls.dagbag = DagBag()
cls.dag = DagBag().get_dag(dag_id='sagemaker-processor')
def test_dag_loaded(self):
"""
To verify if dags are loaded onto the dagabag
:return:
"""
assert self.dagbag.import_errors == {}
assert self.dag is not None
assert len(self.dag.tasks) == 2
For more clarity this is how the Sagemaker Processor job(sagemaker 2.24.1) definition looks like
def initiate_sage_maker_job(self, session):
return Processor(
image_uri=self.ecr_uri,
role=self.iam_role,
instance_count=self.instance_count,
instance_type=self.instance_type,
base_job_name=self.processor_name,
sagemaker_session=session,
).run()
And the boto3(v 1.16.63) session is generated as
def get_session(self):
boto_session = boto3.session.Session()
client = boto_session.client('sagemaker', region_name=self.region)
session = sagemaker.session.Session(boto_session=boto_session, sagemaker_client=client)
return session
Finally, the Dag itself
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
with DAG('sagemaker-processor',
default_args=default_args,
schedule_interval='#hourly',
) as dag:
t1 = BashOperator(
task_id='print_current_date',
bash_command='date'
)
t2 = PythonOperator(
task_id='sagemaker_trigger', python_callable=initiate_sage_maker_job()
)
t1 >> t2
I'm just trying to import dags from a folder and check import errors, check upstream and downstream list.
On a side note, I've made sure the Dag is turned off on the Airflow UI and or to execute airflow scheduler to start queueing tasks. It's really just a standard test I want to execute using Pytest.
The issue pops up as follows
Job Name: airflow-ecr-test-2021-02-26-21-10-39-935
Inputs: []
Outputs: []
[2021-02-26 16:10:39,935] {session.py:854} INFO - Creating processing-job with name airflow-ecr-test-2021-02-26-21-10-39-935
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/Ajeya.Kempegowda/dags/sample_mwaa.py", line 31, in <module>
initiate_sage_maker_job()
File "/Users/Ajeya.Kempegowda/anaconda3/envs/airflow/lib/python3.7/site-packages/sagemaker/processing.py", line 180, in run
experiment_config=experiment_config,
File "/Users/Ajeya.Kempegowda/anaconda3/envs/airflow/lib/python3.7/site-packages/sagemaker/processing.py", line 695, in start_new
processor.sagemaker_session.process(**process_args)
File "/Users/Ajeya.Kempegowda/anaconda3/envs/airflow/lib/python3.7/site-packages/sagemaker/session.py", line 856, in process
self.sagemaker_client.create_processing_job(**process_request)
File "/Users/Ajeya.Kempegowda/anaconda3/envs/airflow/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/Users/Ajeya.Kempegowda/anaconda3/envs/airflow/lib/python3.7/site-packages/botocore/client.py", line 676, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ExpiredTokenException) when calling the CreateProcessingJob operation: The security token included in the request is expired
The error displayed says TokenExpired but it really stems while creating a job itself.
Is there something obvious I'm missing while testing airflow? My understanding is that airflow scheduler should queue up dags and only when told to execute(Turn on dag on Airflow UI/CLI) the tasks must be triggered.
Any help would be appreciated. Thanks!

Airflow error while using pandas - Task received SIGTERM signal

I'm getting this SIGTerm error on Airflow 1.10.11 using LocalExecutor.
[2020-09-21 10:26:51,210] {{taskinstance.py:955}} ERROR - Received SIGTERM. Terminating subprocesses.
The dag task is doing this:
reading some data from SQL Server (on Windows) to a pandas dataframe.
And then it writes it to a file (it doesn't even get to this part).
The strange thing is if I limit the number of rows to return in the query (say TOP 100), the dag succeeds.
If I run the python code in my machine locally, it succeeds. I'm using pyodbc and sqlalchemy. It fails on this line after only 20 or 30 seconds:
df_query_results = pd.read_sql(sql_query, engine)
Airflow log
[2020-09-21 10:26:51,210] {{helpers.py:325}} INFO - Sending Signals.SIGTERM to GPID xxx
[2020-09-21 10:26:51,210] {{taskinstance.py:955}} ERROR - Received SIGTERM. Terminating subprocesses.
[2020-09-21 10:26:51,804] {{taskinstance.py:1150}} ERROR - Task received SIGTERM signal
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/airflow/dags/operators/sql_to_avro.py", line 39, in execute
df_query_results = pd.read_sql(sql_query, engine)
File "/usr/local/lib64/python3.6/site-packages/pandas/io/sql.py", line 436, in read_sql
chunksize=chunksize,
File "/usr/local/lib64/python3.6/site-packages/pandas/io/sql.py", line 1231, in read_query
data = result.fetchall()
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1216, in fetchall
e, None, None, self.cursor, self.context
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1478, in _handle_dbapi_exception
util.reraise(*exc_info)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
raise value
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1211, in fetchall
l = self.process_rows(self._fetchall_impl())
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1161, in _fetchall_impl
return self.cursor.fetchall()
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 957, in signal_handler
raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
[2020-09-21 10:26:51,813] {{taskinstance.py:1194}} INFO - Marking task as FAILED.
EDIT:
I missed this earlier, but there is a warning message about the hostname.
WARNING - The recorded hostname da2mgrl001d1.mycompany.corp does not match this instance's hostname airflow-mycompany-dev.i.mct360.com
I had a Linux/network engineer help out. Unfortunately, I don't know the full details but the fix was they changed the hostname_callable setting in airflow.cfg to hostname_callable = socket:gethostname. It was previously set to socket:getfqdn
Note: I found a couple different (maybe related?) questions where this was the resolution.
How to fix the error "AirflowException("Hostname of job runner does not match")"?
https://stackoverflow.com/a/59108743/220997

Failed to load application: No module named udpecho

I am trying to start a socket connection in amazon AMI server, for that I have some existing code which is in python, I am a newbie to python and unable to start that, As I am trying to start that program using this command, I am facing some issues, please check this
from twisted.application import internet, service
from udpecho import Echo
application = service.Application("echo")
echoService = internet.UDPServer(7401, Echo())
echoService.setServiceParent(application)
but its showing error
Traceback (most recent call last):
File "/usr/lib64/python2.6/dist-packages/twisted/application/app.py", line 694, in run
runApp(config)
File "/usr/lib64/python2.6/dist-packages/twisted/scripts/twistd.py", line 23, in runApp
_SomeApplicationRunner(config).run()
File "/usr/lib64/python2.6/dist-packages/twisted/application/app.py", line 411, in run
self.application = self.createOrGetApplication()
File "/usr/lib64/python2.6/dist-packages/twisted/application/app.py", line 494, in createOrGetApplication
application = getApplication(self.config, passphrase)
--- <exception caught here> ---
File "/usr/lib64/python2.6/dist-packages/twisted/application/app.py", line 505, in getApplication
application = service.loadApplication(filename, style, passphrase)
File "/usr/lib64/python2.6/dist-packages/twisted/application/service.py", line 390, in loadApplication
application = sob.loadValueFromFile(filename, 'application', passphrase)
File "/usr/lib64/python2.6/dist-packages/twisted/persisted/sob.py", line 215, in loadValueFromFile
exec fileObj in d, d
File "udp_server.tac", line 4, in <module>
from udpecho import Echo
exceptions.ImportError: No module named udpecho
Failed to load application: No module named udpecho
What should I do for this to run?
Any Working solution is appreciable

Django + Celery + Supervisord + Redis error when setting

I am going through the setting of the following components on CentOS server. I get supervisord task to get the web site up and running, but I am blocked on setting the supervisor for celery. It seems that it recognizes the tasks, but when I try to execute the tasks, it won't connect to them. My redis is up and running on port 6380
Django==1.10.3
amqp==1.4.9
billiard==3.3.0.23
celery==3.1.25
kombu==3.0.37
pytz==2016.10
my celeryd.ini
[program:celeryd]
command=/root/myproject/myprojectenv/bin/celery worker -A mb --loglevel=INFO
environment=PATH="/root/myproject/myprojectenv/bin/",VIRTUAL_ENV="/root/myproject/myprojectenv",PYTHONPATH="/root/myproject/myprojectenv/lib/python2.7:/root/myproject/myprojectenv/lib/python2.7/site-packages"
directory=/home/.../myapp/
user=nobody
numprocs=1
stdout_logfile=/home/.../myapp/log_celery/worker.log
sterr_logfile=/home/.../myapp/log_celery/worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 1200
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; Set Celery priority higher than default (999)
; so, if rabbitmq(redis) is supervised, it will start first.
priority=1000
The process starts and when I go to the project folder and do:
>python manage.py celery status
celery#ssd-1v: OK
1 node online.
When I open the log file of celery I see that the tasks are loaded.
[tasks]
. mb.tasks.add
. mb.tasks.update_search_index
. orders.tasks.order_created
my mb/tasks.py
from mb.celeryapp import app
import django
django.setup()
#app.task
def add(x, y):
print(x+y)
return x + y
my mb/celeryapp.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
from django.conf import settings
# set the default Django settings module for the 'celery' program.
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mb.settings")
app = Celery('mb', broker='redis://localhost:6380/', backend='redis://localhost:6380/')
app.conf.broker_url = 'redis://localhost:6380/0'
app.conf.result_backend = 'redis://localhost:6380/'
app.conf.timezone = 'Europe/Sofia'
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
my mb/settings.py:
...
WSGI_APPLICATION = 'mb.wsgi.application'
BROKER_URL = 'redis://localhost:6380/0'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'
...
when I run:
python manage.py shell
>>> from mb.tasks import add
>>> add.name
'mb.tasks.add'
>>> result=add.delay(1,1)
>>> result.ready()
False
>>> result.status
'PENDING'
And as mentioned earlier I do not see any change in the log anymore.
If I try to run from the command line:
/root/myproject/myprojectenv/bin/celery worker -A mb --loglevel=INFO
Running a worker with superuser privileges when the
worker accepts messages serialized with pickle is a very bad idea!
If you really want to continue then you have to set the C_FORCE_ROOT
environment variable (but please think about this before you do).
User information: uid=0 euid=0 gid=0 egid=0
But I suppose that's normal since I run it after with user nobody. Interesting thing is that the command just celery status (without python manage.py celery status) gives an error on connection, probably because it is looking for different port for redis, but the process of supervisord starts normally... and when I call 'celery worker -A mb' it says it's ok. Any ideas?
(myprojectenv) [root#ssd-1v]# celery status
Traceback (most recent call last):
File "/root/myproject/myprojectenv/bin/celery", line 11, in <module>
sys.exit(main())
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/__main__.py", line 3
0, in main
main()
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
81, in main
cmd.execute_from_commandline(argv)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
793, in execute_from_commandline
super(CeleryCommand, self).execute_from_commandline(argv)))
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/base.py", line 3
11, in execute_from_commandline
return self.handle_argv(self.prog_name, argv[1:])
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
785, in handle_argv
return self.execute(command, argv)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
717, in execute
).run_from_argv(self.prog_name, argv[1:], command=argv[0])
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/base.py", line 3
15, in run_from_argv
sys.argv if argv is None else argv, command)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/base.py", line 3
77, in handle_argv
return self(*args, **options)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/base.py", line 2
74, in __call__
ret = self.run(*args, **kwargs)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
473, in run
replies = I.run('ping', **kwargs)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
325, in run
return self.do_call_method(args, **kwargs)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/bin/celery.py", line
347, in do_call_method
return getattr(i, method)(*args)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/app/control.py", line 100, in ping
return self._request('ping')
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/app/control.py", line 71, in _request
timeout=self.timeout, reply=True,
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/celery/app/control.py", line 316, in broadcast
limit, callback, channel=channel,
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/kombu/pidbox.py", line 283, in _broadcast
chan = channel or self.connection.default_channel
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/kombu/connection.py", line 771, in default_channel
self.connection
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/kombu/connection.py", line 756, in connection
self._connection = self._establish_connection()
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/kombu/connection.py", line 711, in _establish_connection
conn = self.transport.establish_connection()
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 116, in establish_connection
conn = self.Connection(**opts)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/amqp/connection.py", line 165, in __init__
self.transport = self.Transport(host, connect_timeout, ssl)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/amqp/connection.py", line 186, in Transport
return create_transport(host, connect_timeout, ssl)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/amqp/transport.py", line 299, in create_transport
return TCPTransport(host, connect_timeout)
File "/root/myproject/myprojectenv/lib/python2.7/site-packages/amqp/transport.py", line 95, in __init__
raise socket.error(last_err)
socket.error: [Errno 111] Connection refused
Any help will be highly appreciated.
UPDATE:
when I run
$:python manage.py shell
>>from mb.tasks import add
>>add
<#task: mb.tasks.add of mb:0x**2b3f6d0**>
the 0x2b3f6d0is different from what celery claims to be its memory space in its log, namely:
[config]
- ** ---------- .> app: mb:0x3495bd0
- ** ---------- .> transport: redis://localhost:6380/0
- ** ---------- .> results: disabled://
- *** --- * --- .> concurrency: 1 (prefork)
Ok, the answer in this case was that the gunicorn file was actually starting the project from the common python library, instead of the virtual env

Celery could not start worker processes when using scrapy-Djangoitem

Question detail: https://github.com/celery/celery/issues/3598
I want to run a scrapy spider with celery, which contain Djangoitems.
this is my celery task:
# coding_task.py
import sys
from celery import Celery
from collector.collector.crawl_agent import crawl
app = Celery('coding.net', backend='redis', broker='redis://localhost:6379/0')
app.config_from_object('celery_config')
#app.task
def period_task():
crawl()
collector.collector.crawl_agent.crawl contains a scrapy crawler who uses djangoitem as item.
the item like:
import django
os.environ['DJANGO_SETTINGS_MODULE'] = 'RaPo3.settings'
django.setup()
from scrapy_djangoitem import DjangoItem
from xxx.models import Collection
class CodingItem(DjangoItem):
django_model = Collection
amount = scrapy.Field(default=0)
role = scrapy.Field()
type = scrapy.Field()
duration = scrapy.Field()
detail = scrapy.Field()
extra = scrapy.Field()
when run: celery -A coding_task worker --loglevel=info --concurrency=1, it wil get some errors below:
[2016-11-16 17:33:41,934: ERROR/Worker-1] Process Worker-1
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/billiard/process.py", line 292, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/site-packages/billiard/pool.py", line 292, in run
self.after_fork()
File "/usr/local/lib/python2.7/site-packages/billiard/pool.py", line 395, in after_fork
self.initializer(*self.initargs)
File "/usr/local/lib/python2.7/site-packages/celery/concurrency/prefork.py", line 80, in process_initializer
signals.worker_process_init.send(sender=None)
File "/usr/local/lib/python2.7/site-packages/celery/utils/dispatch/signal.py", line 151, in send
response = receiver(signal=self, sender=sender, **named)
File "/usr/local/lib/python2.7/site-packages/celery/fixups/django.py", line 152, in on_worker_process_init
self._close_database()
File "/usr/local/lib/python2.7/site-packages/celery/fixups/django.py", line 181, in _close_database
funs = [self._db.close_connection] # pre multidb
AttributeError: 'module' object has no attribute 'close_connection'
[2016-11-16 17:33:41,942: INFO/MainProcess] Connected to redis://localhost:6379/0
[2016-11-16 17:33:41,957: INFO/MainProcess] mingle: searching for neighbors
[2016-11-16 17:33:42,962: INFO/MainProcess] mingle: all alone
/usr/local/lib/python2.7/site-packages/celery/fixups/django.py:199: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
warnings.warn('Using settings.DEBUG leads to a memory leak, never '
[2016-11-16 17:33:42,968: WARNING/MainProcess] /usr/local/lib/python2.7/site-packages/celery/fixups/django.py:199: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
warnings.warn('Using settings.DEBUG leads to a memory leak, never '
[2016-11-16 17:33:42,968: WARNING/MainProcess] celery#MacBook-Pro.local ready.
[2016-11-16 17:33:42,969: ERROR/MainProcess] Process 'Worker-1' pid:2777 exited with 'exitcode 1'
[2016-11-16 17:33:42,991: ERROR/MainProcess] Unrecoverable error: WorkerLostError('Could not start worker processes',)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/celery/worker/__init__.py", line 208, in start
self.blueprint.start(self)
File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 127, in start
step.start(parent)
File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 378, in start
return self.obj.start()
File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 271, in start
blueprint.start(self)
File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 127, in start
step.start(parent)
File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 766, in start
c.loop(*c.loop_args())
File "/usr/local/lib/python2.7/site-packages/celery/worker/loops.py", line 50, in asynloop
raise WorkerLostError('Could not start worker processes')
WorkerLostError: Could not start worker processes
if i delete djangoitem in item:
from scrapy.item import Item
class CodingItem(item):
amount = scrapy.Field(default=0)
role = scrapy.Field()
type = scrapy.Field()
duration = scrapy.Field()
detail = scrapy.Field()
extra = scrapy.Field()
the task will play well and doesn't have any error.
What should i do if i want to use djangoitem in this celery-scrapy task?
Thanks!
You should check the ram usage. It might be possible celery is not getting enough ram
Upgrade Celery to 4.0 will solve the problem.
More detail: https://github.com/celery/celery/issues/3598

Categories