I am getting this when using a Scrapy parsing function (that can take till 10 minutes sometimes) inside a Celery task.
I use:
- Django==1.6.5
- django-celery==3.1.16
- celery==3.1.16
- psycopg2==2.5.5 (I used also psycopg2==2.5.4)
[2015-07-19 11:27:49,488: CRITICAL/MainProcess] Task myapp.parse_items[63fc40eb-c0d6-46f4-a64e-acce8301d29a] INTERNAL ERROR: InterfaceError('connection already closed',)
Traceback (most recent call last):
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/celery/app/trace.py", line 284, in trace_task
uuid, retval, SUCCESS, request=task_request,
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/celery/backends/base.py", line 248, in store_result
request=request, **kwargs)
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/backends/database.py", line 29, in _store_result
traceback=traceback, children=self.current_task_children(request),
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/managers.py", line 42, in _inner
return fun(*args, **kwargs)
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/managers.py", line 181, in store_result
'meta': {'children': children}})
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/managers.py", line 87, in update_or_create
return get_queryset(self).update_or_create(**kwargs)
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/managers.py", line 70, in update_or_create
obj, created = self.get_or_create(**kwargs)
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 376, in get_or_create
return self.get(**lookup), False
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 304, in get
num = len(clone)
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 77, in __len__
self._fetch_all()
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 857, in _fetch_all
self._result_cache = list(self.iterator())
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 220, in iterator
for row in compiler.results_iter():
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 713, in results_iter
for rows in self.execute_sql(MULTI):
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 785, in execute_sql
cursor = self.connection.cursor()
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 160, in cursor
cursor = self.make_debug_cursor(self._cursor())
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 134, in _cursor
return self.create_cursor()
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 134, in _cursor
return self.create_cursor()
File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 137, in create_cursor
cursor = self.connection.cursor()
InterfaceError: connection already closed
Unfortunately this is a problem with django + psycopg2 + celery combo.
It's an old and unsolved problem.
Take a look on this thread to understand:
https://github.com/celery/django-celery/issues/121
Basically, when celery starts a worker, it forks a database connection
from django.db framework. If this connection drops for some reason, it
doesn't create a new one. Celery has nothing to do with this problem
once there is no way to detect when the database connection is dropped
using django.db libraries. Django doesn't notifies when it happens,
because it just start a connection and it receives a wsgi call (no
connection pool). I had the same problem on a huge production
environment with a lot of machine workers, and sometimes, these
machines lost connectivity with postgres server.
I solved it putting each celery master process under a linux
supervisord handler and a watcher and implemented a decorator that
handles the psycopg2.InterfaceError, and when it happens this function
dispatches a syscall to force supervisor restart gracefully with
SIGINT the celery process.
Edit:
Found a better solution. I implemented a celery task baseclass like this:
from django.db import connection
import celery
class FaultTolerantTask(celery.Task):
""" Implements after return hook to close the invalid connection.
This way, django is forced to serve a new connection for the next
task.
"""
abstract = True
def after_return(self, *args, **kwargs):
connection.close()
#celery.task(base=FaultTolerantTask)
def my_task():
# my database dependent code here
I believe it will fix your problem too.
Guys and emanuelcds,
I had the same problem, now I have updated my code and created a new loader for celery:
from djcelery.loaders import DjangoLoader
from django import db
class CustomDjangoLoader(DjangoLoader):
def on_task_init(self, task_id, task):
"""Called before every task."""
for conn in db.connections.all():
conn.close_if_unusable_or_obsolete()
super(CustomDjangoLoader, self).on_task_init(task_id, task)
This of course if you are using djcelery, it will also require something like this in the settings:
CELERY_LOADER = 'myproject.loaders.CustomDjangoLoader'
os.environ['CELERY_LOADER'] = CELERY_LOADER
I still have to test it, I will update.
If you are running into this when running tests, then you can either change the test to TransactionTestCase class instead of TestCase or add the mark pytest.mark.django_db(transaction=True). This kept my db connection alive from creation of the pytest-celery fixtures to the database calls.
Github issue - https://github.com/Koed00/django-q/issues/167
For context, I am using pytest-celery with celery_app and celery_worker as fixtures in my tests. I am also trying to hit the test db in the tasks referenced in these tests.
If someone would explain switching to transaction=True keeps it open, that would be great!
Related
Setup:
AWS RDS db (postgres), Dockerised django backend hosted on elastic beanstalk
Some context:
I have a django backend that accepts form post requests from a separate website, and django saves the form data to the database. The django app I'm creating is replacing existing software that is used for this purpose. I started testing the django app by having the website send the form data to BOTH the existing software and my django app. I noticed that occassionally my django app will not receive a form that the existing software receives. E.g django will finish the day with 25 forms but the existing software will finish with 26. I have request logging setup, but even the logs don't show any records of the 26th form.
To debug what was going wrong, I set up a script command within the django app to send forms to my django app endpoint every minute. It eventually failed:
Exception retrieving data fields
SSL SYSCALL error: EOF detected
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
psycopg2.OperationalError: SSL SYSCALL error: EOF detected
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/code/src/leads/management/commands/bug_finder.py", line 115, in handle
data_field = DataField.objects.get(api_field=field)
File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 431, in get
num = len(clone)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 262, in __len__
self._fetch_all()
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 1324, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 51, in __iter__
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 98, in execute
return super().execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
django.db.utils.OperationalError: SSL SYSCALL error: EOF detected
[DJANGO] DEBUG 2022-09-08 22:43:09,567 connectionpool urllib3.connectionpool._new_conn:1003: Starting new HTTPS connection (1): url:443
[DJANGO] DEBUG 2022-09-08 22:43:09,730 connectionpool urllib3.connectionpool._make_request:456: https://url:443 "POST /new_lead/5kbLWVNXHQ3U6lJ9trME8hPSf HTTP/1.1" 400 799
So it seems like django is unable to connect with the database to perform queries/saves sometimes.
My local script exception occurred when trying to access datafield models from the database to generate a fake payload for the form:
payload = {}
for field in api_fields:
try:
data_field = DataField.objects.get(api_field=field)
payload[field] = generate_value(data_field)
except DataField.DoesNotExist:
continue
So I guess sometimes when requests are made from the website, django cant connect with the db, it raises this exception, doesnt save the form, and fails to send a response to the website.
When searching the above exception, most answers seem to suggest to relate to persistent db connection setups that have improper timeout settings, which I don't think relates to django request response system.
Is this normal, is it due to AWS going down? Would a solution be to wrap the endpoint's view in a try catch for this exception and try to store the form data in cache until django can connect with the DB again? is there an easy way to test whether django is connected to the db?
Something like a wrapper that effectively does:
try:
view_function()
except psycopg2.OperationalError:
counter = 0
while counter <= 3:
if django.is_connected_to_db:
return view_function()
else:
counter += 1
time.sleep(20)
save_request_to_cache_to_try_later
return response
Seems like I shouldn't have to be doing this though
For your question regarding checking the database connection in Django, you can use this code wherever you want:
from django.db import connections
from django.db.utils import OperationalError
try:
db_conn = connections['default']
except OperationalError:
time.sleep(20)
About the database problem:
Which region are you exactly using on AWS? I have deployed multiple projects on us-east-1 and eu-central-1 and didn't have the problem.
This problem seems unusual, and you should send a ticket to AWS support about that. Anyway, you can try migrating your RDS to the Elastic Beanstalk managed databas. It is basically the same as normal RDS, but the network configuration is easier. Just make sure that RDS is in the same VPC as your instances.
I'm getting this error of autoreconnect, and there are around 100 connections in the logs during this call to .objects. Here is the document:
class NotificationDoc(Document):
patient_id = StringField(max_length=32)
type = StringField(max_length=32)
sms_sent = BooleanField(default=False)
email_sent = BooleanField(default=False)
sending_time = DateTimeField(default=datetime.utcnow)
def __unicode__(self):
return "{}_{}_{}".format(self.patient_id, self.type, self.sending_time.strftime('%Y-%m-%d'))
and this is the query call that causes the issue:
docs = NotificationDoc.objects(sending_time__lte=from_date)
This is the complete stacktrace. How can I understand what is going on?
pymongo.errors.AutoReconnect: connection pool paused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/celery/app/trace.py", line 451, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/celery/app/trace.py", line 734, in __protected_call__
return self.run(*args, **kwargs)
File "/app/src/reminder/configurations/app/worker.py", line 106, in schedule_recall
schedule_recall_use_case.execute()
File "/app/src/reminder/domain/notification/recall/use_cases/schedule_recall.py", line 24, in execute
notifications_sent = self.notifications_provider.find_recalled_notifications_for_date(no_recalls_before_date)
File "/app/src/reminder/data_providers/database/odm/repositories.py", line 42, in find_recalled_notifications_for_date
if NotificationDoc.objects(sending_time__lte=from_date).count() > 0:
File "/usr/local/lib/python3.8/site-packages/mongoengine/queryset/queryset.py", line 144, in count
return super().count(with_limit_and_skip)
File "/usr/local/lib/python3.8/site-packages/mongoengine/queryset/base.py", line 423, in count
count = count_documents(
File "/usr/local/lib/python3.8/site-packages/mongoengine/pymongo_support.py", line 38, in count_documents
return collection.count_documents(filter=filter, **kwargs)
File "/usr/local/lib/python3.8/site-packages/pymongo/collection.py", line 1502, in count_documents
return self.__database.client._retryable_read(
File "/usr/local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1307, in _retryable_read
with self._secondaryok_for_server(read_pref, server, session) as (
File "/usr/local/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1162, in _secondaryok_for_server
with self._get_socket(server, session) as sock_info:
File "/usr/local/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1099, in _get_socket
with server.get_socket(
File "/usr/local/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.8/site-packages/pymongo/pool.py", line 1371, in get_socket
sock_info = self._get_socket(all_credentials)
File "/usr/local/lib/python3.8/site-packages/pymongo/pool.py", line 1436, in _get_socket
self._raise_if_not_ready(emit_event=True)
File "/usr/local/lib/python3.8/site-packages/pymongo/pool.py", line 1407, in _raise_if_not_ready
_raise_connection_failure(
File "/usr/local/lib/python3.8/site-packages/pymongo/pool.py", line 250, in _raise_connection_failure
raise AutoReconnect(msg) from error
pymongo.errors.AutoReconnect: mongo:27017: connection pool paused
turned out to be celery is leaving connections open, so here is the solution
from mongoengine import disconnect
from celery.signals import task_prerun
#task_prerun.connect
def on_task_init(*args, **kwargs):
disconnect(alias='default')
connect(db, host=host, port=port, maxPoolSize=400, minPoolSize=200, alias='default')
This could be related to the fact that PyMongo is not fork-safe. If you're using a process pool in any way, which includes server software like uWSGI and even some configurations of popular ASGI servers.
Every time you run a query in PyMongo, that MongoClient object becomes fork-unsafe. A MongoClient object that has never run any queries is fork-safe. Created Database and Collection objects will reference back to their parent MongoClient without checking for existence.
I do see you are using MongoEngine, which uses PyMongo underneath. I don't know the semantics of that particular library but I would assume they are the same.
In short: you must re-create your connections as part of your forking process.
pip install -U pymongo; is already fix; see this https://github.com/mongodb/mongo-python-driver/pull/944
I'm having trouble identifying the source of transaction.interfaces.NoTransaction errors within my Pyramid App. I don't see any patterns to when the error happens, so to me it's quite random.
This app is a (semi-) RESTful API and uses SQLAlchemy and MySQL. I'm currently running within a docker container that connects to an external (bare metal) MySQL instance on the same host OS.
Here's the stack trace for a login attempt within the App. This error happened right after another login attempt that was actually successful.
2020-06-15 03:57:18,982 DEBUG [txn.140501728405248:108][waitress-1] new transaction
2020-06-15 03:57:18,984 INFO [sqlalchemy.engine.base.Engine:730][waitress-1] BEGIN (implicit)
2020-06-15 03:57:18,984 DEBUG [txn.140501728405248:576][waitress-1] abort
2020-06-15 03:57:18,985 ERROR [waitress:357][waitress-1] Exception while serving /auth
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/waitress/channel.py", line 350, in service
task.service()
File "/usr/local/lib/python3.8/site-packages/waitress/task.py", line 171, in service
self.execute()
File "/usr/local/lib/python3.8/site-packages/waitress/task.py", line 441, in execute
app_iter = self.channel.server.application(environ, start_response)
File "/usr/local/lib/python3.8/site-packages/pyramid/router.py", line 270, in __call__
response = self.execution_policy(environ, self)
File "/usr/local/lib/python3.8/site-packages/pyramid_retry/__init__.py", line 127, in retry_policy
response = router.invoke_request(request)
File "/usr/local/lib/python3.8/site-packages/pyramid/router.py", line 249, in invoke_request
response = handle_request(request)
File "/usr/local/lib/python3.8/site-packages/pyramid_tm/__init__.py", line 178, in tm_tween
reraise(*exc_info)
File "/usr/local/lib/python3.8/site-packages/pyramid_tm/compat.py", line 36, in reraise
raise value
File "/usr/local/lib/python3.8/site-packages/pyramid_tm/__init__.py", line 135, in tm_tween
userid = request.authenticated_userid
File "/usr/local/lib/python3.8/site-packages/pyramid/security.py", line 381, in authenticated_userid
return policy.authenticated_userid(self)
File "/opt/REDACTED-api/REDACTED_api/auth/policy.py", line 208, in authenticated_userid
result = self._authenticate(request)
File "/opt/REDACTED-api/REDACTED_api/auth/policy.py", line 199, in _authenticate
session = self._get_session_from_token(token)
File "/opt/REDACTED-api/REDACTED_api/auth/policy.py", line 320, in _get_session_from_token
session = service.get(session_id)
File "/opt/REDACTED-api/REDACTED_api/service/__init__.py", line 122, in get
entity = self.queryset.filter(self.Meta.model.id == entity_id).first()
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3375, in first
ret = list(self[0:1])
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3149, in __getitem__
return list(res)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3481, in __iter__
return self._execute_and_instances(context)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3502, in _execute_and_instances
conn = self._get_bind_args(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3517, in _get_bind_args
return fn(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3496, in _connection_from_session
conn = self.session.connection(**kw)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1138, in connection
return self._connection_for_bind(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1146, in _connection_for_bind
return self.transaction._connection_for_bind(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 458, in _connection_for_bind
self.session.dispatch.after_begin(self.session, self, conn)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
fn(*args, **kw)
File "/usr/local/lib/python3.8/site-packages/zope/sqlalchemy/datamanager.py", line 268, in after_begin
join_transaction(
File "/usr/local/lib/python3.8/site-packages/zope/sqlalchemy/datamanager.py", line 233, in join_transaction
DataManager(
File "/usr/local/lib/python3.8/site-packages/zope/sqlalchemy/datamanager.py", line 89, in __init__
transaction_manager.get().join(self)
File "/usr/local/lib/python3.8/site-packages/transaction/_manager.py", line 91, in get
raise NoTransaction()
transaction.interfaces.NoTransaction
The trace shows that the execution eventually reaches my project, but only my custom authentication policy. And it fails right where the database should be queried for the user.
What intrigues me here is the third line on the stack trace. It seems Waitress somehow aborted the transaction it created? Any clue why?
EDIT: Here's the code where that happens: policy.py:320
def _get_session_from_token(self, token) -> UserSession:
try:
session_id, session_secret = self.parse_token(token)
except InvalidToken as e:
raise SessionNotFound(e)
service = AuthService(self.dbsession, None)
try:
session = service.get(session_id) # <---- Service Class called here
except NoResultsFound:
raise SessionNotFound("Invalid session found Request headers. "
"Session id: %s".format(session_id))
if not service.check_session(session, session_secret):
raise SessionNotFound("Session signature does not match")
now = datetime.now(tz=pytz.UTC)
if session.validity < now:
raise SessionNotFound(
"Current session ID {session_id} is expired".format(
session_id=session.id
)
)
return session
And here is an a view on the that service class method:
class AuthService(ModelService):
class Meta:
model = UserSession
queryset = Query(UserSession)
search_fields = []
order_fields = [UserSession.created_at.desc()]
# These below are from the generic ModelClass father class
def __init__(self, dbsession: Session, user_id: str):
self.user_id = user_id
self.dbsession = dbsession
self.Meta.queryset = self.Meta.queryset.with_session(dbsession)
self.logger = logging.getLogger("REDACTED")
#property
def queryset(self):
return self.Meta.queryset
def get(self, entity_id) -> Base:
entity = self.queryset.filter(self.Meta.model.id == entity_id).first()
if not entity:
raise NoResultsFound(f"Could not find requested ID {entity_id}")
As you can see, the there's already some exception treatment. I really don't see what other exception I could try to catch on AuthService.get
I found the solution to be much simpler than tinkering inside Pyramid or SQLAlchemy.
Debugging my Authentication Policy closely, I found out that my it was keeping a sticky reference for the dbsession. It was stored on the first request ever who used it, and never released.
The first request works as expected, the following one fails: My understanding is that the object is still in memory while the app is running, and after the initial transaction is closed. The second request has a new connection, and a new transaction, but the object in memory still points to the previous one, that when used ultimately causes this.
What I don't understand is why the exception didn't happen sometimes. As I mentioned initially, it was seemingly random.
Another thing that I struggled with was in writing a test case to expose the issue. On my tests, the issue never happens because I have (and I've never seen it done differently) a single connection and a single transaction throughout the entire testing session, as opposed of a new connection/transaction per request, so I have not found no way to actually reproduce.
Please let me know if that makes sense, and if you can shed a light on how to expose the bug on a test case.
I'm working on a project where we parse a somewhat large file and process each row asynchronously (we make API calls for each row) using ThreadPoolExecutor. This used to be done synchronously and we had a passing test suite. Now, however, when running the tests Django's default test runner errors out in teardown_databases:
Traceback (most recent call last):
File "manage.py", line 34, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python3.5/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python3.5/site-packages/django/core/management/__init__.py", line 359, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python3.5/site-packages/django/core/management/commands/test.py", line 29, in run_from_argv
super(Command, self).run_from_argv(argv)
File "/usr/local/lib/python3.5/site-packages/django/core/management/base.py", line 294, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/local/lib/python3.5/site-packages/django/core/management/base.py", line 345, in execute
output = self.handle(*args, **options)
File "/usr/local/lib/python3.5/site-packages/django/core/management/commands/test.py", line 72, in handle
failures = test_runner.run_tests(test_labels)
File "/usr/local/lib/python3.5/site-packages/django/test/runner.py", line 551, in run_tests
self.teardown_databases(old_config)
File "/usr/local/lib/python3.5/site-packages/django/test/runner.py", line 526, in teardown_databases
connection.creation.destroy_test_db(old_name, self.verbosity, self.keepdb)
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/creation.py", line 264, in destroy_test_db
self._destroy_test_db(test_database_name, verbosity)
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/creation.py", line 283, in _destroy_test_db
% self.connection.ops.quote_name(test_database_name))
File "/usr/local/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.5/site-packages/django/db/utils.py", line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/usr/local/lib/python3.5/site-packages/django/utils/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.5/site-packages/django/db/backends/utils.py", line 62, in execute
return self.cursor.execute(sql)
django.db.utils.OperationalError: database "test_sftpm_db" is being accessed by other users
DETAIL: There are 10 other sessions using the database.
(We're using 10 workers)
I've tried to close the connections manually in many places in the code but to no avail. Is there any proper way of fixing this?
Make sure you are calling connection.close() in your code when working with threads. This solved the issue for me, where other proposed solutions (decorator for the test methods, changing db settings, DB functions like shown by Nick above, restarting postgres) did not. Useful example
This looks like more Django's issue than Postgres' --
see for example this ticket: https://code.djangoproject.com/ticket/22420
From what you provided, I see that Django didn't close all connections to test database before making attempt to drop it. In this case, Postges tries to protect other sessions from data losses, so it cannot drop/rename the database before they all disconnect.
If you want, you can drop the test database manually, using pg_terminate_backend(..) function and pg_stat_activity view you already used:
select pg_terminate_backend(pid)
from pg_stat_activity
where
datname = 'DATABASE_NAME'
;
drop database DATABASE_NAME;
If, by some reason, somebody is very quick and manages to connect between theese two commands, drop database will fail again. In such case, you can repeat it, but before that revoke rights to connect to this database from public -- this will prevent connections to it:
revoke connect on database DATABASE_NAME from public;
...and then repeat operations described above.
Connections to the database are thread local. I ended up fixing this problem by adding a callback to each future returned by the executor.
from django.db import connections
def on_done(future):
# Because each thread has a connection, so here you can call close_all() to close all connections under this thread name.
connections.close_all()
def main():
# ...
with ThreadPoolExecutor() as executor:
while True:
future = executor.submit(do, get_a_job())
future.add_done_callback(on_done)
More found here: https://www.programmersought.com/article/3618244269/
Another thing that could be the problem is that you are subclassing TestCase which holds a global lock on your test. SubclassingTransactionTestCase will fix this problem and allow any threads your test may spawn to communicate with the database
I got similar problem and solved it by using django.test.testcases.TransactionTestCase as superclass for my test class
In my PyCharm (Run process) + postgres db case, I've fixed it by running:
1.Log in as a user that has correct rights to db, in my case it's default "postgres".
$ psql -h localhost -U postgres -W
2. Then drop all connections to existing db, in my case "mydbname" with command:
# select pg_terminate_backend(pid) from pg_stat_activity where datname='mydbname';
3. Then just click "Run" button PyCharm in normal way.
When one of my unit tests deletes a SQLAlchemy object, the object triggers an after_delete event which triggers a Celery task to delete a file from the drive.
The task is CELERY_ALWAYS_EAGER = True when testing.
gist to reproduce the issue easily
The example has two tests. One triggers the task in the event, the other outside the event. Only the one in the event closes the connection.
To quickly reproduce the error you can run:
git clone https://gist.github.com/5762792fc1d628843697.git
cd 5762792fc1d628843697
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
python test.py
The stack:
$ python test.py
E
======================================================================
ERROR: test_delete_task (__main__.CeleryTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test.py", line 73, in test_delete_task
db.session.commit()
File "/home/brice/Code/5762792fc1d628843697/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 150, in do
return getattr(self.registry(), name)(*args, **kwargs)
File "/home/brice/Code/5762792fc1d628843697/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 776, in commit
self.transaction.commit()
File "/home/brice/Code/5762792fc1d628843697/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 377, in commit
self._prepare_impl()
File "/home/brice/Code/5762792fc1d628843697/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 357, in _prepare_impl
self.session.flush()
File "/home/brice/Code/5762792fc1d628843697/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1919, in flush
self._flush(objects)
File "/home/brice/Code/5762792fc1d628843697/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2037, in _flush
transaction.rollback(_capture_exception=True)
File "/home/brice/Code/5762792fc1d628843697/venv/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 63, in __exit__
compat.reraise(type_, value, traceback)
File "/home/brice/Code/5762792fc1d628843697/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2037, in _flush
transaction.rollback(_capture_exception=True)
File "/home/brice/Code/5762792fc1d628843697/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 393, in rollback
self._assert_active(prepared_ok=True, rollback_ok=True)
File "/home/brice/Code/5762792fc1d628843697/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 223, in _assert_active
raise sa_exc.ResourceClosedError(closed_msg)
ResourceClosedError: This transaction is closed
----------------------------------------------------------------------
Ran 1 test in 0.014s
FAILED (errors=1)
I think I found the problem - it's in how you set up your Celery task. If you remove the app context call from your celery setup, everything runs fine:
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
# deleted --> with app.app_context():
return TaskBase.__call__(self, *args, **kwargs)
There's a big warning in the SQLAlchemy docs about never modifying the session during after_delete events: http://docs.sqlalchemy.org/en/latest/orm/events.html#sqlalchemy.orm.events.MapperEvents.after_delete
So I suspect the with app.app_context(): is being called during the delete, trying to attach to and/or modify the session that Flask-SQLAlchemy stores in the app object, and therefore the whole thing is bombing.
Flask-SQlAlchemy does a lot of magic behind the scenes for you, but you can bypass this and use SQLAlchemy directly. If you need to talk to the database during the delete event, you can create a new session to the db:
#celery.task()
def my_task():
# obviously here I create a new object
session = db.create_scoped_session()
session.add(User(id=13, value="random string"))
session.commit()
return
But it sounds like you don't need this, you're just trying to delete an image path. In that case, I would just change your task so it takes a path:
# instance will call the task
#event.listens_for(User, "after_delete")
def after_delete(mapper, connection, target):
my_task.delay(target.value)
#celery.task()
def my_task(image_path):
os.remove(image_path)
Hopefully that's helpful - let me know if any of that doesn't work for you. Thanks for the very detailed setup, it really helped in debugging.
Similar to the answer suggested by deBrice, but using the approach similar to Rachel.
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
import flask
# tests will be run in unittest app context
if flask.current_app:
return TaskBase.__call__(self, *args, **kwargs)
else:
# actual workers need to enter worker app context
with app.app_context():
return TaskBase.__call__(self, *args, **kwargs)
Ask, the creator of celery, suggested that solution on github
from celery import signals
def make_celery(app):
...
#signals.task_prerun.connect
def add_task_flask_context(sender, **kwargs):
if not sender.request.is_eager:
sender.request.flask_context = app.app_context().__enter__()
#signals.task_postrun.connect
def cleanup_task_flask_context(sender, **kwargs):
flask_context = getattr(sender.request, 'flask_context', None)
if flask_context is not None:
flask_context.__exit__(None, None, None)