Django exception bugging me, don't know how to debug it

Django exception bugging me, don't know how to debug it - python

I recently upgraded to python2.7 and django1.3 and since then
Unhandled exception in thread started by <bound method Command.inner_run of <django.core.management.commands.runserver.Command object at 0x109c57490>>
Traceback (most recent call last):
File "/Users/ApPeL/.virtualenvs/myhunt/lib/python2.7/site-packages/django/core/management/commands/runserver.py", line 88, in inner_run
self.validate(display_num_errors=True)
File "/Users/ApPeL/.virtualenvs/myhunt/lib/python2.7/site-packages/django/core/management/base.py", line 249, in validate
num_errors = get_validation_errors(s, app)
File "/Users/ApPeL/.virtualenvs/myhunt/lib/python2.7/site-packages/django/core/management/validation.py", line 36, in get_validation_errors
for (app_name, error) in get_app_errors().items():
File "/Users/ApPeL/.virtualenvs/myhunt/lib/python2.7/site-packages/django/db/models/loading.py", line 146, in get_app_errors
self._populate()
File "/Users/ApPeL/.virtualenvs/myhunt/lib/python2.7/site-packages/django/db/models/loading.py", line 67, in _populate
self.write_lock.release()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 137, in release
raise RuntimeError("cannot release un-acquired lock")
RuntimeError: cannot release un-acquired lock
Your help would be greatly appreciated.

A usual first recommendation is to apply the latest updates to gevent or greenlet or what you use related to threads. Implementation of threading.Thread.start has been changed between Python 2.6 and 2.7. There are many recipes how to start green... or green... with Django. Try to read any recent for Python 2.7. and send a link which one makes the problem.
Debugging:
Add following lines to your manage.py to enable logging of thread start etc. to stderr:
import threading
setattr(threading, '__debug__', True)
Add the argument verbose to django/db/loading.py line 39 in order to see also what threads acquire and release the lock.
- write_lock = threading.RLock(),
+ write_lock = threading.RLock(verbose=True),
Run development server. For only one thread without autoreload you should see something like:
$ python manage.py runserver --noreload
Validating models...
MainThread: <_RLock owner='MainThread' count=1>.acquire(1): initial success
MainThread: <_RLock owner=None count=0>.release(): final release
Notes:
count=1 acquire(1) -- the first acquire by a blocking lock
owner=None count=0>.release() -- the the lock is currently being unlocked
$ python manage.py runserver
Validating models...
Dummy-1: <_RLock owner=-1222960272 count=1>.acquire(1): initial success
Dummy-1: <_RLock owner=None count=0>.release(): final release
This is the same with autoreload. Models are validated by the child process.
"Dummy-1" is a symbolic name of the thread. This can be repeated for more threads, but no threads should/can acquire the lock until it is released by the previous thread. We can continue according the results.

Related

Is multiprocessing.Pool not allowed in Airflow task? - AssertionError: daemonic processes are not allowed to have children

Our airflow project has a task that queries from BigQuery and uses Pool to dump in parallel to local JSON files:
def dump_in_parallel(table_name):
base_query = f"select * from models.{table_name}"
all_conf_ids = range(1,10)
n_jobs = 4
with Pool(n_jobs) as p:
p.map(partial(dump_conf_id, base_query = base_query), all_conf_ids)
with open("/tmp/final_output.json", "wb") as f:
filenames = [f'/tmp/output_file_{i}.json' for i in all_conf_ids]
This task was working fine for us in airflow v1.10, but is no longer working in v2.1+. Section 2.1 here - https://blog.mbedded.ninja/programming/languages/python/python-multiprocessing/ - mentions "If you try and create a Pool from within a child worker that was already created with a Pool, you will run into the error: daemonic processes are not allowed to have children"
Here is the full Airflow error:
[2021-08-22 02:11:53,064] {taskinstance.py:1462} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1164, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1282, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1312, in _execute_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 150, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 161, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/usr/local/airflow/plugins/tasks/bigquery.py", line 249, in dump_in_parallel
with Pool(n_jobs) as p:
File "/usr/local/lib/python3.7/multiprocessing/context.py", line 119, in Pool
context=self.get_context())
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 176, in __init__
self._repopulate_pool()
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
w.start()
File "/usr/local/lib/python3.7/multiprocessing/process.py", line 110, in start
'daemonic processes are not allowed to have children'
AssertionError: daemonic processes are not allowed to have children
If it matters, we run airflow using the LocalExecutor. Any idea why this task that uses Pool would have been working in airflow v1.10 but no longer in airflow 2.1?

Airflow 2 uses different processing model under the hood to speed up processing, yet to maintain process-based isolation between running tasks.
That's why it uses forking and multiprocessing under the hook to run Tasks, but this also means that if you are using multiprocessing, you will hit the limits of Python multiprocessing that does not allow to chain multi-processing.
I am not 100% sure if it will work but you might try to set execute_tasks_new_python_interpreter configuration to True. https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#execute-tasks-new-python-interpreter . This setting will cause airflow to start a new Python interpreter when running task instead of forking/using multiprocessing (though I am not 100% sure of the latter). It will work quite a bit slower (up to a few seconds of overhead) though to run your task as the new Python interpreter will have to reinitialize and import all the airflow code before running your task.
If that does not work, then you can lunch your multiprocessing job using PythonVirtualenvOperator - that one will launch a new Python interpreter to run your python code and you should be able to use multiprocessing.

Replacing the multiprocessing library with billiard library works, per https://github.com/celery/celery/issues/4525. We have no idea why subbing one library in for the other resolves this issue though...

You can switch to joblib python library with lock backend and kill daemonic processes after execution with following instructions.

Django: running a few tasks in parallel

I have a web application build out of Django version 2.0.1
A user uploads a file, and based on the content, there are tasks which are executed in a serial fashion. After execution the results are shown to the user. Some of the tasks are independent of each other.
I want to execute the independent tasks in parallel. I tried using multiprocessing within views.py but there are some errors thrown when the processes are spawned. These tasks analyse some information and write to a file. The files are then combined to show the results to the user.
These tasks cannot be done asynchronous as the results produced needs to be shown to the user waiting. So I have dropped the idea of using Celery as recommended in other discussions.
Anyone's suggestions would be helpful.
Thanks
Error got
This was the error we gotTraceback (most recent call last):
C:\Users\idea\AppData\Local\Enthought\Canopy\edm\envs\python\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "C:\Users\idea\AppData\Local\Enthought\Canopy\edm\envs\python\lib\multiprocessing\spawn.py", line 116, in _main
self = pickle.load(from_parent)
File "G:\work\gitrepo\suprath-github\smartdata\ssd\FinalPlots\uploads\core\views.py", line 6, in
from uploads.core.models import Document
File "G:\work\gitrepo\suprath-github\smartdata\ssd\FinalPlots\uploads\core\models.py", line 7, in
class Document(models.Model):
File "C:\Users\idea\AppData\Local\Enthought\Canopy\edm\envs\python\lib\site-packages\django\db\models\base.py", line 100, in new
app_config = apps.get_containing_app_config(module)
File "C:\Users\idea\AppData\Local\Enthought\Canopy\edm\envs\python\lib\site-packages\django\apps\registry.py", line 244, in get_containing_app_config
self.check_apps_ready()
File "C:\Users\idea\AppData\Local\Enthought\Canopy\edm\envs\python\lib\site-packages\django\apps\registry.py", line 127, in check_apps_ready
raise AppRegistryNotReady("Apps aren't loaded yet.")
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
Traceback (most recent call last):

These tasks cannot be done asynchronous as the results produced needs to be shown to the user waiting
That doesn't mean you can't use an async queue (celery or other). We have a very similar use case and do use celery to run the tasks. The tasks (part parallel, part serial) store their progress in redis, and the frontend polls to get the current state and display progress to the user, then when the whole process is done (either successfuly or not) we display the result (or errors).

I agree with the solution provided by #bruno desthuillieres, however, you can implement some socket solution to reach back to the user.
Since polling from user may have huge performance impacts, the socket solution will ideal for this case.

Dask Distributed: Getting some errors after computations

I am running Dask Distributed on Linux CentOS 7, with a Python 3.6.2 installation. My computation seems to be getting fine (I am still improving my code, but I am able to have some results), but I keep getting some python errors apparently linked to tornado module. I am only launching a one node standalone Dask distributed cluster.
Here is the most common example:
Exception in thread Client loop:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/site-packages/tornado/ioloop.py", line 832, in start
self._run_callback(self._callbacks.popleft())
AttributeError: 'NoneType' object has no attribute 'popleft'
And here is another one:
tornado.application - ERROR - Exception in callback <bound method WorkStealing.balance of <distributed.stealing.WorkStealing object at 0x7f752ce6d6a0>>
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tornado/ioloop.py", line 1026, in _run
return self.callback()
File "/usr/local/lib/python3.6/site-packages/distributed/stealing.py", line 248, in balance
sat = s.rprocessing[key]
KeyError: 'read-block-9024000000-e3fefd2110094168cc0505db69b326e0'
Do you have any idea why? Should I close some connections or stop the standalone cluster?

Yes, if you don't close down the Tornado IOLoop before exiting the process then it can die in an unpleasant way. Fortunately this shouldn't affect your application, except by looking unpleasant.
You might submit a bug report about this, it's still something that we should fix.

Cement framework receive signal 15 on pool worker close

I'm experiencing a problem with the Cement framework for python (using python3 at the moment). I have a multiprocess application which uses python's Pool workers. A the end (it deos not interfere with the results) of every multiporcessing section my stdout is filled with one or more of these exceptions:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/util.py", line 254, in _run_finalizers
finalizer()
File "/usr/lib/python3.5/multiprocessing/util.py", line 186, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/lib/python3.5/multiprocessing/queues.py", line 198, in _finalize_join
thread.join()
File "/usr/lib/python3.5/threading.py", line 1054, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.5/threading.py", line 1070, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
File "/home/yogaub/.virtualenvs/seminar/lib/python3.5/site-packages/cement/core/foundation.py", line 123, in cement_signal_handler
raise exc.CaughtSignal(signum, frame)
cement.core.exc.CaughtSignal: Caught signal 15
Does anyone know why this happens, and how to prevent it?
Thanks
edit: I should add that i'm logging with the multiprocess logging system of this question. I don't really know if there is any correlation.
edit2: This is the process pool creation and termination:
pool = Pool(processes=core_num)
pool.map(worker_unpacker.work, formatted_input)
pool.close()
t2 = time.time()
I've tried catching sigterm with Cement's hook system but it doesn't work. The only solution I found at the moment is to actually completely ignore signals in the cement app configuration (but it is not really a solution I like..).

This is an educated guess: The parent process kills (terminate()s) the started processes on exit. If you call pool.join() in the parent process, then the parent process waits until all sub processes are finished and will not send SIGTERM to them.

Error in Python multiprocessing process

I am trying a write a python code having multiple processes whose structure and flow is something like this:
import multiprocessing
import ctypes
import time
import errno
m=multiprocessing.Manager()
mylist=m.list()
var1=m.Value('i',0)
var2=m.Value('i',1)
var3=m.Value('i',2)
var4=m.Value(ctypes.c_char_p,"a")
var5=m.Value(ctypes.c_char_p,"b")
var6=3
var7=4
var8=5
var9=6
var10=7
def func(var1,var2,var4,var5,mylist):
i=0
try:
if var1.value==0:
print var2.value,var4.value,var5.value
mylist.append(time.time())
elif var1.value==1:
i=i+2
print var2.value+2,var4.value,var5.value
mylist.append(time.time())
except IOError as e:
if e.errno==errno.EPIPE:
var3.value=var3.value+1
print "Error"
def work():
for i in range(var3.value):
print i,var6,var7,va8,var9,var10
p=multiprocessing.Process(target=func,args=(var1,var2,var4,var5,mylist))
p.start()
work()
When I run this code, sometimes it works perfectly, sometimes it does not run for exact amount of loop counts and sometimes I get following error:
0
1
Process Process-2:
Traceback (most recent call last):
File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "dummy.py", line 19, in func
if var1.value==0:
File "/usr/lib64/python2.6/multiprocessing/managers.py", line 1005, in get
return self._callmethod('get')
File "/usr/lib64/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib64/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib64/python2.6/multiprocessing/connection.py", line 149, in Client
answer_challenge(c, authkey)
File "/usr/lib64/python2.6/multiprocessing/connection.py", line 383, in answer_challenge
message = connection.recv_bytes(256) # reject large message
EOFError
What does this error mean? What wrong am I doing here? What this error indicates? Kindly guide me to the correct path. I am using CentOS 6.5

Working with shared variables in multiprocessing is tricky. Because of the python Global Interpreter Lock (GIL), multiprocessing is not directly possible in Python. When you use the multiprocessing module, you can launch several task on different process, BUT you can't share the memory.
In you case, you need this so you try to use shared memory. But what happens here is that you have several processes trying to read the same memory at the same time. To avoid memory corruption, a process lock the memory address it is currently reading, forbidding other processes to access it until it finishes reading.
Here you have 3 processes trying to evaluate var1.value in the first if loop of your func : the first process read the value, and the other are blocked, raising an error.
To avoid this mechanism, you should always manage the Lock of your shared variables yourself.
You can try with syntax:
var1=multiprocessing.Value('i',0) # create shared variable
var1.acquire() # get the lock : it will wait until lock is available
var1.value # read the value
var1.release() # release the lock
External documentation :
Locks : https://docs.python.org/2/librar/multiprocessing.html#synchronization-between-processes
GIL : https://docs.python.org/2/glossary.html#term-global-interpreter-lock

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.