How to debug Celery/Django tasks running locally in Eclipse - python

I need to debug Celery task from the Eclipse debugger.
I'm using Eclipse, PyDev and Django.
First, I open my project in Eclipse and put a breakpoint at the beginning of the task function.
Then, I'm starting the Celery workers from Eclipse by Right Clicking on manage.py from the PyDev Package Explorer and choosing "Debug As->Python Run" and specifying "celeryd -l info" as the argument. This starts MainThread, Mediator and three more threads visible from the Eclipse debugger.
After that I return back to the PyDev view and start the main application by Right Click on the project and choosing Run As/PyDev:Django
My issues is that once the task is submitted by the mytask.delay() it doesn't stop on the breakpoint. I put some traces withing the tasks code so I can see that it was executed in one of the worker threads.
So, how to make the Eclipse debugger to stop on the breakpoint placed withing the task when it executed in the Celery workers thread?

You should consider the option to run the celery task in the same thread as the main process (normally it runs on a separate process), this will make the debug much easier.
You can tell celery to run the task in sync by adding this setting to your settings.py module:
CELERY_TASK_ALWAYS_EAGER = True
# use this if you are on older versions of celery
# CELERY_ALWAYS_EAGER = True
Note: this is only meant to be in use for debugging or development stages!

You can do it using Celery's rdb:
from celery.contrib import rdb
rdb.set_trace()
Then, in a different terminal type telnet localhost 6900, and you will get the debug prompt.

CELERYD_POOL defaults to celery.concurrency.prefork:TaskPool which will spawn separate processes for each worker and PyDev can't see inside them. If you change it to one of the threaded options then you can use the debugger.
For example, for Celery 3.1 you can use this setting:
CELERYD_POOL = 'celery.concurrency.threads:TaskPool'
Note that this requires the threadpool module to be installed.
Also make sure to have CELERY_ALWAYS_EAGER = False, otherwise changing the pool class makes no sense.

I create a management command to test task.. find it easier than running it from shell..

If it runs only on a different thread, it should work on the latest PyDev versions (I think there was an issue before where a spawned thread would not be debugged, but this was fixed).
Now, if it's launching on a different process, you need to use the remote debugger (even if it's on the same machine). See: http://pydev.org/manual_adv_remote_debugger.html

Related

Threads not being executed under supervisord

I am working on a basic crawler which crawls 5 websites concurrently using threads.
For each site it creates a new thread. When I run the program from the shell then the output log indicates that all the 5 threads run as expected.
But when I run this program as a supervisord program then the log indicates that only 2 threads are being run everytime! The log indicates that the all the 5 threads have started but only the same two of them are being executed and the rest get stuck.
I cannot understand why this inconsistency is happening when it is run from a shell and when it run from supervisor. Is there something I am not taking into account?
Here is the code which creates the threads:
for sid in entries:
url = entries[sid]
threading.Thread(target=self.crawl_loop, \
args=(sid, url)).start()
UPDATES:
As suggested by tdelaney in the comments, I changed the working directory in the supervisord configuration and now all the threads are being run as expected. Though I still don't understand that why setting the working directory to the crawler file directory rectifies the issue. Perhaps some one who knows about how supervisor manages processes can explain?
AFAIK python threads can't do threads properly because it is not thread safe. It just gives you a facility to simulate simultaneous run of the code. Your code will still use 1 core only.
https://wiki.python.org/moin/GlobalInterpreterLock
https://en.wikibooks.org/wiki/Python_Programming/Threading
Therefore it is possible that it does not spawn more processes/threads.
You should use multiprocessing I think?
https://docs.python.org/2/library/multiprocessing.html
I was having the same silent problem, but then realised that I was setting daemon to true, which was causing supervisor problems.
https://docs.python.org/2/library/threading.html#threading.Thread.daemon
So the answer is, daemon = true when running the script yourself, false when running under supervisor.
Just to say, I was just experiencing a very similar problem.
In my case, I was working on a low powered machine (RaspberryPi), with threads that were dedicated to listening to a serial device (an Arduino nano on /dev/ttyUSB0). Code worked perfectly on the command line - but the serial reading thread stalled under supervisor.
After a bit of hacking around (and trying all of the options here), I tried running python in unbuffered mode and managed to solve the issue! I got the idea from https://stackoverflow.com/a/17961520/741316.
In essence, I simply invoked python with the -u flag.

uwsgi attach-daemon before python process starts

I have a separate process that I want to run alongside the python process I have managed by uWSGI. I wanted to use the attach-daemon option to start this process, but it seems that bash command specified in attach-daemon does not get called until after the python process' app gets started up. However, I need the process to be running before the python process starts up in order for everything to run correctly. Is there any way to specify which order things get started in? It's not even necessary to me that I use attach-daemon, if there's a simpler way to initialize a set of managed processes in a defined order.
Use --lazy-apps, in this way the app will be loaded by each worker after the master has been fully spawned (and its external daemons started)

Run Python script forever, logging errors and restarting when crashes

I have a python script that continuously process new data and writes to a mongodb. In the script, its a while loop and a sleep that runs the code continuously.
What is the recommended way to run the Python script forever, logging errors when they occur, and restarting when it crashes?
Will node.js's forever be suitable? I'm also running node/meteor on the same Ubuntu server.
supervisord is perfect for this sort of thing. While I used to check that programs were still running every couple of minutes with a cron job, supervisord runs all programs in an in-process thread, so in the event your program terminates, supervisord will automatically restart the process. I no longer need to parse the output of ps to see if a program crashed.
It has a simple declaritive config file and configurable logging. By default it creates a log file for your-program-name-stderr.log your-program-name-stdout.log which are automatically handled by logrotate when supervisord is installed from an OS package manager (Debian for me).
If you don't want to configure supervisord's logging, you should look at logging in python so you can control what goes into those files.
if you're on a debian derivative you should be able to install and start the daemon simply by executing apt-get install supervisord as root.
The config file is very straightforward too:
[program:myprogram]
command=/path/to/my/program/script
directory=/path/to/my/program/base
user=myuser
autostart=true
autorestart=true
redirect_stderr=True
supervisorctl also allows you to see what your program is doing interactively and can start and stop multiple programs with supervisorctl start myprogram etc
Recently wrote something similar. The basic pattern I follow is
while True:
try:
#functionality
except SpecificError:
#log exception
except: #catch everything else
finally:
time.sleep(600)
to handle reboots you can use init.d or cron jobs.
If you are writing a daemon, you should probably do it with this command:
http://manpages.ubuntu.com/manpages/lucid/man8/start-stop-daemon.8.html
You can spawn this from a System V /etc/init.d/ script, or use Upstart which is slowly replacing it.
Upstart: http://upstart.ubuntu.com/getting-started.html
System V: http://www.cyberciti.biz/tips/linux-write-sys-v-init-script-to-start-stop-service.html
I find System V easier to write, but if this will ever be packaged and distributed in a debian file, I recommend writing an Upstart conf.
Definitely keep the sleep so it won't keep a grip on CPU load.
I don't know if this is still relevant to you, but I have been reading forever about how to do this and want to share somewhere what I did.
For me, the goal was to have a python script running always (on my Linux computer). The python script also has a "while True " loop in it which should theoretically run forever, but if it for any reason I cannot think of would crash, I want the script to restart. Also, when I restart the computer it should run the script.
I am not an expert but for me the best and most understandable was to use systemd (assuming you use Linux).
There are two nice examples of how to do this given here and here, showing how to write your .service files in either /etc/systemd/system or /lib/systemd/system. If you want to be completely correct you should take the former:
" /etc/systemd/system/: units installed by the system administrator" 1
The documentation of systemd here is actually nice to read, even if you are not an expert.
Hope this helps someone!

Django in Pydev spawns multiple processes?

I have my project setup in PyDev in Eclipse. Whenever I debug my project, things go great, but once I try to restart the Django server, it spawns an additional runserver process, blocking up the port I'm using for the server (8000). Is there a workaround to make sure it really kills the server?
Django reloads the server each time changes are made to any Python code (running another instance of the server and killing the old one). It seems that it's not handled properly when launched from Pydev. You can deactivate this by adding the --noreload argument to the server starting command.
More information: --noreload, pydev/django (look for the remark below Run/Debug as Django)

PyDev and Django: how to restart dev server?

I'm new to Django. I think I'm making a simple mistake.
I launched the dev server with Pydev:
RClick on project >> Django >> Custom
command >> runserver
The server came up, and everything was great. But now I'm trying to stop it, and can't figure out how. I stopped the process in the PyDev console, and closed Eclipse, but web pages are still being served from http://127.0.0.1:8000.
I launched and quit the server from the command line normally:
python manage.py runserver
But the server is still up. What am I doing wrong here?
By default, the runserver command runs in autoreload mode, which runs in a separate process. This means that PyDev doesn't know how to stop it, and doesn't display its output in the console window.
If you run the command runserver --noreload instead, the auto-reloader will be disabled. Then you can see the console output and stop the server normally. However, this means that changes to your Python files won't be effective until you manually restart the server.
Run the project 1. Right click on the project (not subfolders) 2. Run As > Pydev:Django
Terminate 1. Click terminate in console window
The server is down
I usually run it from console. Running from PyDev adds unnecessary confusion, and doesn't bring any benefit until you happen to use PyDev's GUI interactive debugging.
Edit: Latest PyDev versions (since PyDev 3.4.1) no longer need any workaround:
i.e.: PyDev will properly kill subprocesses on a kill process operation and when debugging even with regular reloading on, PyDev will attach the debugger to the child processes.
Old answer (for PyDev versions older than 3.4.1):
Unfortunately, that's expected, as PyDev will simply kill the parent process (i.e.: as if instead of ctrl+C you kill the parent process in the task manager).
The solution would be editing Django itself so that the child process polls the parent process to know it's still alive and exit if it's not... see: How to make child process die after parent exits? for a reference.
After a quick look it seems related to django/utils/autoreload.py and the way it starts up things -- so, it'd be needed to start a thread that keeps seeing if the parent is alive and if it's not it kills the child process -- I've reported that as a bug in Django itself: https://code.djangoproject.com/ticket/16982
Note: as a workaround for PyDev, you can make Django allocate a new console (out of PyDev) while still running from PyDev (so, until a proper solution is available from Django, the patch below can be used to make the Django autoreload allocate a new console -- where you can properly use Ctrl+C).
Index: django/utils/autoreload.py
===================================================================
--- django/utils/autoreload.py (revision 16923)
+++ django/utils/autoreload.py (working copy)
## -98,11 +98,14 ##
def restart_with_reloader():
while True:
args = [sys.executable] + ['-W%s' % o for o in sys.warnoptions] + sys.argv
- if sys.platform == "win32":
- args = ['"%s"' % arg for arg in args]
new_environ = os.environ.copy()
new_environ["RUN_MAIN"] = 'true'
- exit_code = os.spawnve(os.P_WAIT, sys.executable, args, new_environ)
+
+ import subprocess
+ popen = subprocess.Popen(args, env=new_environ, creationflags=subprocess.CREATE_NEW_CONSOLE)
+ exit_code = popen.wait()
if exit_code != 3:
return exit_code
Solution: create an interpreter error in some project file. This will cause the server to crash. Server can then be restarted as normal.
If you operate on Windows using the CMD: Quit the server with CTRL+BREAK.
python manage.py runserver localhost:8000
you can quit by clicking Ctrl+ Pause keys. Note that the Pause key might be called Break and in some laptops it is made using the combination Fn + F12. Hope this might helps.
run sudo lsof -i:8000
then run kill -9 #PID should work to kill the processes running that server.
then you can python manage.py server on that port again

Categories