I am working on a basic crawler which crawls 5 websites concurrently using threads.
For each site it creates a new thread. When I run the program from the shell then the output log indicates that all the 5 threads run as expected.
But when I run this program as a supervisord program then the log indicates that only 2 threads are being run everytime! The log indicates that the all the 5 threads have started but only the same two of them are being executed and the rest get stuck.
I cannot understand why this inconsistency is happening when it is run from a shell and when it run from supervisor. Is there something I am not taking into account?
Here is the code which creates the threads:
for sid in entries:
url = entries[sid]
threading.Thread(target=self.crawl_loop, \
args=(sid, url)).start()
UPDATES:
As suggested by tdelaney in the comments, I changed the working directory in the supervisord configuration and now all the threads are being run as expected. Though I still don't understand that why setting the working directory to the crawler file directory rectifies the issue. Perhaps some one who knows about how supervisor manages processes can explain?
AFAIK python threads can't do threads properly because it is not thread safe. It just gives you a facility to simulate simultaneous run of the code. Your code will still use 1 core only.
https://wiki.python.org/moin/GlobalInterpreterLock
https://en.wikibooks.org/wiki/Python_Programming/Threading
Therefore it is possible that it does not spawn more processes/threads.
You should use multiprocessing I think?
https://docs.python.org/2/library/multiprocessing.html
I was having the same silent problem, but then realised that I was setting daemon to true, which was causing supervisor problems.
https://docs.python.org/2/library/threading.html#threading.Thread.daemon
So the answer is, daemon = true when running the script yourself, false when running under supervisor.
Just to say, I was just experiencing a very similar problem.
In my case, I was working on a low powered machine (RaspberryPi), with threads that were dedicated to listening to a serial device (an Arduino nano on /dev/ttyUSB0). Code worked perfectly on the command line - but the serial reading thread stalled under supervisor.
After a bit of hacking around (and trying all of the options here), I tried running python in unbuffered mode and managed to solve the issue! I got the idea from https://stackoverflow.com/a/17961520/741316.
In essence, I simply invoked python with the -u flag.
Related
What is the correct way to have python schedule (by daniel bader) persistently run. I currently run the job by having a terminal open, connected to a VM where the scripts actually run. There I run python "scheduler.py" - where scheduler.py has all the jobs.
But when the connection closes, or I close the terminal, the scheduler stops.
Any easy solutions to fix this?
You have a couple options here. You are starting the process in your ssh session, but then killing the ssh session, which then kills the process.
One way to handle this, would be to have the VM run the script on startup. You could set the script as a service, so even if it goes down for some reason it will come back up. Read into init.rc for info on how launch a script at boot on linux. I'm not well-versed in Windows any more but I believe there is a way to do the same.
Another option is to keep the session open by connecting to it with screen or tmux. This article explains the problem some and gives you a few different ways to work around the issue: https://www.tecmint.com/keep-remote-ssh-sessions-running-after-disconnection/
I have a python script I use at work that checks the contents of a webpage and reports to me the changes. It used to run fine checking the site every 5 minutes and then our company switched some security software. The script will still run but will stop after an hour or so. Its not consistent, it will sometimes be a few hours but about an hour seems average. There are no errors raised that are reported in the shell. Is there a way to have this re-started automatically? The code is below, it used to just call the function and then a sleep command, but I added the for loop and the print line for debugging to see what time it is stopping.
import time
import datetime
import txtWip
while True:
txtWip.main()
for i in range(1, 300,100):
current_time = time.strftime(r"%d.%m.%Y %H:%M:%S", time.localtime())
print(current_time)
time.sleep(100)
What you want is for your program to run as a daemon[1]: a background process that no longer responds to ordinary SIGKILL or even SIGHUP. You also want this daemon to restart itself on termination - effectively making it run forever.
Rather than write your own daemon script, it's far easier to do one of the following:
If on Linux, use Upstart.
This is a replacement for the init.d daemon that supervises all processes while your machine running. It is capable of respawning a process in the event of an unexpected crash - see an example here, and a Python-specific example here. It is the gold standard for such tasks on this platform.
For certain Ubuntu releases, systemd is the prodigal son and should be used instead.
An alternative with an external bash script that doesn't require messing around with upstart is also a possibility.
If on Windows, use ReStartMe
If on Mac, install and configure runit appropriately
[1] The following is not cross-platform advice. You may or may not actually need to run this as a true daemon. A simple background job (i.e. invoked by appending an & when you run python <file_name>.py to the end) should be sufficient - this will quit when the terminal you ran it in quits, but you can get around this by using the Linux utility screen.
I'm running a local django development server together with virtualenv and it's been a couple of days that it behaves in a weird way. Sometimes I don't see any log in the console sometimes I see them.
A couple of times I've tried to quit the process and restart it and I've got the port already taken error, so I inspected the running process and there was still an instance of django running.
Other SO answers said that this is due to the autoreload feature, well so why sometimes I have no problem at all and sometimes I do?
Anyway For curiosity I ps aux| grep python and the result is always TWO running process, one from python and one from my activated "virtualenv" python:
/Users/me/.virtualenvs/myvirtualenv/bin/python manage.py runserver
python manage.py runserver
Is this supposed to be normal?
I've solved the mistery: Django was trying to send emails but it could not because of improper configuration, so it was hanging there forever trying to send those emails.
Most probably (I'm not sure here) Django calls an OS function or a subprocess to do so. The point is that the main process was forking itself and giving the job to a subprocess or thread, or whatever, I'm not expert in this.
It turns out that when your python is forked and you kill the father, the children can apparently keep on living after it.
Correct me if I'm wrong.
I have a python script that continuously process new data and writes to a mongodb. In the script, its a while loop and a sleep that runs the code continuously.
What is the recommended way to run the Python script forever, logging errors when they occur, and restarting when it crashes?
Will node.js's forever be suitable? I'm also running node/meteor on the same Ubuntu server.
supervisord is perfect for this sort of thing. While I used to check that programs were still running every couple of minutes with a cron job, supervisord runs all programs in an in-process thread, so in the event your program terminates, supervisord will automatically restart the process. I no longer need to parse the output of ps to see if a program crashed.
It has a simple declaritive config file and configurable logging. By default it creates a log file for your-program-name-stderr.log your-program-name-stdout.log which are automatically handled by logrotate when supervisord is installed from an OS package manager (Debian for me).
If you don't want to configure supervisord's logging, you should look at logging in python so you can control what goes into those files.
if you're on a debian derivative you should be able to install and start the daemon simply by executing apt-get install supervisord as root.
The config file is very straightforward too:
[program:myprogram]
command=/path/to/my/program/script
directory=/path/to/my/program/base
user=myuser
autostart=true
autorestart=true
redirect_stderr=True
supervisorctl also allows you to see what your program is doing interactively and can start and stop multiple programs with supervisorctl start myprogram etc
Recently wrote something similar. The basic pattern I follow is
while True:
try:
#functionality
except SpecificError:
#log exception
except: #catch everything else
finally:
time.sleep(600)
to handle reboots you can use init.d or cron jobs.
If you are writing a daemon, you should probably do it with this command:
http://manpages.ubuntu.com/manpages/lucid/man8/start-stop-daemon.8.html
You can spawn this from a System V /etc/init.d/ script, or use Upstart which is slowly replacing it.
Upstart: http://upstart.ubuntu.com/getting-started.html
System V: http://www.cyberciti.biz/tips/linux-write-sys-v-init-script-to-start-stop-service.html
I find System V easier to write, but if this will ever be packaged and distributed in a debian file, I recommend writing an Upstart conf.
Definitely keep the sleep so it won't keep a grip on CPU load.
I don't know if this is still relevant to you, but I have been reading forever about how to do this and want to share somewhere what I did.
For me, the goal was to have a python script running always (on my Linux computer). The python script also has a "while True " loop in it which should theoretically run forever, but if it for any reason I cannot think of would crash, I want the script to restart. Also, when I restart the computer it should run the script.
I am not an expert but for me the best and most understandable was to use systemd (assuming you use Linux).
There are two nice examples of how to do this given here and here, showing how to write your .service files in either /etc/systemd/system or /lib/systemd/system. If you want to be completely correct you should take the former:
" /etc/systemd/system/: units installed by the system administrator" 1
The documentation of systemd here is actually nice to read, even if you are not an expert.
Hope this helps someone!
I am running two python processes (clock and web) through Heroku and Foreman
When I run locally with Foreman:
1. Both processes log to terminal
2. Then the clock process stops outputting (even though its still running). This halting of the output doesn't happen at a consistent place in the code but usually somewhere between 3-5 iterations.
3. The web process continues to output correctly.
Oddly enough, when I run the same code on Heroku, the logs output just fine.
We have PYTHONUNBUFFERED set to true locally (with .env) and on Heroku. Has anybody come across this issue? Is there a solution to it? Thanks.
I couldn't fix this problem with Foreman, but I did come up with solution. There is a python port of Foreman called honcho. I've switched to honcho and it fixed my logging/freezing issue.