Issues with time.sleep and Multithreading in Python - python

I am having an issue with the time.sleep() function in python. I am running a script that needs to wait for another program to generate txt files. Although, this is a terribly old machine, so when I sleep the python script, I run into issues with the other program not generating files. Is there any alternatives to using time.sleep()? I thought locking the thread might work but essentially it would just be a loop of locking the thread for a couple of seconds. I'll give some pseudo code here of what I'm doing.
While running:
if filesFound != []:
moveFiles
else:
time.sleep(1)

One way to do a non-blocking wait is to use threading.Event:
import threading
dummy_event = threading.Event()
dummy_event.wait(timeout=1)
This can be set() from another thread to indicate that something has completed. But if you are doing stuff in another thread, you could avoid the timeout and event altogether and just join the other thread:
import threading
def create_the_file(completion_event):
# Do stuff to create the file
def Main():
worker = threading.Thread(target=create_the_file)
worker.start()
# We will stop here until the "create_the_file" function finishes
worker.join()
# Do stuff with the file
If you want an example of using events for more fine-grained control, I can show you that...
The threading approach won't work if your platform doesn't provide the threading module. For example, if you try to substitute the dummy_threading module, dummy_event.wait() returns immediately. Not sure about the join() approach.
If you are waiting for other processes to finish, you would be better off managing them from your own script using the subprocess module (and then, for example, using the wait method to be sure the process is done before you do further work).
If you can't manage the subprocess from your script, but you know the PID, you can use the os.waitpid() function. Beware of the OSError if the process has already finished by the time you use this function...
If you want a cross-platform way to watch a directory to be notified of new files, I'd suggest using a GIO FileMonitor from PyGTK/PyGObject. You can get a monitor on a directory using the monitor_directory method of a GIO.File.
Quick sample code for a directory watch:
import gio
def directory_changed(monitor, file1, file2, evt_type):
print "Changed:", file1, file2, evt_type
gfile = gio.File(".")
monitor = gfile.monitor_directory(gio.FILE_MONITOR_NONE, None)
monitor.connect("changed", directory_changed)
import glib
ml = glib.MainLoop()
ml.run()

Related

Terminate thread, process, function in python

I'm quite new at python and for a while I try to fight specific problem. I have function to listen and print radio frames.To do that I'm using NRF24 Lib and whole function is so easy. The point is that I run this function and from time to time I need to terminate it and again run. So in code it looks like
def recv():
radio.openWritingPipe(pipes[0])
radio.openReadingPipe(1, pipes[1])
radio.startListening()
radio.stopListening()
radio.printDetails()
radio.startListening()
while True:
pipe = [0]
while not radio.available(pipe):
time.sleep(10000/1000000.0)
recv_buffer = []
radio.read(recv_buffer)
print(recv_buffer)
I run this function from a server side and now I want to stop it and run again? There is it posible ? why I just cant recv.kill()? I read about threading, multiprocessing but all this didn't give me proper result.
How I run it:
from multiprocessing import Process
def request_handler(api: str, arg: dict) -> dict:
process_radio = Process(target=recv())
if api == 'start_radio':
process_radio.start()
...
elif api == 'stop_radio':
process_radio.terminate():
...
...
There is no way to stop a Python thread "from the outside." If the thread goes into a wait state (e.g. not running because it's waiting for radio.recv() to complete) there's nothing you can do.
Inside a single process the threads are autonomous, and the best you can do it so set a flag for the thread to action (by terminating) when it examines it.
As you have already discovered, it appears, you can terminate a subprocess, but you then have the issue of how the processes communicate with each other.
Your code and the test with it don't really give enough information (there appear to be several NRF24 implementations in Python) to debug the issues you report.

Python create a subprocess and do not wait

I would like to run a series of commands (which take a long time). But I do not want to wait for the completion of each command. How can I go about this in Python?
I looked at
os.fork()
and
subprocess.popen()
Don't think that is what I need.
Code
def command1():
wait(10)
def command2():
wait(10)
def command3():
wait(10)
I would like to call
command1()
command2()
command3()
Without having to wait.
Use python's multiprocessing module.
def func(arg1):
... do something ...
from multiprocessing import Process
p = Process(target=func, args=(arg1,), name='func')
p.start()
Complete Documentaion is over here: https://docs.python.org/2/library/multiprocessing.html
EDIT:
You can also use the Threading module of python if you are using jpython/cpython distribution as you can overcome the GIL (Global Interpreter Lock) in these distributions.
https://docs.python.org/2/library/threading.html
This example maybe is suitable for you:
#!/usr/bin/env python3
import sys
import os
import time
def forked(fork_func):
def do_fork():
pid = os.fork()
if (pid > 0):
fork_func()
exit(0)
else:
return pid
return do_fork
#forked
def command1():
time.sleep(2)
#forked
def command2():
time.sleep(1)
command1()
command2()
print("Hello")
You just use decorator #forked for your functions.
There is only one problem: when main program is over, it waits for end of child processes.
The most straightforward way is to use Python's own multiprocessing:
from multiprocessing import Process
def command1():
wait(10)
...
call1 = Process(target=command1, args=(...))
call1.start()
...
This module was introduced back exactly to ease the burden on controlling external process execution of functions accessible in the same code-base Of course, that could already be done by using os.fork, subprocess. Multiprocessing emulates as far as possible, Python's own threading moudle interface. The one immediate advantage of using multiprocessing over threading is that this enables the various worker processes to make use of different CPU cores, actually working in parallel - while threading, effectively, due to language design limitations is actually limited to a single execution worker at once, thus making use of a single core even when several are available.
Now, note that there are still peculiarities - specially if you are, for example, calling these from inside a web-request. Check this question an answers form a few days ago:
Stop a background process in flask without creating zombie processes

python multiprocessing logging: QueueHandler with RotatingFileHandler "file being used by another process" error

I'm converting a program to multiprocessing and need to be able to log to a single rotating log from the main process as well as subprocesses. I'm trying to use the 2nd example in the python cookbook Logging to a single file from multiple processes, which starts a logger_thread running as part of the main process, picking up log messages off a queue that the subprocesses add to. The example works well as is, and also works if I switch to a RotatingFileHandler.
However if I change it to start logger_thread before the subprocesses (so that I can log from the main process as well), then as soon as the log rotates, all subsequent logging generates a traceback with WindowsError: [Error 32] The process cannot access the file because it is being used by another process.
In other words I change this code from the 2nd example
workers = []
for i in range(5):
wp = Process(target=worker_process, name='worker %d' % (i + 1), args=(q,))
workers.append(wp)
wp.start()
logging.config.dictConfig(d)
lp = threading.Thread(target=logger_thread, args=(q,))
lp.start()
to this:
logging.config.dictConfig(d)
lp = threading.Thread(target=logger_thread, args=(q,))
lp.start()
workers = []
for i in range(5):
wp = Process(target=worker_process, name='worker %d' % (i + 1), args=(q,))
workers.append(wp)
wp.start()
and swap out logging.FileHandler for logging.handlers.RotatingFileHandler (with a very small maxBytes for testing) and then I hit this error.
I'm using Windows and python 2.7. QueueHandler is not part of stdlib til python 3.2 but I've copied the source code from Gist, which it says is safe to do.
I don't understand why starting the listener first would make any difference, nor do I understand why any process other than main would be attempting to access the file.
You should never start any threads before subprocesses. When Python forks, the threads and IPC state will not always be copied properly.
There are several resources on this, just google for fork and threads. Some people claim they can do it, but it's not clear to me that it can ever work properly.
Just start all your processes first.
Example additional information:
Status of mixing multiprocessing and threading in Python
https://stackoverflow.com/a/6079669/4279
In your case, it might be that the copied open file handle is the problem, but you still should start your subprocesses before your threads (and before you open any files that you will later want to destroy).
Some rules of thumb, summarized by fantabolous from the comments:
Subprocesses must always be started before any threads created by the same process.
multiprocessing.Pool creates both subprocesses AND threads, so one mustn't create additional Processes or Pools after the first one.
Files should not already be open at the time a Process or Pool is created. (This is OK in some cases, but not, e.g. if a file will be deleted later.)
Subprocesses can create their own threads and processes, with the same rules above applying.
Starting all processes first is the easiest way to do this
So, you can simply make your own file log handler. I have yet to see logs getting garbled from multiprocessing, so it seems file log rotation is the big issue. Just do this in your main, and you don't have to change any of the rest of your logging
import logging
import logging.handlers
from multiprocessing import RLock
class MultiprocessRotatingFileHandler(logging.handlers.RotatingFileHandler):
def __init__(self, *kargs, **kwargs):
super(MultiprocessRotatingFileHandler, self).__init__(*kargs, **kwargs)
self.lock = RLock()
def shouldRollover(self, record):
with self.lock:
super(MultiprocessRotatingFileHandler, self).shouldRollover(record)
file_log_path = os.path.join('var','log', os.path.basename(__file__) + '.log')
file_log = MultiprocessRotatingFileHandler(file_log_path,
maxBytes=8*1000*1024,
backupCount=5,
delay=True)
logging.basicConfig(level=logging.DEBUG)
logging.addHandler(file_log)
I'm willing to guess that locking every time you try to rotate is probably slowing down logging, but then this is a case where we need to sacrifice performance for correctness.

Programmatically exiting python script while multithreading

I have some code which runs routinely, and every now and then (like once a month) the program seems to hang somewhere and I'm not sure where.
I thought I would implement [what has turned out to be not quite] a "quick fix" of checking how long the program has been running for. I decided to use multithreading to call the function, and then while it is running, check the time.
For example:
import datetime
import threading
def myfunc():
#Code goes here
t=threading.Thread(target=myfunc)
t.start()
d1=datetime.datetime.utcnow()
while threading.active_count()>1:
if (datetime.datetime.utcnow()-d1).total_seconds()>60:
print 'Exiting!'
raise SystemExit(0)
However, this does not close the other thread (myfunc).
What is the best way to go about killing the other thread?
The docs could be clearer about this. Raising SystemExit tells the interpreter to quit, but "normal" exit processing is still done. Part of normal exit processing is .join()-ing all active non-daemon threads. But your rogue thread never ends, so exit processing waits forever to join it.
As #roippi said, you can do
t.daemon = True
before starting it. Normal exit processing does not wait for daemon threads. Your OS should kill them then when the main process exits.
Another alternative:
import os
os._exit(13) # whatever exit code you want goes there
That stops the interpreter "immediately", and skips all normal exit processing.
Pick your poison ;-)
There is no way to kill a thread. You must kill the target from within the target. The best way is with a hook and a queue. It goes something like this.
import Threading
from Queue import Queue
# add a kill_hook arg to your function, kill_hook
# is a queue used to pass messages to the main thread
def myfunc(*args, **kwargs, kill_hook=None):
#Code goes here
# put this somewhere which is periodically checked.
# an ideal place to check the hook is when logging
try:
if q.get_nowait(): # or use q.get(True, 5) to wait a longer
print 'Exiting!'
raise SystemExit(0)
except Queue.empty:
pass
q = Queue() # the queue used to pass the kill call
t=threading.Thread(target=myfunc, args = q)
t.start()
d1=datetime.datetime.utcnow()
while threading.active_count()>1:
if (datetime.datetime.utcnow()-d1).total_seconds()>60:
# if your kill criteria are met, put something in the queue
q.put(1)
I originally found this answer somewhere online, possibly this. Hope this helps!
Another solution would be to use a separate instance of Python, and monitor the other Python thread, killing it from the system level, with psutils.
Wow, I like the daemon and stealth os._exit solutions too!

Using the Queue class in Python 2.6

Let's assume I'm stuck using Python 2.6, and can't upgrade (even if that would help). I've written a program that uses the Queue class. My producer is a simple directory listing. My consumer threads pull a file from the queue, and do stuff with it. If the file has already been processed, I skip it. The processed list is generated before all of the threads are started, so it isn't empty.
Here's some pseudo-code.
import Queue, sys, threading
processed = []
def consumer():
while True:
file = dirlist.get(block=True)
if file in processed:
print "Ignoring %s" % file
else:
# do stuff here
dirlist.task_done()
dirlist = Queue.Queue()
for f in os.listdir("/some/dir"):
dirlist.put(f)
max_threads = 8
for i in range(max_threads):
thr = Thread(target=consumer)
thr.start()
dirlist.join()
The strange behavior I'm getting is that if a thread encounters a file that's already been processed, the thread stalls out and waits until the entire program ends. I've done a little bit of testing, and the first 7 threads (assuming 8 is the max) stop, while the 8th thread keeps processing, one file at a time. But, by doing that, I'm losing the entire reason for threading the application.
Am I doing something wrong, or is this the expected behavior of the Queue/threading classes in Python 2.6?
I tried running your code, and did not see the behavior you describe. However, the program never exits. I recommend changing the .get() call as follows:
try:
file = dirlist.get(True, 1)
except Queue.Empty:
return
If you want to know which thread is currently executing, you can import the thread module and print thread.get_ident().
I added the following line after the .get():
print file, thread.get_ident()
and got the following output:
bin 7116328
cygdrive 7116328
cygwin.bat 7149424
cygwin.ico 7116328
dev etc7598568
7149424
fix 7331000
home 7116328lib
7598568sbin
7149424Thumbs.db
7331000
tmp 7107008
usr 7116328
var 7598568proc
7441800
The output is messy because the threads are writing to stdout at the same time. The variety of thread identifiers further confirms that all of the threads are running.
Perhaps something is wrong in the real code or your test methodology, but not in the code you posted?
Since this problem only manifests itself when finding a file that's already been processed, it seems like this is something to do with the processed list itself. Have you tried implementing a simple lock? For example:
processed = []
processed_lock = threading.Lock()
def consumer():
while True:
with processed_lock.acquire():
fileInList = file in processed
if fileInList:
# ... et cetera
Threading tends to cause the strangest bugs, even if they seem like they "shouldn't" happen. Using locks on shared variables is the first step to make sure you don't end up with some kind of race condition that could cause threads to deadlock.
Of course, if what you're doing under # do stuff here is CPU-intensive, then Python will only run code from one thread at a time anyway, due to the Global Interpreter Lock. In that case, you may want to switch to the multiprocessing module - it's very similar to threading, though you will need to replace shared variables with another solution (see here for details).

Categories