I have a multi-threading program and have faced an interesting phenomenon recently.
If I call the print method in the worker of a thread, the program turns very reactive. There's no big trick, just calling the print method resolves everything.
I have recently read an article about Python's Global Interpreter Lock aka GIL and it was saying the GIL is released once an I/O bound stuff is executed. Do you think the print call is also an I/O bound?
I would really like to make my program reactive but it is obviously awkward to dump data on the stdout while it's running. So I tried to redirect the output to /dev/null but it didn't resolve the issue:
with contextlib.redirect_stdout(None):
print('')
I would appreciate if you have an idea so that I can reproduce the same effect with the following call but without dumping anything:
print('')
As far as I see the phenomenon, the GIL is released while the interpreter is working for print(''). Maybe I need such a short break which releases me from the GIL.
This is just for your information but I have tried to call the following method:
print('', end='', flush=True)
Of course, it didn't dump anything but my program turned a bit jaggy and it looked to be the thread had occupied the execution time so other threads were running very infrequently.
Update
If I call usleep(1) of QThread expecting to let it sleep for 1 us, then it waits much more than I specified. For example. the thread worker runs every 1 ms which is very slow because I was expecting to run it in the microsecond order. Calling print('') makes the thread running in a few microseconds order. In this meaning, I call it reactive.
Update
I feel something is drugging the execution time of the thread but it's not usleep or time.sleep(). However, I've faced a fact that print can kick the blocker away. So I would like to know what is actually kicking away the blocker.
So there are two things happening here. First, for the GIL itself, most of the I/O functions will release it just before calling into platform code, so a print call will definitely release it. This will naturally let the runtime schedule another thread.
Second, for usleep, this function is guaranteed to sleep at least as many microseconds as you ask for, but is pretty much not going to sleep less than the duration of the OS scheduler tick. On Linux this is often running at 1,000 Hz, 250 Hz, or 100 Hz, but can vary quite a bit.
Now, if you want something more granular than that, there's the nanosleep call which will "busy-wait" for delays shorter than 2 ms instead of calling into the kernel.
Related
This behavior seems really odd to me. I'm running a main pygame loop in which I process the fastevent queue, and I have a separate thread running that actually runs the game. The odd thing is that if I add a short sleep statement within my main loop, the game thread executes much faster. Here's the code for the main loop:
exited = False
while not exited:
if launcher.is_game_running:
launcher.game.grm.board.update_turn_timer()
# Run the event loop so pygame doesnt crash. Note: This loop CANNOT be
# allowed to hang. It must execute quickly. If any on_click function
# is going to stall and wait for user input, it had better process the
# fastevent queue iteslf.
# TODO(?) I don't know why, but having this sleep in here *speeds up*
# the game execution by a SIGNIFICANT factor. Like 10x. As far
# as I can tell, the value in the sleep can be anything small.
time.sleep(0.001)
for event in pygame.fastevent.get():
if event.type == pygame.QUIT:
exited = True
break
# Handle clicks, mouse movement, keyboard, etc
launcher.handle_event(event)
if len(launcher.delayed_on_click_effects) > 0:
launcher.delayed_on_click_effects.popleft()()
I'm really at a loss here, I don't see how adding that sleep could possibly speed up the execution of the other thread. Any ideas? I know this code snippet isn't enough to know what's going on in the other thread and such. I would post more code, but I have so little idea about what's going on here that I don't know which parts of my codebase are actually relevant. Can post more if anyone has suggestions.
I wasn't planning on worrying about this too much, but now a new change I've introduced is slowing my runtime back down again. Without knowing what's actually going on, it's hard to figure out how to get the runtime back where it was.
Thanks Thomas - I had no idea the GIL was even a thing, but yes, it looks like my issue is that certain threads are CPU-intensive and are not releasing the GIL frequently enough for the other threads.
I had noticed that I could replace the time.sleep(0.001) in my main loop with a print statement, and I would get the same speedup effect on the other thread. This makes sense if what that sleep is doing is releasing the GIL, because prints also release the GIL.
The "new change I've introduced" that I mentioned in the post was adding more threads (which handled message passing between the game client and a server). So my suspicion is that one of these new threads is CPU-intensive and is not releasing the GIL, thus partially starving the game thread.
To try to debug this, I added a bunch of print statements wherever I was creating new threads just to make sure I understood how many I had. And it turns out, these print statements fixed the runtime issues. So apparently, one of the places where I just added a print statement was within a thread that was hogging the GIL. The new print statement releases the GIL, allowing the game thread to run.
So my takeaways from this are:
The GIL exists (good to know)
Only one thread can actually execute at a time
If you want a thread to "wait and let other threads do things", then you should release the GIL with an i/o call (print, socket.recv, etc) or with time.sleep()
Threads should not "wait" by, eg, executing a while loop and checking for some condition to be true. This will hog the GIL and slow down other threads (unless you make sure to release the GIL each iteration of the loop with a sleep)
I have troubles with a simple multithreaded Python looping program. It should loop infinitely and stop with Ctrl+C. Here is an implementation using threading:
from threading import Thread, Event
from time import sleep
stop = Event()
def loop():
while not stop.is_set():
print("looping")
sleep(2)
try:
thread = Thread(target=loop)
thread.start()
thread.join()
except KeyboardInterrupt:
print("stopping")
stop.set()
This MWE is extracted from a more complex code (obviously, I do not need multithreading to create an infinite loop).
It works as expected on Linux, but not on Windows: the Ctrl+C event is not intercepted and the loop continues infinitely. According to the Python Dev mailing list, the different behaviors are due to the way Ctrl+C is handled by the two OSs.
So, it appears that one cannot simply rely on Ctrl+C with threading on Windows. My question is: what are the other ways to stop a multithreaded Python script on this OS with Ctrl+C?
As explained by Nathaniel J. Smith in the link from your question, at least as of CPython 3.7, Ctrl-C cannot wake your main thread on Windows:
The end result is that on Windows, control-C almost never works to
wake up a blocked Python process, with a few special exceptions where
someone did the work to implement this. On Python 2 the only functions
that have this implemented are time.sleep() and
multiprocessing.Semaphore.acquire; on Python 3 there are a few more
(you can grep the source for _PyOS_SigintEvent to find them), but
Thread.join isn't one of them.
So, what can you do?
One option is to just not use Ctrl-C to kill your program, and instead use something that calls, e.g., TerminateProcess, such as the builtin taskkill tool, or a Python script using the os module. But you don't want that.
And obviously, waiting until they come up with a fix in Python 3.8 or 3.9 or never before you can Ctrl-C your program is not acceptable.
So, the only thing you can do is not block the main thread on Thread.join, or anything else non-interruptable.
The quick&dirty solution is to just poll join with a timeout:
while thread.is_alive():
thread.join(0.2)
Now, your program is briefly interruptable while it's doing the while loop and calling is_alive, before going back to an uninterruptable sleep for another 200ms. Any Ctrl-C that comes in during that 200ms will just wait for you to process it, so that isn't a problem.
Except that 200ms is already long enough to be noticeable and maybe annoying.
And it may be too short as well as too long. Sure, it's not wasting much CPU to wake up every 200ms and execute a handful of Python bytecodes, but it's not nothing, and it's still getting a timeslice in the scheduler, and that may be enough to, e.g., keep a laptop from going into one of its long-term low-power modes.
The clean solution is to find another function to block on. As Nathaniel J. Smith says:
you can grep the source for _PyOS_SigintEvent to find them
But there may not be anything that fits very well. It's hard to imagine how you'd design your program to block on multiprocessing.Semaphore.acquire in a way that wouldn't be horribly confusing to the reader…
In that case, you might want to drag in the Win32 API directly, whether via PyWin32 or ctypes. Look at how functions like time.sleep and multiprocessing.Semaphore.acquire manage to be interruptible, block on whatever they're using, and have your thread signal whatever it is you're blocking on at exit.
If you're willing to use undocumented internals of CPython, it looks like, at least in 3.7, the hidden _winapi module has a wrapper function around WaitForMultipleObjects that appends the magic _PyOSSigintEvent for you when you're doing a wait-first rather than wait-all.
One of the things you can pass to WaitForMultipleObjects is a Win32 thread handle, which has the same effect as a join, although I'm not sure if there's an easy way to get the thread handle out of a Python thread.
Alternatively, you can manually create some kind of kernel sync object (I don't know the _winapi module very well, and I don't have a Windows system, so you'll probably have to read the source yourself, or at least help it in the interactive interpreter, to see what wrappers it offers), WaitForMultipleObjects on that, and have the thread signal it.
I am aware that this question is rather high-level and may be vague. Please ask if you need any more details and I will try to edit.
I am using QuickFix with Python bindings to consume high-throughput market data from circa 30 markets simultaneously. Most of computing the work is done in separate CPUs via the multiprocessing module. These parallel processes are spawned by the main process on startup. If I wish to interact with the market in any way via QuickFix, I have to do this within the main process, thus any commands (to enter orders, for example) which come from the child processes must be piped (via an mp.Queue object we will call Q) to the main process before execution.
This raises the problem of monitoring Q, which must be done within the main process. I cannot use Q.get(), since this method blocks and my entire main process will hang until something shows up in Q. In order to decrease latency, I must check Q frequently, on the order of 50 times per second. I have been using the apscheduler to do this, but I keep getting Warning errors stating that the runtime was missed. These errors are a serious issue because they prevent me from easily viewing important information.
I have therefore refactored my application to use the code posted by MestreLion as an answer to this question. This is working for me because it starts a new thread from the main process, and it does not print error messages. However, I am worried that this will cause nasty problems down the road.
I am aware of the Global Interpreter Lock in python (this is why I used the multiprocessing module to begin with), but I don't really understand it. Owing to the high-frequency nature of my application, I do not know if the Q monitoring thread and the main process consuming lots of incoming messages will compete for resources and slow each other down.
My questions:
Am I likely to run into trouble in this scenario?
If not, can I add more monitoring threads using the present approach and still be okay? There are at least two other things I would like to monitor at high frequency.
Thanks.
#MestreLion's solution that you've linked creates 50 threads per second in your case.
All you need is a single thread to consume the queue without blocking the rest of the main process:
import threading
def consume(queue, sentinel=None):
for item in iter(queue.get, sentinel):
pass_to_quickfix(item)
threading.Thread(target=consume, args=[queue], daemon=True).start()
GIL may or may not matter for performance in this case. Measure it.
Without knowing your scenario, it's difficult to say anything specific. Your question suggests, that the threads are waiting most of the time via get, so GIL is not a problem. Interprocess communication may result in problems much earlier. There you can think of switching to another protocol, using some kind of TCP-sockets. Then you can write the scheduler more efficient with select instead of threads, as threads are also slow and resource consuming. select is a system function, that allows to monitor many socket-connection at once, therefore it scales incredibly efficient with the amount of connections and needs nearly no CPU-power for monitoring.
Hi while i'm programming i had to do a choice :
while not cont_flag :
pass
and using a Event object :
if not const_flag.is_set() :
const_flag.wait()
i want to know if there is a difference in performance between the two methode
There is. The first method is called busy waiting and is very different from blocking. In busy waiting, the CPU is being used constantly as the while loop is executed. In blocking, the thread is actually suspended until a wake-up condition is met.
See also this discussion:
What is the difference between busy-wait and polling?
The first one is referred to as busy waiting, it will eat up 100% of the CPU time while waiting. It's a way better practice to have some signalling mechanism to communicate events (e.g. something's done).
Python only allows a single thread to execute at a time, regardless of how many cpus your system may have. If multiple threads are ready to run, python will switch among them periodically. If you "busy wait" as in your first example, that while loop will eat up much of the time that your other threads could use for their work. While the second solution is far superior, if you end up using the first one, add a modest sleep to it.
while not cont_flag:
time.sleep(.1)
I have a script which runs quite a lot of concurrent threads (at least 200). Every thread does some quite complex evaluations, which can take unpredictably lot of time. The evaluation method is implemented in C and I can't change it. I want to limit the method execution time for every thread. Please advise.
From what I understand of your problem, it might be a good case for using multiprocessing instead of multithreading. Multiprocessing will allow you to make use of all the available resources on the system - and then some, if you're not careful.
Threads don't actually run in parallel, so unless you're doing a lot of waiting for I/O or something like that, it would make more sense to call it from a separate process. You could use the Python multiprocessing library to call it from a Python script, or you could use a wrapper written in C and use some form of interprocess communication. The second option will avoid the overhead of launching another Python instance just to run some C code.
You could call time.sleep (or perform other tasks and check the system clock for elapsed time), and then check for results after the desired interval, permitting any processes that haven't finished to continue running while you make use of the results. Or, if you don't care at that point, you can send a signal to kill the process.