WX.Python and multiprocessing - python

I have a wx.python application that takes some files and processes them when a button is clicked. I need to process them in parallel.
I use this code inside the bound button function:
my_pool = multiprocessing.Pool(POOLSIZE)
results=[digest_pool.apply_async(self.fun, [args]) for file in list_files() ]
my_pool.close()
my_pool.join()
for result in results :
print result.get()
But it seems this code is not run at all, even if I print something on fun. I didn't get any result and my GUI application got stuck. Could someone help? What is the problem here and how can I solve it using the pool multiprocessing module inside my wx frame class?

It looks like you're running up against a pretty common problem encountered by people attempting to use threading with GUI toolkits. The core of the issue is that you must never block the main GUI thread in your code. The graphical toolkit needs to be able to constantly respond to events. When you do the my_pool.join() call, you're putting the main thread to sleep and the result is that your entire process will appear to lock up.
I'm not particularly familiar with wxWidgets but I'm sure there are a few patterns out there for how to use threads with it. It's easy to spin off background threads from the GUI thread but getting the results back is usually the trick. You'll need some sort of asynchronous "work done" event that you can send to the main GUI thread when the background operation completes. Exactly how that's done differs from toolkit to toolkit. I'm not sure what the mechanism is for wxWidgets but I'd expect a quick google search would turn up an answer (or perhaps a kind commenter will provide a link ;-)

Related

How Is tkinter not affected by threading when tkinter is always occupying the CPU?

Update the title to clarify my question:
I was recently told that Python cannot truly thread and I find this confusing. Something to do with IO and CPU. Take for example I run multiple "Threads" to run multiple queries against an SQL Server all at the same time as to not take forever running them one after another. This is also done as to not block Tkinters mainloop as Tkinter is a single threaded library. At this point I feel like everything in python is single threaded...
So when bringing up this question I was told that Each thread can only occupy the CPU at once and the SQL server is handling the load. I understand that to a point but if this is the case then how is it that Tkinters mainloop is not blocked if the CPU can only work on one thread at a time then any time my other threads are doing stuff the mainloop should be blocked by this logic.
I know that cannot be true as the mainloop is running just fine and my GUI is not frozen so something else I am not understanding must be going on.
So my question is this:
How is this threading not blocking the tkinter mainloop if it is not parallel processing?
How exactly does Python "Thread" and what is the difference between IO and CPU in terms of threading?
This post has some information on my 2nd question. Still not what I want to know about the mainloop though.

How to create cancellable tasks in Python?

I'm building a Python IDE, which needs to highlight all occurrences of the name under cursor (using Jedi library). The process of finding the occurrences can be quite slow.
In order to avoid freezing the GUI, I could run the search in another thread, but when the user moves quickly over several words, the background threads could pile up while working on now obsolete tasks. I would like to cancel the search for previous occurrences when user moves to new name.
Looks like killing a thread is complicated in Python. What are the other options for creating an easily cancellable background tasks in Python 3.4+?
I think concurrent.futures is the answer.
You can create a Thread / Process pool, submit any callable, receive a Future, which you can cancel if needed.
Reference: https://docs.python.org/3/library/concurrent.futures.html
A thread cannot be stopped by another one. This is a OS limitation rather than a Python one. Only thing you can do is periodically inspect a variable and, if set, stop the thread itself (just return).
Moreover, threads in Python suffer from the GIL. This means that CPU intensive operations, when carried out in a separate thread, will still affect your main loop as only one thread per process can run at a time.
I'd recommend you to run the search in a separate process which you can easily cancel whenever you want.
What the guys of YouCompleteMe are doing for example is wrapping Jedi in a HTTP server which they can query in the background. If the user moves the cursor before the completion comes back, the IDE can simply drop the request.
Well, my personal favorites are work queues. If it's a one-time application you should take a look at python rq. Extremely easy and fun to use. If you want to build something more "professional-grade" take a look at something like celery.
You might also want to look at multiprocessing

Checking threadsafe-ness in python?

This is my first foray into threading, so apologies for any obvious mistakes.
I have a PyQt widget, from which a new process, prog, is run in a different thread. In my main thread, I'm also redirecting stdout to a read-only QTextEdit. However, I get errors referring to recursion, and I'm worried that my threads are interfering in each other in a way which causes a print statement to go into an infinite loop. I only get these errors if I run prog from the GUI, and not from the command line. My stdout redirect is using the code in this SO answer
In pseudo-code, this is basically what I've got:
gui.py
class widget(QWidget):
def __init__(self):
self.button = QPushButton("GO!", self)
self.button.clicked.connect(self.start)
def start(self):
self.thread = TaskThread()
sys.stdout = EmittingStream(textWritten = self.outputText)
self.thread.start()
def outputText(self):
#as in answer provided in link (EmittingStream in separate module)
prog.py
class TaskThread(QThread):
def run(self):
'''
Long complicated program; putting in simpler code here (e.g. loop printing to 10000) doesn't reproduce errors
'''
Is there any way of finding out if my recursion is caused by an infinite loop, or by anything else?
Is my code obviously thread-unsafe?
How do you make functions guaranteed to be threadsafe? (Links to tutorials / books will be good!)
This is tricky, but I think that your code is thread-unsafe. Specifically, looking at other stackoverflow answers (here and here) it appears that you should not be accessing a Qt GUI object from another thread than the one it was created in (even a QThread).
Since any call to print in your code now accesses a Qt GUI object, it seems this is very thread unsafe.
My suggestion to make it safe would be to:
Have a QThread, inside of which you have instantiated the output box (it is thread safe to access a Qt GUI object from the thread it was created in, and it does not have to be created in the main thread) Qt GUI objects are not reentrant and must be created and used in the main thread only, see here As such you will need a QThread to post events back to the main thread (using the Qt signals/slots mechanism which can be thread safe when done correctly)
Have this QThread blocking on reading from a Python Queue. When it gets something from the queue, it places it in the Qt text box posts it back to the main thread, and the main thread will update the output box.
Modify your EmmittingStream to place things in the Queue, rather than directly into the Qt output box.
I can see you have located where the error is. But without the code there is not much that I can tell.
Filling your needs of direction, I'll point to you to Python profiles. Since it looks like you need some python profiling tools.
http://docs.python.org/2/library/profile.html
and a answer about the subject
How can you profile a Python script?
In a Qt application you must use one single thread to handle the all the gui part. You can use other threads for computations, but not for user interface.
Just post messages about updates in a queue in the worker threads and use the main thread to pick up those messages and update the GUI.

time.sleep that allows parent application to still evaluate?

I've run into situations as of late when writing scripts for both Maya and Houdini where I need to wait for aspects of the GUI to update before I can call the rest of my Python code. I was thinking calling time.sleep in both situations would have fixed my problem, but it seems that time.sleep just holds up the parent application as well. This means my script evaluates the exact same regardless of whether or not the sleep is in there, it just pauses part way through.
I have a thought to run my script in a separate thread in Python to see if that will free up the application to still run during the sleep, but I haven't had time to test this yet.
Thought I would ask in the meantime if anybody knows of some other solution to this scenario.
Maya - or more precisely Maya Python - is not really multithreaded (Python itself has a dodgy kind of multithreading because all threads fight for the dread global interpreter lock, but that's not your problem here). You can run threaded code just fine in Maya using the threading module; try:
import time
import threading
def test():
for n in range (0, 10):
print "hello"
time.sleep(1)
t = threading.Thread(target = test)
t.start()
That will print 'hello' to your listener 10 times at one second intervals without shutting down interactivity.
Unfortunately, many parts of maya - including most notably ALL user created UI and most kinds of scene manipulation - can only be run from the "main" thread - the one that owns the maya UI. So, you could not do a script to change the contents of a text box in a window using the technique above (to make it worse, you'll get misleading error messages - code that works when you run it from the listener but errors when you call it from the thread and politely returns completely wrong error codes). You can do things like network communication, writing to a file, or long calculations in a separate thread no problem - but UI work and many common scene tasks will fail if you try to do them from a thread.
Maya has a partial workaround for this in the maya.utils module. You can use the functions executeDeferred and executeInMainThreadWithResult. These will wait for an idle time to run (which means, for example, that they won't run if you're playing back an animation) and then fire as if you'd done them in the main thread. The example from the maya docs give the idea:
import maya.utils import maya.cmds
def doSphere( radius ):
maya.cmds.sphere( radius=radius )
maya.utils.executeInMainThreadWithResult( doSphere, 5.0 )
This gets you most of what you want but you need to think carefully about how to break up your task into threading-friendly chunks. And, of course, running threaded programs is always harder than the single-threaded alternative, you need to design the code so that things wont break if another thread messes with a variable while you're working. Good parallel programming is a whole big kettle of fish, although boils down to a couple of basic ideas:
1) establish exclusive control over objects (for short operations) using RLocks when needed
2) put shared data into safe containers, like Queue in #dylan's example
3) be really clear about what objects are shareable (they should be few!) and which aren't
Here's decent (long) overview.
As for Houdini, i don't know for sure but this article makes it sound like similar issues arise there.
A better solution, rather than sleep, is a while loop. Set up a while loop to check a shared value (or even a thread-safe structure like a Queue). The parent processes that your waiting on can do their work (or children, it's not important who spawns what) and when they finish their work, they send a true/false/0/1/whatever to the Queue/variable letting the other processes know that they may continue.

How to implement pause (and more) functionality?

My apologies beforehand for the length of the question, I didn't want to leave anything out.
Some background information
I'm trying to automate a data entry process by writing a Python application that uses the Windows API to simulate keystrokes, mouse movement and window/control manipulation. I have to resort to this method because I do not (yet) have the security clearance required to access the datastore/database directly (e.g. using SQL) or indirectly through a better suited API. Bureaucracy, it's a pain ;-)
The data entry process involves the correction of sales orders due to changes in article availability. The unavailable articles are either removed from the order or replaced by another suitable article.
Initially I want a human to be able to monitor the automatic data entry process to make sure everything goes right. To achieve this I slow down the actions on the one hand but also inform the user of what is currently going on through a pinned window.
The actual question
To allow the user to halt the automation process I'm registering the Pause/Break key as a hotkey and in the handler I want to pause the automation functionality. However, I'm currently struggling to figure out a way to properly pause the execution of the automation functionality. When the pause function is invoked I want the automation process to stop dead in its tracks, no matter what it is doing. I don't want it to even execute another keystroke.
UPDATE [23/01]: I actually want to do more than just pause, I want to be able to communicate with the automation process while it is running and request it to pause, skip the current sales order, give up completely and perhaps even more.
Can anybody show me The Right Way (TM) to achieve what I want?
Some more information
Here's an example of how the automation works (I'm using the pywinauto library):
from pywinauto import application
app = application.Application()
app.start_("notepad")
app.Notepad.TypeKeys("abcdef")
UPDATE [25/01]: After a few days of working on my application I've noticed I don't really use pywinauto that much, right now I'm only using it for finding window and then I directly use SendKeysCtypes.SendKeys to simulate keyboard input and win32api functions to simulate mouse input.
What I've found out so far
Here are a few methods I've come across so far in my search for an answer:
I could separate the automation functionality and the interface + hotkey listener in two separate processes. Let's refer to the former as "automator" and the latter as "manager". The manager can then pause the execution of the automator by sending the process a SIGSTOP signal and unpause it using the SIGCONT signal (or the Windows equivalents through SuspendThread/ResumeThread).
To be able to update the user interface the automator will need to inform the manager of its progression through some sort of an IPC mechanism.
Cons:
Would using SIGSTOP not be a little harsh? Would it even work properly? Lots of people seem to be advising against it and even calling it "dangerous".
I am worried that implementing the IPC mechanism is going to be a bit complicated. On the other hand, I have worked with DBus which wouldn't be too hard to implement.
The second method and one that lots of people seem to be suggesting involves using threads and essentially boils down to the following (simplified):
while True:
if self.pause: # pause
# Do the work...
However, doing it this way it seems it will only pause after there is no more work to do. The only way I see this method would work would be to divide the work (the entire automation process) into smaller work segments (i.e. tasks). Before starting on a new task the worker thread would check if it should pause and wait.
Cons:
Seems like an implementation to divide the work into smaller segments, such as the one above, would be very ugly code wise (aesthetically).
The way I imagine it, all statements would be transformed to look something like: queue.put((function, args)) (e.g. queue.put((app.Notepad.TypeKeys, "abcdef"))) and you'd have the automating process thread running through the tasks and continuously checking for the pause state before starting a task. That just can't be right...
The program would not actually stop dead in its tracks, but would first finish a task (however small) before actually pausing.
Progress made
UPDATE [23/01]: I've implemented a version of my application using the first method through the mentioned SuspendThread/ResumeThread functionality. So far this seems to work very nicely and also allows me to write the automation stuff just like you'd write any other script. The only quirk I've come across is that keyboard modifiers (CTRL, ALT, SHIFT) get "stuck" while paused. Something I can probably easily work around.
I've also written a test using the second method (threads and signals/message passing) and implemented the pause functionality. However, it looks really ugly (both checking for the pause flag and everything related to the "doing the work"). So if anybody can show me a proper example of something similar to the second method I'd appreciate it.
Related questions
Pausing a process?
Pausing a thread using threading class
Alex Martelli posted an answer saying:
There is no method for other threads to forcibly pause a thread (any more than there is for other threads to kill that thread) -- the target thread must cooperate by occasionally checking appropriate "flags" (a threading.Condition might be appropriate for the pause/unpause case).
He then referred to the multiprocessing module and SIGSTOP/SIGCONT.
Is there a way to indefinitely pause a thread?
Pausing a process in Windows
An answer to this question quotes the MSDN documentation regarding SuspendThread:
This function is primarily designed for use by debuggers. It is not intended to be used for thread synchronization. Calling SuspendThread on a thread that owns a synchronization object, such as a mutex or critical section, can lead to a deadlock if the calling thread tries to obtain a synchronization object owned by a suspended thread. To avoid this situation, a thread within an application that is not a debugger should signal the other thread to suspend itself. The target thread must be designed to watch for this signal and respond appropriately.
Is there any way to kill a Thread in Python?
How do I pass an exception between threads in python
Keep in mind that although in your level of abstraction, "executing a keystroke" is a single atomic operation, it's implemented on the machine as a rather complicated sequence of machine instructions. So, pausing a thread at arbitrary points could lead to things being in an indeterminate state. Sending SIGSTOP is the same level of dangerous as pausing a thread at an arbitrary point. Depending on where you are in a particular step, though, your automation could potentially be broken. For example, if you pause in the middle of a timing-dependent step.
It seems to me that this problem would be best solved at the level of the automation library. I'm not very familiar with the automation library that you're using. It might be worth contacting the developers of the library to see if they have any suggestions for pausing the execution of automation steps at safe sub-step levels.
I don't know pywinauto. But I'll assume that you have something like an Application class which you obtain and have methods like SendKeys/SendMouseEvent/etc to do things.
Create your own MyApplication class which holds a reference to pywinauto's application class. Provide the same methods but before each method check whether a pause event has occurred. If it has, you can jump into code which handles the pause event. That way you are checking for a pause every time you cause an event, but this all is handled by the one class without putting pause all over your code.
Once you've detected the pause you can handle it any way you like. For example, you can throw an exception to force giving up on the current task.
Separating the functionality and the interface thread/process is definately the best option imho, the second solution is quicker and easier but definately not better.
Perhaps using multiple threads and an exception would be a better idea than using multiple processes. But if you're using multiple processes than SIGSTOP might be your only way to get it to work.
Is there anything against using 2 threads for this?
1 thread for actually executing
1 thread for reading the user input
I use Python but not pywinauto; for this sort of tasks I use AutoHotKey . One way to implement a simple pause in an AutoHotkey script may be using a "toggle" key like ScrollLock and testing the key state in the script. Also, the script can restore the key state after switching the internal pause setting on / off.

Categories