I am writing a Python application in the field of scientific computing. Currently, when the user works with the GUI and starts a new physics simulation, the interpreter immediately imports several necessary modules for this simulation, such as Traits and Mayavi. These modules are heavy and take too long to import, and the user has to wait ~10 seconds before he can continue, which is bad.
I thought of something that might remedy this. I'll describe it and perhaps someone else has already implemented it, if so please give me a link. If not I might do it myself.
What I want is a separate thread that will import modules asynchronously. It will probably be a subclass of threading.Thread.
Here's a usage example:
importer_thread = ImporterThread()
importer_thread.start()
# ...
importer_thread.import('Mayavi')
importer_thread.import('Traits')
# A thread-safe method that will put the module name
# into a queue which the thread in an inifine loop
# ...
# When the user actually needs the modules:
import Mayavi, Traits
# If they were already loaded by importer_thread, we're good.
# If not, we'll just have to wait as usual.
So do you know of anything like this? If not, do you have any suggestions about the design?
The problem with this is that the imports must still complete before they are usable. Depending on when they're first used, the application could still have to block for 10 seconds before it could start up anyway. Much more productive would be to profile the modules and figure out why they take so long to import.
Why not just do this when the app starts?
def background_imports():
import Traits
import Mayavi
thread = threading.Thread(target=background_imports)
thread.setDaemon(True)
thread.start()
The general idea is good, but the Python/GUI session might not be all that responsive while the background thread is importing away; unfortunately, import inherently and inevitably "locks up" Python substantially (it's not just the GIL, there's specific extra locking for imports).
Still worth trying, as it might make things a bit better -- it's also very easy, since Queues are intrinsically thread-safe and, besides a Queue's put and get, all you need is basically an __import__. Still, don't be surprised if this doesn't help enough and you still need extra oomph.
If you have some drive that's intrinsically very fast, but with limited space, such as a "RAM drive" or a particularly snippy solid-state one, it may be worth keeping the needed packages in a .tar.bz2 (or other form of archive) and unpacking it onto the fast drive at program start (that's essentially just I/O and so it won't lock things up badly -- I/O operations rapidly release the GIL -- and also it's especially easy to delegate to a subprocess running tar xjf or the like).
If some of the import slowness is due to a huge number of .py/.pyc/.pyo files, it's worth a try to keep those (in .pyc form only, not as .py) in a zipfile and importing from there (but that only helps with the I/O overhead, depending on your OS, filesystem, and drive: doesn't help with delays due to loading huge DLLs or executing initialization code in packages at load time, which I suspect are likelier culprits for the slowness).
You could also consider splitting the application up with multiprocessing -- again using Queues (but of the multiprocessing kind) to communicate -- so that both imports and some heavy computations are delegated to a few auxiliary processes and thus made asynchronous (this may also help fully exploiting multiple cores at once). I suspect this may unfortunately be hard to arrange properly for visualization tasks (such as those you're presumably doing with mayavi) but it might help if you also have some "pure heavy computation" packages and tasks.
"the user works with the GUI and starts a new physics simulation"
Not really clear. Does "works with the GUI" means double click? Double click what? Some wxWidgets GUI application? Or IDLE?
If so, what does "starts a new physics simulation" mean? Click a button somewhere else? A GUI button to bring up a panel where they write code? Or do they import a script they wrote off line?
Why is the import happening before the simulation starts? How long does a simulation take? What does the GUI show?
I suspect that there's a way to be much, much lazier in doing the big imports. But from the description, it's hard to determine if there's a point in time where the import doesn't matter as much to the user.
Threads don't help much. What helps is rethinking the UI experience.
Related
Does the presence of python GIL imply that in python multi threading the same operation is not so different from repeating it in a single thread?.
For example, If I need to upload two files, what is the advantage of doing them in two threads instead of uploading them one after another?.
I tried a big math operation in both ways. But they seem to take almost equal time to complete.
This seems to be unclear to me. Can someone help me on this?.
Thanks.
Python's threads get a slightly worse rap than they deserve. There are three (well, 2.5) cases where they actually get you benefits:
If non-Python code (e.g. a C library, the kernel, etc.) is running, other Python threads can continue executing. It's only pure Python code that can't run in two threads at once. So if you're doing disk or network I/O, threads can indeed buy you something, as most of the time is spent outside of Python itself.
The GIL is not actually part of Python, it's an implementation detail of CPython (the "reference" implementation that the core Python devs work on, and that you usually get if you just run "python" on your Linux box or something.
Jython, IronPython, and any other reimplementations of Python generally do not have a GIL, and multiple pure-Python threads can execute simultaneously.
The 0.5 case: Even if you're entirely pure-Python and see little or no performance benefit from threading, some problems are really convenient in terms of developer time and difficulty to solve with threads. This depends in part on the developer, too, of course.
It really depends on the library you're using. The GIL is meant to prevent Python objects and its internal data structures to be changed at the same time. If you're doing an upload, the library you use to do the actual upload might release the GIL while it's waiting for the actual HTTP request to complete (I would assume that is the case with the HTTP modules in the standard library, but I didn't check).
As a side note, if you really want to have things running in parallel, just use multiple processes. It will save you a lot of trouble and you'll end up with better code (more robust, more scalable, and most probably better structured).
It depends on the native code module that's executing. Native modules can release the GIL and then go off and do their own thing allowing another thread to lock the GIL. The GIL is normally held while code, both python and native, are operating on python objects. If you want more detail you'll probably need to go and read quite a bit about it. :)
See:
What is a global interpreter lock (GIL)? and Thread State and the Global Interpreter Lock
Multithreading is a concept where two are more tasks need be completed simultaneously, for example, I have word processor in this application there are N numbers of a parallel task have to work. Like listening to keyboard, formatting input text, sending a formatted text to display unit. In this context with sequential processing, it is time-consuming and one task has to wait till the next task completion. So we put these tasks in threads and simultaneously complete the task. Three threads are always up and waiting for the inputs to arrive, then take that input and produce the output simultaneously.
So multi-threading works faster if we have multi-core and processors. But in reality with single processors, threads will work one after the other, but we feel it's executing with greater speed, Actually, one instruction executes at a time and a processor can execute billions of instructions at a time. So the computer creates illusion that multi-task or thread working parallel. It just an illusion.
On Windows, Python (2)'s standard library routine subprocess.Popen allows you to specify arbitrary flags to CreateProcess, and you can access the process handle for the newly-created process from the object that Popen returns. However, the thread handle for the newly-created process's initial thread is closed by the library before Popen returns.
Now, I need to create a process suspended (CREATE_SUSPENDED in creation flags) so that I can manipulate it (specifically, attach it to a job object) before it has a chance to execute any code. However, that means I need the thread handle in order to release the process from suspension (using ResumeThread). The only way I can find, to recover the thread handle, is to use the "tool help" library to walk over all threads on the entire system (e.g. see this question and answer). This works, but I do not like it. Specifically, I am concerned that taking a snapshot of all the threads on the system every time I need to create a process will be too expensive. (The larger application is a test suite, using processes for isolation; it creates and destroys processes at a rate of tens to hundreds a second.)
So, the question is: is there a more efficient way to resume execution of a process that was suspended by CREATE_SUSPENDED, if all you have is the process handle, and the facilities of the Python 2 standard library (including ctypes, but not the winapi add-on)? Vista-and-higher techniques are acceptable, but XP compatibility is preferred.
I have found a faster approach; unfortunately it relies on an undocumented API, NtResumeProcess. This does exactly what it sounds like - takes a process handle and applies the equivalent of ResumeThread to every thread in the process. Python/ctypes code to use it looks something like
import ctypes
from ctypes.wintypes import HANDLE, LONG, ULONG
ntdll = ctypes.WinDLL("ntdll.dll")
RtlNtStatusToDosError = ntdll.RtlNtStatusToDosError
NtResumeProcess = ntdll.NtResumeProcess
def errcheck_ntstatus(status, *etc):
if status < 0: raise ctypes.WinError(RtlNtStatusToDosError(status))
return status
RtlNtStatusToDosError.argtypes = (LONG,)
RtlNtStatusToDosError.restype = ULONG
# RtlNtStatusToDosError cannot fail
NtResumeProcess.argtypes = (HANDLE,)
NtResumeProcess.restype = LONG
NtResumeProcess.errcheck = errcheck_ntstatus
def resume_subprocess(proc):
NtResumeProcess(int(proc._handle))
I measured approximately 20% less process setup overhead using this technique than using Toolhelp, on an otherwise-idle Windows 7 virtual machine. As expected given how Toolhelp works, the performance delta gets bigger the more threads exist on the system -- whether or not they have anything to do with the program in question.
Given the obvious general utility of NtResumeProcess and its counterpart NtSuspendProcess, I am left wondering why they have never been documented and given kernel32 wrappers. They are used by a handful of core system DLLs and EXEs all of which, AFAICT, are part of the Windows Error Reporting mechanism (faultrep.dll, werui.dll, werfault.exe, dwwin.exe, etc) and don't appear to re-expose the functionality under documented names. It seems unlikely that these functions would change their semantics without also changing their names, but a defensively-coded program should probably be prepared for them to disappear (falling back to toolhelp, I suppose).
I'm posting this here, because I found something that addresses this question. I'm looking into this myself and I believe that I've found the solution with this.
I can't give you an excerpt or a summary, because it's just too much and I found it just two hours ago. I'm posting this here for all the others who, like me, seek a way to "easily" spawn a proper child process in windows, but want to execute a cuckoo instead. ;)
The whole second chapter is of importance, but the specifics start at page 12.
http://lsd-pl.net/winasm.pdf
I hope that it helps others as much as it hopefully going to help me.
Edit:
I guess I can add more to it. From what I've gathered, does this document explain how to spawn a sleeping process which never gets executed. This way we have a properly set-up windows process running. Then it explains that by using the win32api functions VirtualAllocEx and WriteProcessMemory, we can easily allocate executable pages and inject machine code into the other process.
Then - the best part in my opinion - it's possible to change the registers of the process, allowing the programmer to change the instruction pointer to point at the cuckoo!
Amazing!
I'm having some trouble conceptualizing what the big deal is with greenlets. I understand how the ability to switch between running functions in the same process could open the door to a world of possibilities; but i haven't come across any examples of how they solve problems standard python techniques cannot (other than the nested-functions-in-generators problem--which, honestly..."meh").
Take this example from greenlet's main page that is basically a more complex way of doing this:
def test0():
print 12
print 56
print 34
I know it's just a superfluous example, but that seems to be the long and the short of what greenlets can do. Unless you are that much of a control-freak that you have to be the one who decides when, where, and how every line of code in your application is executed, how is test0 improved by using greenlets? Or take the GUI example (which is what interested me in greenlets in the first place); It's shouldn't hard to ponder a strategy that doesn't require the while loop in process_commands, no?
I've seen some of the cool things can be done with greenlets; but only in conjunction with some other dark sorcery implemented in another package (e.g., Stackless, gevent, etc.). Even with those, the greenlets aren't sufficient, requiring them to subclass.
My question:
What are some real-world examples of how one can one use greenlets, by themselves, to enhance the functionality of python? I suspect the answer lies in networking--which would probably be why i don't understand. But are there any others?
Note that your example has explicitly woven all the prints together into one function. In a real program, you don't just have two functions; you have some arbitrary number of functions, some of them even from third-party libraries you don't control, and rewriting all that code to interleave all the statements is not quite so simple.
GUIs are actually an excellent example: by letting the event loop (which is the way you handle commands in practice, btw) suspend itself when there are no events to read, your GUI can remain interactive on the same thread. If the event loop had to actually stop and wait for the user to press a key, your GUI would freeze, because nothing would be telling the OS to redraw the window.
Not that I'm a huge fan of gevent in particular; I'm placing my bets on the stdlib asyncio library. :) But it's all the same idea really: when you have some work to do that involves a lot of waiting, let other code run in the meantime.
Essentially any problem where you don't want to block the rest of application while waiting for something to "come back at you" (e.g. sleep, socket). Or in other words, any problem where event-driven development would make things easier.
Networking as you mentioned.
GUI.
Simulations/games where you might have 1000s of Actors and you want them somewhat to act independently.
Gluing synchronous with asynchronous libraries/frameworks.
My script accepts arbitrary-length and -content strings of Python code, then runs them inside exec() statements. If the time to run the arbitrary code passes over some predetermined limit, then the exec() statement needs to exit and a boolean flag needs to be set to indicate that a premature exit has occurred.
How can this be accomplished?
Additional information
These pieces of code will be running in parallel in numerous threads (or at least as parallel as you can get with the GIL).
If there is an alternative method in another language, I am willing to try it out.
I plan on cleaning the code to prevent access to anything that might accidentally damage my system (file and system access, import statements, nested calls to exec() or eval(), etc.).
Options I've considered
Since the exec() statements are running in threads, use a poison pill to kill the thread. Unfortunately, I've read that poison pills do not work for all cases.
Running the exec() statements inside processes, then using process.terminate() to kill everything. But I'm running on Windows and I've read that process creation can be expensive. It also complicates communication with the code that's managing all of this.
Allowing only pre-written functions inside the exec() statements and having those functions periodically check for an exit flag then perform clean-up as necessary. This is complicated, time-consuming, and there are too many corner-cases to consider; I am looking for a simpler solution.
I know this is a bit of an oddball question that deserves a "Why would you ever want to allow arbitrary code to run in an exec() statement?" type of response. I'm trying my hand at a bit of self-evolving code. This is my major stumbling block at the moment: if you allow your code to do almost anything, then it can potentially hang forever. How do you regain control and stop it when it does?
This isn't a very detailed answer, but its more than I wanted to put into a comment.
You may want to consider something like this other question for creating functions with timeouts, using multiprocessing as a start.
The problem with threads is that you probably can't use your poison pill approach, as they are not workers taking many small bits of tasks. They would be sitting there blocking on a statement. It would never get the value to exit.
You mentioned that your concern about using processes on Windows is that they are expensive. So what you might do is create your own kind of process pool (a list of processes). They are all pulling from a queue, and you submit new tasks to the queue. If any process exceeds the timeout, you kill it, and replace it in the pool with a new one. That way you limit the overhead of creating new processes only to when they are timing out, instead of creating a new one for every task.
There are a few different options here.
First, start with jdi's suggestion of using multiprocessing. It may be that Windows process creation isn't actually expensive enough to break your use case.
If it actually is a problem, what I'd personally do is use Virtual PC, or even User Mode Linux, to just run the same code in another OS, where process creation is cheap. You get a free sandbox out of that, as well.
If you don't want to do that, jdi's suggestion of processes pools is a bit more work, but should work well as long as you don't have to kill processes very often.
If you really do want everything to be threads, you can do so, as long as you can restrict the way the jobs are written. If the jobs can always be cleanly unwound, you can kill them just by raising an exception. Of course they also have to not catch the specific exception you choose to raise. Obviously neither of these conditions is realistic as a general-purpose solution, but for your use case, it may be fine. The key is to make sure your code evolver never inserts any manual resource-management statements (like opening and closing a file); only with statements. (Alternatively, insert the open and close, but inside a try/finally.) And that's probably a good idea even if you're not doing things this way, because spinning off hundreds of processes that, e.g., each leak as many file handles as they can until they either time out or hit the file limit would slow your machine to a crawl.
If you can restrict the code generator/evolver even further, you could use some form of cooperative threading (e.g., greenlets), which makes things even nicer.
Finally, you could switch from CPython to a different Python implementation that can run multiple interpreter instances in a single process. I don't know whether jython or IronPython can do so. PyPy can do that, and also has a restricted-environment sandbox, but unfortunately I think both of those—and Python 3.x support—are not-ready-for-prime-time features, which means you either have to get a special build of PyPy (probably without the JIT optimizer), or build it yourself. This might be the best long-term solution, but it's probably not what you want today.
I have a code-base that I'm looking to split up and add to by using threading, however I'm relatively new on how to handle it. Please before reading further respect my wish of NOT just re-writing this code and tossing it back at me with the problem solved. I would much rather work the problem out by someone pointing me in the right direction, than someone solving it FOR me; I don't learn well that way.
The fully functioning code-base is here -- It requires the mechanize and beautifulsoup libraries which can be installed via easy_install.
I've separated out all of my functions, and tried to keep the code as clean as possible (I'm sure there are some optimizations in there that I'll get reamed for, but the main problem is how to thread this.
My ultimate goal is to pack this into a thread, and then share cookies between other initialized browser objects in order to do other things while my original code is running 'backgrounded'.
I've tried thus:
class Recon(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
#Packed the stuff above my original while loop in here, minus functions.
def run(self):
#Packed my code past the while loop in here.
somevar = Recon()
somevar.start()
Problem I'm having is that, once I run the program it will run the things in init, but afterwards it just sits there and freezes on me. No traceback, no errors, just doesn't do anything, doesn't even return my command prompt back to my control.
Could I just get some tips, or a general flow of how to convert this? I got overwhelmed and deleted the code I was trying with so I don't have that example, but do I need to be prepending 'self.' to all of my variables? Do I need to just define my vars as global?
Here is a reproduction of what I'm having trouble with after having tried to convert the script to use threading.
As long as you have a single thread (as in the above snippet, where you instantiate Recon just once), it shouldn't matter much what you do where; but of course I imagine the reason you're introducing threading is to eventually move to having multiple threads active.
If that's the case, then the first key issue is to ensure that you never have two or more threads simultaneously trying to use the same shared system/resource -- for example, multiple threads writing at the same time to ReconFile, in the case of the code at the pastebin URL you mention.
The classic way to avoid such issues is to use locking, but my favorite way is quite different: make sure any such resource is accessed by only one dedicated thread, and use a Queue.Queue instance (intrinsically threadsafe) to have other threads post work-request to the dedicated thread (so instead of writing to ReconFile directly each other thread would make a list of lines to be written contiguously, then .put the list on the queue where the "recon file writing" worker thread is waiting via .get).
When you need to get results back from such actions (not the case here), the requesting thread would place its own personal "queue on which to return results" as part of the "work request packet" it puts to the worker thread's queue. I've presented much more detail about this recommended architecture in the threading chapter of "Python in a Nutshell" 2nd edition (and why, as the book's author, I would of course never recommend you perform an illegal download of a free pirate copy of my book, I can however mention there's plenty of sites offering such pirate copies for download -- the legal way to read my book for free is to sign up for a trial offer to O'Reilly's "safari" online books website).
This does not address the specific problem you're observing, since it's happening when you only have one thread around. I notice that thread is trying to perform lots of I/O on standard input and standard output, which is possibly problematic from a thread -- consider doing the input for a thread before you start it (in the main thread) and for needed output use Python's standard logging module, which is guaranteed to be thread-safe. Do you still observe problems then? If that's the case, then the next step is to pepper your code with logging.info calls so that you can pinpoint exactly where it's stalling -- and tell us about that, so we can try to help from there!