Hello stack overflow: Sometimes reader, first time poster.
Background:
Windows box running XP SP3, soon to be upgraded to Windows Seven (MSDNAA <3)
I have an injected DLL which gets cycles by hooking a function that is called thousands of times a second.
I would like to communicate/control this DLL via a python app. Basically, the DLL does the work, the python app supplies the brains/decision making.
My game plan for doing this, is I would have a counter and an if statement in the DLL. Each time the hooked function is called, counter++ and then jump back to the original function until something like if ( counter == 250 ) { // dostuff(); }. My though behind this it will allow the target app to run mostly unimpeded, but will still let me do interesting things.
Problem:
I'm at an utter loss on which IPC method I should use to do the communication. We have sockets, shared memory, pipes, filemapping(?), RPC, and other (seemingly) esoteric stuff like writing to the clipboard.
I've NEVER implemented any kind of IPC beyond toy examples.
I'm fairly sure I need something that:
Can handle talking back and forth between python and a DLL
Doesn't block/wait
Can check for waiting data, and continue if there isn't any
If locks are involved, can continue instead of waiting
Doesn't cost lots of time to read/write too
Help? Thank you for your time, I hope I've provided enough general information and not broken any accepted conventions.
I would like to add that the related questions box is very cool, and I did peruse it before posting.
Try sockets. Your demands are essentially a requirement of asynchronous operating; Python has asyncore module for asynchronous IO on sockets. At the same time, it doesn't look like Python's stdlib can asynchronously handle other IPC things, so I'd not recommend using them.
If you don't care about realtime, then you can use the file system for communication: a log file for the DLL's output, and a config file that is read every now and then to change the DLLs behavior.
Related
I'm having a difficult time understanding asynchronous IO so I hope to clear up some of my misunderstanding because the word "asynchronous" seems to be thrown in a lot. If it matters, my goal is to get into twisted python but I want a general understanding of the underlying concepts.
What exactly is asynchronous programming? Is it programming with a language and OS that support Asynchronous IO? Or is it something more general? In other words, is asynchronous IO a separate concept from asynchronous programming?
Asynchronous IO means the application isn't blocked when your computer is waiting for something. The definition of waiting here is not processing. Waiting for a webserver? Waiting for a network connection? Waiting for a hard drive to respond with data on a platter? All of this is IO.
Normally, you write this in a very simple fashion synchronously:
let file = fs.readFileSync('file');
console.log(`got file ${file}`);
This will block, and nothing will happen until readFileSync returns with what you asked for. Alternatively, you can do this asynchronously which won't block. This compiles totally differently. Under the hood it may be using interrupts. It may be polling handles with select statements. It typically uses a different binding to a low level library, such as libc. That's all you need to know. That'll get your feet wet. Here is what it looks like to us,
fs.readFile(
'file',
function (file) {console.log(`got file ${file}`)}
);
In this you're providing a "callback". That function will request the file immediately, and when it (the function you called, here fs.readFile) gets the file back it will call your callback (here that's a function that takes a single argument file.
There are difficulties writing things asynchronously:
Creates pyramid code if using callbacks.
Errors can be harder to pinpoint.
Garbage collection isn't always as clean.
Performance overhead, and memory overhead.
Can create hard to debug situations if mixed with synchronous code.
All of that is the art of asynchronous programming..
On Windows, Python (2)'s standard library routine subprocess.Popen allows you to specify arbitrary flags to CreateProcess, and you can access the process handle for the newly-created process from the object that Popen returns. However, the thread handle for the newly-created process's initial thread is closed by the library before Popen returns.
Now, I need to create a process suspended (CREATE_SUSPENDED in creation flags) so that I can manipulate it (specifically, attach it to a job object) before it has a chance to execute any code. However, that means I need the thread handle in order to release the process from suspension (using ResumeThread). The only way I can find, to recover the thread handle, is to use the "tool help" library to walk over all threads on the entire system (e.g. see this question and answer). This works, but I do not like it. Specifically, I am concerned that taking a snapshot of all the threads on the system every time I need to create a process will be too expensive. (The larger application is a test suite, using processes for isolation; it creates and destroys processes at a rate of tens to hundreds a second.)
So, the question is: is there a more efficient way to resume execution of a process that was suspended by CREATE_SUSPENDED, if all you have is the process handle, and the facilities of the Python 2 standard library (including ctypes, but not the winapi add-on)? Vista-and-higher techniques are acceptable, but XP compatibility is preferred.
I have found a faster approach; unfortunately it relies on an undocumented API, NtResumeProcess. This does exactly what it sounds like - takes a process handle and applies the equivalent of ResumeThread to every thread in the process. Python/ctypes code to use it looks something like
import ctypes
from ctypes.wintypes import HANDLE, LONG, ULONG
ntdll = ctypes.WinDLL("ntdll.dll")
RtlNtStatusToDosError = ntdll.RtlNtStatusToDosError
NtResumeProcess = ntdll.NtResumeProcess
def errcheck_ntstatus(status, *etc):
if status < 0: raise ctypes.WinError(RtlNtStatusToDosError(status))
return status
RtlNtStatusToDosError.argtypes = (LONG,)
RtlNtStatusToDosError.restype = ULONG
# RtlNtStatusToDosError cannot fail
NtResumeProcess.argtypes = (HANDLE,)
NtResumeProcess.restype = LONG
NtResumeProcess.errcheck = errcheck_ntstatus
def resume_subprocess(proc):
NtResumeProcess(int(proc._handle))
I measured approximately 20% less process setup overhead using this technique than using Toolhelp, on an otherwise-idle Windows 7 virtual machine. As expected given how Toolhelp works, the performance delta gets bigger the more threads exist on the system -- whether or not they have anything to do with the program in question.
Given the obvious general utility of NtResumeProcess and its counterpart NtSuspendProcess, I am left wondering why they have never been documented and given kernel32 wrappers. They are used by a handful of core system DLLs and EXEs all of which, AFAICT, are part of the Windows Error Reporting mechanism (faultrep.dll, werui.dll, werfault.exe, dwwin.exe, etc) and don't appear to re-expose the functionality under documented names. It seems unlikely that these functions would change their semantics without also changing their names, but a defensively-coded program should probably be prepared for them to disappear (falling back to toolhelp, I suppose).
I'm posting this here, because I found something that addresses this question. I'm looking into this myself and I believe that I've found the solution with this.
I can't give you an excerpt or a summary, because it's just too much and I found it just two hours ago. I'm posting this here for all the others who, like me, seek a way to "easily" spawn a proper child process in windows, but want to execute a cuckoo instead. ;)
The whole second chapter is of importance, but the specifics start at page 12.
http://lsd-pl.net/winasm.pdf
I hope that it helps others as much as it hopefully going to help me.
Edit:
I guess I can add more to it. From what I've gathered, does this document explain how to spawn a sleeping process which never gets executed. This way we have a properly set-up windows process running. Then it explains that by using the win32api functions VirtualAllocEx and WriteProcessMemory, we can easily allocate executable pages and inject machine code into the other process.
Then - the best part in my opinion - it's possible to change the registers of the process, allowing the programmer to change the instruction pointer to point at the cuckoo!
Amazing!
Clarification: As per some of the comments, I should clarify that this is intended as a simple framework to allow execution of programs that are naturally parallel (so-called embarrassingly parallel programs). It isn't, and never will be, a solution for tasks which require communication or synchronisation between processes.
I've been looking for a simple process-based parallel programming environment in Python that can execute a function on multiple CPUs on a cluster, with the major criterion being that it needs to be able to execute unmodified Python code. The closest I found was Parallel Python, but pp does some pretty funky things, which can cause the code to not be executed in the correct context (with the appropriate modules imported etc).
I finally got tired of searching, so I decided to write my own. What I came up with is actually quite simple. The problem is, I'm not sure if what I've come up with is simple because I've failed to think of a lot of things. Here's what my program does:
I have a job server which hands out jobs to nodes in the cluster.
The jobs are handed out to servers listening on nodes by passing a dictionary that looks like this:
{
'moduleName':'some_module',
'funcName':'someFunction',
'localVars': {'someVar':someVal,...},
'globalVars':{'someOtherVar':someOtherVal,...},
'modulePath':'/a/path/to/a/directory',
'customPathHasPriority':aBoolean,
'args':(arg1,arg2,...),
'kwargs':{'kw1':val1, 'kw2':val2,...}
}
moduleName and funcName are mandatory, and the others are optional.
A node server takes this dictionary and does:
sys.path.append(modulePath)
globals()[moduleName]=__import__(moduleName, localVars, globalVars)
returnVal = globals()[moduleName].__dict__[funcName](*args, **kwargs)
On getting the return value, the server then sends it back to the job server which puts it into a thread-safe queue.
When the last job returns, the job server writes the output to a file and quits.
I'm sure there are niggles that need to be worked out, but is there anything obvious wrong with this approach? On first glance, it seems robust, requiring only that the nodes have access to the filesystem(s) containing the .py file and the dependencies. Using __import__ has the advantage that the code in the module is automatically run, and so the function should execute in the correct context.
Any suggestions or criticism would be greatly appreciated.
EDIT: I should mention that I've got the code-execution bit working, but the server and job server have yet to be written.
I have actually written something that probably satisfies your needs: jug. If it does not solve your problems, I promise you I'll fix any bugs you find.
The architecture is slightly different: workers all run the same code, but they effectively generate a similar dictionary and ask the central backend "has this been run?". If not, they run it (there is a locking mechanism too). The backend can simply be the filesystem if you are on an NFS system.
I myself have been tinkering with batch image manipulation across my computers, and my biggest problem was the fact that some things don't easily or natively pickle and transmit across the network.
for example: pygame's surfaces don't pickle. these I have to convert to strings by saving them in StringIO objects and then dumping it across the network.
If the data you are transmitting (eg your arguments) can be transmitted without fear, you should not have that many problems with network data.
Another thing comes to mind: what do you plan to do if a computer suddenly "disappears" while doing a task? while returning the data? do you have a plan for re-sending tasks?
I have a code-base that I'm looking to split up and add to by using threading, however I'm relatively new on how to handle it. Please before reading further respect my wish of NOT just re-writing this code and tossing it back at me with the problem solved. I would much rather work the problem out by someone pointing me in the right direction, than someone solving it FOR me; I don't learn well that way.
The fully functioning code-base is here -- It requires the mechanize and beautifulsoup libraries which can be installed via easy_install.
I've separated out all of my functions, and tried to keep the code as clean as possible (I'm sure there are some optimizations in there that I'll get reamed for, but the main problem is how to thread this.
My ultimate goal is to pack this into a thread, and then share cookies between other initialized browser objects in order to do other things while my original code is running 'backgrounded'.
I've tried thus:
class Recon(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
#Packed the stuff above my original while loop in here, minus functions.
def run(self):
#Packed my code past the while loop in here.
somevar = Recon()
somevar.start()
Problem I'm having is that, once I run the program it will run the things in init, but afterwards it just sits there and freezes on me. No traceback, no errors, just doesn't do anything, doesn't even return my command prompt back to my control.
Could I just get some tips, or a general flow of how to convert this? I got overwhelmed and deleted the code I was trying with so I don't have that example, but do I need to be prepending 'self.' to all of my variables? Do I need to just define my vars as global?
Here is a reproduction of what I'm having trouble with after having tried to convert the script to use threading.
As long as you have a single thread (as in the above snippet, where you instantiate Recon just once), it shouldn't matter much what you do where; but of course I imagine the reason you're introducing threading is to eventually move to having multiple threads active.
If that's the case, then the first key issue is to ensure that you never have two or more threads simultaneously trying to use the same shared system/resource -- for example, multiple threads writing at the same time to ReconFile, in the case of the code at the pastebin URL you mention.
The classic way to avoid such issues is to use locking, but my favorite way is quite different: make sure any such resource is accessed by only one dedicated thread, and use a Queue.Queue instance (intrinsically threadsafe) to have other threads post work-request to the dedicated thread (so instead of writing to ReconFile directly each other thread would make a list of lines to be written contiguously, then .put the list on the queue where the "recon file writing" worker thread is waiting via .get).
When you need to get results back from such actions (not the case here), the requesting thread would place its own personal "queue on which to return results" as part of the "work request packet" it puts to the worker thread's queue. I've presented much more detail about this recommended architecture in the threading chapter of "Python in a Nutshell" 2nd edition (and why, as the book's author, I would of course never recommend you perform an illegal download of a free pirate copy of my book, I can however mention there's plenty of sites offering such pirate copies for download -- the legal way to read my book for free is to sign up for a trial offer to O'Reilly's "safari" online books website).
This does not address the specific problem you're observing, since it's happening when you only have one thread around. I notice that thread is trying to perform lots of I/O on standard input and standard output, which is possibly problematic from a thread -- consider doing the input for a thread before you start it (in the main thread) and for needed output use Python's standard logging module, which is guaranteed to be thread-safe. Do you still observe problems then? If that's the case, then the next step is to pepper your code with logging.info calls so that you can pinpoint exactly where it's stalling -- and tell us about that, so we can try to help from there!
I am writing a Python application in the field of scientific computing. Currently, when the user works with the GUI and starts a new physics simulation, the interpreter immediately imports several necessary modules for this simulation, such as Traits and Mayavi. These modules are heavy and take too long to import, and the user has to wait ~10 seconds before he can continue, which is bad.
I thought of something that might remedy this. I'll describe it and perhaps someone else has already implemented it, if so please give me a link. If not I might do it myself.
What I want is a separate thread that will import modules asynchronously. It will probably be a subclass of threading.Thread.
Here's a usage example:
importer_thread = ImporterThread()
importer_thread.start()
# ...
importer_thread.import('Mayavi')
importer_thread.import('Traits')
# A thread-safe method that will put the module name
# into a queue which the thread in an inifine loop
# ...
# When the user actually needs the modules:
import Mayavi, Traits
# If they were already loaded by importer_thread, we're good.
# If not, we'll just have to wait as usual.
So do you know of anything like this? If not, do you have any suggestions about the design?
The problem with this is that the imports must still complete before they are usable. Depending on when they're first used, the application could still have to block for 10 seconds before it could start up anyway. Much more productive would be to profile the modules and figure out why they take so long to import.
Why not just do this when the app starts?
def background_imports():
import Traits
import Mayavi
thread = threading.Thread(target=background_imports)
thread.setDaemon(True)
thread.start()
The general idea is good, but the Python/GUI session might not be all that responsive while the background thread is importing away; unfortunately, import inherently and inevitably "locks up" Python substantially (it's not just the GIL, there's specific extra locking for imports).
Still worth trying, as it might make things a bit better -- it's also very easy, since Queues are intrinsically thread-safe and, besides a Queue's put and get, all you need is basically an __import__. Still, don't be surprised if this doesn't help enough and you still need extra oomph.
If you have some drive that's intrinsically very fast, but with limited space, such as a "RAM drive" or a particularly snippy solid-state one, it may be worth keeping the needed packages in a .tar.bz2 (or other form of archive) and unpacking it onto the fast drive at program start (that's essentially just I/O and so it won't lock things up badly -- I/O operations rapidly release the GIL -- and also it's especially easy to delegate to a subprocess running tar xjf or the like).
If some of the import slowness is due to a huge number of .py/.pyc/.pyo files, it's worth a try to keep those (in .pyc form only, not as .py) in a zipfile and importing from there (but that only helps with the I/O overhead, depending on your OS, filesystem, and drive: doesn't help with delays due to loading huge DLLs or executing initialization code in packages at load time, which I suspect are likelier culprits for the slowness).
You could also consider splitting the application up with multiprocessing -- again using Queues (but of the multiprocessing kind) to communicate -- so that both imports and some heavy computations are delegated to a few auxiliary processes and thus made asynchronous (this may also help fully exploiting multiple cores at once). I suspect this may unfortunately be hard to arrange properly for visualization tasks (such as those you're presumably doing with mayavi) but it might help if you also have some "pure heavy computation" packages and tasks.
"the user works with the GUI and starts a new physics simulation"
Not really clear. Does "works with the GUI" means double click? Double click what? Some wxWidgets GUI application? Or IDLE?
If so, what does "starts a new physics simulation" mean? Click a button somewhere else? A GUI button to bring up a panel where they write code? Or do they import a script they wrote off line?
Why is the import happening before the simulation starts? How long does a simulation take? What does the GUI show?
I suspect that there's a way to be much, much lazier in doing the big imports. But from the description, it's hard to determine if there's a point in time where the import doesn't matter as much to the user.
Threads don't help much. What helps is rethinking the UI experience.