I'm trying to simulate a network (only TCP) using python's multiprocessing, multi-threading and raw socket.
What I'm doing?
Create a child process which sniffs the network for new connections.
A child process processes 1000 connections and terminates itself after spawning another child process (via main process obviously.) which will do the same job.
All the connection information is stored in a dictionary which is process specific.
Per connection, I'm creating a Timer thread which will delete the connection from the dictionary if the connection is idle for 5 seconds. (And I'm returning from the Thread also.)
While deleting the connection, I'm de-referencing the timer thread and all other dictionary parameters for that connection.
After deleting the last element from the dictionary, I'm calling a gc.collect() to collect the garbage and calling os._exit(0) so the child process will terminate. (Keep in mind that another sibling process has already taken over.)
Why I'm making it so complicated?
Each connection needs to have its own Timer as it has to die after 5 secs of inactivity.
I've tried with a single process processing all the connections which kept on eating up the memory. (Though I followed the above method, memory was not being released.) And at the end, the machine with 4GB memory used to hang making me unable to use it. (Keyboard and mouse interrupts were very very slow.)
So I made a child process for only 1000 connections, and then terminate it. Which releases the memory (I've seen it as the memory used by Python mostly remains a constant).
Since I'm terminating the child process, all threads associated to it should be deleted. (I've read that a thread is not removed in Python unless the parent process dies, which is not the case here.)
What is my problem?
Many times I'm seeing an error (Multiple times per child process), Can't start new thread. Though I'm explicitly terminating the child process using os._exit(). I know that there might be a limit in creating new threads. But I don't think it'd be too less like 2000 or 3000. Since I'm terminating the child processes, I hope the threads are also getting deleted. (Correct me if I'm wrong.)
Occasionally, This Can't start new thread. error comes while spawning a new child process. I couldn't understand why it's showing a thread error while creating a child process?
Rarely I've seen, at the beginning of the first child process itself the above error comes. (No doubt the previous instances of Python is being killed just before few seconds.) Since no instance of Python is there in the process list (Of Linux), It's obvious that all threads from the previous instance have been cleared which is not reflected in this rare scenario.
No doubt I'm ignoring the errors with try except, It'd be good to know:
Why that error comes with so many less threads?
On which parameters (of OS and python), creation of a new thread depends?
How to avoid the above error? (I've one, which will create just one background thread as daemon instead of a timer thread for each connection. So other than that any better solution?)
Related
I wrote a data analysis program with python's multiprocessing library for parallelism. As I don't need to control the subprocess in detail, I used the multiprocessing.Pool for simplicity.
However, when running the program, I find all the sub-processes fall into status S(SLEEPING) after a short period of active(Running) state.
I investigated the wchan of the processes. The parent process and all but one sub-processes are waiting for _futex, the other one is waiting for pipe_wait.
Some information about my program:
I used multiprocessing.Pool#map to distribute the tasks.
The sub-process task contains disk IO and high memory usage. During the course of the program, the sub-process memory cost may exceed the memory capacity (32 sub-processes each takes at most 5% memory). The disk space is ample.
The arguments and return values of the mapped function are not very large in size (just the filenames of the file to be processed, to be specific).
I didn't explicitly create any pipe in my code.
This is the code skeleton of my program.
# imports emitted
def subprocess_task(filename):
read_the_file(filename) # Large disk IO
process_the_data() # High memory cost
write_the_file(new_filename) # Large disk IO
return newfile_name
if __name__=="__main__":
files=["","",...] # The filename of files to process, len(files)=32.
p=multiprocessing.Pool(32) # There are more than 32 cores on the computer.
res=p.map(subprocess_task,files)
p.close()
# Do something with res.
So I want to know why the processes stuck in such a state(especially the pipe_waiting one)? Does it have anything to do with the high memory usage, and how do I solve it?
Much thanks!
OK, after some efforts digging into pipe(7), multiprocessing source code and the log of my troublesome program, I finally identified the problem.
The sole child process which is pipe_wait seems suspicious, because of which I wasted hours trying to find the blocking pipe. However, the key problem has nothing to do with pipes.
The problem is solved when I put some print reporting the pid at some checkpoints in my program. The processes is not same when the tasks are submitted (which I will refer to as original processes) and when the program got stuck (referred as the stuck processes). One of the original 32 child processes is missing in the stuck processes, and the only stuck process which is pipe_wait is not present when the tasks are submitted.
So I can guess the reason now. And the multiprocessing source code corresponds with my guess.
As I said, the program costs lots of memory. At some point when the system is out of memory, the OOM killer kills one of the child processes, selected by some certain algorithm. The OOM killer is forcible and the process exited with all the finishing undone, which includes the communication with the multiprocessing.Pool.
As the source code indicates, the pool uses one thread to collect the task results, and another to manage the workers. The collector thread passively waits for the result to be sent by the child process, while the worker manager thread actively detects process exit by polling all processes.
Therefore, after the process is killed, the worker manager thread detects it, and repopulates the pool by spawning a new process. As no more task is submitted, the process is pipe_wait for some new task. That's the sole pipe_wait child process in my problem. Meanwhile, the result collector thread keeps waiting for the result from the killed thread, which will never arrive. So the other threads are also sleeping.
I have no root access to the environment, or this could be further verified by investigating OOM killer log.
Each worker runs a long CPU-bound computation. The computation depends on parameters that can change anytime, even while the computation is in progress. Should that happen, the eventual result of the computation will become useless. We do not control the computation code, so we cannot signal it to stop. What can we do?
Nothing: Let the worker complete its task and somehow recognize afterwards that the result is incorrect and must be recomputed. That would means continuing using a processor for a useless result, possibly for a long time.
Don't use Pool: Create and join the processes as needed. We can then terminate the useless process and create another one. We can even keep bounds on the number of processes existing simultaneously. Unfortunately, we will not be reusing processes.
Find a way to terminate and replace a Pool worker: Is terminating a Pool worker even possible? Will Pool create replace the terminated one? If not, is there an external way of creating a new worker in a pool?
Given the strict "can't change computation code" limitation (which prevents checking for invalidation intermittently), your best option is probably #2.
In this case, the downside you mention for #2 ("Unfortunately, we will not be reusing processes.") isn't a huge deal. Reusing processes is an issue when the work done by a process is small relative to the overhead of launching the process. But it sounds like you're talking about processes that run over the course of seconds or longer; the cost of forking a new process (default on most UNIX-likes) is a trivial fraction of that, and spawning a process (default behavior on MacOS and Windows) is typically still measured in small fractions of a second.
For comparison:
Option #1 is wasteful; if you're anywhere close to using up your cores, and invalidation occurs with any frequency at all, you don't want to leave a core chugging on garbage indefinitely.
Option #3, even if it worked, would work only by coincidence, and might break in a new release of Python, since the behavior of killing workers explicitly is not a documented feature.
When running my code I start a thread that runs for around 50 seconds and does a lot of background stuff. If I run this program and then close it soon after, the stuff still goes on in the background for a while because the thread never dies. How can I kill the thread gracefully in my closeEvent method in my MianWindow class? I've tried setting up a method called exit(), creating a signal 'quitOperation' in the thread in question, and then tried to use
myThread.quitOperation.emit()
I expected that this would call my exit() function in my thread because I have this line in my constructor:
self.quitOperation.connect(self.exit)
However, when I use the first line it breaks, saying that 'myThread' has no attribute 'quitOperation'. Why is this? Is there a better way?
I'm not sure for python, but I assume this myThread.quitOperation.emit() emits a signal for the thread to exit. The point is that while your worker is using the thread and does not return, nor runs QCoreApplication::processEvents(), myThread will never have the chance to actually process your request (this is called thread starvation).
Correct answer may depend on the situation, and the nature of the "stuff" your thread is doing. The most common practice is that the main thread sends a signal to the worker thread where a slot sets a flag. In the blocking process you regularly check this flag. It it is set you stop whatever "stuff" you are doing, tell your worker thread that it can quit (with a signal preferably with queued connection), call a deleteLater() on the worker object itself, and return from any functions you are currently in, so that the thread's event handler can run, and clear your worker object and itself up, the finally quit.
In case your "stuff" is a huge cycle of very fast operation like simple mathematics or directory navigation one-by-one that takes only a few milliseconds each, this will be enough.
In case your "stuff" contain huge blocking parts that you have no control of (an thus you can't place this flag checking call in it), you may need to wait in the main thread until the worker thread quits.
In case you use direct connect to set the flag, or you set it directly, be sure to protect the read/write access of the flag with a QMutex to prevent inconsistent reads, or user a queued connection to ensure single thread access of the flag.
While highly discouraged, optionally you can use QThread's terminate() method to instantaneously kill the thread. You should never do this as it may cause memory leak, heap corruption, resource leaking and any nasty stuff as destructors and clean-up codes will not run, and the execution can be halted at an undesired state.
I have a multithreaded PyQt application that is leaking memory. All the functions that leak memory are worker threads, and I'm wondering if there's something fundamentally wrong with my approach.
When the main application starts, the various worker thread instances are created from the thread classes, but they are not initially started.
When functions run that require a worker thread, the thread is initialized (data and parameters are passed from the main function, and variables are reset from with in the worker instance), and then the thread is started. The worker thread does its business, then completes, but is never formally deleted.
If the function is called again, then again the thread instance is initialized, started, runs, stops, etc...
Because the threads can be called to run again and again, I never saw the need to formally delete them. I originally figured that the same variables were just get re-used, but now I'm wondering if I was mistaken.
Does this sound like the cause of my memory leak? Should I be deleting the threads when they complete even if they're going to be called again?
If this is the root of my problem, can someone point me to a code example of how to handle the thread deleting process properly? (If it matters, I'm using PyQt 4.11.3, Qt 4.8.6, and Python 3.4.3)
In a multi-threaded Python process I have a number of non-daemon threads, by which I mean threads which keep the main process alive even after the main thread has exited / stopped.
My non-daemon threads hold weak references to certain objects in the main thread, but when the main thread ends (control falls off the bottom of the file) these objects do not appear to be garbage collected, and my weak reference finaliser callbacks don't fire.
Am I wrong to expect the main thread to be garbage collected? I would have expected that the thread-locals would be deallocated (i.e. garbage collected)...
What have I missed?
Supporting materials
Output from pprint.pprint( threading.enumerate() ) showing the main thread has stopped while others soldier on.
[<_MainThread(MainThread, stopped 139664516818688)>,
<LDQServer(testLogIOWorkerThread, started 139664479889152)>,
<_Timer(Thread-18, started 139663928870656)>,
<LDQServer(debugLogIOWorkerThread, started 139664437925632)>,
<_Timer(Thread-17, started 139664463103744)>,
<_Timer(Thread-19, started 139663937263360)>,
<LDQServer(testLogIOWorkerThread, started 139664471496448)>,
<LDQServer(debugLogIOWorkerThread, started 139664446318336)>]
And since someone always asks about the use-case...
My network service occasionally misses its real-time deadlines (which causes a total system failure in the worst case). This turned out to be because logging of (important) DEBUG data would block whenever the file-system has a tantrum. So I am attempting to retrofit a number of established specialised logging libraries to defer blocking I/O to a worker thread.
Sadly the established usage pattern is a mix of short-lived logging channels which log overlapping parallel transactions, and long-lived module-scope channels which are never explicitly closed.
So I created a decorator which defers method calls to a worker thread. The worker thread is non-daemon to ensure that all (slow) blocking I/O completes before the interpreter exits, and holds a weak reference to the client-side (where method calls get enqueued). When the client-side is garbage collected the weak reference's callback fires and the worker thread knows no more work will be enqueued, and so will exit at its next convenience.
This seems to work fine in all but one important use-case: when the logging channel is in the main thread. When the main thread stops / exits the logging channel is not finalised, and so my (non-daemon) worker thread lives on keeping the entire process alive.
It's a bad idea for your main thread to end without calling join on all non-daemon threads, or to make any assumptions about what happens if you don't.
If you don't do anything very unusual, CPython (at least 2.0-3.3) will cover for you by automatically calling join on all non-daemon threads as pair of _MainThread._exitfunc. This isn't actually documented, so you shouldn't rely on it, but it's what's happening to you.
Your main thread hasn't actually exited at all; it's blocking inside its _MainThread._exitfunc trying to join some arbitrary non-daemon thread. Its objects won't be finalized until the atexit handler is called, which doesn't happen until after it finishes joining all non-daemon threads.
Meanwhile, if you avoid this (e.g., by using thread/_thread directly, or by detaching the main thread from its object or forcing it into a normal Thread instance), what happens? It isn't defined. The threading module makes no reference to it at all, but in CPython 2.0-3.3, and likely in any other reasonable implementation, it falls to the thread/_thread module to decide. And, as the docs say:
When the main thread exits, it is system defined whether the other threads survive. On SGI IRIX using the native thread implementation, they survive. On most other systems, they are killed without executing try ... finally clauses or executing object destructors.
So, if you manage to avoid joining all of your non-daemon threads, you have to write code that can handle both having them hard-killed like daemon threads, and having them continue running until exit.
If they do continue running, at least in CPython 2.7 and 3.3 on POSIX systems, that the main thread's OS-level thread handle, and various higher-level Python objects representing it, may be still retained, and not get cleaned up by the GC.
On top of that, even if everything were released, you can't rely on the GC ever deleting anything. If your code depends on deterministic GC, there are many cases you can get away with it in CPython (although your code will then break in PyPy, Jython, IronPython, etc.), but at exit time is not one of them. CPython can, and will, leak objects at exit time and let the OS sort 'em out. (This is why writable files that you never close may lose the last few writes—the __del__ method never gets called, and therefore there's nobody to tell them to flush, and at least on POSIX the underlying FILE* doesn't automatically flush either.)
If you want something to be cleaned up when the main thread finishes, you have to use some kind of close function rather than relying on __del__, and you have to make sure it gets triggered via a with block around the main block of code, an atexit function, or some other mechanism.
One last thing:
I would have expected that the thread-locals would be deallocated (i.e. garbage collected)...
Do you actually have thread locals somewhere? Or do you just mean locals and/or globals that are only accessed in one thread?