Different thread limit between python 3.4 and 3.5? - python

I have a python program that opens threads in a loop, each thread runs a method that sends an http packet.
The program is supposed to emulate heavy traffic on a certain server we're working on.
The thread creation code looks something like this:
while True:
try:
num_of_connections += 1
thread_obj = HTTP_Tester.Threaded_Test(ip)
thread_obj.start()
except RuntimeError as e:
print("Runtime Error!")
So again, the thread_obj is running a method which sends HTTP requests to the ip it is given, nothing fancy.
When running this code under python 3.5, I am able to open around 880 threads until the RuntimeError "can't open a new thread" is thrown.
When running this code under python 3.4 however, the number of threads keeps growing and growing - I got up to 2000+ threads until the machine it was running on became unresponsive.
I check the amount of threads that are open by looking at the num_of_connections counter and also using TCPView to verify in fact that the number of Sockets is actually growing. Under python 3.4 TCPView actually shows 2000+ sockets open for the program so I deduce that there are in fact 2000+ threads open
I googled around and saw that people suggested the threading.stack_size is the culprit - not here, I changed the size but the number of possible threads doesn't change either way
The question is, how come with 3.5 the limit is so low, whereas in 3.4 its (presumably) high? Also can I change the limit? I would prefer to use 3.5 but want to open as many threads as I can
Thank you!

Related

os.system and subprocess.run make my multi threaded process freeze until call ends

I am new to python and having some problems.
I wrote an update_manager class that can communicate with the user via Tcp and preform installations of different components.
My update_manager class uses 2 other classes(they are his members) to accomplish this. The first is used for TCP communication and the second for actual installation. the installation class runs from the main thread and the communication is run by using Threading.thread() function.
my main locks like this:
if __name__ == "__main__":
new_update = UpdateManager()
#time.sleep(10)
new_update.run()
and the run functions is:
def run(self):
comm_thread = threading.Thread(target=
self._comm_agent.start_server_tcp_comunication)
comm_thread.start()
while True:
if (False == self.is_recovery_required()):
self.calculate_free_storage_for_update_zip_extraction()
self.start_communication_with_client_in_state_machine()
self._comm_agent.disable_synchronized_communication()
self.start_update_install()
self._comm_agent.enable_synchronized_communication()
if (True == self.is_dry_run_requested()):
self.preform_cleanup_after_dry_run()
else:
self.reset_valid_states()
self.preform_clean_up_after_update_cycle()
I use 2 multiprocessing.Queue() to sync between the threads and between the user. One for incoming messages and one for outgoing messages.
At first TCP communication is synchronous, user provides installation file and few other things.
Once installation begins TCP communication is no longer synchronous.
During the installation I use 4 different install methods. and all but one work just fine with no problem(user can pool the update_manager process and ask progress questions and get immediate a reply)
The problematic one is the instantiation of rpm files. for this I tried calling for os.system() and subprocess.run() and it works but for big rpm files I notices the entire process with my threads freezes until
the call finishes(I can see the progress bar of rpm installation on my screen during this freeze).
What I noticed and tried:
1.There is no freeze during other installation methods which use python.
libraries.
2.Once user connects via TCP there are only 2 threads for the update_manager, once first request is sent and a reply is send back 2 more threads appear (I assume it have something to do with the queues I use).
3.I created third thread that prints time(and has nothing to do with the queues), and I start it as soon as update_manager process starts. When the 2 threads freeze this one keeps going.
4.On some rare occasions process will unfreeze just for a message to go throw from client to update_manager and freeze back.
Edit: I forgot one more important point
5. The freeze occurs when calling:
os.system("rpm --nodeps --force -ivh rpm_file_name")
But does not happen when calling:
os.system("sleep 5")
I would really appreciate some indigent, Thanks.
The problem was with the incoming queue.
I used:
if (True == self._outgoing_message_queue.empty()):
temp: dict = self._outgoing_message_queue.get()
This is a simple bug, thread just got stuck on an empty queue.
But even If the code is changed to
if (False == self._outgoing_message_queue.empty()):
temp: dict = self._outgoing_message_queue.get()
it might cause the same behavior because between the moment the if statement is evaluated and the moment the get() is called a contact switch might occur and the queue might become empty and thread will get stuck on .get() as in my original code.
A better solution is to use get_nowait()
try:
temp = self._outgoing_message_queue.get_nowait()
except:
temp = None

Is reusing same process name in loop situation possibly generate zombie process?

My script has to run over a day and its core cycle runs 2-3 times per a minute. I used multiprocessing to give a command simultaneously and each of them will be terminated/join within one cycle.
But in reality I found the software end up out of swap memory or computer freezing situation, I guess this is caused by accumulated processes. I can see on another session while running program, python PID abnormally increasing by time. So I just assume this must be something process thing. What I don't understand is how it happens though I made sure each cycle's process has to be finished on that cycle before proceed the next one.
so I am guessing, actual computing needs more time to progress 'terminate()/join()' job, so I should not "reuse" same object name. Is this proper guessing or is there other possibility?
def function(a,b):
try:
#do stuff # audio / serial things
except:
return
flag_for_2nd_cycle=0
for i in range (1500): # main for running long time
#do something
if flag_for_2nd_cycle==1:
while my_process.is_alive():
if (timecondition) < 30: # kill process if it still alive
my_process.terminate()
my_process.join()
flag_for_2nd_cycle=1
my_process=multiprocessing.process(target=function, args=[c,d])
my_process.start()
#do something and other process jobs going on, for example
my_process2 = multiprocessing.process() ##*stuff
my_process2.terminate()
my_process2.join()
Based on your comment, you are controlling three projectors over serial ports.
The simplest way to do that would be to simply open three serial connections (using pySerial). Then run a loop where you check for available data each of the connections and if so, read and process it. Then you send commands to each of the projectors in turn.
Depending on the speed of the serial link you might not need more than this.

How can you skip a loop iteration in python if a function called inside the loop takes too long to execute?

I want to loop over a set of files and perform an operation on each of them which is specified in "runthingy()". However, because this operation gets stuck on some of those files stopping the entire program, i want to skip this particular file if it takes longer than 120 seconds to complete. I am using Windows which is why signal.SIGALARM is not available, so I am using the stopit library (https://pypi.org/project/stopit/) instead. The following example code will abort the while loop and print timeout after 3 seconds:
with stopit.ThreadingTimeout(3) as to_ctx_mrg:
while(True):
continue
if to_ctx_mrg.state == to_ctx_mrg.TIMED_OUT:
print("Time out")
However, using it in this context will never print out time out if the runthingy() function gets stuck/takes forever to complete:
for filename in os.listdir(os.getcwd()+"\\files\\"):
with stopit.ThreadingTimeout(120) as to_ctx_mrg:
runthingy(filename)
if to_ctx_mrg.state == to_ctx_mrg.TIMED_OUT:
print("Time out")
continue
I don't have experience of the library you are using but it says it raises an asynchronous exception in the timed out thread.
The question is why your function gets 'stuck'? The Python interpreter will only detect that an exception has been raised when it is interpreting Python instructions within that thread. If the reason your function sticks is that it has made a C call that hasn't returned then other Python threads can still probably run, but they won't be able to interrupt the remote thread.
You need to look more closely at why 'runthingy()' blocks. Is it perhaps reading from a socket, or waiting for a file lock? If the call that blocks has an optional timeout then make sure the timeout parameter is set fairly low: even if the code just retries the call after a timeout it at least gives the Python interpreter a chance to get in there and abort the process.
Better still, if you can find out why the function sticks you may be able to fix the underlying problem instead of applying a brute force timeout.

Python socket recv taking ages to deliver packet

I have a Python 3 program which sends short commands to a host and gets short responses back (both 20 bytes). It's not doing anything complicated.
The socket is opened like this:
self.conn = socket.create_connection( ( self.host, self.port ) )
self.conn.settimeout( POLL_TIME )
and used like this:
while( True ):
buf = self.conn.recv( 256 )
# append buffer to bigger buffer, parse packet once we've got enough bytes
After my program has been running for a while (hours, usually), sometimes it goes into a strange mode - if I use tcpdump, I can see a response packet arriving at the local machine, but recv doesn't give me the packet until 30s (Windows) to 1m (Linux) later. The time is random +/- about ten seconds. I wondered if the packet was being delayed til the next packet arrived, but this doesn't seem to be true.
In the meantime, the same program is also operating a second socket connection using the same code on a different thread, which continues to work normally.
This doesn't happen all the time, but it's happened several times in a month. Sometimes it's preceded by a few seconds of packets taking longer and longer to arrive, but most of the time it just goes straight from OK to completely broken. Most of the time it stays broken for hours until I restart the server, but last night I noticed it recovering and going back to normal operation, so it's not irrecoverable.
CPU usage is almost zero, and nothing else is running on the same machine.
The weirdest thing is that this happens on both the Linux Subsystem for Windows (two different laptops), and on Linux (AWS tiny instance running Amazon Linux).
I had a look at the CPython implementation of socket.recv() using GDB. Looking at the source code, it looks like it passes calls to socket.recv() straight through to the underlying recv(). However, while the outer function sock_recv() (which implements socket.recv() ) gets called frequently, it only calls recv() when there's actually data to read from the socket, using the socket_call() function to call poll()/select() to see if there's any data waiting. Calls to recv() happen directly before the app receives a packet, so the delay is somewhere before that point, rather than between recv() and my code.
Any ideas on how to troubleshoot this?
(Both the Linux and Windows machines are updated to the most recent everything, and the Python is Python 3.6.2)
[edit] The issue gets even weirder. I got fed up and wrote a method to detect the issue (looking for ten late-arriving packets in a row with near-identical roundtrip times), drop the connection and reconnect (by closing the previous connection and creating a new socket object) ... and it didn't work. Even with a new socket object, the delayed packets remain delayed by exactly the same amount. So I altered the method to completely kill the thread that was running that code and restart it, reasoning that perhaps there was some thread-local state. That still didn't work. The only resort I have left is killing the entire program and having a watchdog to restart it...
[edit2] Killing the entire program and restarting it with an external watchdog worked. It's a terrible solution, but at least it's a solution.

Python-Twisted Reactor Starting too Early

I have an application that uses PyQt4 and python-twisted to maintain a connection to another program. I am using "qt4reactor.py" as found here. This is all packaged up using py2exe. The application works wonderfully for 99% of users, but one user has reported that networking is failing completely on his Windows system. No other users report the issue, and I cannot replicate it on my own Windows VM. The user reports no abnormal configuration.
The debugging logs show that the reactor.connectTCP() call is executing immediately, even though the reactor hasn't been started yet! There's no mistaking run order because this is a single-threaded process with 60 sec of computation and multiple log messages between this line and when the reactor is supposed to start.
There's a lot of code, so I am only putting in pseudo-code, hoping that there is a general solution for this issue. I will link to the actual code below it.
import qt4reactor
qt4reactor.install()
# Start setting up main window
# ...
from twisted.internet import reactor
# Separate listener for detecting/processing multiple instances
self.InstanceListener = ListenerFactory(...)
reactor.listenTCP(LISTEN_PORT, self.InstanceListener)
# The active/main connection
self.NetworkingFactory = ClientFactory(...)
reactor.connectTCP(ACTIVE_IP, ACTIVE_PORT, self.NetworkingFactory)
# Finish setting up main window
# ...
from twisted.internet import reactor
reactor.runReturn()
The code is nested throughout the Armory project files. ArmoryQt.py (containing the above code) and armoryengine.py (containing the ReconnectingClientFactory subclass used for this connection).
So, the reactor.connectTCP() call executes immediately. The client code executes the send command and then immediately connectionLost() gets called. It does not appear to try to reconnect. It also doesn't throw any errors other than connectionLost(). Even more mysteriously, it receives messages from the remote node later on, and this app even processes them! But it believes it's not connected (and handshake never finished, so the remote node shouldn't be sending messages, but might be a bug/oversight in that program).
What on earth is going on!? How could the reactor get started before I tell it to start? I searched the code and found no other code that (I believe) could start the reactor.
The API that you're looking for is twisted.internet.reactor.callWhenRunning.
However, it wouldn't hurt to have less than 60 seconds of computation at startup, either :). Perhaps you should spread that out, or delegate it to a thread, if it's relatively independent?

Categories