Python: socket.rcvfrom() lock all script - python

I have a simple code in python 3 using schedule and socket:
import schedule
import socket
from time import sleep
def readDataFromFile():
data = []
with open("/tmp/tmp.txt", "r") as f:
for singleLine in f.readlines():
data.append(str(singleLine))
if(len(data)>0):
writeToBuffer(data)
def readDataFromUDP():
udpData = []
rcvData, addr = sock.recvfrom(256)
udpData.append(rcvData.decode('ascii'))
if(len(udpData)>0):
writeToBuffer(udpData)
.
.
.
def main():
schedule.every().second.do(readDataFromFile)
schedule.every().second.do(readDataFromUDP)
while(1):
schedule.run_pending()
sleep(1)
UDP_IP = "192.xxx.xxx.xxx"
UDP_PORT = xxxx
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind((UDP_IP, UDP_PORT))
main()
The problem is, script hung up on the sock.rcvfrom() instruction, and wait until data come.
How force python to run this job independently? Better idea is to run this in threads?

You can use threads here, and it'll work fine, but it will require a few changes. First, the scheduler on your background thread is going to try to kick off a new recvfrom every second, no matter how long the last one took. Second, since both threads are apparently trying to call the same writeToBuffer function, you're probably going to need a Lock or something else to synchronize them.
Rewriting the whole program around an asynchronous event loop is almost certainly overkill here.
Just changing the socket to be nonblocking and doing a hybrid is probably the simplest change, e.g., by using settimeout:
# wherever you create your socket
sock.settimeout(0.8)
# ...
def readDataFromUDP():
udpData = []
try:
rcvData, addr = sock.recvfrom(256)
except socket.timeout:
return
udpData.append(rcvData.decode('ascii'))
if(len(udpData)>0):
writeToBuffer(udpData)
Now, every time you call recvfrom, if there's data available, you'll handle it immediately; if not, it'll wait up to 0.8 seconds, and then raise an exception, which means you have no data to process, so go back and wait for the next loop. (There's nothing magical about that 0.8; I just figured something a little less than 1 second would be a good idea, so there's time left to do all the other work before the next schedule time hits.)
Under the covers, this works by setting the OS-level socket to non-blocking mode and doing some implementation-specific thing to wait with a timeout. You could do the same yourself by using setblocking(False) and using the select or selectors module to wait up to 0.8 seconds for the socket to be ready, but it's easier to just let Python take care of that for you.

Related

Python: Improving performance - Writing to database in seperate thread

I am running a python app where I for various reasons have to host my program on a server in one part of the world and then have my database in another.
I tested via a simple script, and from my home which is in a neighboring country to the database server, the time to write and retrieve a row from the database is about 0.035 seconds (which is a nice speed imo) compared to 0,16 seconds when my python server in the other end of the world performs same action.
This is an issue as I am trying to keep my python app as fast as possible so I was wondering if there is a smart way to do this?
As I am running my code synchronously my program is waiting every time it has to write to the db, which is about 3 times a second so the time adds up. Is it possible to run the connection to the database in a separate thread or something, so it doesn't halt the whole program while it tries to send data to the database? Or can this be done using asyncio (I have no experience with async code)?
I am really struggling figuring out a good way to solve this issue.
In advance, many thanks!
Yes, you can create a thread that does the writes in the background. In your case, it seems reasonable to have a queue where the main thread puts things to be written and the db thread gets and writes them. The queue can have a maximum depth so that when too much stuff is pending, the main thread waits. You could also do something different like drop things that happen too fast. Or, use a db with synchronization and write a local copy. You also may have an opportunity to speed up the writes a bit by committing multiple at once.
This is a sketch of a worker thread
import threading
import queue
class SqlWriterThread(threading.Thread):
def __init__(self, db_connect_info, maxsize=8):
super().__init__()
self.db_connect_info = db_connect_info
self.q = queue.Queue(maxsize)
# TODO: Can expose q.put directly if you don't need to
# intercept the call
# self.put = q.put
self.start()
def put(self, statement):
print(f"DEBUG: Putting\n{statement}")
self.q.put(statement)
def run(self):
db_conn = None
while True:
# get all the statements you can, waiting on first
statements = [self.q.get()]
try:
while True:
statements.append(self.q.get(), block=False)
except queue.Empty:
pass
try:
# early exit before connecting if channel is closed.
if statements[0] is None:
return
if not db_conn:
db_conn = do_my_sql_connect()
try:
print("Debug: Executing\n", "--------\n".join(f"{id(s)} {s}" for s in statements))
# todo: need to detect closed connection, then reconnect and resart loop
cursor = db_conn.cursor()
for statement in statements:
if statement is None:
return
cursor.execute(*statement)
finally:
cursor.commit()
finally:
for _ in statements:
self.q.task_done()
sql_writer = SqlWriterThread(('user', 'host', 'credentials'))
sql_writer.put(('execute some stuff',))

Same socket speed is 66% slower on a new thread?

This is crazy! I heard Python threads are slow but this is beyond normal.
Here is the pseudo-code:
class ReadThread:
v = []
def __init__(self, threaded = True):
self.v = MySocket('127.0.0.1')
if threaded:
thread.start_new_thread(self._scan, ())
def read(self):
t0 = datetime.now()
self.v.read('SomeVariable')
t = datetime.now()
dt = (t-t0).total_seconds()
print dt
def _scan(self):
while True:
self.read()
If I run the read() in a while loop in the main-thread as this:
r = ReadThread(threaded = False)
while True:
r.read()
dt is about 78 ms with small variation. Now if I run it in a new thread like this:
r = ReadThread(threaded = True)
while True:
pass
dt is about 130 ms with +-10ms variance!
Why is it so slow? Am I doing something really wrong? It's the same thing just in a new thread!
MySocket() is is an object that uses a socket to read/write variables to a server and read() just gets some variable for the test.
It is hard to reproduce this problem locally without knowing what MySocket is, and the full example. However, I can try guessing, that the problem is this cycle:
while True:
pass
It is VERY CPU-consuming. The CPU literally goes around all the time, taking the CPU cycles to itself, and not letting the socket to work.
Contrary, the socket read operations are usually blocking and idling for the data to arrive, so they consume almost no CPU.
In the first example, you run your socket while nothing else eats CPU. In the second example, the main thread consumes 1 CPU completely.
Try replacing this cycle with a usual idling operation, e.g. time.sleep(60). So the main thread will idle for 60s while the socket thread will read and process data.
r = ReadThread(threaded = True)
time.sleep(60)
What will be the measuring in that case then?

Python select.select leaking threads

I have a program that maintains a connection to a server with a periodic heartbeat. Every once in a while, the server stops responding to heartbeats and I have to reconnect. I implemented this with a timer that, if no response is heard after n seconds, will call reconnect. Every time this happens, I leak a thread and over time I eventually run out of threads.
Now, simplifying massively for an easy repro, this illlustrates how reconnecting after a delay and how always causes an increase in threads. How can I kill the old threads/sockets/selects (which may be waiting on a recv)?
import socket
import select
import threading
class Connection():
def tick(self):
print(threading.active_count()) # this increases every 1s!
# ... certain conditions not met / it's been too long, then:
self.reconnect()
def reconnect(self):
self.socket.shutdown(socket.SHUT_WR)
self.socket.close()
self.timer.cancel()
self.connect()
def connect(self):
self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.socket.connect((IP, TCP_PORT))
self.timer = threading.Timer(1, self.tick)
self.timer.start()
r,_,_ = select.select([self.socket], [], [])
if __name__ == '__main__':
Connection().connect()
I'm pretty sure, it's not select() that leaks any threads. Let's assume the select() doesn't return, i.e it blocks forever.
In that case
.tick() is called from the timer thread.
.tick() calls .reconnect() within the timer thread.
.reconnect() closes the existing socket. This causes the active select() call to fail with IOError "Bad file descriptor" (which is also why you should really fix your code).
.reconnect() tries to cancel the current timer.
This does nothing, since the timer already triggered (we are currently inside the timer function!).
.reconnect() calls .connect() and that one establishes a new timer and here we go again.
So the question is: Where does this mode of operation hang on to the existing timer object? Well, all your timer threads get terminated by an IOError from the select() call. This stores a per-thread reference of the exception.
My guess is that this prevents the reference counted cleanup in CPython to trigger and hence the timer thread will only be cleaned up during garbage collection. This is unreliable, since there is no guarantee that the timer thread is ever cleaned up in time.
If you add import gc; gc.collect() at the start of .connect(), the problem (seems to) goes away. But yeah, that's a non-solution.
Why don't you use the timeout parameter to select() to achieve a similar result without having to use a timer thread?
r = []
while not r:
if self.socket:
self.socket.shutdown(socket.SHUT_WR)
self.socket.close()
self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.socket.connect((IP, TCP_PORT))
# select returns empty lists on timeout
r, _, _ = select.select([self.socket], [], [], 1)
Don't forget to set self.socket = None in Connection.__init__() for this to work.

A confusion regarding threads in python [duplicate]

This question already has answers here:
thread.start_new_thread vs threading.Thread.start
(2 answers)
Closed 9 years ago.
Level beginner. I have a confusion regarding the thread creation methods in python. To be specific is there any difference between the following two approaches:
In first approach I am using import thread module and later I am creating a thread by this code thread.start_new_thread(myfunction,()) as myfunction() doesn't have any args.
In second approach I am using from threading import Thread and later I am creating threads by doing something like this: t = Thread(target=myfunction)then t.start()
The reason why I am asking is because my programme works fine for second approach but when I use first approach it doesn't works as intended. I am working on a Client-Server programme. Thanks
The code is as below:
#!/usr/bin/env python
import socket
from threading import Thread
import thread
data = 'default'
tcpSocket = ''
def start_server():
global tcpSocket
tcpSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpSocket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
tcpSocket.bind(('',1520))
tcpSocket.listen(3)
print "Server is up...."
def service():
global tcpSocket
(clientSocket,address) = tcpSocket.accept()
print "Client connected with: ", address
# data = 'default'
send_data(clientSocket,"Server: This is server\n")
global data
while len(data):
data = receive_data(clientSocket)
send_data(clientSocket,"Client: "+data)
print "Client exited....\nShutting the server"
clientSocket.close()
tcpSocket.close()
def send_data(socket,data):
socket.send(data)
def receive_data(socket):
global data
data = socket.recv(2048)
return data
start_server()
for i in range(2):
t = Thread(target=service)
t.start()
#thread.start_new_thread(service,())
#immortal can you explain bit more please. I didn't get it sorry. How can main thread die? It should start service() in my code then the server waits for client. I guess it should wait rather than to die.
Your main thread calls:
start_server()
and that returns. Then your main thread executes this:
for i in range(2):
t = Thread(target=service)
t.start()
#thread.start_new_thread(service,())
Those also complete almost instantly, and then your main thread ends.
At that point, the main thread is done. Python enters its interpreter shutdown code.
Part of the shutdown code is waiting to .join() all (non-daemon) threads created by the threading module. That's one of the reasons it's far better not to use thread unless you know exactly what you're doing. For example, if you're me ;-) But the only times I've ever used thread are in the implementation of threading, and to write test code for the thread module.
You're entirely on your own to manage all aspects of a thread module thread's life. Python's shutdown code doesn't wait for those threads. The interpreter simply exits, ignoring them completely, and the OS kills them off (well, that's really up to the OS, but on all major platforms I know of the OS does just kill them ungracefully in midstream).

Timeout and high cpu load problems using multiple telnet connection threads in python

I want to connect to multiple telnet hosts using threading in python, but I stumbled about an issue I'm not able to solve.
Using the following code on MAC OS X Lion / Python 2.7
import threading,telnetlib,socket
class ReaderThread(threading.Thread):
def __init__(self, ip, port):
threading.Thread.__init__(self)
self.ip = ip
self.port = port
self.telnet_con = telnetlib.Telnet()
def run(self):
try:
print 'Start %s' % self.ip
self.telnet_con.open(self.ip,self.port,30)
print 'Done %s' % self.ip
except socket.timeout:
print 'Timeout in %s' % self.ip
def join(self):
self.telnet_con.close()
ta = []
t1 = ReaderThread('10.0.1.162',9999)
ta.append(t1)
t2 = ReaderThread('10.0.1.163',9999)
ta.append(t2)
for t in ta:
t.start()
print 'Threads started\n'
In general it works, but either one of the threads (it is not always the same one) takes a long time to connect (about 20 second and sometimes even runs into a timeout). During that awfully long connection time (in an all local network), cpu load also goes up to 100 %.
Even more strange is the fact that if I'm using only one thread in the array it always works flawlessly. So it must have something to do with the use of multiple threads.
I already added hostname entries for all IP addresses to avoid a DNS lookup issue. This didn't make a difference.
Thanks in advance for your help.
Best regards
senexi
Ok, You have overridden join(), and you are not supposed to do that. The main thread calls join() on each thread when the main thread finishes, which is right after the last line in your code. Since your join() method returns before your telnet thread actually exits, Python gets confused and tries to call join() again, and this is what causes the 100% cpu usage. Try to put a 'print' statement in your join() method.
Your implementation of join() tries to close the socket (probably while the other thread is still trying to open a connection), and this might be what causing your telnet threads to never finish.

Categories