What is the difference between asyncio and threads? - python

I have some hardware devices in my network that I need to read from them data every 100ms and I need some async way to do it and not to wait to each call.
one way to do it is to use threads and other is using asyncio that uses loop.run_executer method(that create thread for each call).
In both cases it will be async so I really don't understand what asyncio is giving us that threads are not.
can someone explain what is the advantage of using asyncio and not threads?
For example How can I turn the next code to be asyncio code:
def _send(self, data):
"""Send data over current socket
:param data: registers value to write
:type data: str (Python2) or class bytes (Python3)
:returns: True if send ok or None if error
:rtype: bool or None
"""
# check link
if self.__sock is None:
self.__debug_msg('call _send on close socket')
return None
# send
data_l = len(data)
try:
send_l = self.__sock.send(data)
except socket.error:
send_l = None
# handle send error
if (send_l is None) or (send_l != data_l):
self.__last_error = const.MB_SEND_ERR
self.__debug_msg('_send error')
self.close()
return None
else:
return send_l
This code is taken from ModbusClient class
Thanks

I believe that threads uses each one of your computers threads at the same time, which means that rather than being able to read/process the data one at a time, you will be able to do as many threads as your computer has. This means that you are limited by the hardware that you are programming on.
What asyncio allows you to do is to add the processes to a future, so you get the data, but you don't do anything with it. Once amount of time, or a certain number of datapoints are collected, you can process them all at once.
In this situation asyncio would be advantageous because you can add anywhere from 0 to many thousands of tasks to your future and perform them all at once instead.

Related

How to achieve concurrency with two threads with different execution times?

I am working on a school project. I set some rules in iptables which logs INPUT and OUTPUT connections. My goal is to read these logs line by line, parse them and find out which process with which PID is causing this.
My problem starts when I use psutil to find a match with (ip, port) tuple with the corresponding PID. iptables is saving logs to file too fast, like 1x10^-6 seconds. My Python script also read lines as fast as iptables. But when I use the following code:
def get_proc(src: str, spt: str, dst: str, dpt: str) -> str:
proc_info = ""
if not (src and spt and dst and dpt):
return proc_info
for proc in psutil.process_iter(["pid", "name"]):
for conn in proc.connections(kind="all"):
if flag.is_set():
return proc_info
if not all([
hasattr(conn.laddr, "ip"), hasattr(conn.laddr, "port"),
hasattr(conn.raddr, "ip"), hasattr(conn.raddr, "port"),
]):
continue
if not all([
conn.laddr.ip == src, conn.laddr.port == int(spt),
conn.raddr.ip == dst, conn.raddr.port == int(dpt),
]):
continue
return f"pid={proc.pid},name={proc.name()}"
return proc_info
psutil finishes its job like 1x10^-3 seconds, means 10^3 times slower than reading process. What happens is that: If I run this get_proc function once, I read 1000 lines. So this slowness quickly becomes a problem when 1x10^6 lines are read at the end. Because in order to find the PID, I need to run this method immediately when the log is received.
I thought of using multithreading but as far as I understand it won't solve my problem. Because the same latency problem.
I haven't done much coding so far because I still can't find an algorithm to use. That's way no more code here.
How can I solve this problem with or without multithreading? Because I can't speed up the execution of psutil. I believe there must be better approaches.
Edit
Code part for reading logs from iptables.log:
flag = threading.Event()
def stop(signum, _frame):
"""
Tell everything to stop themselves.
:param signum: The captured signal number.
:param _frame: No use.
"""
if flag.is_set():
return
sys.stderr.write(f"Signal {signum} received.")
flag.set()
signal.signal(signal.SIGINT, stop)
def receive_logs(file, queue__):
global CURSOR_POSITION
with open(file, encoding="utf-8") as _f:
_f.seek(CURSOR_POSITION)
while not flag.is_set():
line = re.sub(r"[\[\]]", "", _f.readline().rstrip())
if not line:
continue
# If all goes okay do some parsing...
# .
# .
queue__.put_nowait((nettup, additional_info))
CURSOR_POSITION = _f.tell()
Here is an approach that may help a bit. As I've mentioned in comments, the issue cannot be entirely avoided unless you change to a better approach entirely.
The idea here is to scan the list of processes not once per connection but for all connections that have arrived since the last scan. Since checking connections can be done with a simple hash table lookup in O(1) time, we can process messages much faster.
I chose to go with a simple 1-producer-1-consumer multithreading approach. I think this will work fine because most time is spent in system calls, so Python's global interpreter lock (GIL) is less of an issue. But that requires testing. Possible variations:
Use no multithreading, instead read incoming logs nonblocking, then process what you've got
Swap the threading module and queue for multiprocessing module
Use multiple consumer threads and maybe batch block sizes to have multiple scans through the process list in parallel
import psutil
import queue
import threading
def receive_logs(consumer_queue):
"""Placeholder for actual code reading iptables log"""
for connection in log:
nettup = (connection.src, int(connection.spt),
connection.dst, int(connection.dpt))
additional_info = connection.additional_info
consumer_queue.put((nettup, additional_info))
The log reading is not part of the posted code, so this is just some placeholder.
Now we consume all queued connections in a second thread:
def get_procs(producer_queue):
# 1. Construct a set of connections to search for
# Blocks until at least one available
nettup, additional_info = producer_queue.get()
connections = {nettup: additional_info}
try: # read as many as possible
while True:
nettup, additional_info = producer_queue.get_nowait()
connections[nettup] = additional_info
except queue.Empty:
pass
found = []
for proc in psutil.process_iter(["pid", "name"]):
for conn in proc.connections(kind="all"):
try:
src = conn.laddr.ip
spt = conn.laddr.port
dst = conn.raddr.ip
dpt = conn.raddr.port
except AttributeError: # not an IP address
continue
nettup = (src, spt, dst, dpt)
if nettup in connections:
additional_info = connections[nettup]
found.append((proc, nettup, additional_info))
found_connections = {nettup for _, nettup, _ in found}
lost = [(nettup, additional_info)
for nettup, additional_info in connections.items()
if not nettup in found_connections]
return found, lost
I don't really understand parts of the posted code in the question, such as the if flag.is_set(): return proc_info part so I just left those out. Also, I got rid of some of the less pythonic and potentially slow parts such as hasattr(). Adapt as needed.
Now we tie it all together by calling the consumer repeatedly and starting both threads:
def consume(producer_queue):
while True:
found, lost = get_procs(producer_queue)
for proc, (src, spt, dst, dpt), additional_info in found:
print(f"pid={proc.pid},name={proc.name()}")
def main():
producer_consumer_queue = queue.SimpleQueue()
producer = threading.Thread(
target=receive_logs, args=((producer_consumer_queue, ))
consumer = threading.Thread(
target=consume, args=((producer_consumer_queue, ))
consumer.start()
producer.start()
consumer.join()
producer.join()

Python: Improving performance - Writing to database in seperate thread

I am running a python app where I for various reasons have to host my program on a server in one part of the world and then have my database in another.
I tested via a simple script, and from my home which is in a neighboring country to the database server, the time to write and retrieve a row from the database is about 0.035 seconds (which is a nice speed imo) compared to 0,16 seconds when my python server in the other end of the world performs same action.
This is an issue as I am trying to keep my python app as fast as possible so I was wondering if there is a smart way to do this?
As I am running my code synchronously my program is waiting every time it has to write to the db, which is about 3 times a second so the time adds up. Is it possible to run the connection to the database in a separate thread or something, so it doesn't halt the whole program while it tries to send data to the database? Or can this be done using asyncio (I have no experience with async code)?
I am really struggling figuring out a good way to solve this issue.
In advance, many thanks!
Yes, you can create a thread that does the writes in the background. In your case, it seems reasonable to have a queue where the main thread puts things to be written and the db thread gets and writes them. The queue can have a maximum depth so that when too much stuff is pending, the main thread waits. You could also do something different like drop things that happen too fast. Or, use a db with synchronization and write a local copy. You also may have an opportunity to speed up the writes a bit by committing multiple at once.
This is a sketch of a worker thread
import threading
import queue
class SqlWriterThread(threading.Thread):
def __init__(self, db_connect_info, maxsize=8):
super().__init__()
self.db_connect_info = db_connect_info
self.q = queue.Queue(maxsize)
# TODO: Can expose q.put directly if you don't need to
# intercept the call
# self.put = q.put
self.start()
def put(self, statement):
print(f"DEBUG: Putting\n{statement}")
self.q.put(statement)
def run(self):
db_conn = None
while True:
# get all the statements you can, waiting on first
statements = [self.q.get()]
try:
while True:
statements.append(self.q.get(), block=False)
except queue.Empty:
pass
try:
# early exit before connecting if channel is closed.
if statements[0] is None:
return
if not db_conn:
db_conn = do_my_sql_connect()
try:
print("Debug: Executing\n", "--------\n".join(f"{id(s)} {s}" for s in statements))
# todo: need to detect closed connection, then reconnect and resart loop
cursor = db_conn.cursor()
for statement in statements:
if statement is None:
return
cursor.execute(*statement)
finally:
cursor.commit()
finally:
for _ in statements:
self.q.task_done()
sql_writer = SqlWriterThread(('user', 'host', 'credentials'))
sql_writer.put(('execute some stuff',))

How to use kqueue for file monitoring in asyncio?

I want to use kqueue to monitor files for changes. I can see how to use select.kqueue() in a threaded way.
I'm searching for a way to use it with asyncio. I may have missed something really obvious here. I know that python uses kqueue for asyncio on macos. I'm happy for any solution to only work when kqueue selector is used.
So far the only way I can see to do this is create a thread to continually kqueue.control() from another thread and then inject the events in with asyncio.loop.call_soon_threadsafe(). I feel like there should be a better way.
You can add the FD from the kqueue objet as a reader to the control loop using loop.add_reader(). The control loop will then inform you events are ready to collect.
There's two features of doing this which might be odd to those familiar with kqueue:
select.kqueue.control is a one-shot method which first changes the monitor and waits for new events to arrive. Because we don't ever want it to block, the two actions must be split into one non-blocking call to modify the monitor and a second, later, non-blocking call to collect the resulting events.
Because we don't ever want to block, the timeout can never be used. This can be re-implemented with asyncio.wait_for()
There are more efficient ways to write this, but here's an example of how to completely replace select.kqueue.control with an async method (here named kqueue_control):
async def kqueue_control(kqueue: select.kqueue,
changes: Optional[Iterable[select.kevent]],
max_events: int,
timeout: Optional[int]):
def receive_result():
try:
# Events are ready to collect; fetch them but do not block
results = kqueue.control(None, max_events, 0)
except Exception as ex:
future.set_exception(ex)
else:
future.set_result(results)
finally:
loop.remove_reader(kqueue.fileno())
# If this call is non-blocking then just execute it
if timeout == 0 or max_events == 0:
return kqueue.control(changes, max_events, 0)
# Apply the changes, but DON'T wait for events
kqueue.control(changes, 0)
loop = asyncio.get_running_loop()
future = loop.create_future()
loop.add_reader(kqueue.fileno(), receive_result)
if timeout is None:
return await future
else:
return await asyncio.wait_for(future, timeout)

how to make python awaitable object

In python 3.5.1 one can make use of await/async, however, to use it (as I undestand), you need to have awaitable object.
An awaitable object is an object that defines __await__() method returning an iterator. More info here.
But I can not google out any example of having this, since most examples have some sort of asyncio.sleep(x) to mimic awaitable object.
My ultimate goal is to make simple websocket serial server, however, I can't pass this first step.
This is my (non working code).
import serial
import asyncio
connected = False
port = 'COM9'
#port = '/dev/ttyAMA0'
baud = 57600
timeout=1
class startser(object):
def __init__(self, port, baud):
self.port = port
self.baud = baud
def openconn(self):
self.ser = serial.Serial(port, baud)
async def readport(self):
#gooo= await (self.ser.in_waiting > 0)
read_byte = async self.ser.read(1).decode('ascii')
self.handle_data(read_byte)
print ("42")
def handle_data(self, data):
print(data)
serr=startser(port,baud)
serr.openconn()
loop = asyncio.get_event_loop()
#loop.run_forever(serr.readport())
loop.run_until_complete(serr.readport())
loop.close()
print ("finitto")
#with serial.Serial('COM9', 115200, timeout=1) as ser:
#x = ser.read() # read one byte
#s = ser.read(10) # read up to ten bytes (timeout)
#line = ser.readline() # read a '\n' terminated line`
I guess there is still no answer because the question is not pretty clear.
You correctly said that
An awaitable object is an object that defines __await__() method returning an iterator
Not much to add here. Just return an iterator from that method.
The only thing you need to understand is how does it work. I mean, how asyncio or another similar framework achieves concurrency in a single thread. This is simple on a high level: just get all your code organized as iterators, then call them one-by-one until the values are exhausted.
So, for example, if you have two iterators, let's say first one yields letters and the second one yields numbers, event loop calls first one and gets 'A', then it calls the second one and gets 1 then it calls first one again and gets 'B' and so on and so on, until the iterators are completed. Of course, each of these iterators can do whatever you want before yielding the next value. But, the longer it takes - the longer pause between 'switching tasks' would be. You MUST keep every iteration short:
If you have inner loops, use async for - this will allow switching task without explicit yielding.
If you have a lot of code which executes for tens or even hundreds of milliseconds, consider rewriting it in smaller pieces. In a case of legacy code, you can use hacks like asyncio.sleep(0) ← this is an allowance for asyncio to switch task here.
No blocking operations! This is most important. Consider you do something like socket.recv(). All tasks will be stopped until this call ends. This is why this is called async io in the standard library: you must use theirs implementation of all I/O functions like BaseEventLoop.sock_recv().
I'd recommend you to start (if you didn't yet) with the following docs:
https://pymotw.com/3/asyncio/
https://docs.python.org/3/library/asyncio.html
https://www.python.org/dev/peps/pep-0492

Communication with sockets using threads

I'm currently programming a python class which acts as a client.
Because I don't want to block the main thread, receiving of packets is done in another thread and a callback function is called if a packet arrives.
The received packets are either broadcast messages or a reply for a command sent by the client. The function for sending commands is synchronous, it blocks until the reply arrives so it can directly return the result.
Simplified example:
import socket
import threading
class SocketThread(threading.Thread):
packet_received_callback = None
_reply = None
_reply_event = threading.Event()
def run(self):
self._initialize_socket()
while True:
# This function blocks until a packet arrives
p = self._receive_packet()
if self._is_reply(p):
self._reply = p
self._reply_event.set()
else:
self.packet_received_callback(p)
def send_command(self, command):
# Send command via socket
self.sock.send(command)
# Wait for reply
self._reply_event.wait()
self._reply_event.clear()
return self._process_reply(self._reply)
The problem which I'm facing now is that I can't send commands in the callback function because that would end in a deadlock (send_command waits for a reply but no packets can be received because the thread which receives packets is actually executing the callback function).
My current solution is to start a new thread each time to call the callback function. But that way a lot of threads are spawned and it will be difficult to ensure that packets are processed synchronously in heavy traffic situations.
Does anybody know a more elegant solution or am I going the right way?
Thanks for your help!
A proper answer to this question depends a lot on the details of the problem you are trying to solve, but here is one solution:
Rather than invoking the callback function immediately upon receiving the packet, I think it would make more sense for the socket thread to simply store the packet that it received and continue polling for packets. Then when the main thread has time, it can check for new packets that have arrived and act on them.
Recently had another idea, let me know how you think about it. It's just a general approach to solve such problems in case someone else has a similar problem and needs to use multi-threading.
import threading
import queue
class EventBase(threading.Thread):
''' Class which provides a base for event-based programming. '''
def __init__(self):
self._event_queue = queue.Queue()
def run(self):
''' Starts the event loop. '''
while True:
# Get next event
e = self._event_queue.get()
# If there is a "None" in the queue, someone wants to stop
if not e:
break
# Call event handler
e[0](*e[1], **e[2])
# Mark as done
self._event_queue.task_done()
def stop(self, join=True):
''' Stops processing events. '''
if self.is_alive():
# Put poison-pill to queue
self._event_queue.put(None)
# Wait until finished
if join:
self.join()
def create_event_launcher(self, func):
''' Creates a function which can be used to call the passed func in the event-loop. '''
def event_launcher(*args, **kwargs):
self._event_queue.put((func, args, kwargs))
return event_launcher
Use it like so:
event_loop = eventbase.EventBase()
event_loop.start()
# Or any other callback
sock_thread.packet_received_callback = event_loop.create_event_launcher(my_event_handler)
# ...
# Finally
event_loop.stop()

Categories