Python3.8 asyncio Streams and read

Python3.8 asyncio Streams and read - python

This is probably a stupid question, but I'm missing something that I think is probably obvious:
I'm working with the Python 3.8 asyncio streams:
import asyncio
async def client_handler():
reader, writer = await asyncio.open_connection('127.0.0.1', 9128)
server_name = writer.get_extra_info('peername')[0]
print(f"Connecting to: { server_name }")
data = b''
close_client = False
try:
while not close_client:
print(data)
data = await reader.read(1024)
print(data)
if data != b'':
print(data.decode('utf-8'))
data = b''
writer.close()
except:
pass
finally:
writer.close()
await writer.wait_closed()
asyncio.run(client_handler())
I guess I expected that it would try to read 1024 bytes but if there was nothing there, then it would just return None or an empty byte string or something, but instead it just sits there until data is received.
Am I misunderstanding what read is supposed to do? Is there instead another method that I could use to peek into a buffer or poll to see if any data is actually incoming?
For instance, lets say I'm writing an example chat program server and client that need to be able to dynamically send and receive data at the same time... how do I implement that with the asyncio streams? Should I just build my own asyncio.Protocol subclass instead?

You can use tools like asyncio.gather(), asyncio.wait(return_when=FIRST_COMPLETED), and asyncio.wait_for() to multiplex between different operations, such as reads from different streams, or reads and writes. Without additional details regarding your use case it's hard to give you concrete advice how to proceed.
Building an asyncio.Protocol or using feed_eof() directly is almost certainly the wrong approach, unless you are writing very specialized software and know exactly what you are doing.

Related

Why should asyncio.StreamWriter.drain be explicitly called?

From doc:
write(data)
Write data to the stream.
This method is not subject to flow control. Calls to write() should be followed by drain().
coroutine drain()
Wait until it is appropriate to resume writing to the stream. Example:
writer.write(data)
await writer.drain()
From what I understand,
You need to call drain every time write is called.
If not I guess, write will block the loop thread
Then why is write not a coroutine that calls it automatically? Why would one call write without having to drain? I can think of two cases
You want to write and close immediately
You have to buffer some data before the message it is complete.
First one is a special case, I think we can have a different API. Buffering should be handled inside write function and application should not care.
Let me put the question differently. What is the drawback of doing this? Does the python3.8 version effectively do this?
async def awrite(writer, data):
writer.write(data)
await writer.drain()
Note: drain doc explicitly states the below:
When there is nothing to wait for, the drain() returns immediately.
Reading the answer and links again, I think the functions work like this. Note: Check accepted answer for more accurate version.
def write(data):
remaining = socket.try_write(data)
if remaining:
_pendingbuffer.append(remaining) # Buffer will keep growing if other side is slow and we have a lot of data
async def drain():
if len(_pendingbuffer) < BUF_LIMIT:
return
await wait_until_other_side_is_up_to_speed()
assert len(_pendingbuffer) < BUF_LIMIT
async def awrite(writer, data):
writer.write(data)
await writer.drain()
So when to use what:
When the data is not continuous, Like responding to an HTTP request. We just need to send some data and don't care about when it is reached and memory is not a concern - Just use write
Same as above but memory is a concern, use awrite
When streaming data to a large number of clients (e.g. some live stream or a huge file). If the data is duplicated in each of the connection's buffers, it will definitely overflow RAM. In this case, write a loop that takes a chunk of data each iteration and call awrite. In case of a huge file, loop.sendfile is better if available.

From what I understand, (1) You need to call drain every time write is called. (2) If not I guess, write will block the loop thread
Neither is correct, but the confusion is quite understandable. The way write() works is as follows:
A call to write() just stashes the data to a buffer, leaving it to the event loop to actually write it out at a later time, and without further intervention by the program. As far as the application is concerned, the data is written in the background as fast as the other side is capable of receiving it. In other words, each write() will schedule its data to be transferred using as many OS-level writes as it takes, with those writes issued when the corresponding file descriptor is actually writable. All this happens automatically, even without ever awaiting drain().
write() is not a coroutine, and it absolutely never blocks the event loop.
The second property sounds convenient - you can call write() wherever you need to, even from a function that's not async def - but it's actually a major flaw of write(). Writing as exposed by the streams API is completely decoupled from the OS accepting the data, so if you write data faster than your network peer can read it, the internal buffer will keep growing and you'll have a memory leak on your hands. drain() fixes that problem: awaiting it pauses the coroutine if the write buffer has grown too large, and resumes it again once the os.write()'s performed in the background are successful and the buffer shrinks.
You don't need to await drain() after every write, but you do need to await it occasionally, typically between iterations of a loop in which write() is invoked. For example:
while True:
response = await peer1.readline()
peer2.write(b'<response>')
peer2.write(response)
peer2.write(b'</response>')
await peer2.drain()
drain() returns immediately if the amount of pending unwritten data is small. If the data exceeds a high threshold, drain() will suspend the calling coroutine until the amount of pending unwritten data drops beneath a low threshold. The pause will cause the coroutine to stop reading from peer1, which will in turn cause the peer to slow down the rate at which it sends us data. This kind of feedback is referred to as back-pressure.
Buffering should be handled inside write function and application should not care.
That is pretty much how write() works now - it does handle buffering and it lets the application not care, for better or worse. Also see this answer for additional info.
Addressing the edited part of the question:
Reading the answer and links again, I think the the functions work like this.
write() is still a bit smarter than that. It won't try to write only once, it will actually arrange for data to continue to be written until there is no data left to write. This will happen even if you never await drain() - the only thing the application must do is let the event loop run its course for long enough to write everything out.
A more correct pseudo code of write and drain might look like this:
class ToyWriter:
def __init__(self):
self._buf = bytearray()
self._empty = asyncio.Event(True)
def write(self, data):
self._buf.extend(data)
loop.add_writer(self._fd, self._do_write)
self._empty.clear()
def _do_write(self):
# Automatically invoked by the event loop when the
# file descriptor is writable, regardless of whether
# anyone calls drain()
while self._buf:
try:
nwritten = os.write(self._fd, self._buf)
except OSError as e:
if e.errno == errno.EWOULDBLOCK:
return # continue once we're writable again
raise
self._buf = self._buf[nwritten:]
self._empty.set()
loop.remove_writer(self._fd, self._do_write)
async def drain(self):
if len(self._buf) > 64*1024:
await self._empty.wait()
The actual implementation is more complicated because:
it's written on top of a Twisted-style transport/protocol layer with its own sophisticated flow control, not on top of os.write;
drain() doesn't really wait until the buffer is empty, but until it reaches a low watermark;
exceptions other than EWOULDBLOCK raised in _do_write are stored and re-raised in drain().
The last point is another good reason to call drain() - to actually notice that the peer is gone by the fact that writing to it is failing.

pySerial Capturing a long response

Hi guys I'm working a on script that will get data from a host using the Data Communications Standard (Developed by: Data Communication Standard Committee Lens Processing Division of The Vision Council), by serial port and pass the data into ModBus Protocol for the device to perform it's operations.
Since I don't fiscally have access to the host machine I'm trying to develop a secondary script to emulate the host. I am currently on the stage where I need to read a lot of information from the serial port and I get only part of the data. I was hoping to get the whole string sent on the send_job() function on my host emulator script.
Guys also can any of you tell me if this would be a good approach? the only thing the machine is supposed to do is grab 2 values from the host response and assign them to two modbus holding registers.
NOTE: the initialization function is hard coded because it will always be the same and the actual response data will not matter except for status. Also the job request is hard coded i only pass the job # that i get from a modbus holding register, the exact logic on how the host resolved this should not matter i only need to send the job number scanned from the device in this format.
main script:
def request_job_modbus(job):
data = F'[06][1c]req=33[0d][0a]job={job}[0d][0a][1e][1d]'.encode('ascii')
writer(data)
def get_job_from_serial():
response = serial_client.read_all()
resp = response.decode()
return resp
# TODO : SEND INIT SEQUENCE ONCE AND VERIFY IF REQUEST status=0
initiation_request()
init_response_status = get_init_status()
print('init method being active')
print(get_init_status())
while True:
# TODO: get job request data
job_serial = get_job_from_serial()
print(job_serial)
host emulation script:
def send_job():
job_response = '''[06][1c]ans=33[0d]job=30925[0d]status=0;"ok"[0d]do=l[0d]add=;2.50[0d]ar=1[0d]
bcerin=;3.93[0d]bcerup=;-2.97[0d]crib=;64.00[0d]do=l[0d]ellh=;64.00[0d]engmask=;613l[0d]
erdrin=;0.00[0d]erdrup=;10.00[0d]ernrin=;2.00[0d]ernrup=;-8.00[0d]ersgin=;0.00[0d]
ersgup=;4.00[0d]gax=;0.00[0d]gbasex=;-5.30[0d]gcrosx=;-7.96[0d]kprva=;275[0d]kprvm=;0.55[0d]
ldpath=\\uscqx-tcpmain-at\lds\iot\do\800468.sdf[0d]lmatid=;151[0d]lmatname=;f50[0d]
lnam=;vsp_basic_fh15[0d]sgerin=;0.00[0d]sgerup=;0.00[0d]sval=;5.18[0d]text_11=;[0d]
text_12=;[0d]tind=;1.53[0d][1e][1d]'''.encode('ascii')
writer(job_response)
def get_init_request():
req = p.readline()
print(req)
request = req.decode()[4:11]
# print(request)
if request == 'req=ini':
print('request == req=ini??? <<<<<<< cumple condicion y enviala respuesta')
send_init_response()
send_job()
while True:
# print(get_init_request())
get_init_request()
what I get in screen: main script
init method being active
bce
erd
condition was met init status=0
outside loop
ers
condition was met init status=0
inside while loop
trigger reset <<<--------------------
5782
`:lmatid=;151[0d]lmatname=;f50[0d]
lnam=;vsp_basic_fh15[0d]sgerin=;0.00[0d]sgerup=;0.00[0d]sval=;5.18[0d]text_11=;[0d]
text_12=;[0d]tind=;1.53[0d][1e][1d]
outside loop
condition was met init status=0
outside loop
what I get in screen: host emulation script
b'[1c]req=ini[0d][0a][1e][1d]'
request == req=ini??? <<<<<<< cumple condicion y enviala respuesta
b''
b'[06][1c]req=33[0d][0a]job=5782[0d][0a][1e][1d]'
b''
b''
b''
b''
b''
b''

I'm suspect you're trying to write too much at once to a hardware buffer that is fairly small. Especially when dealing with low power hardware, assuming you can stuff an entire message into a buffer is not often correct. Even full modern PC's sometimes have very small buffers for legacy hardware like serial ports. You may find when you switch from development to actual hardware, that the RTS and DTR lines need to be used to determine when to send or receive data. This will be up to whoever designed the hardware unfortunately, as they are often also ignored.
I would try chunking your data transfer into smaller bits as a test to see if the whole message gets through. This is a quick and dirty first attempt that may have bugs, but it should get you down the right path:
def get_job_from_serial():
response = b'' #buffer for response
while True:
try:
response += serial_client.read() #read any available data or wait for timeout
#this technically could only be reading 1 char at a time, but any
#remotely modern pc should easily keep up with 9600 baud
except serial.SerialTimeoutException: #timeout probably means end of data
#you could also presumably check the length of the buffer if it's always
#a fixed length to determine if the entire message has been sent yet.
break
return response
def writer(command):
written = 0 #how many bytes have we actually written
chunksize = 128 #the smaller you go, the less likely to overflow
# a buffer, but the slower you go.
while written < len(command):
#you presumably might have to wait for p.dtr() == True or similar
#though it's just as likely to not have been implemented.
written += p.write(command[written:written+chunksize])
p.flush() #probably don't actually need this
P.S. I had to go to the source code for p.read_all (for some reason I couldn't find it online), and it does not do what I think you expect it does. The exact code for it is:
def read_all(self):
"""\
Read all bytes currently available in the buffer of the OS.
"""
return self.read(self.in_waiting)
There is no concept of waiting for a complete message, it just a shorthand for grab everything currently available.

Sockets in python

im writting an app using python and sockets, here is piece of the server code:
while True:
c = random.choice(temp_deck)
temp_deck.remove(c)
if hakem == p1:
p1.send(pickle.dumps(('{} for {}'.format(c,'you'),False)))
p2.send(pickle.dumps(('{} for {}'.format(c,'other'),False)))
else:
p1.send(pickle.dumps(('{} for {}'.format(c,'other'),False)))
p2.send(pickle.dumps(('{} for {}'.format(c,'you'),False)))
if c in ['A♠','A♣','A♦','A♥']:
if hakem == p1:
p1.send(pickle.dumps(('You are Hakem!',False)))
p2.send(pickle.dumps(('Other Player is Hakem!',False)))
break
else:
p1.send(pickle.dumps(('Other Player is Hakem!',False)))
p2.send(pickle.dumps(('You are Hakem!',False)))
break
if hakem == p1:
hakem = p2
other = p1
else:
hakem = p1
other = p2
this needs two clients to connect, everything is fine except clients don't receive full data:
for example one gets:
3♠ for other
2♠ for you
10♣ for other
10♦ for you
A♣ for other
the other gets:
2♠ for you
10♣ for other
10♦ for you
A♣ for other
what should i do?
client code:
import socket
import pickle
s = socket.socket()
host = socket.gethostname()
port = 12345
s.connect((host, port))
while True:
o = pickle.loads(s.recv(1024))
print(o[0])
if o[1] == True:
s.send(pickle.dumps(input(">")))
s.close

The problem is that TCP sockets are byte streams, not message streams. When you send some data and the client does a recv, there's no guarantee that it will receive everything you sent. It may get half the message. It may get multiple messages at once.
I've explained this at some length in a blog post—but fortunately, you're actually only hitting half the problem, and it's ultimately the simpler half. You've chosen to use a stream of pickle messages as your protocol, and pickle is a self-delimiting (aka framed) protocol.
pickle.load can load pickle after pickle out of anything with a file-like interface. And if your client and server are built around blocking I/O (e.g., using a thread for each direction on the socket), you can simulate read by doing blocking recv calls and appending them onto a buffer until you have enough bytes to satisfy the read.
And, even better, you don't have to do that yourself, because that's exactly what the builtin socket.makefile does. I haven't done any more than a quick test with this, so I won't promise it's bulletproof, but…
On the client side, you probably have something like this:
sock.connect(...)
# more stuff
# in a loop somewhere
buf = sock.recv(16384)
msg = pickle.loads(buf)
# later
sock.close()
Change it to this:
sock.connect(...)
rfile = socket.makefile('rb')
# more stuff
# in a loop somewhere
msg = pickle.load(rfile)
# later
rfile.close()
sock.close()
And it just works.
Again, you should test this. And you should read either my blog post, or a more complete primer on sockets programming and TCP, to understand what's going on. And really, you're probably better off designing your app around a higher-level framework (asyncio is really cool, especially with the syntactic support in Python 3.5+, or I think Twisted already has a pickle protocol class pre-written for you…). But this may be enough to get you started.

Socket issues in Python

I'm building a simple server-client app using sockets. Right now, I am trying to get my client to print to console only when it received a specific message (actually, when it doesn't receive a specific message), but for some reason, every other time I run it, it goes through the other statement in my code and is really inconsistent - sometimes it will work as it should and then it will randomly break for a couple uses.
Here is the code on my client side:
def post_checker(client_socket):
response = client_socket.recv(1024)
#check if response is "NP" for a new post from another user
if response == "NP":
new_response = client_socket.recv(1024)
print new_response
else: #print original message being sent
print response
where post_checker is called in the main function as simply "post_checker(client_socket)" Basically, sometimes I get "NPray" printed to my console (when the client only expects to receive the username "ray") and other times it will print correctly.
Here is the server code correlated to this
for sublist in user_list:
client_socket.send("NP")
client_socket.send(sublist[1] + " ")
where user_list is a nested list and sublist[1] is the username I wish to print out on the client side.
Whats going on here?

The nature of your problem is that TCP is a streaming protocol. The bufsize in recv(bufsize) is a maximum size. The recv function will always return data when available, even if not all of the bytes have been received.
See the documentation for details.
This causes problems when you've only sent half the bytes, but you've already started processing the data. I suggest you take a look at the "recvall" concept from this site or you can also consider using UDP sockets (which would solve this problem but may create a host of others as UDP is not a guaranteed protocol).
You may also want to let the python packages handle some of the underlying framework for you. Consider using a SocketServer as documented here:

buffer = []
def recv(sock):
global buffer
message = b""
while True:
if not (b"\r\n" in b"".join(buffer)):
chunk = sock.recv(1024)
if not chunk:
break
buffer.append(chunk)
concat = b"".join(buffer)
if (b"\r\n" in concat):
message = concat[:concat.index(b"\r\n")]
concat = concat[concat.index(b"\r\n") + 2:]
buffer = [concat]
break
return message
def send(sock, data):
sock.send(data + b"\r\n")
I have tested this, and in my opinion, it works perfectly
My use case: I have two scripts that send data quickly, it ends up that one time or another, the buffers receive more than they should, and gather the data, with this script it leaves everything that receives more saved, and continues receiving until there is a new line between the data, and then, it gathers the data, divides in the new line, saves the rest and returns the data perfectly separated
(I translated this, so please excuse me if anything is wrong or misunderstood)

non-blocking read/log from an http stream

I have a client that connects to an HTTP stream and logs the text data it consumes.
I send the streaming server an HTTP GET request... The server replies and continuously publishes data... It will either publish text or send a ping (text) message regularly... and will never close the connection.
I need to read and log the data it consumes in a non-blocking manner.
I am doing something like this:
import urllib2
req = urllib2.urlopen(url)
for dat in req:
with open('out.txt', 'a') as f:
f.write(dat)
My questions are:
will this ever block when the stream is continuous?
how much data is read in each chunk and can it be specified/tuned?
is this the best way to read/log an http stream?

Hey, that's three questions in one! ;-)
It could block sometimes - even if your server is generating data quite quickly, network bottlenecks could in theory cause your reads to block.
Reading the URL data using "for dat in req" will mean reading a line at a time - not really useful if you're reading binary data such as an image. You get better control if you use
chunk = req.read(size)
which can of course block.
Whether it's the best way depends on specifics not available in your question. For example, if you need to run with no blocking calls whatever, you'll need to consider a framework like Twisted. If you don't want blocking to hold you up and don't want to use Twisted (which is a whole new paradigm compared to the blocking way of doing things), then you can spin up a thread to do the reading and writing to file, while your main thread goes on its merry way:
def func(req):
#code the read from URL stream and write to file here
...
t = threading.Thread(target=func)
t.start() # will execute func in a separate thread
...
t.join() # will wait for spawned thread to die
Obviously, I've omitted error checking/exception handling etc. but hopefully it's enough to give you the picture.

You're using too high-level an interface to have good control about such issues as blocking and buffering block sizes. If you're not willing to go all the way to an async interface (in which case twisted, already suggested, is hard to beat!), why not httplib, which is after all in the standard library? HTTPResponse instance .read(amount) method is more likely to block for no longer than needed to read amount bytes, than the similar method on the object returned by urlopen (although admittedly there are no documented specs about that on either module, hmmm...).

Another option is to use the socket module directly. Establish a connection, send the HTTP request, set the socket to non-blocking mode, and then read the data with socket.recv() handling 'Resource temporarily unavailable' exceptions (which means that there is nothing to read). A very rough example is this:
import socket, time
BUFSIZE = 1024
s = socket.socket()
s.connect(('localhost', 1234))
s.send('GET /path HTTP/1.0\n\n')
s.setblocking(False)
running = True
while running:
try:
print "Attempting to read from socket..."
while True:
data = s.recv(BUFSIZE)
if len(data) == 0: # remote end closed
print "Remote end closed"
running = False
break
print "Received %d bytes: %r" % (len(data), data)
except socket.error, e:
if e[0] != 11: # Resource temporarily unavailable
print e
raise
# perform other program tasks
print "Sleeping..."
time.sleep(1)
However, urllib.urlopen() has some benefits if the web server redirects, you need URL based basic authentication etc. You could make use of the select module which will tell you when there is data to read.

Yes when you catch up with the server it will block until the server produces more data
Each dat will be one line including the newline on the end
twisted is a good option
I would swap the with and for around in your example, do you really want to open and close the file for every line that arrives?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.