Misuse of yield

Misuse of yield - python

I'm making a SocketServer that will need to be able to handle a lot of commands. So to keep my RequestHandler from becoming too long it will call different functions depening on the command. My dilemma is how to make it send info back to the client.
Currently I'm making the functions "yield" everything it wants to send back to the client. But I'm thinking it's probably not the pythonic way.
# RequestHandler
func = __commands__.get(command, unkown_command)
for message in func():
self.send(message)
# example_func
def example():
yield 'ip: {}'.format(ip)
yield 'count: {}'.format(count)
. . .
for ping in pinger(ip,count):
yield ping
Is this an ugly use of yield? The only alterative I can think of is if when the RequestHandler calls the function is passes itself as an argument
func(self)
and then in the function
def example(handler):
. . .
handler.send('ip: {}'.format(ip))
But this way doesn't feel much better.

def example():
yield 'ip: {}'.format(ip)
yield 'count: {}'.format(count)
What strikes me as strange in this solution is not the use of yield itself (which can be perfectly valid) but the fact that you're losing a lot of information by turning your data into strings prematurely.
In particular, for this kind of data, simply returning a dictionary and handling the sending in the caller seems more readable:
def example():
return {'ip': ip, 'count': count}
This also helps you separate content and presentation, which might be useful if you want, for example, to return data encoded in XML but later switch to JSON.
If you want to yield intermediate data, another possibility is using tuples: yield ('ip', ip). In this way you keep the original data and can start processing the values immediately outside the function

I do the same as you with yield. The reason for this is simple:
With yield the main loop can easily handle the case that sending data to one socket will block. Each socket gets a buffer for outgoing data that you fill with the yield. The main loop tries to send as much of that as possible to the socket and when it blocks it records how far it got in the buffer and waits for the socket to be ready for more. When the buffer is empty is runs next(func) to get the next chunk of data.
I don't see how you would do that with handler.send('ip: {}'.format(ip)). When that socket blocks you are stuck. You can't pause that send and handle other sockets easily.
Now for this to be useful there are some assumptions:
the data each yield sends is considerable and you don't want to generate all of it into one massive buffer ahead of time
generating the data for each yield takes time and you want to already send the finished parts
you want to use reply = yield data waiting for the peer to respond to the data in some way. Yes, you can make this a back and forth. next(func) becomes func.send(reply).
Any of these is a good reason to go the yield way or coroutines in general. The alternative seems to be to use one thread per socket.
Note: the func can also call other generators using yield from. Makes it easy to split a large problem into smaller handlers and to share common parts.

Related

PyQt readyRead: set text from serial to multiple labels

In PyQt5, I want to read my serial port after writing (requesting a value) to it. I've got it working using readyRead.connect(self.readingReady), but then I'm limited to outputting to only one text field.
The code for requesting parameters sends a string to the serial port. After that, I'm reading the serial port using the readingReady function and printing the result to a plainTextEdit form.
def read_configuration(self):
if self.serial.isOpen():
self.serial.write(f"?request1\n".encode())
self.label_massGainOutput.setText(f"{self.serial.readAll().data().decode()}"[:-2])
self.serial.write(f"?request2\n".encode())
self.serial.readyRead.connect(self.readingReady)
self.serial.write(f"?request3\n".encode())
self.serial.readyRead.connect(self.readingReady)
def readingReady(self):
data = self.serial.readAll()
if len(data) > 0:
self.plainTextEdit_commandOutput.appendPlainText(f"{data.data().decode()}"[:-2])
else: self.serial.flush()
The problem I have, is that I want every answer from the serial port to go to a different plainTextEdit form. The only solution I see now is to write a separate readingReady function for every request (and I have a lot! Only three are shown now). This must be possible in a better way. Maybe using arguments in the readingReady function? Or returning a value from the function that I can redirect to the correct form?
Without using the readyRead signal, all my values are one behind. So the first request prints nothing, the second prints the first etc. and the last is not printed out.
Does someone have a better way to implement this functionality?

QSerialPort has asyncronous (readyRead) and syncronous API (waitForReadyRead), if you only read configuration once on start and ui freezing during this process is not critical to you, you can use syncronous API.
serial.write(f"?request1\n".encode())
serial.waitForReadyRead()
res = serial.read(10)
serial.write(f"?request2\n".encode())
serial.waitForReadyRead()
res = serial.read(10)
This simplification assumes that responces comes in one chunk and message size is below or equal 10 bytes which is not guaranteed. Actual code should be something like this:
def isCompleteMessage(res):
# your code here
serial.write(f"?request2\n".encode())
res = b''
while not isCompleteMessage(res):
serial.waitForReadyRead()
res += serial.read(10)
Alternatively you can create worker or thread, open port and query requests in it syncronously and deliver responces to application using signals - no freezes, clear code, slightly more complicated system.

Why should asyncio.StreamWriter.drain be explicitly called?

From doc:
write(data)
Write data to the stream.
This method is not subject to flow control. Calls to write() should be followed by drain().
coroutine drain()
Wait until it is appropriate to resume writing to the stream. Example:
writer.write(data)
await writer.drain()
From what I understand,
You need to call drain every time write is called.
If not I guess, write will block the loop thread
Then why is write not a coroutine that calls it automatically? Why would one call write without having to drain? I can think of two cases
You want to write and close immediately
You have to buffer some data before the message it is complete.
First one is a special case, I think we can have a different API. Buffering should be handled inside write function and application should not care.
Let me put the question differently. What is the drawback of doing this? Does the python3.8 version effectively do this?
async def awrite(writer, data):
writer.write(data)
await writer.drain()
Note: drain doc explicitly states the below:
When there is nothing to wait for, the drain() returns immediately.
Reading the answer and links again, I think the functions work like this. Note: Check accepted answer for more accurate version.
def write(data):
remaining = socket.try_write(data)
if remaining:
_pendingbuffer.append(remaining) # Buffer will keep growing if other side is slow and we have a lot of data
async def drain():
if len(_pendingbuffer) < BUF_LIMIT:
return
await wait_until_other_side_is_up_to_speed()
assert len(_pendingbuffer) < BUF_LIMIT
async def awrite(writer, data):
writer.write(data)
await writer.drain()
So when to use what:
When the data is not continuous, Like responding to an HTTP request. We just need to send some data and don't care about when it is reached and memory is not a concern - Just use write
Same as above but memory is a concern, use awrite
When streaming data to a large number of clients (e.g. some live stream or a huge file). If the data is duplicated in each of the connection's buffers, it will definitely overflow RAM. In this case, write a loop that takes a chunk of data each iteration and call awrite. In case of a huge file, loop.sendfile is better if available.

From what I understand, (1) You need to call drain every time write is called. (2) If not I guess, write will block the loop thread
Neither is correct, but the confusion is quite understandable. The way write() works is as follows:
A call to write() just stashes the data to a buffer, leaving it to the event loop to actually write it out at a later time, and without further intervention by the program. As far as the application is concerned, the data is written in the background as fast as the other side is capable of receiving it. In other words, each write() will schedule its data to be transferred using as many OS-level writes as it takes, with those writes issued when the corresponding file descriptor is actually writable. All this happens automatically, even without ever awaiting drain().
write() is not a coroutine, and it absolutely never blocks the event loop.
The second property sounds convenient - you can call write() wherever you need to, even from a function that's not async def - but it's actually a major flaw of write(). Writing as exposed by the streams API is completely decoupled from the OS accepting the data, so if you write data faster than your network peer can read it, the internal buffer will keep growing and you'll have a memory leak on your hands. drain() fixes that problem: awaiting it pauses the coroutine if the write buffer has grown too large, and resumes it again once the os.write()'s performed in the background are successful and the buffer shrinks.
You don't need to await drain() after every write, but you do need to await it occasionally, typically between iterations of a loop in which write() is invoked. For example:
while True:
response = await peer1.readline()
peer2.write(b'<response>')
peer2.write(response)
peer2.write(b'</response>')
await peer2.drain()
drain() returns immediately if the amount of pending unwritten data is small. If the data exceeds a high threshold, drain() will suspend the calling coroutine until the amount of pending unwritten data drops beneath a low threshold. The pause will cause the coroutine to stop reading from peer1, which will in turn cause the peer to slow down the rate at which it sends us data. This kind of feedback is referred to as back-pressure.
Buffering should be handled inside write function and application should not care.
That is pretty much how write() works now - it does handle buffering and it lets the application not care, for better or worse. Also see this answer for additional info.
Addressing the edited part of the question:
Reading the answer and links again, I think the the functions work like this.
write() is still a bit smarter than that. It won't try to write only once, it will actually arrange for data to continue to be written until there is no data left to write. This will happen even if you never await drain() - the only thing the application must do is let the event loop run its course for long enough to write everything out.
A more correct pseudo code of write and drain might look like this:
class ToyWriter:
def __init__(self):
self._buf = bytearray()
self._empty = asyncio.Event(True)
def write(self, data):
self._buf.extend(data)
loop.add_writer(self._fd, self._do_write)
self._empty.clear()
def _do_write(self):
# Automatically invoked by the event loop when the
# file descriptor is writable, regardless of whether
# anyone calls drain()
while self._buf:
try:
nwritten = os.write(self._fd, self._buf)
except OSError as e:
if e.errno == errno.EWOULDBLOCK:
return # continue once we're writable again
raise
self._buf = self._buf[nwritten:]
self._empty.set()
loop.remove_writer(self._fd, self._do_write)
async def drain(self):
if len(self._buf) > 64*1024:
await self._empty.wait()
The actual implementation is more complicated because:
it's written on top of a Twisted-style transport/protocol layer with its own sophisticated flow control, not on top of os.write;
drain() doesn't really wait until the buffer is empty, but until it reaches a low watermark;
exceptions other than EWOULDBLOCK raised in _do_write are stored and re-raised in drain().
The last point is another good reason to call drain() - to actually notice that the peer is gone by the fact that writing to it is failing.

Generator for n-records of a real time stream

I subscribe to a real time stream which publishes a small JSON record at a slow rate (0.5 KBs every 1-5 seconds). The publisher has provided a python client that exposes these records. I write these records to a list in memory. The client is just a python wrapper for doing a curl command on a HTTPS endpoint for a dataset. A dataset is defined by filters and fields. I can let the client go for a few days and stop it at midnight to process multiple days worth of data as one batch.
Instead of multi-day batches described above, I'd like to write every n-records by treating the stream as a generator. The client code is below. I just added the append() line to create a list called 'records' (in memory) to playback later:
records=[]
data_set = api.get_dataset(dataset_id='abc')
for record in data_set.request_realtime():
records.append(record)
which as expected, gives me [*] in Jupyter Notebook; and keeps running.
Then, I created a generator from my list in memory as follows to extract one record (n=1 for initial testing):
def Generator():
count = 1
while count < 2:
for r in records:
yield r.data
count +=1
But my generator definition also gave me [*] and kept calculating; which I understand it is because the list is still being written in memory. But I thought my generator would be able to lock the state of my list and yield the first n-records. But it didn't. How can I code my generator in this case? And if a generator is not a good choice in this use case, please advise.
To give you the full picture, if my code was working, then, I'd have instantiated it, printed it, and received an object as expected like this:
>>>my_generator = Generator()
>>>print(my_generator)
<generator object Gen at 0x0000000009910510>
Then, I'd have written it to a csv file like so:
with open('myfile.txt', 'w') as f:
cf = csv.DictWriter(f, column_headers, extrasaction='ignore')
cf.writeheader()
cf.writerows(i.data for i in my_generator)
Note: I know there are many tools for this e.g. Kafka; but I am in an initial PoC phase. Please use Python 2x. Once I get my code working, I plan on stacking generators to set up my next n-record extraction so that I don't lose data in between. Any guidance on stacking would also be appreciated.

That's not how concurrency works. Unless some magic is being used that you didn't tell us about, while your first code returns * you can't run more code. Putting the generator in another cell just adds it to a queue to run when the first code finishes - since the first code will never finish, the second code will never even start running!
I suggest looking into some asynchronous networking library, like asyncio, twisted or trio. They allow you to make functions cooperative so while one of them is waiting for data, the other can run, instead of blocking. You'd probably have to rewrite the api.get_dataset code to be asynchronous as well.

Socket issues in Python

I'm building a simple server-client app using sockets. Right now, I am trying to get my client to print to console only when it received a specific message (actually, when it doesn't receive a specific message), but for some reason, every other time I run it, it goes through the other statement in my code and is really inconsistent - sometimes it will work as it should and then it will randomly break for a couple uses.
Here is the code on my client side:
def post_checker(client_socket):
response = client_socket.recv(1024)
#check if response is "NP" for a new post from another user
if response == "NP":
new_response = client_socket.recv(1024)
print new_response
else: #print original message being sent
print response
where post_checker is called in the main function as simply "post_checker(client_socket)" Basically, sometimes I get "NPray" printed to my console (when the client only expects to receive the username "ray") and other times it will print correctly.
Here is the server code correlated to this
for sublist in user_list:
client_socket.send("NP")
client_socket.send(sublist[1] + " ")
where user_list is a nested list and sublist[1] is the username I wish to print out on the client side.
Whats going on here?

The nature of your problem is that TCP is a streaming protocol. The bufsize in recv(bufsize) is a maximum size. The recv function will always return data when available, even if not all of the bytes have been received.
See the documentation for details.
This causes problems when you've only sent half the bytes, but you've already started processing the data. I suggest you take a look at the "recvall" concept from this site or you can also consider using UDP sockets (which would solve this problem but may create a host of others as UDP is not a guaranteed protocol).
You may also want to let the python packages handle some of the underlying framework for you. Consider using a SocketServer as documented here:

buffer = []
def recv(sock):
global buffer
message = b""
while True:
if not (b"\r\n" in b"".join(buffer)):
chunk = sock.recv(1024)
if not chunk:
break
buffer.append(chunk)
concat = b"".join(buffer)
if (b"\r\n" in concat):
message = concat[:concat.index(b"\r\n")]
concat = concat[concat.index(b"\r\n") + 2:]
buffer = [concat]
break
return message
def send(sock, data):
sock.send(data + b"\r\n")
I have tested this, and in my opinion, it works perfectly
My use case: I have two scripts that send data quickly, it ends up that one time or another, the buffers receive more than they should, and gather the data, with this script it leaves everything that receives more saved, and continues receiving until there is a new line between the data, and then, it gathers the data, divides in the new line, saves the rest and returns the data perfectly separated
(I translated this, so please excuse me if anything is wrong or misunderstood)

Is there a way to check whether the data buffer for a socket is empty or not in python?

I want to verify if the socket data buffer is empty or not before calling socket.recv(bufsize[, flags]). Is there a way to do that?

You can peek (look without actually consuming the data):
data = conn.recv(bufsize, socket.MSG_PEEK)

You might want to make your socket non-blocking:
socket.setblocking(0)
After that call, if you read from a socket without available data it will not block, but instead an exception is raised. See socket.setblocking(flag) for details.
For more advanced uses, you might look at select. Something like:
r, _, _ = select([socket],[],[],0)
# ^
# Timeout of 0 to poll (and not block)
Will inform you if some data are available to read in your socket.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Misuse of yield - python

Related

PyQt readyRead: set text from serial to multiple labels

Why should asyncio.StreamWriter.drain be explicitly called?

Generator for n-records of a real time stream

Socket issues in Python

Is there a way to check whether the data buffer for a socket is empty or not in python?

Categories

Resources