I am writing a little Python script that parses the input from a QR reader (which is seen as a keyboard by the system).
At the moment I am using raw_input() but this function waits for an EOF/end-of-line symbol in order to submit the received string to the program.
I am wondering if there is a way to continuously parse the input string and not just in chunks limited by a line end.
In practice:
- is there a way in python to asynchronously and continuously parse a console input ?
- is there a way to change raw_input() (or an equivalent function) to look for another character in order to submit the string read into the program?
It seems like you're generally trying to solve two problems:
Read input in chunks
Parse that input asynchronously
For the first part, it will vary greatly based on the specifics of the input function your calling, but for standard input you could use something like
sys.stdin.read(1)
As for parsing asynchronously, there are a number of approaches you could take. Python is synchronous, so you will necessarily have to involve some subprocess calls. Manually spawning a function using the subprocess library is one option. You could also use something like Redis or some lightweight job queue to pop input chunks on and have them read and processed by another background script. Finally, gevent is a very popular coroutine based library for spawning asynchronous processes. Using gevent, this whole set up would look something like this:
class QRLoader(object):
def __init__(self):
self.data = []
def add_data(data):
self.data.append(data)
# if self._data constitutes a full QR code
# do something with data
gevent.spawn(parse_async)
def parse_async():
# do something with qr_loader.data
qr_loader = QRLoader()
while True:
data = sys.stdin.read(1)
if data:
qr_loader.add_data(data)
Related
In PyQt5, I want to read my serial port after writing (requesting a value) to it. I've got it working using readyRead.connect(self.readingReady), but then I'm limited to outputting to only one text field.
The code for requesting parameters sends a string to the serial port. After that, I'm reading the serial port using the readingReady function and printing the result to a plainTextEdit form.
def read_configuration(self):
if self.serial.isOpen():
self.serial.write(f"?request1\n".encode())
self.label_massGainOutput.setText(f"{self.serial.readAll().data().decode()}"[:-2])
self.serial.write(f"?request2\n".encode())
self.serial.readyRead.connect(self.readingReady)
self.serial.write(f"?request3\n".encode())
self.serial.readyRead.connect(self.readingReady)
def readingReady(self):
data = self.serial.readAll()
if len(data) > 0:
self.plainTextEdit_commandOutput.appendPlainText(f"{data.data().decode()}"[:-2])
else: self.serial.flush()
The problem I have, is that I want every answer from the serial port to go to a different plainTextEdit form. The only solution I see now is to write a separate readingReady function for every request (and I have a lot! Only three are shown now). This must be possible in a better way. Maybe using arguments in the readingReady function? Or returning a value from the function that I can redirect to the correct form?
Without using the readyRead signal, all my values are one behind. So the first request prints nothing, the second prints the first etc. and the last is not printed out.
Does someone have a better way to implement this functionality?
QSerialPort has asyncronous (readyRead) and syncronous API (waitForReadyRead), if you only read configuration once on start and ui freezing during this process is not critical to you, you can use syncronous API.
serial.write(f"?request1\n".encode())
serial.waitForReadyRead()
res = serial.read(10)
serial.write(f"?request2\n".encode())
serial.waitForReadyRead()
res = serial.read(10)
This simplification assumes that responces comes in one chunk and message size is below or equal 10 bytes which is not guaranteed. Actual code should be something like this:
def isCompleteMessage(res):
# your code here
serial.write(f"?request2\n".encode())
res = b''
while not isCompleteMessage(res):
serial.waitForReadyRead()
res += serial.read(10)
Alternatively you can create worker or thread, open port and query requests in it syncronously and deliver responces to application using signals - no freezes, clear code, slightly more complicated system.
I want to write a command-line program that communicates with other interactive programs through a pseudo-terminal. In particular I want to be able to cause keystrokes received to conditionally be sent to the underlying process. Let's say for an example that I would like to silently ignore any "e" characters that are sent.
I know that Python has a pty module for working with pseudo-terminals and I have a basic version of my program working using it:
import os
import pty
def script_read(stdin):
data = os.read(stdin, 1024)
if data == b"e":
return ... # What goes here?
return data
pty.spawn(["bash"], script_read)
From experimenting, I know that returning an empty bytes object b"" causes the pty.spawn implementation to think that the underlying file descriptor has reached the end of file and should no longer be read from, which causes the terminal to become totally unresponsive (I had to kill my terminal emulator!).
For interactive use, the simplest way to do this is probably to just return a bytes object containing a single null byte: b"\0". The terminal emulator will not print anything for it and so it will look like that input is just completely ignored.
This probably isn't great for certain usages of pseudo-terminals. In particular, if the content written to the pseudo-terminal is going to be written again by the attached program this would probably cause random null bytes to appear in the file. Testing with cat as the attached program, the sequence ^# is printed to the terminal whenever a null byte is sent to it.
So, YMMV.
A more proper solution would be to create a wrapper type that can masquerade as an empty string for the purposes of os.write but that would evaluate as "truthy" in a boolean context to not trigger the end of file conditional. I did some experimenting with this and couldn't figure out what needs to be faked to make os.write fully accept the wrapper as a string type. I'm unclear if it's even possible. :(
Here's my initial attempt at creating such a wrapper type:
class EmptyBytes():
def __init__(self):
self.sliced = False
def __class__(self):
return type(b"")
def __getitem__(self, _key):
return b""
I subscribe to a real time stream which publishes a small JSON record at a slow rate (0.5 KBs every 1-5 seconds). The publisher has provided a python client that exposes these records. I write these records to a list in memory. The client is just a python wrapper for doing a curl command on a HTTPS endpoint for a dataset. A dataset is defined by filters and fields. I can let the client go for a few days and stop it at midnight to process multiple days worth of data as one batch.
Instead of multi-day batches described above, I'd like to write every n-records by treating the stream as a generator. The client code is below. I just added the append() line to create a list called 'records' (in memory) to playback later:
records=[]
data_set = api.get_dataset(dataset_id='abc')
for record in data_set.request_realtime():
records.append(record)
which as expected, gives me [*] in Jupyter Notebook; and keeps running.
Then, I created a generator from my list in memory as follows to extract one record (n=1 for initial testing):
def Generator():
count = 1
while count < 2:
for r in records:
yield r.data
count +=1
But my generator definition also gave me [*] and kept calculating; which I understand it is because the list is still being written in memory. But I thought my generator would be able to lock the state of my list and yield the first n-records. But it didn't. How can I code my generator in this case? And if a generator is not a good choice in this use case, please advise.
To give you the full picture, if my code was working, then, I'd have instantiated it, printed it, and received an object as expected like this:
>>>my_generator = Generator()
>>>print(my_generator)
<generator object Gen at 0x0000000009910510>
Then, I'd have written it to a csv file like so:
with open('myfile.txt', 'w') as f:
cf = csv.DictWriter(f, column_headers, extrasaction='ignore')
cf.writeheader()
cf.writerows(i.data for i in my_generator)
Note: I know there are many tools for this e.g. Kafka; but I am in an initial PoC phase. Please use Python 2x. Once I get my code working, I plan on stacking generators to set up my next n-record extraction so that I don't lose data in between. Any guidance on stacking would also be appreciated.
That's not how concurrency works. Unless some magic is being used that you didn't tell us about, while your first code returns * you can't run more code. Putting the generator in another cell just adds it to a queue to run when the first code finishes - since the first code will never finish, the second code will never even start running!
I suggest looking into some asynchronous networking library, like asyncio, twisted or trio. They allow you to make functions cooperative so while one of them is waiting for data, the other can run, instead of blocking. You'd probably have to rewrite the api.get_dataset code to be asynchronous as well.
I'm currently using Popen to send instructions to a utility (canutils... the cansend function in particular) via the command line.
The entire function looks like this.
def _CANSend(self, register, value, readWrite = 'write'):
"""send a CAN frame"""
queue=self.CANbus.queue
cobID = hex(0x600 + self.nodeID) #assign nodeID
indexByteLow,indexByteHigh,indexByteHigher,indexByteHighest = _bytes(register['index'], register['objectDataType'])
subIndex = hex(register['subindex'])
valueByteLow,valueByteHigh,valueByteHigher,valueByteHighest = _bytes(value, register['objectDataType'])
io = hex(COMMAND_SPECIFIER[readWrite])
frame = ["cansend", self.formattedCANBus, "-i", cobID, io, indexByteLow, indexByteHigh, subIndex, valueByteLow, valueByteHigh, valueByteHigher, valueByteHighest, "0x00"]
Popen(frame,stdout=PIPE)
a=queue.get()
queue.task_done()
return a
I was running into some issues as I was trying to send frames (the Popen frame actually executes the command that sends the frame) in rapid succession, but found that the Popen line was taking somewhere on the order of 35 ms to execute... every other line was less than 2 us.
So... what might be a better way to invoke the cansend function (which, again, is part of the canutils utility..._CANSend is the python function above that calls ) more rapidly?
I suspect that most of that time is due to the overhead of forking every time you run cansend. To get rid of it, you'll want an approach that doesn't have to create a new process for each send.
According to this blog post, SocketCAN is supported by python 3.3. It should let your program create and use CAN sockets directly. That's probably the direction you'll want to go.
I currently have a Python application where newline-terminated ASCII strings are being transmitted to me via a TCP/IP socket. I have a high data rate of these strings and I need to parse them as quickly as possible. Currently, the strings are being transmitted as CSV and if the data rate is high enough, my Python application starts to lag behind the input data rate (probably not all that surprising).
The strings look something like this:
chan,2007-07-13T23:24:40.143,0,0188878425-079,0,0,True,S-4001,UNSIGNED_INT,name1,module1,...
I have a corresponding object that will parse these strings and store all of the data into an object. Currently the object looks something like this:
class ChanVal(object):
def __init__(self, csvString=None,**kwargs):
if csvString is not None:
self.parseFromCsv(csvString)
for key in kwargs:
setattr(self,key,kwargs[key])
def parseFromCsv(self, csvString):
lst = csvString.split(',')
self.eventTime=lst[1]
self.eventTimeExact=long(lst[2])
self.other_clock=lst[3]
...
To read the data in from the socket, I'm using a basic "socket.socket(socket.AF_INET,socket.SOCK_STREAM)" (my app is the server socket) and then I'm using the "select.poll()" object from the "select" module to constantly poll the socket for new input using its "poll(...)" method.
I have some control over the process sending the data (meaning I can get the sender to change the format), but it would be really convenient if we could speed up the ASCII processing enough to not have to use fixed-width or binary formats for the data.
So up until now, here are the things I've tried and haven't really made much of a difference:
Using the string "split" method and then indexing the list of results directly (see above), but "split" seems to be really slow.
Using the "reader" object in the "csv" module to parse the strings
Changing the strings being sent to a string format that I can use to directly instantiate an object via "eval" (e.g. sending something like "ChanVal(eventTime='2007-07-13T23:24:40.143',eventTimeExact=0,...)")
I'm trying to avoid going to a fixed-width or binary format, though I realize those would probably ultimately be much faster.
Ultimately, I'm open to suggestions on better ways to poll the socket, better ways to format/parse the data (though hopefully we can stick with ASCII) or anything else you can think of.
Thanks!
You can't make Python faster. But you can make your Python application faster.
Principle 1: Do Less.
You can't do less input parsing over all but you can do less input parsing in the process that's also reading the socket and doing everything else with the data.
Generally, do this.
Break your application into a pipeline of discrete steps.
Read the socket, break into fields, create a named tuple, write the tuple to a pipe with something like pickle.
Read a pipe (with pickle) to construct the named tuple, do some processing, write to another pipe.
Read a pipe, do some processing, write to a file or something.
Each of these three processes, connected with OS pipes, runs concurrently. That means that the first process is reading the socket and make tuples while the second process is consuming tuples and doing calculations while the third process is doing calculations and writing a file.
This kind of pipeline maximizes what your CPU can do. Without too many painful tricks.
Reading and writing to pipes is trivial, since linux assures you that sys.stdin and sys.stdout will be pipes when the shell creates the pipeline.
Before doing anything else, break your program into pipeline stages.
proc1.py
import cPickle
from collections import namedtuple
ChanVal= namedtuple( 'ChanVal', ['eventTime','eventTimeExact', 'other_clock', ... ] )
for line socket:
c= ChanVal( **line.split(',') )
cPickle.dump( sys.stdout )
proc2.py
import cPickle
from collections import namedtuple
ChanVal= namedtuple( 'ChanVal', ['eventTime','eventTimeExact', 'other_clock', ... ] )
while True:
item = cPickle.load( sys.stdin )
# processing
cPickle.dump( sys.stdout )
This idea of processing namedtuples through a pipeline is very scalable.
python proc1.py | python proc2.py
You need to profile your code to find out where the time is being spent.
That doesn't necessarily mean using python's profiler
For example you can just try parsing the same csv string 1000000 times with different methods. Choose the fastest method - divide by 1000000 now you know how much CPU time it takes to parse a string
Try to break the program into parts and work out how what resources are really required by each part.
The parts that need the most CPU per input line are your bottle necks
On my computer, the program below outputs this
ChanVal0 took 0.210402965546 seconds
ChanVal1 took 0.350302934647 seconds
ChanVal2 took 0.558166980743 seconds
ChanVal3 took 0.691503047943 seconds
So you see that about half the time there is taken up by parseFromCsv. But also that quite a lot of time is taken extracting the values and storing them in the class.
If the class isn't used right away it might be faster to store the raw data and use properties to parse the csvString on demand.
from time import time
import re
class ChanVal0(object):
def __init__(self, csvString=None,**kwargs):
self.csvString=csvString
for key in kwargs:
setattr(self,key,kwargs[key])
class ChanVal1(object):
def __init__(self, csvString=None,**kwargs):
if csvString is not None:
self.parseFromCsv(csvString)
for key in kwargs:
setattr(self,key,kwargs[key])
def parseFromCsv(self, csvString):
self.lst = csvString.split(',')
class ChanVal2(object):
def __init__(self, csvString=None,**kwargs):
if csvString is not None:
self.parseFromCsv(csvString)
for key in kwargs:
setattr(self,key,kwargs[key])
def parseFromCsv(self, csvString):
lst = csvString.split(',')
self.eventTime=lst[1]
self.eventTimeExact=long(lst[2])
self.other_clock=lst[3]
class ChanVal3(object):
splitter=re.compile("[^,]*,(?P<eventTime>[^,]*),(?P<eventTimeExact>[^,]*),(?P<other_clock>[^,]*)")
def __init__(self, csvString=None,**kwargs):
if csvString is not None:
self.parseFromCsv(csvString)
self.__dict__.update(kwargs)
def parseFromCsv(self, csvString):
self.__dict__.update(self.splitter.match(csvString).groupdict())
s="chan,2007-07-13T23:24:40.143,0,0188878425-079,0,0,True,S-4001,UNSIGNED_INT,name1,module1"
RUNS=100000
for cls in ChanVal0, ChanVal1, ChanVal2, ChanVal3:
start_time = time()
for i in xrange(RUNS):
cls(s)
print "%s took %s seconds"%(cls.__name__, time()-start_time)