outReceived from twisted ProcessProtocol merges messages if received too fast (buffering problem?) - python

I am using Klein, a micro web-framework based on twisted. I have a server (running on windows!), which will spawn an external long running process (end-to-end test) via reactor.spawnProcess().
To send status information about the running test, I implemented a ProcessProtocol:
class IPCProtocol(protocol.ProcessProtocol):
def __init__(self, status: 'Status', history: 'History'):
super().__init__()
self.status: Status = status
self.history: History = history
self.pid = None
def connectionMade(self):
self.pid = self.transport.pid
log.msg("process started, pid={}".format(self.pid))
def processExited(self, reason):
log.msg("process exited, status={}".format(reason.value.exitCode))
# add current run to history
self.history.add(self.status.current_run)
# create empty testrun and save status
self.status.current_run = Testrun()
self.status.status = StatusEnum.ready
self.status.save()
# check for more queue items
if not self.status.queue.is_empty():
start_testrun()
def outReceived(self, data: bytes):
data = data.decode('utf-8').strip()
if data.startswith(constants.LOG_PREFIX_FAILURE):
self.failureReceived()
if data.startswith(constants.LOG_PREFIX_SERVER):
data = data[len(constants.LOG_PREFIX_SERVER):]
log.msg("Testrunner: " + data)
self.serverMsgReceived(data)
I start the process with the following command:
ipc_protocol = IPCProtocol(status=app.status, history=app.history)
args = [sys.executable, 'testrunner.py', next_entry.suite, json.dumps(next_entry.testscripts)]
log.msg("Starting testrunn.py with args: {}".format(args))
reactor.spawnProcess(ipc_protocol, sys.executable, args=args)
To send information, I just print out messages (with a prefix to distinct them) in my testrunner.py.
The problem is that if I send the print commands to fast, then outReceived will merge the messages.
I already tried adding a flush=True for print() calls in the external process, but this didn't fix the problem. Some other question suggested using usePTY=True for the spawnProcess but this is not supported under windows.
Is there a better way to fix this, than adding a small delay (like time.sleep(0.1)) to each print()call?

You didn't say it, but it seems like the child process writes lines to its stdout.
You need to parse the output to find the line boundaries if you want to operate on these lines.
You can use LineOnlyReceiver to help you with this. Since processes aren't stream transports, you can't just use LineOnlyReceiver directly. You have to adapt it to the process protocol interface. You can do this yourself or you can use ProcessEndpoint (instead of spawnProcess) to do it for you.
For example:
from twisted.protocols.basic import LineOnlyReceiver
from twisted.internet.protocol import Factory
from twisted.internet.endpoints import ProcessEndpoint
from twisted.internet import reactor
endpoint = ProcessEndpoint(reactor, b"/some/some-executable", ...)
spawning_deferred = endpoint.connect(Factory.forProtocol(LineOnlyReceiver))
...

Related

Python - improving logging to file and console using multiprocessing

I am trying to download a file on CAN bus using python-can. It involves sending data very quickly (in the order of 2-3 messages per millisecond). I am trying to log to file these messages without impacting the speed of sending. Doing the file I/O slows down the sending due to the logging overhead. I tried various methods to improve this (including using queues and reading the queue from another thread but this was not much better - possibly due to GIL). Most of these tests started with using the Python logging module and trying various handlers (QueueHandler/QueueListener, MemoryHandler, etc).
I've managed to make some significant improvements by moving the file I/O into a separate process. I initially ran into an issue with the overhead of sending data from one process to another - so I now buffer it. Now, instead of taking 150% longer with direct file I/O in the main process, I see ~20% increase in time.
I thought that, since this is running in another process, I could also print() the data to console (which I know is relative expensive) but I see a huge increase in the file download time.
What is happening that means the print() affects the main process even though it is running in a child process?
Code below:
file_logger_mp() is called from the main process and it starts the child process that does the logging. The main process then uses the log_hdl function to add a message to the buffer. When the buffer reaches a certain size (100) it is sent to the child process for logging to file or printing to console.
Device: rpi4. And the main process uses asyncio, in case that affects it.
def file_logger_mp(logger_name: str, log_file_pth: str):
conn_rec, conn_send = multiprocessing.Pipe()
log_hdl_c = MyLogger(conn_send)
log_hdl = log_hdl_c.log_hdl # This is used by main code to provide log messages to child process
listener = MyProcess(conn_rec, log_file_pth)
atexit.register(log_hdl_c.final_flush, listener)
listener.start() # Start the child process
return log_hdl, listener
class MyLogger():
def __init__(self, conn_send) -> None:
self.buffer = []
self.conn_send = conn_send
def log_hdl(self, msg):
self.buffer.append(msg)
if len(self.buffer) > 100:
self.conn_send.send(self.buffer)
self.buffer.clear()
def final_flush(self, listener):
self.conn_send.send(self.buffer)
listener.terminate()
class MyProcess(multiprocessing.Process):
def __init__(self, queue, f_hdl):
multiprocessing.Process.__init__(self)
self.exit = multiprocessing.Event()
self.queue = queue
self.f_hdl = f_hdl
def run(self):
f = open(self.f_hdl, "w+")
while not self.exit.is_set():
try:
record = self.queue.recv()
for msg in record:
output = str(msg)
f.write(output+'\n')
print(output) # This `print()` causes large delays to main process?!
record.clear()
except Exception:
import sys, traceback
print('Whoops! Problem:', file=sys.stderr)
traceback.print_exc(file=sys.stderr)
for msg in record: # Flush any pending records before finishing
f.write(str(msg)+'\n')
f.close()
def terminate(self):
self.exit.set()

Non-blocking thread of subprocess stdout.PIPE stream with a queue on Windows still hangs

I am trying to work with subprocess routine that spawns an interactive child process which expects user inputs. This process normally hangs immediately if I try to read its stdout stream directly.
I read through many solutions using fcntl, asynchronous operations, pexpect and file output and reading redirections. Although temporary log files should work, I don't want to go through that route as I would like to keep the process interactive within the Python interface. From all of those, threads seemed to be the most easiest and straightforward way (I could not get pexpect to work properly, although it seemed to be a good option, too).
Indeed, when I implemented the following code (stolen from Non-blocking read on a subprocess.PIPE in python):
import os
import subprocess as sp
from threading import Thread
from queue import Queue, Empty
class App:
def __init__(self):
proc = sp.Popen(['app'], stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE, encoding='utf8')
out = NonBlockingStreamReader(proc.stdout)
print(out.readline(1))
class NonBlockingStreamReader:
def __init__(self, stream):
self.s = stream
self.q = Queue()
def populateQueue(stream, queue):
while True:
line = stream.readline()
if line:
queue.put(line)
else:
raise UnexpectedEndOfStream
self.t = Thread(target = populateQueue, args = (self.s, self.q))
self.t.daemon = True
self.t.start()
def readline(self, timeout = None):
try:
return self.q.get(block = timeout is not None, timeout = timeout)
except Empty:
return None
class UnexpectedEndOfStream(Exception):
pass
everything worked, flawlessly. Well, the problem is -- it worked on Linux only, even though the solution should be Windows compatible.
When I try to run this implementation on Windows, the newly created thread hangs the moment it tries to execute stream.readline(), never gets to actually populate the queue and thus the output of out.readline(1) read from the main thread is None.
How can I make this work on Windows?

Python multiprocessing module do not work

i am trying to write a spider with multiprocessing module
here is my python code:
# -*- coding:utf-8 -*-
import multiprocessing
import requests
class SpiderWorker(object):
def __init__(self, q):
self._q = q
def run(self):
def _crawl_item(url):
requests.get("http://www.baidu.com")
if respon.ok:
print respon.url
while True:
rst = self._q.get()
_crawl_item(rst)
def general_worker():
q = multiprocessing.Queue()
CPU_COUNT = multiprocessing.cpu_count()
worker_processes = [
multiprocessing.Process(target=SpiderWorker(q).run)
for i in range(CPU_COUNT)
]
map( lambda process: process.start(), worker_processes )
return q, worker_processes
maybe it is my process way wrong
every time i run this code, my process tell me
<Process(Process-1, stopped[SIGSEGV])>
hope love it
The major problem here is that you don't have any information on why your processes fail. It could be gevent, but it could just as easily be something else. So learning the actual reason why your processes get terminated is the first step before doing anything else.
What you need is multiprocessing.log_to_stderr():
class SpiderWorker(object):
# ...
def run(self):
logger = multiprocessing.log_to_stderr()
logger.setLevel(multiprocessing.SUBDEBUG)
try:
# Here goes your original run() code
except Exception:
logger.exception('whoopsie')
What this code does:
Creates a special logger which will transmit it's information to the main process and dump it to stderr (console by default).
Configures this logger to report everything, including some internal multiprocessing module events (just in case as you probably don't need them).
Wraps your entire code in catch-all statement so whatever happens there cannot escape your notice.
Runs .exception() method on the logger, which not only logs the message (it's meaningless anyway as we don't know what actually happens) but most importantly logs the entire error traceback - which we actually need.

Runing class method multiple times parallel in Python

I have implemented a Python socket server. It sends image data from multiple cameras to a client. My request handler class looks like:
class RequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
while True:
data = self.request.recv(1024)
if data.endswith('0000000050'): # client requests data
for camera_id, camera_path in _video_devices.iteritems():
message = self.create_image_transfer_message(camera_id, camera_path)
self.request.sendto(message, self.client_address)
def create_image_transfer_message(self, camera_id, camera_path):
# somecode ...
I am forced to stick to the socket server because of the client. It works however the problem is that it works sequentially, so there are large delays between the camera images being uploaded. I would like to create the transfer messages in parallel with a small delay between the calls.
I tried to use the pool class from multiprocessing:
import multiprocessing
class RequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
...
pool = multiprocessing.Pool(processes=4)
messages = [pool.apply(self.create_image_transfer_message, args=(camera_id, camera_path)) for camId, camPath in _video_devices.iteritems()]
But this throws:
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
I want to know if there is an another way to create those transfer messages in parallel with a defined delay between the calls?
EDIT:
I create the response messages using data from multiple cameras. The problem is, that if I run the image grabbing routines too close to each other I get image artifacts, because the USB bus is overloaded. I figured out, that calling the image grabbing sequentially with 0.2 sec delay will solve the problem. The cameras are not sending data the whole time the image grabbing function is running, so the delayed parallel cal result in good images with only a small delay between them.
I think you're on the right path already, no need to throw away your work.
Here's an answer to how to use a class method with multiprocessing I found via Google after searching for "multiprocessing class method"
from multiprocessing import Pool
import time
pool = Pool(processes=2)
def unwrap_self_f(arg, **kwarg):
return C.create_image_transfer_message(*arg, **kwarg)
class RequestHandler(SocketServer.BaseRequestHandler):
#classmethod
def create_image_transfer_message(cls, camera_id, camera_path):
# your logic goes here
def handle(self):
while True:
data = self.request.recv(1024)
if not data.endswith('0000000050'): # client requests data
continue
pool.map(unwrap_self_f,
(
(camera_id, camera_path)
for camera_id, camera_path in _video_devices.iteritems()
)
)
Note, if you want to return values from the workers then you'll need to explore using a shared resource see this answer here - How can I recover the return value of a function passed to multiprocessing.Process?
This code did the trick for me:
class RequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
while True:
data = self.request.recv(1024)
if data.endswith('0000000050'): # client requests data
process_manager = multiprocessing.Manager()
messaging_queue = process_manager.Queue()
jobs = []
for camId, camPath in _video_devices.iteritems():
p = multiprocessing.Process(target=self.create_image_transfer_message,
args=(camera_id, camera_path, messaging_queue))
jobs.append(p)
p.start()
time.sleep(0.3)
# wait for all processes to finish
for p in jobs:
p.join()
while not messaging_queue.empty():
self.request.sendto(messaging_queue.get(), self.client_address)

Remove threads usage from script

The next script I'm using is used to listen to IMAP connection using IMAP IDLE and depends heavily on threads. What's the easiest way for me to eliminate the treads call and just use the main thread?
As a new python developer I tried editing def __init__(self, conn): method but just got more and more errors
A code sample would help me a lot
#!/usr/local/bin/python2.7
print "Content-type: text/html\r\n\r\n";
import socket, ssl, json, struct, re
import imaplib2, time
from threading import *
# enter gmail login details here
USER="username#gmail.com"
PASSWORD="password"
# enter device token here
deviceToken = 'my device token x x x x x'
deviceToken = deviceToken.replace(' ','').decode('hex')
currentBadgeNum = -1
def getUnseen():
(resp, data) = M.status("INBOX", '(UNSEEN)')
print data
return int(re.findall("UNSEEN (\d)*\)", data[0])[0])
def sendPushNotification(badgeNum):
global currentBadgeNum, deviceToken
if badgeNum != currentBadgeNum:
currentBadgeNum = badgeNum
thePayLoad = {
'aps': {
'alert':'Hello world!',
'sound':'',
'badge': badgeNum,
},
'test_data': { 'foo': 'bar' },
}
theCertfile = 'certif.pem'
theHost = ('gateway.push.apple.com', 2195)
data = json.dumps(thePayLoad)
theFormat = '!BH32sH%ds' % len(data)
theNotification = struct.pack(theFormat, 0, 32,
deviceToken, len(data), data)
ssl_sock = ssl.wrap_socket(socket.socket(socket.AF_INET,
socket.SOCK_STREAM), certfile=theCertfile)
ssl_sock.connect(theHost)
ssl_sock.write(theNotification)
ssl_sock.close()
print "Sent Push alert."
# This is the threading object that does all the waiting on
# the event
class Idler(object):
def __init__(self, conn):
self.thread = Thread(target=self.idle)
self.M = conn
self.event = Event()
def start(self):
self.thread.start()
def stop(self):
# This is a neat trick to make thread end. Took me a
# while to figure that one out!
self.event.set()
def join(self):
self.thread.join()
def idle(self):
# Starting an unending loop here
while True:
# This is part of the trick to make the loop stop
# when the stop() command is given
if self.event.isSet():
return
self.needsync = False
# A callback method that gets called when a new
# email arrives. Very basic, but that's good.
def callback(args):
if not self.event.isSet():
self.needsync = True
self.event.set()
# Do the actual idle call. This returns immediately,
# since it's asynchronous.
self.M.idle(callback=callback)
# This waits until the event is set. The event is
# set by the callback, when the server 'answers'
# the idle call and the callback function gets
# called.
self.event.wait()
# Because the function sets the needsync variable,
# this helps escape the loop without doing
# anything if the stop() is called. Kinda neat
# solution.
if self.needsync:
self.event.clear()
self.dosync()
# The method that gets called when a new email arrives.
# Replace it with something better.
def dosync(self):
print "Got an event!"
numUnseen = getUnseen()
sendPushNotification(numUnseen)
# Had to do this stuff in a try-finally, since some testing
# went a little wrong.....
while True:
try:
# Set the following two lines to your creds and server
M = imaplib2.IMAP4_SSL("imap.gmail.com")
M.login(USER, PASSWORD)
M.debug = 4
# We need to get out of the AUTH state, so we just select
# the INBOX.
M.select("INBOX")
numUnseen = getUnseen()
sendPushNotification(numUnseen)
typ, data = M.fetch(1, '(RFC822)')
raw_email = data[0][1]
import email
email_message = email.message_from_string(raw_email)
print email_message['Subject']
#print M.status("INBOX", '(UNSEEN)')
# Start the Idler thread
idler = Idler(M)
idler.start()
# Sleep forever, one minute at a time
while True:
time.sleep(60)
except imaplib2.IMAP4.abort:
print("Disconnected. Trying again.")
finally:
# Clean up.
#idler.stop() #Commented out to see the real error
#idler.join() #Commented out to see the real error
#M.close() #Commented out to see the real error
# This is important!
M.logout()
As far as I can tell, this code is hopelessly confused because the author used the "imaplib2" project library which forces a threading model which this code then never uses.
Only one thread is ever created, which wouldn't need to be a thread but for the choice of imaplib2. However, as the imaplib2 documentation notes:
This module presents an almost identical API as that provided by the standard python library module imaplib, the main difference being that this version allows parallel execution of commands on the IMAP4 server, and implements the IMAP4rev1 IDLE extension. (imaplib2 can be substituted for imaplib in existing clients with no changes in the code, but see the caveat below.)
Which makes it appear that you should be able to throw out much of class Idler and just use the connection M. I recommend that you look at Doug Hellman's excellent Python Module Of The Week for module imaplib prior to looking at the official documentation. You'll need to reverse engineer the code to find out its intent, but it looks to me like:
Open a connection to GMail
check for unseen messages in Inbox
count unseen messages from (2)
send a dummy message to some service at gateway.push.apple.com
Wait for notice, goto (2)
Perhaps the most interesting thing about the code is that it doesn't appear to do anything, although what sendPushNotification (step 4) does is a mystery, and the one line that uses an imaplib2 specific service:
self.M.idle(callback=callback)
uses a named argument that I don't see in the module documentation. Do you know if this code ever actually ran?
Aside from unneeded complexity, there's another reason to drop imaplib2: it exists independently on sourceforge and PyPi which one maintainer claimed two years ago "An attempt will be made to keep it up-to-date with the original". Which one do you have? Which would you install?
Don't do it
Since you are trying to remove the Thread usage solely because you didn't find how to handle the exceptions from the server, I don't recommend removing the Thread usage, because of the async nature of the library itself - the Idler handles it more smoothly than a one thread could.
Solution
You need to wrap the self.M.idle(callback=callback) with try-except and then re-raise it in the main thread. Then you handle the exception by re-running the code in the main thread to restart the connection.
You can find more details of the solution and possible reasons in this answer: https://stackoverflow.com/a/50163971/1544154
Complete solution is here: https://www.github.com/Elijas/email-notifier

Categories