Block jupyter notebook cell execution till specific message received - python

I'm trying to implement asynchronous, distributed computation engine for python, which is compatible with jupyter notebook. The system is supposed to be based on 'push notification' approach what makes it (almost, I hope) impossible to allows user to wait for specific computation result (i.e. block execution of given notebook cell until message with expected result is delivered). To be precise, I'm trying to:
Add new task to jupyter notebook event loop (the task is periodically checking if specific msg has arrived in while loop, breaks when msg arrived)
Block current cell waiting for the task to be completed.
Still be able to process incoming messages (Using RabbitMQ, Pika, slightly modified code from http://pika.readthedocs.io/en/0.10.0/examples/asynchronous_consumer_example.html)
I have prepared notebooks presenting my problem: https://github.com/SLEEP-MAN/RabbitMQ_jupyterNotebook_asyncio
Any ideas? Is it possible (maybe some IPython/IpyKernel magic ;>?), or I have to change my approach by 180 degree?

Your issue is that you mixed two different loops in one. That is why it didn't work. You need to make few changes.
Use AsyncioConnection instead of TornadoConnection
return adapters.AsyncioConnection(pika.URLParameters(self._url),
self.on_connection_open)
Next you need to remove below line
self._connection.ioloop.start() #throws exception but not a problem...
Because your loop is already started in connect. Then you need use the below code for waiting
loop = asyncio.get_event_loop()
loop.run_until_complete(wait_for_eval())
And now it works

Related

Can threads work in an organized sequence?

I have a python script. It is a test by the way for some hardware (automotive).
The hardware I test has 2 processors, which gives a lot of logs (output on its own console that I can connect via Linux minicom) while it works.
Firstly, I needed to record how much time pass from my trigger activity to the processors's response messages. Let's name it- processor A and processor B. The messages have been coming up in different order. So, for not being stopped by waiting for subsequent signal (while the other could come first from the other processor) and for not making my second signal's time incorrectly higher, I used threads. Thread1 waiting for message 1 on processor A and thread2 waiting for message 2 on processor B. That worked fine, regardless of the order that messages 1 and 2 have come. I put that threads to an array and in 'for' loop started them and then in 'for' loop joined them.
But now I need to make sth like this: on 'B', I am still waiting only for one line of message output. And that works fine, the thread for that operation reads log-line and calculate correct time form trigger to that log. But on processor 'B', I have to check times for new 7 logs. So I made 7 new functions for reading this 7 new messages and calculating time for them. Assigned that functions to 7 new threads. Put them, as previously in the array. Then started them in for-loop and joined them also in for-loop.
What is the result? The only one message from B-processor is read and time is recorded correctly. But no message from A-processor is detected (although they came for sure).
And here one info — there is a function for reading console messages-logs, line by line, from the device. But it works like a stack — once read line is 'popped out' from a buffer. So my 8 threads are popping out messages they aren't interested in, and there is almost no chance for detecting a message by proper thread, before it will be popped out by the other.
So my idea for solving the problem is to make 8 threads work in strict order : 1-2-3-4-5-6-7-8-1-2-3... over every next log line. However... I don't know how to do that. Please, share some ideas :)

Can't get my async function to properly work

I'm trying to build a script based on data provided by a WebSocket, but I have a tricky problem I can't solve. I have two cells.
The first one:
msg = ''
stream = {}
async def call_api(msg):
async with websockets.connect('wss://www.bitmex.com/realtime?subscribe=quote:XBTUSD,quote:ETHU19') as websocket:
await websocket.send(msg)
while websocket.open:
response = await websocket.recv()
response = json.loads(response)
if 'data' in list(response.keys()):
response = response['data']
for n in range(len(response)):
symbol = response[n]['symbol']
stream[symbol] = response[n]['askPrice']
loop = asyncio.get_event_loop()
loop.create_task(call_api(msg))
The second one:
stream['XBTUSD']
If I run the first cell in Jupyter Notebook and then run the second cell manually afterward, python will print the correct value. But if I press the "restart the current kernel and re-execute the whole notebook" button I get the error KeyError: 'XBTUSD' at the second cell. This error also happens when I run the script with the python shell.
I can't understand the difference in behavior between these two executions.
This is because you created an asyncio task in the first cell but did not wait for it to finish. loop.create_task() immediately returns and let the event loop to continue execution of the created task in background as long as the event loop is alive. (In this case, the event loop keeps running while your notebook kernel is running.) Therefore, loop.create_task() makes your Jupyter notebook to think that the first cell is done immediately.
Note that Jupyter notebook itself also works asynchronously against the kernel process, so if you run the second cell after the first cell too quickly (e.g., using "restart the current kernel and re-execute the whole notebook" instead of manually clicking the Run button), the first cell's asyncio task would not finish before the second cell's execution starts.
To ensure the first cell to actually finish the task before reporting that the cell's execution has finished, use run_until_complete() instead of create_task():
loop = asyncio.get_event_loop()
loop.run_until_complete(call_api(msg))
or, to get additional control over your task with a reference to it:
loop = asyncio.get_event_loop()
t = loop.create_task(call_api(msg))
loop.run_until_complete(t)
If you want to keep the task running in background for an indefinite time, you need a different approach.
Don't use Jupyter notebook and write a daemonized process to continuously fetch and process websocket messages. Jupyter does not provide any means to keep track of background asyncio tasks in the kernel process and execute cells by event triggers from such background tasks. Jupyter notebook is simply not a tool for such patterns.
To decouple websocket message receiver and the processing routines, use an intermediate queue. If both sides run in the same process and the same event loop, you may use asyncio.Queue. If the processing happens in a different thread using synchronous codes, you could try out janus. If the processing happens in a different process, use multiprocessing.Queue or some other IPC mechanisms.

What is the most efficient way to run independent processes from the same application in Python

I have a script that in the end executes two functions. It polls for data on a time interval (runs as daemon - and this data is retrieved from a shell command run on the local system) and, once it receives this data will: 1.) function 1 - first write this data to a log file, and 2.) function 2 - observe the data and then send an email IF that data meets certain criteria.
The logging will happen every time, but the alert may not. The issue is, in cases that an alert needs to be sent, if that email connection stalls or takes a lengthy amount of time to connect to the server, it obviously causes the next polling of the data to stall (for an undisclosed amount of time, depending on the server), and in my case it is very important that the polling interval remains consistent (for analytics purposes).
What is the most efficient way, if any, to keep the email process working independently of the logging process while still operating within the same application and depending on the same data? I was considering creating a separate thread for the mailer, but that kind of seems like overkill in this case.
I'd rather not set a short timeout on the email connection, because I want to give the process some chance to connect to the server, while still allowing the logging to be written consistently on the given interval. Some code:
def send(self,msg_):
"""
Send the alert message
:param str msg_: the message to send
"""
self.msg_ = msg_
ar = alert.Alert()
ar.send_message(msg_)
def monitor(self):
"""
Post to the log file and
send the alert message when
applicable
"""
read = r.SensorReading()
msg_ = read.get_message()
msg_ = read.get_message() # the data
if msg_: # if there is data in general...
x = read.get_failed() # store bad data
msg_ += self.write_avg(read)
msg_ += "==============================================="
self.ctlog.update_templog(msg_) # write general data to log
if x:
self.send(x) # if bad data, send...
This is exactly the kind of case you want to use threading/subprocesses for. Fork off a thread for the email, which times out after a while, and keep your daemon running normally.
Possible approaches that come to mind:
Multiprocessing
Multithreading
Parallel Python
My personal choice would be multiprocessing as you clearly mentioned independent processes; you wouldn't want a crashing thread to interrupt the other function.
You may also refer this before making your design choice: Multiprocessing vs Threading Python
Thanks everyone for the responses. It helped very much. I went with threading, but also updated the code to be sure it handled failing threads. Ran some regressions and found that the subsequent processes were no longer being interrupted by stalled connections and the log was being updated on a consistent schedule . Thanks again!!

running 2 python scripts without them effecting each other

I have 2 python scripts I'm trying to run side by side. However, each of them have to open and close and reopen independently from each other. Also, one of the scripts is running inside a shell script.
Flaskserver.py & ./pyinit.sh
Flaskserver.py is just a flask server that needs to be restarted everynow and again to load a new page. (cant define all pages as the html is interchangeable). the pyinit is runs as xinit ./pyinit.sh (its selenium-webdriver pythoncode)
So when the Flaskserver changes and restarts the ./pyinit needs to wait about 20 seconds then restart as well.
Either one of these can create errors so I need to be able to check if Flaskserver has an error before restarting ./pyinit if ./pyinit errors i need to set the Flaskserver to a default value and then relaunch both of them.
I know a little about subprocess but I'm unsure on how it can deal with errors and stop-start code.
Rather than using sub-process I would recommend you to create a different thread for your processes using multithread.
Multithreading will not solve the problem if global variables are colliding, but by running them in different scripts, while you might solve this, you might collide in something else like a log file.
Now, if you keep both processes running from a single process that takes care of keeping them separated and assigning different global variables where necessary, you should be able to keep a better control. Using things like join and lock from the multithreading library, will also ensure that they don't collide and it should be easy to put a process to sleep while the other is running (as per waiting 20 secs).
You can keep a thread list as a global variable, as well as your lock. I have done this successfully with CherryPy's server for example. Any more details about multithreading look into the question I linked above, it's very well explained.

Python re establishment after Server shuts down

I have a python script running on my server which accessed a database, executes a fetch query and runs a learning algorithm to classify and updates certain values and means depending on the query.
I want to know if for some reason my server shuts down in between then my python script would shut down and my query lost.
How do i get to know where to continue from once I re-run the script and i want to carry on the updated means from the previous queries that have happened.
First of all: the question is not really related to Python at all. It's a general problem.
And the answer is simple: keep track of what your script does (in a file or directly in db). If it crashes continue from the last step.

Categories