Websocket Threading - python

Below is the code to receive live ticks using WebSocket. Each time tick is received callback function on_ticks() is called and it will print ticks.
Can I spawn a single thread in on_ticks() function and call store_ticks() function to store the ticks in the database? if yes can someone please help and show how can it be done? Or is there any other way to call store_ticks() function and store the ticks each time ticks is received?
from kiteconnect import KiteTicker
kws = KiteTicker("your_api_key", "your_access_token")
def on_ticks(ws, ticks):
print(ticks)
def on_connect(ws, response):
# Callback on successful connect.
# Subscribe to a list of instrument_tokens
ws.subscribe([738561, 5633])
def store_ticks():
# Store ticks here
def on_close(ws, code, reason):
# On connection close stop the main loop
# Reconnection will not happen after executing `ws.stop()`
ws.stop()
# Assign the callbacks.
kws.on_ticks = on_ticks
kws.on_connect = on_connect
kws.on_close = on_close
kws.connect()

If the reason you want to spawn a new thread is to avoid delays, I'd say don't be bothered.
I have been using mysql-client (MySQLDB connector) with a MariaDB server, subscribed to 100+ instruments in Full mode, for the past 2 months and there have been no delays in writing the ticks to the DB.
Also, we do not know when and how many ticks we'd receive once we start the ticker.This makes it hard to time/count and close the thread and DB connection. Could end up exhausting the connection limit and the thread # really fast. (DB connection pooling is an overkill here)
The reason I use MySQLDB connector and not pymysql - I've seen an approx 20% increase in write times while using pymsql. This wouldn't be obvious in live ticks . I had cloned a medium sized DB (1 Mill+ rows) , dumped it to a Dataframe in python and then wrote it row by row to another DB and bench marked the result for 10 iterations.
The reason I use MariaDB - all the features of MySQL enterprise edition, without the Oracle fuss.
Just make sure that you set a decent amount of Memory for the DB server you use.
This creates a breathing space for the DB's buffer just in case.
Avoiding a remote server and sticking on to a local sever also helps to great extent.
If you want to back up the data from local to a cloud backup, you can setup a daily job to dump in local, export to cloud and load to DB there
If you are looking for a walkthrough, this page has an example already, along with a code walk through video.
Edit:
I just made my code public here

You could modify your store_ticks() function to
def store_ticks(ticks):
# code to store tick into database
and then modify your on_ticks function to:
def on_ticks(ws, ticks):
print(ticks)
store_ticks(ticks)
What goes inside store_ticks(ticks) is dependent on what database you want to use and what info exactly you wish to store in there.
EDIT:
To spawn a new thread for store_ticks(), use the _thread module:
import _thread
def store_ticks(ticks):
# code to store tick into database
def on_ticks(ticks):
print(ticks)
try:
_thread.start_new_thread(store_ticks, (ticks,))
except:
# unable to start the thread, probably want some logging here

import a Queue and Threading
on_tick() insert data in to the Queue
store_ticks method contains code to save to database and clear Queue
start another Deamon thread sharing the data in Queue and store_ticks
PS: very lazy to open editor and write code

Related

Telemetry data through python socket, without stopping execution of the program

I'm building photovoltaic motorized solar trackers. They're controlled by Raspberry Pi's running python script. RPI's are connected to my public openVPN server for remote control and continuous software development. That's working fine. Recently a passionate customer asked me for some sort of telemetry data for his tracker - let's say, it's current orientation, measured wind speed etc.. By being new to python, I'm really struggling with this part.
I've decided to use socket approach from guides like this. Python script listens on a socket, and my openVPN server, which is also web server, connects to it using PHP fsockopen. Python sends telemetry data, PHP makes it user friendly and displays it on the web. Everything so far works, however I don't know how to design my python script around it.
The problem is, that my script has to run continuously, and socket.accept() halts it's execution, waiting for a connection. Didn't find any obvious solution on the web. Would multi-threading work for this? Sounds a bit like overkill.
Is there a way to run socket listening asynchronously? Like, for example, pigpio callback's which I'm using abundantly?
Or alternatively, is there a better way to accomplish my goal?
I tried with remote accessing status file that my script is maintaining, but that proved to be extremely involved with setup and prone to errors when the file was being written.
I also tried running the second script. Problem is, then I have no access to relevant data, or I need to read beforementioned status file, and that leads to the same problems as above.
Relevant bit of code is literally only this:
# Main loop
try:
while True:
# Telemetry
conn, addr = S.accept()
conn.send(data.encode())
conn.close()
Best regards.
For a simple case like this I would probably just wrap the socket code into a separate thread.
With multithreading in python, the Global Interpreter Lock (GIL) means that only one thread executes at a time, so you don't really need to add any further locks to the data if you're just reading the values, and don't care if it's also being updated at the same time.
Your code would essentially read something like:
from threading import Thread
def handle_telemetry_requests():
# Main loop
try:
while True:
# Telemetry
conn, addr = S.accept()
conn.send(data.encode())
conn.close()
except:
# Error handling here (this will cause thread to exit if any error occurs)
pass
socket_thread = Thread(target=handle_telemetry_requests)
socket_thread.daemon = True
socket_thread.start()
Setting the daemon flag means that when the main application ends, the thread will also be terminated.
Python does provide the asyncio module - which may provide the callbacks you're looking for (though I don't have any experience with this).
Other options are to run a flask server in the python apps which will handle the sockets for you and you can just code the endpoints to request the data. Or think about using an MQTT broker - the current data can be written to that - and other apps can subscribe to updates.

Async callbacks on firestore on_snapshot using python

I have an application that listens to updates from a Firestore collection using google-cloud-firestore. For each update I need to do upload some data to an FTP server which takes time. Receiving a lot of data at the same time introduces delay that is not acceptable and I figure the answer is async callback (i.e. do not wait for my callback to end before continuing) but is that possible.
Imagine a script like this
from google.cloud.firestore import Client
import time
def callback(col_snapshot, changes, read_time):
print("Received updates")
# mock FTP upload
time.sleep(1)
print("Finished handling the updates")
Client().collection('news').on_snapshot(callback)
while True:
pass
How can I modify that code so it doesn't queue each callback.
Update
I've created a feature request at google-cloud-firestore
What you need to do is use one of the approaches mentioned in this SO question
My suggestion is using multiprocessing module in Python 3

How do I make PySolr drop a connection?

I'm working on time series charts for 300+ clients.
It is beneficial to us to pull each client separately as the combined data is huge and in some cases clients data is resampled or manipulated in a slightly different fashion.
My problem is that the function I loop through to get each client data opens 3 new threads but never closes the threads (I'm assuming the connection stays open) when the request is complete and the function returns the data.
Once I have the results of a client, I'd like to close that connection. I just can't figure out how to do that and haven't been able to find anything in my searches.
def solr_data_pull(submitterId):
zookeeper= pysolr.ZooKeeper('ndhhadr1dnp11,ndhhadr1dnp12,ndhhadr1dnp13:2181/solr')
solr = pysolr.SolrCloud(zookeeper, collection='tran_timings', timeout=60)
query = ('SubmitterId:'+ str(submitterId) +' AND Tier:'+tier+' AND Mode:'+mode+' '
'AND Timestamp:['+ str(start_period)+' TO '+ str(end_period)+ '] ')
results = solr.search(rows=50000, q=[query], fl=[fl_list])
return(pd.DataFrame(list(results)))
PySolr uses the Session object from requests as its underlying library (which in turn uses urllib3s connection pooling), so calling solr.get_session().close() should close all connections and drain the pool:
def close(self):
"""Closes all adapters and as such the session"""
(SolrCloud is an extension of Solr which have the get_session() method.)
For disconnecting from Zookeeper - which you probably shouldn't if its a long running session as it'll have to set up watches etc. again, you can use the .zk object directly on your SolrCloud instance - zk is a KazooClient:
stop()
Gracefully stop this Zookeeper session.
close()
Free any resources held by the client.
This method should be called on a stopped client before
it is discarded. Not doing so may result in filehandles
being leaked.

What is the most efficient way to run independent processes from the same application in Python

I have a script that in the end executes two functions. It polls for data on a time interval (runs as daemon - and this data is retrieved from a shell command run on the local system) and, once it receives this data will: 1.) function 1 - first write this data to a log file, and 2.) function 2 - observe the data and then send an email IF that data meets certain criteria.
The logging will happen every time, but the alert may not. The issue is, in cases that an alert needs to be sent, if that email connection stalls or takes a lengthy amount of time to connect to the server, it obviously causes the next polling of the data to stall (for an undisclosed amount of time, depending on the server), and in my case it is very important that the polling interval remains consistent (for analytics purposes).
What is the most efficient way, if any, to keep the email process working independently of the logging process while still operating within the same application and depending on the same data? I was considering creating a separate thread for the mailer, but that kind of seems like overkill in this case.
I'd rather not set a short timeout on the email connection, because I want to give the process some chance to connect to the server, while still allowing the logging to be written consistently on the given interval. Some code:
def send(self,msg_):
"""
Send the alert message
:param str msg_: the message to send
"""
self.msg_ = msg_
ar = alert.Alert()
ar.send_message(msg_)
def monitor(self):
"""
Post to the log file and
send the alert message when
applicable
"""
read = r.SensorReading()
msg_ = read.get_message()
msg_ = read.get_message() # the data
if msg_: # if there is data in general...
x = read.get_failed() # store bad data
msg_ += self.write_avg(read)
msg_ += "==============================================="
self.ctlog.update_templog(msg_) # write general data to log
if x:
self.send(x) # if bad data, send...
This is exactly the kind of case you want to use threading/subprocesses for. Fork off a thread for the email, which times out after a while, and keep your daemon running normally.
Possible approaches that come to mind:
Multiprocessing
Multithreading
Parallel Python
My personal choice would be multiprocessing as you clearly mentioned independent processes; you wouldn't want a crashing thread to interrupt the other function.
You may also refer this before making your design choice: Multiprocessing vs Threading Python
Thanks everyone for the responses. It helped very much. I went with threading, but also updated the code to be sure it handled failing threads. Ran some regressions and found that the subsequent processes were no longer being interrupted by stalled connections and the log was being updated on a consistent schedule . Thanks again!!

Gspread - Change Listener?

I currently run a daemon thread that grabs all cell values, calculates if there's a change, and then writes out dependent cells in a loop, ie:
def f():
while not event.is_set():
update()
event.wait(15)
Thread(target=f).start()
This works, but the looped get-all calls are significant I/O.
Rather than doing this, it would be much cleaner if the thread was notified of changes by Google Sheets. Is there a way to do this?
I rephrased my comment on gspread GitHub's Issues:
Getting a change notification from Google Sheets is possible with help of installable triggers in Apps Script. You set up a custom function in the Scripts editor and assign a trigger event for this function. In this function you can fetch an external url with UrlFetchApp.fetch.
On the listening end (your web server) you'll have a handler for this url. This handler will do the job. Depending on the server configuration (many threads or processes) make sure to avoid possible race condition.
Also, I haven't tested non browser-triggered updates. If Sheets trigger the same event for this type of updates there could be a case for infinite loops.
I was able to get this working by triggering an HTTP request whenever Google Sheets detected a change.
On Google Sheets:
function onEdit (e) {
UrlFetchApp.fetch("http://myaddress.com");
}
Python-side (w/ Tornado)
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
on_edit()
self.write('Updating.')
def on_edit():
# Code here
pass
app = tornado.web.Application([(r'/', MainHandler)])
app.listen(#port here)
tornado.ioloop.IOLoop.current().start()
I don't think this sort of functionality should be within the scope of gspread, but I hope the documentation helps others.

Categories