I have an application that listens to updates from a Firestore collection using google-cloud-firestore. For each update I need to do upload some data to an FTP server which takes time. Receiving a lot of data at the same time introduces delay that is not acceptable and I figure the answer is async callback (i.e. do not wait for my callback to end before continuing) but is that possible.
Imagine a script like this
from google.cloud.firestore import Client
import time
def callback(col_snapshot, changes, read_time):
print("Received updates")
# mock FTP upload
time.sleep(1)
print("Finished handling the updates")
Client().collection('news').on_snapshot(callback)
while True:
pass
How can I modify that code so it doesn't queue each callback.
Update
I've created a feature request at google-cloud-firestore
What you need to do is use one of the approaches mentioned in this SO question
My suggestion is using multiprocessing module in Python 3
Related
I'm trying to develop an event-hub trigger azure function that could receive events from a first event-hub and send these events to a second event-hub.
As additional features I'd like my function to be asynchronous and use store checkpoints in an Azure Blob Storage.
To do so, I wanted to use the EventHubConsumerClient class of the azure-eventhub library (https://pypi.org/project/azure-eventhub/, https://learn.microsoft.com/en-us/javascript/api/#azure/event-hubs/eventhubconsumerclient?view=azure-node-latest)
However, it seems I cannot receive the events in the first place when I'm testing the function locally on VSCode.
The Event Hub I am listening has two partitions. Its shared access policy is set to send and listen.
I have a small script to send him messages for testing and it works great.
My Azure function runtime is 4.x with python 3.9.13 with locally a conda base.
Here is the code of my function to receive the events with EventHubConsumerClient class in my init.py:
import logging
import asyncio
import os
from azure.eventhub.aio import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreblobaio import BlobCheckpointStore
import azure.functions as func
CONNECTION_STR = os.environ.get("EVENT_HUB_CONN_STR")
EVENTHUB_NAME = os.environ.get("EVENT_HUB_NAME")
STORAGE_CONNECTION_STR = os.environ.get("AZURE_STORAGE_CONN_STR")
BLOB_CONTAINER_NAME = os.environ.get("AZURE_STORAGE_NAME")
async def on_event(partition_context, event):
logging.info("Received event with body: {} from partition: {}.".format(event.body_as_str(encoding="UTF-8"), partition_context.partition_id))
await partition_context.update_checkpoint(event)
async def receive(client):
await client.receive(
on_event=on_event,
starting_position="-1", # "-1" is from the beginning of the partition.
)
async def main(one_event: func.EventHubEvent):
checkpoint_store = BlobCheckpointStore.from_connection_string(STORAGE_CONNECTION_STR, BLOB_CONTAINER_NAME)
client = EventHubConsumerClient.from_connection_string(
CONNECTION_STR,
consumer_group="$Default",
eventhub_name=EVENTHUB_NAME,
checkpoint_store=checkpoint_store,
)
async with client:
await receive(client)
if __name__ == '__main__':
asyncio.run(main())
source: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/eventhub/azure-eventhub/samples/async_samples/recv_with_checkpoint_store_async.py
Note: I know the one_event in main is not used in the main code but I want him to behave as a trigger to run main.
My function.json file is:
{
"scriptFile": "__init__.py",
"bindings": [
{
"type": "eventHubTrigger",
"name": "one_event",
"direction": "in",
"eventHubName": "<My_event_hub_name>",
"connection": "<My_event_hub_co_str>",
"cardinality": "one",
"consumerGroup": "$Default"
}
]
}
I defined an event hub input binding in there to use as a trigger.
I also have a local.settings.json which contains some variables and the requirements.txt which does not seem to lack any libraries.
FYI: I have tested another method (here: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-event-hubs-trigger?tabs=in-process%2Cfunctionsv2%2Cextensionv5&pivots=programming-language-python) (without using EventHubConsumerClient class) to receive the events and it works fine but I do not have the checkpoint and async capabilities.
Upon running the function locally with a "func start", instead of receiving and printing some basic information about the received event, I've got a lot of messages continuously printing in my terminal.
It keeps on printing and it locks my terminal so I've got to manually kill it and create a new one.
So it seems that my code is not working properly.
*I am probably messing things up about the main() and asyncio.run() methods.
*
Do you know by any chance what the problem may be please?
Thank you very much!
I am not a Python expert, but on a conceptual level I can tell that when using the regular event hub trigger, checkpointing still takes places, using a storage account:
AzureWebJobsStorage
The Azure Functions runtime uses this storage account connection string for normal operation. Some uses of this storage account include key management, timer trigger management, and Event Hubs checkpoints. The storage account must be a general-purpose one that supports blobs, queues, and tables
Under the hood, the trigger uses an EventProcessorHost, which is similar to the EventHubConsumerClient (I suppose the azure function runtime will get updated soon to also use the EventHubConsumerClient).
So, I am not sure what you are trying to achieve. It seems like you have combined an event hub triggered function with your own event hub listener. The EventHubConsumerClient you are using will wait for new event hub messages to arrive and blocks further execution untill explicitly stopped. That is not going to work for an azure function, which execution time should be short and is limited to 5 minutes by default. If you would have a continuous running azure web job for example, using a EventHubConsumerClient would make sense.
I'm trying to develop an event-hub trigger azure function that could receive events from a first event-hub and send these events to a second event-hub.
I would say you need an event hub triggered function with an Event Hub output binding to pass messages from one event hub to another.
I'm reading data from the PLC of a machine using a Python script. The core of the script is a while loop that runs every second and reads whatever data the machine sends. What I want to do is saving the data in a database and, at the same time, "publishing" some of the data using websockets so that I can show it in realtime on a web application. The db part is working.
For my other goal, I've been reading websockets documentation and I've started a very simple websocket server like this
import asyncio
import websockets
async def create_socket(data):
async def hello(websocket, path):
await websocket.send(data)
start_server = websockets.serve(hello, "localhost", 5678)
asyncio.get_event_loop().run_until_complete(start_server)
asyncio.get_event_loop().run_forever()
The "data" input is a json and I can read it without any problems from my JS script but this, of course, blocks the main queue and, since I'm calling the above code from the mentioned while loop, I can't get pass the first iteration. What I would like to do is continuing to go through my while loop and asynchronously update the websocket with the data I read from the plc at every iteration.
Do you have any tips?
Thanks!
Below is the code to receive live ticks using WebSocket. Each time tick is received callback function on_ticks() is called and it will print ticks.
Can I spawn a single thread in on_ticks() function and call store_ticks() function to store the ticks in the database? if yes can someone please help and show how can it be done? Or is there any other way to call store_ticks() function and store the ticks each time ticks is received?
from kiteconnect import KiteTicker
kws = KiteTicker("your_api_key", "your_access_token")
def on_ticks(ws, ticks):
print(ticks)
def on_connect(ws, response):
# Callback on successful connect.
# Subscribe to a list of instrument_tokens
ws.subscribe([738561, 5633])
def store_ticks():
# Store ticks here
def on_close(ws, code, reason):
# On connection close stop the main loop
# Reconnection will not happen after executing `ws.stop()`
ws.stop()
# Assign the callbacks.
kws.on_ticks = on_ticks
kws.on_connect = on_connect
kws.on_close = on_close
kws.connect()
If the reason you want to spawn a new thread is to avoid delays, I'd say don't be bothered.
I have been using mysql-client (MySQLDB connector) with a MariaDB server, subscribed to 100+ instruments in Full mode, for the past 2 months and there have been no delays in writing the ticks to the DB.
Also, we do not know when and how many ticks we'd receive once we start the ticker.This makes it hard to time/count and close the thread and DB connection. Could end up exhausting the connection limit and the thread # really fast. (DB connection pooling is an overkill here)
The reason I use MySQLDB connector and not pymysql - I've seen an approx 20% increase in write times while using pymsql. This wouldn't be obvious in live ticks . I had cloned a medium sized DB (1 Mill+ rows) , dumped it to a Dataframe in python and then wrote it row by row to another DB and bench marked the result for 10 iterations.
The reason I use MariaDB - all the features of MySQL enterprise edition, without the Oracle fuss.
Just make sure that you set a decent amount of Memory for the DB server you use.
This creates a breathing space for the DB's buffer just in case.
Avoiding a remote server and sticking on to a local sever also helps to great extent.
If you want to back up the data from local to a cloud backup, you can setup a daily job to dump in local, export to cloud and load to DB there
If you are looking for a walkthrough, this page has an example already, along with a code walk through video.
Edit:
I just made my code public here
You could modify your store_ticks() function to
def store_ticks(ticks):
# code to store tick into database
and then modify your on_ticks function to:
def on_ticks(ws, ticks):
print(ticks)
store_ticks(ticks)
What goes inside store_ticks(ticks) is dependent on what database you want to use and what info exactly you wish to store in there.
EDIT:
To spawn a new thread for store_ticks(), use the _thread module:
import _thread
def store_ticks(ticks):
# code to store tick into database
def on_ticks(ticks):
print(ticks)
try:
_thread.start_new_thread(store_ticks, (ticks,))
except:
# unable to start the thread, probably want some logging here
import a Queue and Threading
on_tick() insert data in to the Queue
store_ticks method contains code to save to database and clear Queue
start another Deamon thread sharing the data in Queue and store_ticks
PS: very lazy to open editor and write code
I have some trouble trying to understand how to use the threading module
in python 3.
Origin: I wrote a python script to do some image processing on every
frame of a camera stream in a for loop.
Therefor I wrote some functions which are used inside the main script. The main script/loop isnĀ“t encapsulated inside a function.
Aim: I want the main loop to run the whole time. The result of the
processing of the latest frame have to be send to a socket client only
if the client sends a request to the server socket.
My idea was to use two threads. One for the image processing and one for
the server socket which listens for a request, takes the latest image
processing result and sends it to the client socket.
I saw different tutorials how to use threading and understand the
workflow in general, but not how to use it to cope with this particular
case. So I hope for your help.
Below there is the rough structure of the origin script:
import cv2
import numpy
import json
import socket
from threading import Thread
def crop(image, coords):
...
def cont(image):
...
# load parameters
a = json_data["..."]
# init cam
camera = PiCamers()
# main loop
for frame in camera.capture_continuous(...):
#######
# some image processing
#######
result = (x, y, z)
Thank you in advance for your ideas!
Greetings
Basically you have to create a so called ThreadPool.
In this ThreadPool function you can add the functions you want to be executed in a thread with their specific parameters. Afterwards you can start the Threadpool.
https://www.codementor.io/lance/simple-parallelism-in-python-du107klle
Here the threadPool with .map is used. There are more advanced functions that do the job. You can read the documentary of ThreadPools or search other tutorials.
Hope it helped
I currently run a daemon thread that grabs all cell values, calculates if there's a change, and then writes out dependent cells in a loop, ie:
def f():
while not event.is_set():
update()
event.wait(15)
Thread(target=f).start()
This works, but the looped get-all calls are significant I/O.
Rather than doing this, it would be much cleaner if the thread was notified of changes by Google Sheets. Is there a way to do this?
I rephrased my comment on gspread GitHub's Issues:
Getting a change notification from Google Sheets is possible with help of installable triggers in Apps Script. You set up a custom function in the Scripts editor and assign a trigger event for this function. In this function you can fetch an external url with UrlFetchApp.fetch.
On the listening end (your web server) you'll have a handler for this url. This handler will do the job. Depending on the server configuration (many threads or processes) make sure to avoid possible race condition.
Also, I haven't tested non browser-triggered updates. If Sheets trigger the same event for this type of updates there could be a case for infinite loops.
I was able to get this working by triggering an HTTP request whenever Google Sheets detected a change.
On Google Sheets:
function onEdit (e) {
UrlFetchApp.fetch("http://myaddress.com");
}
Python-side (w/ Tornado)
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
on_edit()
self.write('Updating.')
def on_edit():
# Code here
pass
app = tornado.web.Application([(r'/', MainHandler)])
app.listen(#port here)
tornado.ioloop.IOLoop.current().start()
I don't think this sort of functionality should be within the scope of gspread, but I hope the documentation helps others.