I have built a pipeline with Stream Analytics data triggering Azure Functions.
There are 5000 values merged in a single data. I wrote a simple python program in the Function to validate the data, parse the bulk data, and save it in Cosmos DB as an individual document. But the problem is, my functions don't stop. After 30 minutes I can see that my function generated an error saying timed out. And in these 30 minutes, I can see more than 300k values in my database which are duplicating themselves. I thought this problem is with my code (for loop) and I tried running it locally, and everything works. I am not sure why this is the problem. In the whole code, the only statement, I am unable to understand is in container.upsert line.
This is my code:
import logging
import azure.functions as func
import hashlib as h
from azure.cosmos import CosmosClient
import random, string
def generateRandomID(length):
# choose from all lowercase letter
letters = string.ascii_lowercase
result_str = ''.join(random.choice(letters) for i in range(length))
return result_str
URL = dburl
KEY = dbkey
client = CosmosClient(URL, credential=KEY)
DATABASE_NAME = dbname
database = client.get_database_client(DATABASE_NAME)
CONTAINER_NAME = containername
container = database.get_container_client(CONTAINER_NAME)
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
req_body = req.get_json()
try:
#Level 1
rawMsg = req_body[0]
filteredMsg = rawMsg['message']
metaData = rawMsg['metaData']
logging.info(metaData)
encodeMD5 = filteredMsg.encode('utf-8')
generateMD5 = h.md5(encodeMD5).hexdigest()
parsingMetaData = metaData.split(',')
parsingMD5Hex = parsingMetaData[3]
splitingHex = parsingMD5Hex.split(':')
parsingMD5Value = splitingHex[1]
except:
logging.info("Failed to parse the Data and Generate MD5 Checksums. Error at the level 1")
finally:
logging.info("Execution Successful | First level Completed ")
#return func.HttpResponse(f"OK")
try:
#Level 2:
if generateMD5 == parsingMD5Value:
#parsing the ecg values
logging.info('MD5 Checksums matched!')
splitValues = filteredMsg.split(',')
for eachValue in range(len(splitValues)):
ecgRawData = splitValues[eachValue]
divideEachValue = ecgRawData.split(':')
timeData = divideEachValue[0]
ecgData = divideEachValue[1]
container.upsert_item({ 'id': generateRandomID(10), 'time': timeData, 'ecgData': ecgData})
elif generateMD5 != parsingMD5Hex:
logging.info('The MD5s did not matched and couldnt execute the code properly')
logging.info(generateMD5)
else:
logging.info('Something is going wrong. Please check.')
except:
logging.info("Failed to parse ECG Values into the DB Container. Error ar the level 2")
finally:
logging.info("Execution Successful | Second level complete ")
#return func.HttpResponse(f"OK")
# Return a 200 status
return func.HttpResponse(f"OK")
A test I performed:
Commented the for loop block and deployed the Function, it executes normally without any error.
Please let me know how I can address this issue and also if there is a wrong way of code practice.
I found the solution! (I am the OP)
In my resource group, an App service plan is enabled for a Web application. So, when creating an Azure Function, it doesn't let me deploy it in the Serverless option. So, I deployed with the same app service plan used for Web applications. And while testing, the function completely works except for the container.upsert line. When I add this line, it fails to stop and creates 10x values in the database until it gets stopped by a timeout error beyond 30 minutes.
I tried creating an App Service plan dedicated to this Function. But the issue is still the same.
And while testing with 100s of corner case scenarios, I found out that my function runs perfectly when I deploy it in the other resource group. The only catch is, I have opted for the Serverless option while deploying the Functions.
(If you are using an App service plan in your Azure Resource Group, you cannot deploy Azure Functions with a Serverless option. It shows the deployment is not proper. You need to create a dedicated app service plan for that function or you should use the existing App service plan)
As per my research, when dealing with bulk data and inserting those data into the database, the usual app service plan doesn't work. The App Service Plan should be large enough to sustain the load. Or you should choose the Serverless option while deploying the Function, as the compute is totally managed by Azure.
Hope this helps.
Related
I'm writing an Azure Durable Function, and I would like to write some unit tests for this whole Azure Function.
I tried to trigger the Client function (the "Start" function, as it is often called), but I can't make it work.
I'm doing this for two reasons:
It's frustrating to run the Azure Function code by running "func host start" (or pressing F5), then going to my browser, finding the right tab, going to http://localhost:7071/api/orchestrators/FooOrchestrator and going back to VS Code to debug my code.
I'd like to write some unit tests to ensure the quality of my project's code. Therefore I'm open to suggestions, maybe it would be easier to only test the execution of Activity functions.
Client Function code
This is the code of my Client function, mostly boilerplate code like this one
import logging
import azure.functions as func
import azure.durable_functions as df
async def main(req: func.HttpRequest, starter: str) -> func.HttpResponse:
# 'starter' seems to contains the JSON data about
# the URLs to monitor, stop, etc, the Durable Function
client = df.DurableOrchestrationClient(starter)
# The Client function knows which orchestrator to call
# according to 'function_name'
function_name = req.route_params["functionName"]
# This part fails with a ClientConnectorError
# with the message: "Cannot connect to host 127.0.0.1:17071 ssl:default"
instance_id = await client.start_new(function_name, None, None)
logging.info(f"Orchestration '{function_name}' starter with ID = '{instance_id}'.")
return client.create_check_status_response(req, instance_id)
Unit test try
Then I tried to write some code to trigger this Client function like I did for some "classic" Azure Functions:
import asyncio
import json
if __name__ == "__main__":
# Build a simple request to trigger the Client function
req = func.HttpRequest(
method="GET",
body=None,
url="don't care?",
# What orchestrator do you want to trigger?
route_params={"functionName": "FooOrchestrator"},
)
# I copy pasted the data that I obtained when I ran the Durable Function
# with "func host start"
starter = {
"taskHubName": "TestHubName",
"creationUrls": {
"createNewInstancePostUri": "http://localhost:7071/runtime/webhooks/durabletask/orchestrators/{functionName}[/{instanceId}]?code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"createAndWaitOnNewInstancePostUri": "http://localhost:7071/runtime/webhooks/durabletask/orchestrators/{functionName}[/{instanceId}]?timeout={timeoutInSeconds}&pollingInterval={intervalInSeconds}&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
},
"managementUrls": {
"id": "INSTANCEID",
"statusQueryGetUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID?taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"sendEventPostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID/raiseEvent/{eventName}?taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"terminatePostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID/terminate?reason={text}&taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"rewindPostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID/rewind?reason={text}&taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"purgeHistoryDeleteUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID?taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"restartPostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID/restart?taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
},
"baseUrl": "http://localhost:7071/runtime/webhooks/durabletask",
"requiredQueryStringParameters": "code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"rpcBaseUrl": "http://127.0.0.1:17071/durabletask/",
}
# I need to use async methods because the "main" of the Client
# uses async.
reponse = asyncio.get_event_loop().run_until_complete(
main(req, starter=json.dumps(starter))
)
But unfortunately the Client function still fails in the await client.start_new(function_name, None, None) part.
How could I write some unit tests for my Durable Azure Function in Python?
Technical information
Python version: 3.9
Azure Functions Core Tools version 4.0.3971
Function Runtime Version: 4.0.1.16815
Not sure if this will help which is the official documentation from Microsoft on the Unit testing for what you are looking for - https://github.com/kemurayama/durable-functions-for-python-unittest-sample
I am developing a flutter app using flask as back end framework and mariabd as database
Trying to reduce web service time response of ws:
1- open the connexion at the begining of ws
2- Execute queries
3-close connexion to database before returnning the response
Here is an exemple of my code archi:
#app.route('/ws_name', methods=['GET'])
def ws_name():
cnx=db_connexion()
try:
id_lanparamguage = request.args.get('param')
result = function_execute_many_query(cnx,param)
except:
cnx.close()
return jsonify(result), 200
response = {}
cnx.close()
return jsonify(result), 200
db_connexion is my function that handle connecting to database
The probleme is when only one user is connecting to the app (use ws) the time response is perfect
but if 3 users (as exemple) are connected th time response is up from millisecond to 10 seconds
I suspect you have a problem with many requests sharing the same thread. Read https://werkzeug.palletsprojects.com/en/1.0.x/local/ for how the local context works and why you need werkzeug to manage your local context in an WSGI application.
You would want to do something like:
from werkzeug.local import LocalProxy
cnx=LocalProxy(db_connexion)
I also recommend closing your connextion in a function decorated by #app.teardown_request
See https://flask.palletsprojects.com/en/1.1.x/api/#flask.Flask.teardown_request
I am trying to build REST API with only one call.
Sometimes it takes up to 30 seconds for a program to return a response. But if user thinks that service is lagging - he makes a new call and my app returns response with error code 500 (Internal Server Error).
For now it is enough for me to block any new requests if last one is not ready. Is there any simple way to do it?
I know that there is a lot of queueing managers like Celery, but I prefer not to overload my app with any large dependencies/etc.
You could use Flask-Limiter to ignore new requests from that remote address.
pip install Flask-Limiter
Check this quickstart:
from flask import Flask
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
app = Flask(__name__)
limiter = Limiter(
app,
key_func=get_remote_address,
default_limits=["200 per day", "50 per hour"]
)
#app.route("/slow")
#limiter.limit("1 per day")
def slow():
return "24"
#app.route("/fast")
def fast():
return "42"
#app.route("/ping")
#limiter.exempt
def ping():
return "PONG"
As you can see, you could ignore the remote IP address for a certain amount of time meanwhile you finish the process you´re running
DOCS
Check these two links:
Flasf-Limiter Documentation
Flasf-Limiter Quick start
I'm writing python app which currently is being hosted on Heroku. It is in early development stage, so I'm using free account with one web dyno. Still, I want my heavier tasks to be done asynchronously so I'm using iron worker add-on. I have it all set up and it does the simplest jobs like sending emails or anything that doesn't require any data being sent back to the application. The question is: How do I send the worker output back to my application from the iron worker? Or even better, how do I notify my app that the worker is done with the job?
I looked at other iron solutions like cache and message queue, but the only thing I can find is that I can explicitly ask for the worker state. Obviously I don't want my web service to poll the worker because it kind of defeats the original purpose of moving the tasks to background. What am I missing here?
I see this question is high in Google so in case you came here with hopes to find some more details, here is what I ended up doing:
First, I prepared the endpoint on my app. My app uses Flask, so this is how the code looks:
#app.route("/worker", methods=["GET", "POST"])
def worker():
#refresh the interface or whatever is necessary
if flask.request.method == 'POST':
return 'Worker endpoint reached'
elif flask.request.method == 'GET':
worker = IronWorker()
task = worker.queue(code_name="hello", payload={"WORKER_DB_URL": app.config['WORKER_DB_URL'],
"WORKER_CALLBACK_URL": app.config['WORKER_CALLBACK_URL']})
details = worker.task(task)
flask.flash("Work queued, response: ", details.status)
return flask.redirect('/')
Note that in my case, GET is here only for testing, I don't want my users to hit this endpoint and invoke the task. But I can imagine situations when this is actually useful, specifically if you don't use any type of scheduler for your tasks.
With the endpoint ready, I started to look for a way of visiting that endpoint from the worker. I found this fantastic requests library and used it in my worker:
import sys, json
from sqlalchemy import *
import requests
print "hello_worker initialized, connecting to database..."
payload = None
payload_file = None
for i in range(len(sys.argv)):
if sys.argv[i] == "-payload" and (i + 1) < len(sys.argv):
payload_file = sys.argv[i + 1]
break
f = open(payload_file, "r")
contents = f.read()
f.close()
payload = json.loads(contents)
print "contents: ", contents
print "payload as json: ", payload
db_url = payload['WORKER_DB_URL']
print "connecting to database ", db_url
db = create_engine(db_url)
metadata = MetaData(db)
print "connection to the database established"
users = Table('users', metadata, autoload=True)
s = users.select()
#def run(stmt):
# rs = stmt.execute()
# for row in rs:
# print row
#run(s)
callback_url = payload['WORKER_CALLBACK_URL']
print "task finished, sending post to ", callback_url
r = requests.post(callback_url)
print r.text
So in the end there is no real magic here, the only important thing is to send the callback url in the payload if you need to notify your page when the task is done. Alternatively you can place the endpoint url in the database if you use one in your app. Btw. the snipped above also shows how to connect to the postgresql database in your worker and print all the users.
One last thing you need to be aware of is how to format your .worker file, mine looks like this:
# set the runtime language. Python workers use "python"
runtime "python"
# exec is the file that will be executed:
exec "hello_worker.py"
# dependencies
pip "SQLAlchemy"
pip "requests"
This will install the latest versions of SQLAlchemy and requests, if your project is dependent on any specific version of the library, you should do this instead:
pip "SQLAlchemy", "0.9.1"
Easiest way - push message to your api from worker - it's log or anything you need to have in your app
I am working on a small project in Python. It is divided into two parts.
First part is responsible to crawl the web and extract some infromation and insert them into a database.
Second part is resposible for presenting those information with use of the database.
Both parts share the database. In the second part I am using Flask framework to display information as html with some formatting, styling and etc. to make it look cleaner.
Source files of both parts are in the same package, but to run this program properly user has to run crawler and results presenter separately like this :
python crawler.py
and then
python presenter.py
Everything is allright just except one thing. What I what presenter to do is to create result in html format and open the page with results in user's default browser, but it is always opened twice, probably due to the presence of run() method, which starts Flask in a new thread and things get cloudy for me. I don't know what I should do to be able to make my presenter.py to open only one tab/window after running it.
Here is the snippet of my code :
from flask import Flask, render_template
import os
import sqlite3
# configuration
DEBUG = True
DATABASE = os.getcwd() + '/database/database.db'
app = Flask(__name__)
app.config.from_object(__name__)
app.config.from_envvar('CRAWLER_SETTINGS', silent=True)
def connect_db():
"""Returns a new connection to the database."""
try:
conn = sqlite3.connect(app.config['DATABASE'])
return conn
except sqlite3.Error:
print 'Unable to connect to the database'
return False
#app.route('/')
def show_entries():
u"""Loads pages information and emails from the database and
inserts results into show_entires template. If there is a database
problem returns error page.
"""
conn = connect_db()
if conn:
try:
cur = connect_db().cursor()
results = cur.execute('SELECT url, title, doctype, pagesize FROM pages')
pages = [dict(url=row[0], title=row[1].encode('utf-8'), pageType=row[2], pageSize=row[3]) for row in results.fetchall()]
results = cur.execute('SELECT url, email from emails')
emails = {}
for row in results.fetchall():
emails.setdefault(row[0], []).append(row[1])
return render_template('show_entries.html', pages=pages, emails=emails)
except sqlite3.Error, e:
print ' Exception message %s ' % e
print 'Could not load data from the database!'
return render_template('show_error_page.html')
else:
return render_template('show_error_page.html')
if __name__ == '__main__':
url = 'http://127.0.0.1:5000'
webbrowser.open_new(url)
app.run()
I use similar code on Mac OS X (with Safari, Firefox, and Chrome browsers) all the time, and it runs fine. Guessing you may be running into Flask's auto-reload feature. Set debug=False and it will not try to auto-reload.
Other suggestions, based on my experience:
Consider randomizing the port you use, as quick edit-run-test loops sometimes find the OS thinking port 5000 is still in use. (Or, if you run the code several times simultaneously, say by accident, the port truly is still in use.)
Give the app a short while to spin up before you start the browser request. I do that through invoking threading.Timer.
Here's my code:
import random, threading, webbrowser
port = 5000 + random.randint(0, 999)
url = "http://127.0.0.1:{0}".format(port)
threading.Timer(1.25, lambda: webbrowser.open(url) ).start()
app.run(port=port, debug=False)
(This is all under the if __name__ == '__main__':, or in a separate "start app" function if you like.)
So this may or may not help. But my issue was with flask opening in microsoft edge when executing my app.py script... NOOB solution. Go to settings and default apps... And then change microsoft edge to chrome... And now it opens flask in chrome everytime. I still have the same issue where things just load though