I'm writing an Azure Durable Function, and I would like to write some unit tests for this whole Azure Function.
I tried to trigger the Client function (the "Start" function, as it is often called), but I can't make it work.
I'm doing this for two reasons:
It's frustrating to run the Azure Function code by running "func host start" (or pressing F5), then going to my browser, finding the right tab, going to http://localhost:7071/api/orchestrators/FooOrchestrator and going back to VS Code to debug my code.
I'd like to write some unit tests to ensure the quality of my project's code. Therefore I'm open to suggestions, maybe it would be easier to only test the execution of Activity functions.
Client Function code
This is the code of my Client function, mostly boilerplate code like this one
import logging
import azure.functions as func
import azure.durable_functions as df
async def main(req: func.HttpRequest, starter: str) -> func.HttpResponse:
# 'starter' seems to contains the JSON data about
# the URLs to monitor, stop, etc, the Durable Function
client = df.DurableOrchestrationClient(starter)
# The Client function knows which orchestrator to call
# according to 'function_name'
function_name = req.route_params["functionName"]
# This part fails with a ClientConnectorError
# with the message: "Cannot connect to host 127.0.0.1:17071 ssl:default"
instance_id = await client.start_new(function_name, None, None)
logging.info(f"Orchestration '{function_name}' starter with ID = '{instance_id}'.")
return client.create_check_status_response(req, instance_id)
Unit test try
Then I tried to write some code to trigger this Client function like I did for some "classic" Azure Functions:
import asyncio
import json
if __name__ == "__main__":
# Build a simple request to trigger the Client function
req = func.HttpRequest(
method="GET",
body=None,
url="don't care?",
# What orchestrator do you want to trigger?
route_params={"functionName": "FooOrchestrator"},
)
# I copy pasted the data that I obtained when I ran the Durable Function
# with "func host start"
starter = {
"taskHubName": "TestHubName",
"creationUrls": {
"createNewInstancePostUri": "http://localhost:7071/runtime/webhooks/durabletask/orchestrators/{functionName}[/{instanceId}]?code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"createAndWaitOnNewInstancePostUri": "http://localhost:7071/runtime/webhooks/durabletask/orchestrators/{functionName}[/{instanceId}]?timeout={timeoutInSeconds}&pollingInterval={intervalInSeconds}&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
},
"managementUrls": {
"id": "INSTANCEID",
"statusQueryGetUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID?taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"sendEventPostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID/raiseEvent/{eventName}?taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"terminatePostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID/terminate?reason={text}&taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"rewindPostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID/rewind?reason={text}&taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"purgeHistoryDeleteUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID?taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"restartPostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID/restart?taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
},
"baseUrl": "http://localhost:7071/runtime/webhooks/durabletask",
"requiredQueryStringParameters": "code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"rpcBaseUrl": "http://127.0.0.1:17071/durabletask/",
}
# I need to use async methods because the "main" of the Client
# uses async.
reponse = asyncio.get_event_loop().run_until_complete(
main(req, starter=json.dumps(starter))
)
But unfortunately the Client function still fails in the await client.start_new(function_name, None, None) part.
How could I write some unit tests for my Durable Azure Function in Python?
Technical information
Python version: 3.9
Azure Functions Core Tools version 4.0.3971
Function Runtime Version: 4.0.1.16815
Not sure if this will help which is the official documentation from Microsoft on the Unit testing for what you are looking for - https://github.com/kemurayama/durable-functions-for-python-unittest-sample
Related
I'm trying to implement a service that will get a request from an external API, do some work (which might take time) and then return a response to the external API with the parsed data. However I'm at a loss on how to achieve this. I'm using FastAPI as my API service and have been looking at the following documentation: OpenAPI Callbacks
By following that documentation I can get the OpenAPI docs looking all pretty and nice. However I'm stumped on how to implement the actual callback and the docs don't have much information about that.
My current implementation is as follows:
from typing import Union
from fastapi import APIRouter, FastAPI
from pydantic import BaseModel, AnyHttpUrl
import requests
import time
from threading import Thread
app = FastAPI()
class Invoice(BaseModel):
id: str
title: Union[str, None] = None
customer: str
total: float
class InvoiceEvent(BaseModel):
description: str
paid: bool
class InvoiceEventReceived(BaseModel):
ok: bool
invoices_callback_router = APIRouter()
#invoices_callback_router.post(
"{$callback_url}/invoices/{$request.body.id}", response_model=InvoiceEventReceived
)
def invoice_notification(body: InvoiceEvent):
pass
#app.post("/invoices/", callbacks=invoices_callback_router.routes)
async def create_invoice(invoice: Invoice, callback_url: Union[AnyHttpUrl, None] = None):
# Send the invoice, collect the money, send the notification (the callback)
thread = Thread(target=do_invoice(invoice, callback_url))
thread.start()
return {"msg": "Invoice received"}
def do_invoice(invoice: Invoice, callback_url: AnyHttpUrl):
time.sleep(10)
url = callback_url + "/invoices/" + invoice.id
json = {
"data": ["Payment celebration"],
}
requests.post(url=url, json=json)
I thought putting the actual callback in a separate thread might work and that the {"msg": "Invoice received"} would be returned immediately and then 10s later the external api would recieve the result from the do_invoice function. But this doesn't seem to be the case so perhaps I'm doing something wrong.
I've also tried putting the logic in the invoice_notification function but that doesn't seem to do anything at all.
So what is the correct to implement a callback like the one I want? Thankful for any help!
I thought putting the actual callback in a separate thread might work and that the {"msg": "Invoice received"} would be returned
immediately and then 10s later the external api would recieve the
result from the do_invoice function. But this doesn't seem to be the case so perhaps I'm doing something wrong.
If you would like to run a task after the response has been sent, you could use a BackgroundTask, as demonstrated in this answer, as well as here and here. If you instead would like to wait for the task to finish before returning the response, you could run the task in either an external ThreadPool or ProcessPool (depending on the nature of the task) and await it, as explained in this detailed answer.
I would also strongly recommend using the httpx library in an async environment such as FastAPI, instead of using Python requests—you may find details and working examples here, as well as here and here.
I have built a pipeline with Stream Analytics data triggering Azure Functions.
There are 5000 values merged in a single data. I wrote a simple python program in the Function to validate the data, parse the bulk data, and save it in Cosmos DB as an individual document. But the problem is, my functions don't stop. After 30 minutes I can see that my function generated an error saying timed out. And in these 30 minutes, I can see more than 300k values in my database which are duplicating themselves. I thought this problem is with my code (for loop) and I tried running it locally, and everything works. I am not sure why this is the problem. In the whole code, the only statement, I am unable to understand is in container.upsert line.
This is my code:
import logging
import azure.functions as func
import hashlib as h
from azure.cosmos import CosmosClient
import random, string
def generateRandomID(length):
# choose from all lowercase letter
letters = string.ascii_lowercase
result_str = ''.join(random.choice(letters) for i in range(length))
return result_str
URL = dburl
KEY = dbkey
client = CosmosClient(URL, credential=KEY)
DATABASE_NAME = dbname
database = client.get_database_client(DATABASE_NAME)
CONTAINER_NAME = containername
container = database.get_container_client(CONTAINER_NAME)
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
req_body = req.get_json()
try:
#Level 1
rawMsg = req_body[0]
filteredMsg = rawMsg['message']
metaData = rawMsg['metaData']
logging.info(metaData)
encodeMD5 = filteredMsg.encode('utf-8')
generateMD5 = h.md5(encodeMD5).hexdigest()
parsingMetaData = metaData.split(',')
parsingMD5Hex = parsingMetaData[3]
splitingHex = parsingMD5Hex.split(':')
parsingMD5Value = splitingHex[1]
except:
logging.info("Failed to parse the Data and Generate MD5 Checksums. Error at the level 1")
finally:
logging.info("Execution Successful | First level Completed ")
#return func.HttpResponse(f"OK")
try:
#Level 2:
if generateMD5 == parsingMD5Value:
#parsing the ecg values
logging.info('MD5 Checksums matched!')
splitValues = filteredMsg.split(',')
for eachValue in range(len(splitValues)):
ecgRawData = splitValues[eachValue]
divideEachValue = ecgRawData.split(':')
timeData = divideEachValue[0]
ecgData = divideEachValue[1]
container.upsert_item({ 'id': generateRandomID(10), 'time': timeData, 'ecgData': ecgData})
elif generateMD5 != parsingMD5Hex:
logging.info('The MD5s did not matched and couldnt execute the code properly')
logging.info(generateMD5)
else:
logging.info('Something is going wrong. Please check.')
except:
logging.info("Failed to parse ECG Values into the DB Container. Error ar the level 2")
finally:
logging.info("Execution Successful | Second level complete ")
#return func.HttpResponse(f"OK")
# Return a 200 status
return func.HttpResponse(f"OK")
A test I performed:
Commented the for loop block and deployed the Function, it executes normally without any error.
Please let me know how I can address this issue and also if there is a wrong way of code practice.
I found the solution! (I am the OP)
In my resource group, an App service plan is enabled for a Web application. So, when creating an Azure Function, it doesn't let me deploy it in the Serverless option. So, I deployed with the same app service plan used for Web applications. And while testing, the function completely works except for the container.upsert line. When I add this line, it fails to stop and creates 10x values in the database until it gets stopped by a timeout error beyond 30 minutes.
I tried creating an App Service plan dedicated to this Function. But the issue is still the same.
And while testing with 100s of corner case scenarios, I found out that my function runs perfectly when I deploy it in the other resource group. The only catch is, I have opted for the Serverless option while deploying the Functions.
(If you are using an App service plan in your Azure Resource Group, you cannot deploy Azure Functions with a Serverless option. It shows the deployment is not proper. You need to create a dedicated app service plan for that function or you should use the existing App service plan)
As per my research, when dealing with bulk data and inserting those data into the database, the usual app service plan doesn't work. The App Service Plan should be large enough to sustain the load. Or you should choose the Serverless option while deploying the Function, as the compute is totally managed by Azure.
Hope this helps.
I am preparing automation solution in Azure. I decided to use Azure Durable Functions. As per Durable Functions design I have created: Client Function (triggered by Service Bus message), Activity Function, Orchestrator Function. Service bus message is in Json format. Once Client Function get Service Bus message Client Function has to run Orchestrator Function. I have prepared code in Python, but does not work. In Azure function Code + test window getting an error.500 Internal Server Error. My code below. Main problem here is to run Orchestrator Python Function from Client Function code presented below. Piece of code for receiving service bus json message is ok, I tested it i other functions.
import json
import azure.functions as func
from azure.servicebus import ServiceBusClient, ServiceBusMessage
import azure.durable_functions as df
async def main(msg: func.ServiceBusMessage, starter: str):
result = ({
'body': json.loads(msg.get_body().decode('utf-8'))
})
try:
account_name = result.get('body', {}).get('accountName')
client = df.DurableOrchestrationClient(starter)
instance_id = await client.start_new(msg.route_params["Orchestrator"], None, None)
logging.info(f"Started orchestration with ID = '{instance_id}'.")
except Exception as e:
logging.info(e)
Solution workflow:
While starting the instance of the orchestration function using the start_new method, it will need both payload and messages.
You have given the message in the following code:
instance_id = await client.start_new(msg.route_params["Orchestrator"], None, None)
Adding the payload, might work and by payload, I mean this
payload = msg.get_body().decode('utf-8')
The code will look like
instance_id = await client.start_new(msg.route_params["Orchestrator"], payload)
refer the following documentation.
Also refer this article by Ajit Patra
I made an API for my AI model but I would like to not have any down time when I update the model. I search a way to load in background and once it's loaded I switch the old model with the new. I tried passing values between sub process but doesn't work well. Do you have any idea how can I do that ?
You can place the serialized model in a raw storage, like an S3 bucket if you're on AWS. In S3's case, you can use bucket versioning which might prove helpful. Then setup some sort of trigger. You can definitely get creative here, and I've thought about this a lot. In practice, the best options I've tried are:
Set up an endpoint that when called will go open the new model at whatever location you store it at. Set up a webhook on the storage/S3 bucket that will send a quick automated call to the given endpoint and auto-load that new item
Same thing as #1, but instead you just manually load it. In both cases you'll really want some security on that endpoint or anyone that finds your site can just absolutely abuse your stack.
Set a timer at startup that calls a given function nightly, internally running within the application itself. The function is invoked and then goes and reloads.
Could be other ideas I'm not smart enough (yet!) to use, just trying to start some dialogue.
Found a way to do it with async and multiprocessing
import asyncio
import random
from uvicorn import Server, Config
from fastapi import FastAPI
import time
from multiprocessing import Process, Manager
app = FastAPI()
value = {"latest": 1, "b": 2}
#app.get("/")
async def root():
global value
return {"message": value}
def background_loading(d):
time.sleep(2)
d["test"] = 3
async def update():
while True:
global value
manager = Manager()
d = manager.dict()
p1 = Process(target=background_loading, args=(d,))
p1.daemon = True
p1.start()
while p1.is_alive():
await asyncio.sleep(5)
print(f'Update to value to {d}')
value = d
if __name__ == "__main__":
loop = asyncio.new_event_loop()
config = Config(app=app, loop=loop)
server = Server(config)
loop.create_task(update())
loop.run_until_complete(server.serve())
I have an API written in python that makes calls to AWS services, specifically sqs, s3, and dynamodb. I am trying to write unit tests for the API and I want to mock all calls to AWS. I have done a lot of research into moto as a way to mock these services however every implementation I have tried does not mock my calls and sends real requests to AWS. Looking into this problem I found people discussing some incompatibilities between boto and moto when using boto3>=1.8. Is there any way around this? My ultimate question is this: Is there an easy way to mock boto3 calls to sqs, s3, and dynamodb using either moto or some other library when using boto3>=1.8?
Here are my current versions of boto3 and moto I am using:
boto3 == 1.9.314
moto == 1.3.11
Below is my latest attempt at using moto to mock calls to sqs. I defined a pytest fixture where I create a mock_sqs session and a (hopefully fake) queue. I use this fixture to unit test my get_queue_item function.
SQS Script
# ptr_api.aws.sqs
import boto3
REGION = 'us-east-1'
sqs_r = boto3.resource('sqs', REGION)
sqs_c = boto3.client('sqs', REGION)
def get_queue_item(queue_name):
queue = sqs_r.get_queue_by_name(QueueName=queue_name)
queue_url = queue.url
response = sqs_c.receive_message(
QueueUrl=queue_url,
MaxNumberOfMessages=1,
VisibilityTimeout=10,
WaitTimeSeconds=3
)
try:
message = response['Messages'][0]
receipt_handle = message['ReceiptHandle']
delete_response = sqs_c.delete_message(QueueUrl=queue_url,
ReceiptHandle=receipt_handle)
return message['Body']
except Exception as e:
print("error in get_queue_item: ")
print(e)
return False
Test SQS Script
# test_sqs.py
import pytest
from moto import mock_sqs
import boto3
from ptr_api.aws.sqs import get_queue_item
#pytest.fixture
def sqs_mocker(scope='session', autouse=True):
mock = mock_sqs()
mock.start()
sqs_r = boto3.resource('sqs', 'us-east-1')
sqs_c = boto3.client('sqs', 'us-east-1')
queue_name = 'test_queue_please_dont_actually_exist'
queue_url = sqs_c.create_queue(
QueueName=queue_name
)['QueueUrl']
yield (sqs_c, queue_url, queue_name)
mock.stop()
def test_get_queue_item(sqs_mocker):
sqs_c, queue_url, queue_name = sqs_mocker
message_body = 'why hello there' # Create dummy message
sqs_c.send_message( # Send message to fake queue
QueueUrl=queue_url,
MessageBody=message_body,
)
res = get_queue_item(queue_name) # Test get_queue_item function
assert res == message_body
When I go to check the console however, I see the queue has actually been created. I have also tried moving around the order of my imports but nothing seemed to work. I tried using mock decorators and I even briefly played around with moto's stand-alone server mode. Am I doing something wrong or is it really just the boto3/moto incompatibility I have been hearing about with newer versions of boto3? Downgrading my version of boto3 is not an option unfortunately. Is there another way to get the results I want with another library? I have looked a little bit into localstack but I want to make sure that is my only option before I give up on moto entirely.
I figured out a way to mock all my AWS calls! I am confident now that moto and boto3>=1.8 currently has serious incompatibility issues. Turns out the problem is with botocore >= 1.11.0 which no longer uses requests and instead directly uses urllib3: This means moto cannot use responses the same way it did before, hence the incompatibility issues. To get around this though, I instead created stand-alone moto servers for each of the AWS services I wanted to mock which worked like a charm! By creating the mock servers and not mocking the requests themselves, there wasn't any issues with moto using responses.
I set these mock servers running in the backgound by using a separate start_local.py script. Next I made sure to change my unit test's boto3 reource and client objects to now reference these mock endpoints. Now I can run my pytests without any calls being made to aws and no need to mock aws credentials!
Below is the new start_local.py script and my updated sqs unit test:
Start local AWS services
# start_local.py
import boto3
import threading, subprocess
def start_sqs(port=5002):
subprocess.call(["moto_server", "sqs", f"-p{port}"])
sqs = threading.Thread(target=start_sqs)
sqs.start()
New Test SQS Script
import pytest
import boto3
import os
from ptr_api.aws import sqs
#pytest.fixture
def sqs_mocker(scope='session', autouse=True):
sqs_r_mock = boto3.resource('sqs', region_name='us-east-1', endpoint_url=f'http://localhost:5002')
sqs_c_mock = boto3.client('sqs', region_name='us-east-1', endpoint_url=f'http://localhost:5002')
queue_name = 'test_queue'
queue_url = sqs_c_mock.create_queue(
QueueName=queue_name
)['QueueUrl']
yield (sqs_r_mock, sqs_c_mock, queue_url, queue_name)
def test_get_queue_item(sqs_mocker):
sqs_r_mock, sqs_c_mock, queue_url, queue_name = sqs_mocker
message_body = 'why hello there' # Create dummy message
sqs_c_mock.send_message( # Send message to fake queue
QueueUrl=queue_url,
MessageBody=message_body,
)
sqs.sqs_r = sqs_r_mock # VERY IMPORTANT - Override boto3 resource global variable within imported module with mock resource
sqs.sqs_c = sqs_c_mock # VERY IMPORTANT - Override boto3 client global variable within imported module with mock client
res = sqs.get_queue_item(queue_name) # Test get_queue_item function
assert res == message_body