Azure Functions for python, structured logging to Application Insights? - python

When using Python (3.8) in Azure Functions, is there a way to send structured logs to Application Insights? More specifically, I'm trying to send custom dimensions with a log message. All I could find about logging is this very brief section.

Update 0127:
It's solved as per this github issue. And here is the sample code:
# Change Instrumentation Key and Ingestion Endpoint before you run this function app
import logging
import azure.functions as func
from opencensus.ext.azure.log_exporter import AzureLogHandler
logger_opencensus = logging.getLogger('opencensus')
logger_opencensus.addHandler(
AzureLogHandler(
connection_string='InstrumentationKey=aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee;IngestionEndpoint=https://eastus-6.in.applicationinsights.azure.com/'
)
)
def main(req: func.HttpRequest) -> func.HttpResponse:
properties = {
'custom_dimensions': {
'key_1': 'value_1',
'key_2': 'value_2'
}
}
logger_opencensus.info('logger_opencensus.info Custom Dimension', extra=properties)
logger_opencensus.info('logger_opencensus.info Statement')
return func.HttpResponse("OK")
Please try OpenCensus Python SDK.
The example code is in the Logs section, step 5:
Description: You can also add custom properties to your log messages in the extra keyword argument by using the custom_dimensions field. These properties appear as key-value pairs in customDimensions in Azure Monitor.
The sample:
import logging
from opencensus.ext.azure.log_exporter import AzureLogHandler
logger = logging.getLogger(__name__)
# TODO: replace the all-zero GUID with your instrumentation key.
logger.addHandler(AzureLogHandler(
connection_string='InstrumentationKey=00000000-0000-0000-0000-000000000000')
)
properties = {'custom_dimensions': {'key_1': 'value_1', 'key_2': 'value_2'}}
# Use properties in logging statements
logger.warning('action', extra=properties)

Related

How to include LogRecord attributes as part of logging information on GCP (in json format)?

I would like to send LogRecord attributes (like processName, threadName, exc_info etc) as part of logging information on GCP in json format. How to do that? I am using below python code to send logs to GCP (with jsonpayload)
(thanks #ianyoung code)
import logging
import google.cloud.logging
import json
client = google.cloud.logging.Client()
client.setup_logging()
logger = logging.getLogger('test')
data_dict = {"hello": "world"}
logging.info(json.dumps(data_dict))

How to write unit tests for Durable Azure Functions?

I'm writing an Azure Durable Function, and I would like to write some unit tests for this whole Azure Function.
I tried to trigger the Client function (the "Start" function, as it is often called), but I can't make it work.
I'm doing this for two reasons:
It's frustrating to run the Azure Function code by running "func host start" (or pressing F5), then going to my browser, finding the right tab, going to http://localhost:7071/api/orchestrators/FooOrchestrator and going back to VS Code to debug my code.
I'd like to write some unit tests to ensure the quality of my project's code. Therefore I'm open to suggestions, maybe it would be easier to only test the execution of Activity functions.
Client Function code
This is the code of my Client function, mostly boilerplate code like this one
import logging
import azure.functions as func
import azure.durable_functions as df
async def main(req: func.HttpRequest, starter: str) -> func.HttpResponse:
# 'starter' seems to contains the JSON data about
# the URLs to monitor, stop, etc, the Durable Function
client = df.DurableOrchestrationClient(starter)
# The Client function knows which orchestrator to call
# according to 'function_name'
function_name = req.route_params["functionName"]
# This part fails with a ClientConnectorError
# with the message: "Cannot connect to host 127.0.0.1:17071 ssl:default"
instance_id = await client.start_new(function_name, None, None)
logging.info(f"Orchestration '{function_name}' starter with ID = '{instance_id}'.")
return client.create_check_status_response(req, instance_id)
Unit test try
Then I tried to write some code to trigger this Client function like I did for some "classic" Azure Functions:
import asyncio
import json
if __name__ == "__main__":
# Build a simple request to trigger the Client function
req = func.HttpRequest(
method="GET",
body=None,
url="don't care?",
# What orchestrator do you want to trigger?
route_params={"functionName": "FooOrchestrator"},
)
# I copy pasted the data that I obtained when I ran the Durable Function
# with "func host start"
starter = {
"taskHubName": "TestHubName",
"creationUrls": {
"createNewInstancePostUri": "http://localhost:7071/runtime/webhooks/durabletask/orchestrators/{functionName}[/{instanceId}]?code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"createAndWaitOnNewInstancePostUri": "http://localhost:7071/runtime/webhooks/durabletask/orchestrators/{functionName}[/{instanceId}]?timeout={timeoutInSeconds}&pollingInterval={intervalInSeconds}&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
},
"managementUrls": {
"id": "INSTANCEID",
"statusQueryGetUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID?taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"sendEventPostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID/raiseEvent/{eventName}?taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"terminatePostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID/terminate?reason={text}&taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"rewindPostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID/rewind?reason={text}&taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"purgeHistoryDeleteUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID?taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"restartPostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/INSTANCEID/restart?taskHub=TestHubName&connection=Storage&code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
},
"baseUrl": "http://localhost:7071/runtime/webhooks/durabletask",
"requiredQueryStringParameters": "code=aakw1DfReOkYCTFMdKPaA1Q6bSfnHZ/0lzvKsS6MVXCJdp4zhHKDJA==",
"rpcBaseUrl": "http://127.0.0.1:17071/durabletask/",
}
# I need to use async methods because the "main" of the Client
# uses async.
reponse = asyncio.get_event_loop().run_until_complete(
main(req, starter=json.dumps(starter))
)
But unfortunately the Client function still fails in the await client.start_new(function_name, None, None) part.
How could I write some unit tests for my Durable Azure Function in Python?
Technical information
Python version: 3.9
Azure Functions Core Tools version 4.0.3971
Function Runtime Version: 4.0.1.16815
Not sure if this will help which is the official documentation from Microsoft on the Unit testing for what you are looking for - https://github.com/kemurayama/durable-functions-for-python-unittest-sample

How can I create a python webhook sender app?

This is a follow up question to this post.
I have a data warehouse table exposed via xxx.com\data API endpoint
I have been querying this table using the following code and parsing it into a dataframe as follows;
import requests
import json
import http.client
import pandas as pd
url = "xxx.com\data?q=Active%20%3D1%20and%20LATITUDE%20%3D%20%20%220.000000%22%20and%20LONGITUDE%20%3D%20%220.000000%22&pageSize =300"
payload = {}
headers = {'Authorization': access_token}
response = requests.request("GET", url, headers=headers, data = payload)
j=json.loads(response.text.encode('utf8'))
df = pd.json_normalize(j['DataSet'])
The warehouse table gets periodically updated and I am required to create a webhook to be listened to by the following Azure httptrigger;
import logging
import os
import json
import pandas as pd
import azure.functions as func
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
d={
'Date' :['2016-10-30','2016-10-30','2016-11-01','2016-10-30'],
'Time':['09:58:11', '10:05:34', '10:07:57', '11:15:32'],
'Transaction':[2,3,1,1]
}
df=pd.DataFrame(d, columns=['Date','Time','Transaction'])
output = df.to_csv (index_label="idx", encoding = "utf-8")
return func.HttpResponse(output)
When run,the httptrigger successfully listens to the following webhooker sender which I have created and am running locally on my disk.
import logging
import os
import json
import pandas as pd
data={'Lat': '0.000000',
'Long': '0.000000',
'Status': '1', 'Channel URL':"xxx.com\data"}
webhook_url="http://localhost:7071/api/HttpTrigger1"
r=requests.post(webhook_url, headers={'Content-Type':'application/json'}, data =json.dumps(l))
My question is;
How can I deploy the webhook sender to the cloud as an app so that every time "xxx.com\data" is updated with Lat==0,Long=00 and Status=1, a message is send to my webhook listener?
The app can either be Azure/Flask/postman or any other python based webhook builder.
A simple approach can be to wrap your sender code into a Timer Trigger Function which would poll your xxx.com\data at every x seconds (or whatever frequency you decide) and call your webhook (another http triggered function) if there is any change.
{
"name": "mytimer",
"type": "timerTrigger",
"direction": "in",
"schedule": "0 */5 * * * *"
}
import datetime
import logging
import os
import json
import pandas as pd
import azure.functions as func
def main(mytimer: func.TimerRequest) -> None:
utc_timestamp = datetime.datetime.utcnow().replace(
tzinfo=datetime.timezone.utc).isoformat()
if mytimer.past_due:
logging.warn('The timer is past due!')
# "xxx.com\data" polling in real scenario
l={'Lat': '0.000000',
'Long': '0.000000',
'Status': '1',
'Channel URL':"xxx.com\data"}
webhook_url="{function app base url}/api/HttpTrigger1"
r=requests.post(webhook_url, headers={'Content-Type':'application/json'}, data =json.dumps(l))
logging.info('Python timer trigger function ran at %s', utc_timestamp)
At the end of the day, you can deploy both your webhook function (http trigger) and the sender (timer trigger polling) into a Function app.
You can also think of getting rid of the webhook function altogether (to save one intermediate hop) and do your stuffs into the same timer triggered function.
You currently have some polling logic (under querying this table using the following code). If you want to move that to "Cloud" then create a TimerTrigger function, and put all your poller code in it.
If you want to leave that poller code untouched, but want to call some code in "cloud" whenever poller detects a change (updated with Lat==0,Long=00 and Status=1), then you can create an HTTPTrigger function and invoke it from poller whenever it detects the change.
Confusing part is this: How do you detect this change today? Where is the poller code hosted and how often is it executed?
If data in DB is changing then only ways you can execute "some code" whenever the data changes is:
poll the DB periodically, say every 1 minute and if tehre is a change execute "some code" OR
some feature of this DB allows you to configure a REST API (HTTP Webhook) that is called by the DB whenever there is a change. Implement a REST API (e.g. as an HttpTrigger function) and put that "some code" that you want executed inside it. Now whenever there is a change the DB calls your webhook/REST-API and "some code" is executed.
and the way to read it is to call a REST API (xxx.com/data?q=...) then the only ways you can detect

Google cloud trace + Gcloud logging in logs viewer

I've stuck with problem with google cloud logging and google cloud trace using google cloud kubernetes
I've the application which consumes gcloud pubsub topic and I want to unify logs in trace of every pubsub message handle func call
My Gcloud Logging handler code
class GCLHandler(CloudLoggingHandler):
def emit(self, record):
message = super(GCLHandler, self).format(record)
resource = Resource(
type='k8s_container',
labels={
'cluster_name': os.environ['CLUSTER_NAME'],
'container_name': os.environ['POD_APP_NAME'],
'location': os.environ['CLUSTER_LOCATION'],
'namespace_name': os.environ['POD_NAMESPACE'],
'pod_name': os.environ['POD_NAME'],
'project_id': _settings.PROJECT_NAME
}
)
labels: Dict[str, Any] = {
'k8s-pod/app': os.environ['POD_APP_NAME'],
'k8s-pod/app_kubernetes_io/managed-by': os.environ['POD_MANAGED_BY'],
'k8s-pod/pod-template-hash': os.environ['POD_TEMPLATE_HASH']
}
trace = getattr(record, 'traceId', None)
if trace is not None:
trace = f'projects/{_settings.PROJECT_NAME}/traces/{trace}'
self.transport.send(
record,
message,
resource=resource,
labels=labels,
trace=trace,
span_id=getattr(record, 'spanId', None)
)
I use opensensus integration with gcloud trace and logging, so I can get traceId and spanId and pass it into gcloud logging transport, it work fine and LogEntry in logs viewer contains proper traceId and spanId
My code of using gcloud trace looks like
config_integration.trace_integrations(['logging'])
logger = logging.getLogger(__name__)
exporter = stackdriver_exporter.StackdriverExporter(
project_id=settings.PROJECT_NAME
)
async def handle_message(message: Message) -> None:
tracer = Tracer(exporter=exporter, sampler=AlwaysOnSampler())
with tracer.span(name=f'Message#{message.message_id}'):
logger.debug(f'debug')
logger.info(f'info')
logger.warning(f'warning')
So, I can these logs in Logs Viewer, but they aren't gropped in one trace, but if I use gcloud trace viewer and search by traceId, I will find this trace with connected logs.
Q: There is any way to display trace in logs viewer as it displayed in any appengine service as appengine.googleapis.com/Frequest_log?
As it was confirmed by #Nikita Davydov in the comment section there's a workaround: you can create a fake http_request payload to group logs.
If it doesn't work for you, you can file a feature request at Google Public Issue Tracker in order to change current behavior.

Logging extra information in asynchronous programming

Despite example below is written on Python, it's not so python-dependent.
I need to pass extra per-request information every time like client-id for every single log item including ORM without modification of every library and application we use.
With synchronous programming, like you can see in example below, it's easy as we can create extra to pass and modify this extra one time per each request. But what to do in asynchronous programming as this technique becomes useless?
The sample code here is written on Python as it's easiest way to illustrate problem here but question itself is language-independent.
Logger configuration file:
# logging_config.py
extra = {
'application': None,
'client_id': None
}
# use custom logstash formatter to send this extra
Our main application file:
#!/usr/bin/env python3
import logging
import our_lib.logging_config as log_cfg
log = logging.getLogger(__name__)
def application_setup():
""" Sets up application and logging. """
log_cfg['application'] = 'OurWebApplication'
... # do the rest here
Our request processor:
# api_handler.py
import logging
log = logging.getLogger(__name__)
... # rest of the file
def extract_client_id(request):
""" Extracts Client ID from request or session. """
...
# some web API request handle
#route('/api/do_smth')
def api_do_smth_handler(request):
""" Handles `do_smth` web request. """
log_cfg.extra['client_id'] = extract_client_id(request)
...
Some library module:
import logging
log = logging.getLogger(__name__)
def process_data(*args, **kwargs):
""" Processes arguments. """
try:
... # process arguments
except MyException:
log.exception("argument processing failed") # <-- how to log client ID here?

Categories