I want to trigger a job when I receive multiple files under the same container/directory in an Azure Storage. Let's say I receive the 2 files:
- mycontainer/uploads/files/file.rtf
- mycontainer/uploads/files/file.txt
The job I want should be triggered when both of those files appear. So I started defining the bindings like this:
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "myitem",
"type": "queueTrigger",
"direction": "in",
"queueName": "myfiles",
"connection": "StorageConnectionString"
},
{
"name": "inputRtf",
"type": "blob",
"direction": "in",
"path": "uploads/files/{blobname}.rtf",
"connection": "StorageConnectionString"
},
{
"name": "inputTxt",
"type": "blob",
"direction": "in",
"path": "uploads/files/{blobname}.txt",
"connection": "StorageConnectionString"
},
{
"name": "outputRtf",
"type": "blob",
"direction": "out",
"path": "output/{blobname}.rtf",
"connection": "StorageConnectionString"
},
{
"name": "outputTxt",
"type": "blob",
"direction": "out",
"path": "output/{blobname}.txt",
"connection": "StorageConnectionString"
}
]
}
Let's say for simplicity that the python code just copies the content of the .txt file into the output container, same for the .rtf file. I don't really understand how queueTrigger works so I'm pretty sure my config doesn't look right
You will have to confirm that all files needed your function are present and then trigger you function. The input binding won't be able to do it on its own.
You could instead have an Event Grid Triggered function which is fired for each blob uploaded and for each event you check if the other files required for you actual function are present.
If not, just return but if all the files are indeed present, trigger your actual function.
You could trigger you actual function with a storage queue message (use the binding) that has the filename details required for the blob input bindings.
For examples on using the Queue Trigger binding along with the Blob Input binding, check the blob input binding docs.
Related
I have a Python Azure function that triggers based on messages to a topic, which works fine independently. However, if I then try to also write a message to a different ServiceBus Queue it doesn't work (as in the Azure Function won't even trigger if new messages are published to the topic). Feels like the trigger conditions aren't met when I include the msg_out: func.Out[str] component. Any help would be much appreciated!
__init.py
import logging
import azure.functions as func
def main(msg: func.ServiceBusMessage, msg_out: func.Out[str]):
# Log the Service Bus Message as plaintext
# logging.info("Python ServiceBus topic trigger processed message.")
logging.info("Changes are coming through!")
msg_out.set("Send an email")
function.json
{
"scriptFile": "__init__.py",
"entryPoint": "main",
"bindings": [
{
"name": "msg",
"type": "serviceBusTrigger",
"direction": "in",
"topicName": "publish-email",
"subscriptionName": "validation-sub",
"connection": "Test_SERVICEBUS"
},
{
"type": "serviceBus",
"direction": "out",
"connection": "Test_SERVICEBUS",
"name": "msg_out",
"queueName": "email-test"
}
]
}
host.json
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0)"
},
"extensions": {
"serviceBus": {
"prefetchCount": 100,
"messageHandlerOptions": {
"autoComplete": true,
"maxConcurrentCalls": 32,
"maxAutoRenewDuration": "00:05:00"
},
"sessionHandlerOptions": {
"autoComplete": false,
"messageWaitTimeout": "00:00:30",
"maxAutoRenewDuration": "00:55:00",
"maxConcurrentSessions": 16
}
}
}
}
I can reproduce your problem, it seems to be caused by the following error:
Property sessionHandlerOptions is not allowed.
Deleting sessionHandlerOptions can be triggered normally.
I am going off the documentation here: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-output?tabs=python
Here is the code I currently have:
function.json
{
"bindings": [
{
"queueName": "myqueue-items",
"connection": "nameofstorageaccount_STORAGE",
"name": "queuemsg",
"type": "queueTrigger",
"direction": "in"
},
{
"name": "inputblob",
"type": "blob",
"dataType": "binary",
"path": "samples-workitems/{queueTrigger}",
"connection": "nameofstorageaccount_STORAGE",
"direction": "in"
},
{
"name": "outputblob",
"type": "blob",
"dataType": "binary",
"path": "samples-workitems/{queueTrigger}-Copy",
"connection": "nameofstorageaccount_STORAGE",
"direction": "out"
}
],
"disabled": false,
"scriptFile": "__init__.py"
}
init.py
import logging
import azure.functions as func
def main(queuemsg: func.QueueMessage, inputblob: bytes, outputblob: func.Out[bytes]):
logging.info(f'Python Queue trigger function processed {len(inputblob)} bytes')
outputblob.set(inputblob)
If I am understanding correctly, this function should get triggered when a blob is added to a container, and for it to save a copy of that blob inside the same container.
The functions runs, however nothing happens when a blob is uploaded to a container? I would like to trigger some code with a blob being uploaded, this is the only full example I have found with Python and Blob Trigger.
Appreciate any help,
Thanks! :)
No. If you read the document, it states that the function is triggered when a message is sent to the queue:
The following example shows blob input and output bindings in a
function.json file and Python code that uses the bindings. The
function makes a copy of a blob. The function is triggered by a
queue message that contains the name of the blob to copy. The new
blob is named {originalblobname}-Copy.
If you want to execute a function when a blob is created, please see Blob Trigger example here: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-trigger?tabs=python.
I am using an Azure function as Webhook Consumer to receive http events as JSON and storing it on Azure Storage.
I want to do Dynamic naming of the output blob path based on date as shown below. I tried lots of options, but not able to get the desired output.
I am followed this post but no luck.
Excepted write path:
source/
ctry/
yyyy/
mm/
date/
hrs/
event_{systemtime}.json
function.json:
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "anonymous",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
},
{
"type": "blob",
"name": "outputblob",
"path": "source/ctry/{datetime:yyyy}/{datetime:MM}/{datetime:dd}/{datetime:hh}/event_{systemtime}.json",
"direction": "out",
"connection": "MyStorageConnectionAppSetting"
}
]
}
init.py
import logging
import azure.functions as func
def main(req: func.HttpRequest,outputblob: func.Out[str]) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
name = 'some_name'
if not name:
try:
req_body = 'req_body_test'#req.get_json()
except ValueError:
pass
else:
name = 'name'#req_body.get('name')
print(str(req.get_json()))
outputblob.set(str(req.get_json()))
Dynamic blob name needs you to post a request in json format.
For example, if want to output blob to test/{testdirectory}/test.txt, then, you need to post a request like:
{
"testdirectory":"nameofdirectory"
}
After that, the binding of azure function will be able to get the directory name.
By the way, I don't recommend binding for complex blob operations. I recommend using SDK more than binding.
I am able to achieve the dynamic path by making below changes to
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "anonymous",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
},
{
"type": "blob",
"name": "outputblob",
"path": "source/ctry/{DateTime:yyyy}/{DateTime:MM}/{DateTime:dd}/event_{DateTime}.json",
"direction": "out",
"connection": "MyStorageConnectionAppSetting"
}
]
}
I am reading in .xslx data from a blob in an azure function. My code looks something like this:
def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):
# Load in the tech and crm data
crm_data = pd.read_excel(crmdatablob.read().decode('ISO-8859-1'))
tech_data = pd.read_excel(techdatablob.read().decode('ISO-8859-1'))
The issue is when I try to decode the files, I get the following error:
ValueError: Protocol not known: PK...
And a lot of strange characters after the "...". Any thoughts on how to properly read in these files?
Please refer to my code, it seems that you don't need to add decode('ISO-8859-1'):
import logging
import pandas as pd
import azure.functions as func
def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {techdatablob.name}\n"
f"Blob Size: {techdatablob.length} bytes")
# Load in the tech and crm data
crm_data = pd.read_excel(crmdatablob.read())
logging.info(f"{crm_data}")
tech_data = pd.read_excel(techdatablob.read())
logging.info(f"{tech_data}")
Note: Your function.json should look like this. Otherwise, an error will occur.
{
"name": "techdatablob",
"type": "blobTrigger",
"direction": "in",
"path": "path1/{name}",
"connection": "example"
},
{
"name": "crmdatablob",
"dataType": "binary",
"type": "blob",
"direction": "in",
"path": "path2/data.xlsx",
"connection": "example"
},
{
"name": "outputblob",
"type": "blob",
"direction": "out",
"path": "path3/out.xlsx",
"connection": "example"
}
The difference between this and your function.json is that you are missing a dataType attribute.
My test result is like this, there are seems to be no problems.
Right now I'm trying to figure out how to work with Azure, and now I'm stuck in a problem while storing my data in the storage account.
I have three strings and want to store each of them in a separate blob. With the first two, my code works fine, but the third one causes some retries and ends with a timeout.
My code is running within an Azure function.
Here is a minimal example:
from azure.storage.blob import BlobClient
blob_client = BlobClient.from_connection_string(
conn_str. = '<STORAGE_ACCOUNT_CONNECTION_STRING>',
container_name = '<CONTAINER_NAME>',
blob_name. = '<NAME_OF_BLOB>',
)
dic_blob_props = blob_client.upload_blob(
data = '<INFORMATION_THAT_SHOULD_GO_TO_THE_BLOB>',
blob_type = "BlockBlob",
overwrite = True,
)
The for the first two strings everything works fine but the third fails. The strings have the following length:
len(s_1) = 1246209
len(s_2) = 8794086
len(s_3) = 24518001
Most likely it is because the third string is too long, but there must be a way to save it, right?
I have already tried to set the timeout time within the .upload_blob method by timeout=600, but this has not changed the result at all, nor the time until a new attempt to write is made.
The error is:
Exception: ServiceResponseError: ('Connection aborted.', timeout('The write operation timed out'))
If you have any ideas on the problem pleast let me know :-)
In my case, the problem disappeared after I deployed the function in the cloud. It seems that there was a problem debugging with Visual Studio code.
On my side, I don't have the problem. You can have a look of my code:
__init__.py
import logging
import azure.functions as func
def main(req: func.HttpRequest,outputblob: func.Out[func.InputStream],) -> func.HttpResponse:
logging.info('This code is to upload a string to a blob.')
s_3 = "x"*24518001
outputblob.set(s_3)
return func.HttpResponse(
"The string already been uploaded to a blob.",
status_code=200
)
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "anonymous",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"route": "{test}",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
},
{
"name": "outputblob",
"type": "blob",
"path": "test1/{test}.txt",
"connection": "str",
"direction": "out"
}
]
}
local.settings.json
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "",
"FUNCTIONS_WORKER_RUNTIME": "python",
"str":"DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx==;EndpointSuffix=core.windows.net"
}
}
Then I hit the endpoint http://localhost:7071/api/bowman, it uploads the string to blob and don't have time out error:
So, I think the problem is related with the method you use.