Azure Functions Blob deployment - python

I am going off the documentation here: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-output?tabs=python
Here is the code I currently have:
function.json
{
"bindings": [
{
"queueName": "myqueue-items",
"connection": "nameofstorageaccount_STORAGE",
"name": "queuemsg",
"type": "queueTrigger",
"direction": "in"
},
{
"name": "inputblob",
"type": "blob",
"dataType": "binary",
"path": "samples-workitems/{queueTrigger}",
"connection": "nameofstorageaccount_STORAGE",
"direction": "in"
},
{
"name": "outputblob",
"type": "blob",
"dataType": "binary",
"path": "samples-workitems/{queueTrigger}-Copy",
"connection": "nameofstorageaccount_STORAGE",
"direction": "out"
}
],
"disabled": false,
"scriptFile": "__init__.py"
}
init.py
import logging
import azure.functions as func
def main(queuemsg: func.QueueMessage, inputblob: bytes, outputblob: func.Out[bytes]):
logging.info(f'Python Queue trigger function processed {len(inputblob)} bytes')
outputblob.set(inputblob)
If I am understanding correctly, this function should get triggered when a blob is added to a container, and for it to save a copy of that blob inside the same container.
The functions runs, however nothing happens when a blob is uploaded to a container? I would like to trigger some code with a blob being uploaded, this is the only full example I have found with Python and Blob Trigger.
Appreciate any help,
Thanks! :)

No. If you read the document, it states that the function is triggered when a message is sent to the queue:
The following example shows blob input and output bindings in a
function.json file and Python code that uses the bindings. The
function makes a copy of a blob. The function is triggered by a
queue message that contains the name of the blob to copy. The new
blob is named {originalblobname}-Copy.
If you want to execute a function when a blob is created, please see Blob Trigger example here: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-trigger?tabs=python.

Related

Input multiple blobs into azure function python

I'm working in python for a Azure function. I am trying to read in two blobs, one that is triggered, and a static blob.
When I read them in, both blobs point to the triggered blob (URI is the same). How can input and use two blobs correctly?
My bindings look like:
{
"name": "techdatablob",
"type": "blobTrigger",
"direction": "in",
"path": "path1/{name}",
"connection": "example"
},
{
"name": "crmdatablob",
"type": "blob",
"direction": "in",
"path": "path2/data.xlsx",
"connection": "example"
},
{
"name": "outputblob",
"type": "blob",
"direction": "out",
"path": "path3/out.xlsx",
"connection": "example"
}
And the init.py file starts with:
def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {techdatablob.name}\n"
f"Blob Size: {techdatablob.length} bytes")
print(techdatablob.uri)
print(crmdatablob.uri)
When I read them in, both blobs point to the triggered blob (URI is
the same). How can input and use two blobs correctly?
In fact, you have already input the multiple blob, the problem comes from azure function blob binding metadata is not from the function host, so things such as blob name, blob length, uri etc cannot get correct values. But in fact their data is different (the objects are also different).
You can do something like below to test:
import logging
import azure.functions as func
def main(techdatablob: func.InputStream, crmdatablob: func.InputStream) -> None:
logging.info("-----"+techdatablob.read().decode('utf-8'))
logging.info("-----"+crmdatablob.read().decode('utf-8'))
Have a check of this error page:
https://github.com/Azure/azure-functions-python-worker/issues/576
I think the problem is not on your side, it is a function design problem. There should be no problem using storage sdk to get metadata.

How to read xlsx blob into pandas from Azure function in python

I am reading in .xslx data from a blob in an azure function. My code looks something like this:
def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):
# Load in the tech and crm data
crm_data = pd.read_excel(crmdatablob.read().decode('ISO-8859-1'))
tech_data = pd.read_excel(techdatablob.read().decode('ISO-8859-1'))
The issue is when I try to decode the files, I get the following error:
ValueError: Protocol not known: PK...
And a lot of strange characters after the "...". Any thoughts on how to properly read in these files?
Please refer to my code, it seems that you don't need to add decode('ISO-8859-1'):
import logging
import pandas as pd
import azure.functions as func
def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {techdatablob.name}\n"
f"Blob Size: {techdatablob.length} bytes")
# Load in the tech and crm data
crm_data = pd.read_excel(crmdatablob.read())
logging.info(f"{crm_data}")
tech_data = pd.read_excel(techdatablob.read())
logging.info(f"{tech_data}")
Note: Your function.json should look like this. Otherwise, an error will occur.
{
"name": "techdatablob",
"type": "blobTrigger",
"direction": "in",
"path": "path1/{name}",
"connection": "example"
},
{
"name": "crmdatablob",
"dataType": "binary",
"type": "blob",
"direction": "in",
"path": "path2/data.xlsx",
"connection": "example"
},
{
"name": "outputblob",
"type": "blob",
"direction": "out",
"path": "path3/out.xlsx",
"connection": "example"
}
The difference between this and your function.json is that you are missing a dataType attribute.
My test result is like this, there are seems to be no problems.

How to upload a large string in an Azure Blob?

Right now I'm trying to figure out how to work with Azure, and now I'm stuck in a problem while storing my data in the storage account.
I have three strings and want to store each of them in a separate blob. With the first two, my code works fine, but the third one causes some retries and ends with a timeout.
My code is running within an Azure function.
Here is a minimal example:
from azure.storage.blob import BlobClient
blob_client = BlobClient.from_connection_string(
conn_str. = '<STORAGE_ACCOUNT_CONNECTION_STRING>',
container_name = '<CONTAINER_NAME>',
blob_name. = '<NAME_OF_BLOB>',
)
dic_blob_props = blob_client.upload_blob(
data = '<INFORMATION_THAT_SHOULD_GO_TO_THE_BLOB>',
blob_type = "BlockBlob",
overwrite = True,
)
The for the first two strings everything works fine but the third fails. The strings have the following length:
len(s_1) = 1246209
len(s_2) = 8794086
len(s_3) = 24518001
Most likely it is because the third string is too long, but there must be a way to save it, right?
I have already tried to set the timeout time within the .upload_blob method by timeout=600, but this has not changed the result at all, nor the time until a new attempt to write is made.
The error is:
Exception: ServiceResponseError: ('Connection aborted.', timeout('The write operation timed out'))
If you have any ideas on the problem pleast let me know :-)
In my case, the problem disappeared after I deployed the function in the cloud. It seems that there was a problem debugging with Visual Studio code.
On my side, I don't have the problem. You can have a look of my code:
__init__.py
import logging
import azure.functions as func
def main(req: func.HttpRequest,outputblob: func.Out[func.InputStream],) -> func.HttpResponse:
logging.info('This code is to upload a string to a blob.')
s_3 = "x"*24518001
outputblob.set(s_3)
return func.HttpResponse(
"The string already been uploaded to a blob.",
status_code=200
)
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "anonymous",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"route": "{test}",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
},
{
"name": "outputblob",
"type": "blob",
"path": "test1/{test}.txt",
"connection": "str",
"direction": "out"
}
]
}
local.settings.json
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "",
"FUNCTIONS_WORKER_RUNTIME": "python",
"str":"DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx==;EndpointSuffix=core.windows.net"
}
}
Then I hit the endpoint http://localhost:7071/api/bowman, it uploads the string to blob and don't have time out error:
So, I think the problem is related with the method you use.

How to edit a *.csv file in the azure blob storage with an azure function?

I am working with Azure at the moment and I am unhappy with the predefined functions in the DataFactory because they start a Cluster in the background and this is absolutely not necessary for my problem.
I receive a csv file in a predefined folder and want to pick a set of columns and store them in a certain order in a csv file.
At the moment my file looks as follows:
The JSON file:
"bindings": [
{
"name": "myblob",
"type": "blobTrigger",
"path": "input-raw",
"connection": "AzureWebJobsStorage",
"direction": "in"
},
{
"name": "outputblob",
"type": "blob",
"path": "{blobTrigger}-copy",
"connection": "AzureWebJobsStorage",
"direction": "out"
}
],
"disabled": false,
"scriptFile": "__init__.py"
}
The init.py:
import logging
import azure.functions as func
def main(myblob: func.InputStream, outputblob: func.Out[func.InputStream]):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {myblob.name}\n"
f"Blob Size: {myblob.length} bytes")
outputblob.set(myblob)
My function picks a file in the folder and copies it with a '-copy' in the end in the same folder.
Is there an easy way to access the data and to edit it with python?
Toll now I tried the packages 'csv', 'io' and 'fileinput' to read the information but I could not manage till now to edit or even see the data within my VisualStudioCode.
If you need more information please let me know.
Best
P
In fact there is no way to 'edit' the .csv file. But you can download the .csv file and change it then upload to override the .csv file on azure.
By the way, if i read right, your function have a big problem. When the azure function be triggered, there will be endless 'xx-Copy' file in your container. I mean the output file will be the trigger condition of your function and the function will be endless.
This is my function, It use InputStream in func to read the blob data:
import logging
import azure.functions as func
def main(myblob: func.InputStream):
logging.info(myblob.read().decode("utf-8") );
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {myblob.name}\n"
f"Blob Size: {myblob.length} bytes")
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "myblob",
"type": "blobTrigger",
"direction": "in",
"path": "samples-workitems",
"connection": "AzureWebJobsStorage"
}
]
}
In my situation, I first read the blob data to bytes, and then I convert it to string. Let me know whether this can solved your question.:)

Input multiple blob in Azure Functions

I want to trigger a job when I receive multiple files under the same container/directory in an Azure Storage. Let's say I receive the 2 files:
- mycontainer/uploads/files/file.rtf
- mycontainer/uploads/files/file.txt
The job I want should be triggered when both of those files appear. So I started defining the bindings like this:
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "myitem",
"type": "queueTrigger",
"direction": "in",
"queueName": "myfiles",
"connection": "StorageConnectionString"
},
{
"name": "inputRtf",
"type": "blob",
"direction": "in",
"path": "uploads/files/{blobname}.rtf",
"connection": "StorageConnectionString"
},
{
"name": "inputTxt",
"type": "blob",
"direction": "in",
"path": "uploads/files/{blobname}.txt",
"connection": "StorageConnectionString"
},
{
"name": "outputRtf",
"type": "blob",
"direction": "out",
"path": "output/{blobname}.rtf",
"connection": "StorageConnectionString"
},
{
"name": "outputTxt",
"type": "blob",
"direction": "out",
"path": "output/{blobname}.txt",
"connection": "StorageConnectionString"
}
]
}
Let's say for simplicity that the python code just copies the content of the .txt file into the output container, same for the .rtf file. I don't really understand how queueTrigger works so I'm pretty sure my config doesn't look right
You will have to confirm that all files needed your function are present and then trigger you function. The input binding won't be able to do it on its own.
You could instead have an Event Grid Triggered function which is fired for each blob uploaded and for each event you check if the other files required for you actual function are present.
If not, just return but if all the files are indeed present, trigger your actual function.
You could trigger you actual function with a storage queue message (use the binding) that has the filename details required for the blob input bindings.
For examples on using the Queue Trigger binding along with the Blob Input binding, check the blob input binding docs.

Categories