Blob name patterns of Azure Functions in Python - python

I am implementing an Azure Function in Python which is triggered by a file uploaded to blob storage. I want to specify the pattern of the filename and use its parts inside my code as follows:
function.json:
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "inputblob",
"type": "blobTrigger",
"direction": "in",
"path": "dev/sources/{filename}.csv",
"connection": "AzureWebJobsStorage"
}
]
}
The executed __init__.py file looks as follows:
import logging
import azure.functions as func
def main(inputblob: func.InputStream):
logging.info('Python Blob trigger function processed %s', inputblob.filename)
The error message that I get is: AttributeError: 'InputStream' object has no attribute 'filename'.
As a reference, I used this documentation.
Did I do something wrong or is it not possible to achieve what I want in Python?

Your function code should be this:
import logging
import os
import azure.functions as func
def main(myblob: func.InputStream):
head, filename = os.path.split(myblob.name)
name = os.path.splitext(filename)[0]
logging.info(f"Python blob trigger function processed blob \n"
f"Name without extension: {name}\n"
f"Filename: {filename}")
It should be name instead of filename.:)

I know its really late but I was going through the same problem and I got a getaway so I decided to answer you here.
You can just do reassemble the string in python.
inside init.py -
filenameraw = inputblob.name
filenameraw = filenameraw.split('/')[-1]
filenameraw = filenameraw.replace(".csv","")
with this you'll get your desired output. :)

Related

How do I get the file name that triggered a Azure Function Blob Trigger

I have a Azure function with a blob trigger in Python that scans the content of of PDF files that get added to a container, how do I get the file name of the file that triggered the trigger e.g. "bank_data.pdf"?
def main(myblob: func.InputStream):
blob = {myblob.name}
I get this error when trying to get the name through the InputStream:
Result: Failure Exception: FunctionLoadError: cannot load the pdf_blob_trigger_test function: the following parameters are declared in Python but not in function.json: {'name'} Stack: File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py", line 371, in _handle__function_load_request self._functions.add_function( File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/functions.py", line 353, in add_function input_types, output_types = self.validate_function_params(params, File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/functions.py", line 137, in validate_function_params raise FunctionLoadError(
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "myblob",
"type": "blobTrigger",
"direction": "in",
"path": "data-upload-pdf-test/{name}.pdf",
"connection": "AzureWebJobsStorage",
"containerName": "data-upload-pdf-test"
}
]
}
Solution to my problem is in the comments of the accepted answer
I have reproduced in my environment and got expected results as below:
Firstly, go to Function App, I have used Logs section to find which file is triggered my Azure Blob Trigger by using below KQL query:
traces
| where message contains "Python blob trigger"
In Storage Account:
init.py
import logging
import azure.functions as func
def main(myblob: func.InputStream):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {myblob.name}\n"
f"Blob Size: {myblob.length} bytes")
So, you can able to log the details of file name in Logs section of function app.
So, used f"Name: {myblob.name}\n" to print file name which triggered.

Neo4j FAST API add labels dynamically

from fastapi import FastAPI
from pydantic import BaseModel
import os
from dotenv import load_dotenv
from neo4j import GraphDatabase
load_dotenv()
uri=os.getenv("uri")
user=os.getenv("user")
pwd=os.getenv("pwd")
class nodemodel(BaseModel):
label:str
name:str
def connection():
driver=GraphDatabase.driver(uri=uri,auth=(user,pwd))
return driver
app=FastAPI()
#app.post("/createNode")
def createnode(node:nodemodel):
driver_neo4j=connection()
session=driver_neo4j.session()
q1="""
CREATE(n{name:$name}) WITH n
CALL apoc.create.addLabels(n, [$label]) YIELD node
return n.name as name
"""
x={"name":node.name, "label":node.label}
results=session.run(q1,x)
return {"response" +results}
Thats my current code for a Neo4j REST API and it is working but i want to add labels more dynamically instead of just one. I want to type into the Request Body of my API:
{
"labels": "Label1:Label2:Label3",
"name": "string"
}
or just
{
"labels": "Label1",
"name": "string"
}
and i want both Options to work. Is there a way to do this, can someone show me an example with code?
Kind Regards
Tim
I fix some of the syntax errors on your script. See below:
The APOC apoc.create.addLabels parameters are both list so make (nodes, $labels) as lists. Pass nodes as collect(n) and labels as ["Label1", "Label2", "Label3"]
In the query, during CREATE you need to use a node class name so I used SampleNode
In your query, use node.name rather than n.name
Sample value of x:
x= {"labels": ["Label1", "Label2", "Label3"], "name": "Sample Name"}
q1="""
CREATE (n:SampleNode {name:$name})
WITH collect(n) as nodes
CALL apoc.create.addLabels(nodes, $labels) YIELD node
return node.name as name"""
"""
x={"name":node.name, "labels":node.labels}
results=session.run(q1,x)
I tested it in my python notebook and it looks like this:

Great Expectation with Azure and Databricks

I want to run great_expectation test suites against csv files in my ADLS Gen2. On my ADLS, I have a container called "input" in which I have a file at input/GE/ind.csv. I use a InferredAssetAzureDataConnector. I was able to create and test/validate the data source configuration. But when i validate my data I'm getting below error.
import datetime
import pandas as pd
from ruamel import yaml
from great_expectations.core.batch import RuntimeBatchRequest
from great_expectations.data_context import BaseDataContext
from great_expectations.data_context.types.base import (
DataContextConfig,
FilesystemStoreBackendDefaults,
)
from ruamel import yaml
import great_expectations as ge
from great_expectations.core.batch import Batch, BatchRequest
#Root Directory
root_directory = "/dbfs/FileStore/great_expectation_official/"
#Data Context
data_context_config = DataContextConfig(
store_backend_defaults=FilesystemStoreBackendDefaults(
root_directory=root_directory
),
)
context = BaseDataContext(project_config=data_context_config)
#Configure your Datasource
datasource_config = {
"name": "my_azure_datasource",
"class_name": "Datasource",
"execution_engine": {
"class_name": "SparkDFExecutionEngine",
"azure_options": {
"account_url": "https://<account_Name>.blob.core.windows.net",
"credential": "ADLS_key",
},
},
"data_connectors": {
"default_inferred_data_connector_name": {
"class_name": "InferredAssetAzureDataConnector",
"azure_options": {
"account_url": "https://<account_Name>.blob.core.windows.net",
"credential": "ADLS_key",
},
"container": "input",
"name_starts_with": "/GE/",
"default_regex": {
"pattern": "(.*)\\.csv",
"group_names": ["data_asset_name"],
},
},
},
}
context.test_yaml_config(yaml.dump(datasource_config))
context.add_datasource(**datasource_config)
batch_request = BatchRequest(
datasource_name="my_azure_datasource",
data_connector_name="default_inferred_data_connector_name",
data_asset_name="data_asset_name",
batch_spec_passthrough={"reader_method": "csv", "reader_options": {"header": True}},
)
context.create_expectation_suite(
expectation_suite_name="test_suite", overwrite_existing=True
)
validator = context.get_validator(
batch_request=batch_request, expectation_suite_name="test_suite"
)
[Error_snapshot_click_here]
[csv_data_snapshot]
Can someone help me to find out the issue?
You can check with the following code whether your batch list is indeed empty.
context.get_batch_list(batch_request=batch_request)
If this is empty, you probably have an issue with your data_asset_names.
You can check whether the correct data asset name has been used in the output of the following code.
context.test_yaml_config(yaml.dump(my_spark_datasource_config))
In the output there is a list of available data_asset_names that you can choose from. If the data asset_name of your BatchRequest is not in the list, you will have an empty batch_list as the data asset is not available. There should be a warning here but I think it is not implemented.
I had the same issue but figured it out yesterday. Also, you should produce a workable example of your error so people can explore the code. That way it is easier to help you.
Hopefully this helps!

Python script for outputting JSON has wrong output for attribute-value pair

I'm writing a Python script to output some JSON files to Cosmos DB on Azure. My script looks as follows:
import logging
import uuid
import json
import azure.functions as func
def main(event: func.EventHubEvent, message: func.Out[func.Document]) -> None:
event_body = event.get_body().decode('utf-8')
logging.info('Python event trigger function processed an event item: %s',event_body)
data = {
"value": event_body,
"insertion_time": event_body
}
message.set(func.Document.from_json(json.dumps(data)))
The output is written like:
{
"value": "{\n \"value\": \"66\",\n \"insertion_time\": \"2020-06-02T05:50:00+00:00\"\n}",
"insertion_time": "{\n \"value\": \"66\",\n \"insertion_time\": \"2020-06-02T05:50:00+00:00\"\n}"
}
However, I'd like it to be like:
{
"value": "66",
"insertion_time": "2020-06-02T05:50:00+00:00"
}
How do I correct this?
Your event_body appears to be a JSON string which already contains exactly what you want. It appears you don’t need to do anything and just use it directly:
message.set(func.Document.from_json(event_body))

Load pickle file in Azure function from Azure Blob Storage

I try to build service via Azure functions that performs a matrix multiplication using a vector given by the http request and a fix numpy matrix. The matrix is stored in Azure blob storage as a pickle file and I want to load it via an input binding. However I do not manage to load the pickle file. I am able to load plain text files.
Right now my approach looks like this:
def main(req: func.HttpRequest, blobIn: func.InputStream) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
matrix = pickle.loads(blobIn.read())
vector = req.params.get('vector')
result = matrix.dot(vector)
return func.HttpResponse(json.dumps(result))
The error I get when running it that way is UnpicklingError: invalid load key, '\xef'. Another approach I tried after some googling was the following:
def main(req: func.HttpRequest, blobIn: func.InputStream) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
blob_bytes = matrix.read()
blob_to_read = BytesIO(blob_bytes)
with blob_to_read as f:
A = pickle.load(f)
vector = req.params.get('vector')
result = matrix.dot(vector)
return func.HttpResponse(json.dumps(result))
But it yields the same error. I also tried to save the matrix in a text file, get the string and build the matrix based on the string, but I encountered other issues.
So how can I load a pickle file in my Azure function? Is it even the correct approach to use input bindings to load such files or is there a better way? Many thanks for your help!
Thanks for evilSnobu's contribution.
So when face this problem, that means the pickle file you get in your code is corrupt.
The solution is add "dataType": "binary" to the input binding in function.json.
Like this:
{
"name": "inputBlob",
"type": "blob",
"dataType": "binary",
"direction": "in",
"path": "xxx/xxx.xxx",
"connection": "AzureWebJobsStorage"
}

Categories