Problem triggering nested dependencies in Azure Function - python

I have a problem using the videohash package for python when deployed to Azure function.
My deployed azure function does not seem to be able to use a nested dependency properly. Specifically, I am trying to use the package “videohash” and the function VideoHash from it. The
input to VideoHash is a SAS url token for a video placed on an Azure blob storage.  
In the monitor of my output it prints: 
Accessing the sas url token directly takes me to the video, so that part seems to be working.  
Looking at the source code for videohash this error seems to occur in the process of downloading the video from a given url (link:
https://github.com/akamhy/videohash/blob/main/videohash/downloader.py). 
.. where self.yt_dlp_path = str(which("yt-dlp")). This to me indicates, that after deploying the function, the package yt-dlp isn’t properly activated. This is a dependency from the videohash
module, but adding yt-dlp directly to the requirements file of the azure function also does not solve the issue. 
Any ideas on what is happening? 
 
Deploying code to Azure function, which resulted in the details highlighted in the issue description.

I have a work around where you download the video file on you own instead of the videohash using azure.storage.blob
To download you will need a BlobServiceClient , ContainerClient and connection string of azure storage account.
Please create two files called v1.mp3 and v2.mp3 before downloading the video.
file structure:
Complete Code:
import logging
from videohash import VideoHash
import azure.functions as func
import subprocess
import tempfile
import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
def main(req: func.HttpRequest) -> func.HttpResponse:
# local file path on the server
local_path = tempfile.gettempdir()
filepath1 = os.path.join(local_path, "v1.mp3")
filepath2 = os.path.join(local_path,"v2.mp3")
# Reference to Blob Storage
client = BlobServiceClient.from_connection_string("<Connection String >")
# Reference to Container
container = client.get_container_client(container= "test")
# Downloading the file
with open(file=filepath1, mode="wb") as download_file:
download_file.write(container.download_blob("v1.mp3").readall())
with open(file=filepath2, mode="wb") as download_file:
download_file.write(container.download_blob("v2.mp3").readall())
// video hash code .
videohash1 = VideoHash(path=filepath1)
videohash2 = VideoHash(path=filepath2)
t = videohash2.is_similar(videohash1)
return func.HttpResponse(f"Hello, {t}. This HTTP triggered function executed successfully.")
Output :
Now here I am getting the ffmpeg error which related to my test file and not related to error you are facing.
This work around as far as I know will not affect performance as in both scenario you are downloading blobs anyway

Related

Is there a way to access DriveItems via the Microsoft Graph API for python?

I'm writing a python application requiring that I download a folder from OneDrive. I understand that there was a package called onedrivesdk in the past for doing this, but it has since been deprecated and it is now recommended that the Microsoft Graph API be used to access OneDrive files (https://pypi.org/project/onedrivesdk/). It is my understanding that this requires somehow producing a DriveItem object referring to the target folder.
I was able to access the folder via GraphClient in msgraph.core :
from azure.identity import DeviceCodeCredential
from msgraph.core import GraphClient
import configparser
config = configparser.ConfigParser()
config.read(['config.cfg'])
azure_settings = config['azure']
scopes = azure_settings['graphUserScopes'].split(' ')
device_code_credential = DeviceCodeCredential(azure_settings['clientId'], tenant_id=azure_settings['tenantId'])
client = GraphClient(credential=device_code_credential, scopes=scopes)
import json
endpoint = '/me/drive/root:/Desktop'
x = client.get(endpoint)
x is the requests.models.Response object referring to the target folder (Desktop). I don't know how to extract from x a DriveItem or otherwise iterate over its contents programmatically. How can I do this?
Thanks

Automatically create zip file with dependencies for AWS Lambda - Python

I am building a script which automatically builds AWS Lambda function. I follow this github repo as an inspiration.
However, the lambda_handler that I want to deploy is having extra dependencies, such as numpy, pandas, or even lgbm. Simple example below:
import numpy as np
def lambda_handler(event, context):
result = np.power(event['data'], 2)
response = {'result': result}
return response
example of an event and its response:
event = {'data' = [1,2,3]}
lambda_handler(event)
> {"result": [1,4,9]}
I would like to automatically add the needed layer, while creating the AWS Lambda. For that I think I need to change create_lambda_deployment_package function that is in the lambda_basic.py in the repo. What I was thinking of doing is a following:
import zipfile
import glob
import io
def create_full_lambda_deployment_package(function_file_name):
buffer = io.BytesIO()
with zipfile.ZipFile(buffer, 'w') as zipped:
# Adding all the files around the lambda_handler directory
for i in glob.glob(function_file_name):
zipped.write(i)
# Adding the numpy directory
for i in glob.glob('./venv/lib/python3.7/site-packages/numpy/*'):
zipped.write(i, f'numpy/{i[41:]}')
buffer.seek(0)
return buffer.read()
Despite the fact that lambda is created and 'numpy' folder appears my lambda environment, unfortunately this doesn't work (error cannot import name 'integer' from partially initialized module 'numpy' (most likely due to a circular import) (/var/task/numpy/__init__.py)").
How could I fix this issue? Or is there maybe another way to solve my problem?

Azure Functions - How to Connect to other services using API connections

I'm new to azure functions, I want to deploy my python code in function app, where my code is linked with SharePoint, Outlook, SQL Server, could some one suggest me the best way to connect all 3 of them in azure functions App....#python #sql #sharepoint #azure
Firstly, would like to discuss about accessing SharePoint files from Azure function, we just need to use few imports for it from VSCode and we also have the python documentation for Office365-Rest-Client.
Below is one of the examples to download a file from SharePoint:
Import os
import tempfile
from office365.sharepoint.client_context import ClientContext
from tests import test_team_site_url, test_client_credentials
ctx = ClientContext(test_team_site_url).with_credentials(test_client_credentials)
# file_url = '/sites/team/Shared Documents/big_buck_bunny.mp4'
file_url = "/sites/team/Shared Documents/report #123.csv"
download_path = os.path.join(tempfile.mkdtemp(), os.path.basename(file_url))
with open(download_path, "wb") as local_file:
file = ctx.web.get_file_by_server_relative_path(file_url).download(local_file).execute_query()
print("[Ok] file has been downloaded into: {0}".format(download_path))
To get all the details of files folders and their operations refer to this GIT link
For connecting to SQL we have a blog which has all insights with Python code, thanks to lieben.

add delay for a task until particular file is moved from bucket

I am new to Airflow. I have to check for a file which is generated from DAG (eg: sample.txt) is moved from bucket(in my case the file I have generated will be moved away from the bucket when picked up by other system, and then there won't be this output file in the bucket. It might take few minutes for the file to be removed from bucket)
How to add a task in the same DAG where it waits/retires till the file is moved away from the bucket and when the sample.txt file is moved away then proceed with the next task.
Is there any operator which satisfies the above criteria? please throw some light on how to proceed
You can create a custom sensor based on the current GCSObjectExistenceSensor
The modification is simple:
from airflow.providers.google.cloud.sensors.gcs import GCSObjectExistenceSensor
class GCSObjectNotExistenceSensor(GCSObjectExistenceSensor):
def poke(self, context: dict) -> bool:
self.log.info('Sensor checks if : %s, %s does not exist', self.bucket, self.object)
hook = GCSHook(
gcp_conn_id=self.google_cloud_conn_id,
delegate_to=self.delegate_to,
impersonation_chain=self.impersonation_chain,
)
return not hook.exists(self.bucket, self.object)
Then use the sensor GCSObjectNotExistenceSensor in your code like:
gcs_object_does_not_exists = GCSObjectNotExistenceSensor(
bucket=BUCKET_1,
object=PATH_TO__FILE,
mode='poke',
task_id="gcs_object_does_not_exists_task",
)
The sensor will not let the pipeline to continue until the object PATH_TO__FILE is removed.
You can use airflow PythonOperator to achieve the task. Make the Python callable continuously poke GCS and check if the file is removed. Return from Python function when the file from GCS is removed.
from airflow.operators.python_operator import PythonOperator
from google.cloud import storage
import google.auth
def check_file_in_gcs():
credentials, project = google.auth.default()
storage_client = storage.Client('your_Project_id', credentials=credentials)
name = 'sample.txt'
bucket_name = 'Your_Bucket_name'
bucket = storage_client.bucket(bucket_name)
while True:
stats = storage.Blob(bucket=bucket, name=name).exists(storage_client)
if not stats:
print("Returning as file is removed!!!!")
return
check_gcs_file_removal = PythonOperator(
task_id='check_gcs_file_removal',
python_callable= check_file_in_gcs,
#op_kwargs={'params': xyz},
#Pass bucket name and other details if needed by commentating above
dag=dag
)
you might need to install Python packages for the google cloud libraries to work. Please install one from below. (Not sure which one to install exactly.Taken from my virtualenv)
google-api-core==1.16.0
google-api-python-client==1.8.0
google-auth==1.12.0
google-auth-httplib2==0.0.3
google-auth-oauthlib==0.4.1
google-cloud-core==1.3.0
google-cloud-storage==1.27.0

FileUploadMiscError while persisting output file from Azure Batch

I'm facing the following error while trying to persist log files to Azure Blob storage from Azure Batch execution - "FileUploadMiscError - A miscellaneous error was encountered while uploading one of the output files". This error doesn't give a lot of information as to what might be going wrong. I tried checking the Microsoft Documentation for this error code, but it doesn't mention this particular error code.
Below is the relevant code for adding the task to Azure Batch that I have ported from C# to Python for persisting the log files.
Note: The container that I have configured gets created when the task is added, but there's no blob inside.
import datetime
import logging
import os
import azure.storage.blob.models as blob_model
import yaml
from azure.batch import models
from azure.storage.blob.baseblobservice import BaseBlobService
from azure.storage.common.cloudstorageaccount import CloudStorageAccount
from dotenv import load_dotenv
LOG = logging.getLogger(__name__)
def add_tasks(batch_client, job_id, task_id, io_details, blob_details):
task_commands = "This is a placeholder. Actual code has an actual task. This gets completed successfully."
LOG.info("Configuring the blob storage details")
base_blob_service = BaseBlobService(
account_name=blob_details['account_name'],
account_key=blob_details['account_key'])
LOG.info("Base blob service created")
base_blob_service.create_container(
container_name=blob_details['container_name'], fail_on_exist=False)
LOG.info("Container present")
container_sas = base_blob_service.generate_container_shared_access_signature(
container_name=blob_details['container_name'],
permission=blob_model.ContainerPermissions(write=True),
expiry=datetime.datetime.now() + datetime.timedelta(days=1))
LOG.info(f"Container SAS created: {container_sas}")
container_url = base_blob_service.make_container_url(
container_name=blob_details['container_name'], sas_token=container_sas)
LOG.info(f"Container URL created: {container_url}")
# fpath = task_id + '/output.txt'
fpath = task_id
LOG.info(f"Creating output file object:")
out_files_list = list()
out_files = models.OutputFile(
file_pattern=r"../stderr.txt",
destination=models.OutputFileDestination(
container=models.OutputFileBlobContainerDestination(
container_url=container_url, path=fpath)),
upload_options=models.OutputFileUploadOptions(
upload_condition=models.OutputFileUploadCondition.task_completion))
out_files_list.append(out_files)
LOG.info(f"Output files: {out_files_list}")
LOG.info(f"Creating the task now: {task_id}")
task = models.TaskAddParameter(
id=task_id, command_line=task_commands, output_files=out_files_list)
batch_client.task.add(job_id=job_id, task=task)
LOG.info(f"Added task: {task_id}")
There is a bug in Batch's OutputFile handling which causes it to fail to upload to containers if the full container URL includes any query-string parameters other than the ones included in the SAS token. Unfortunately, the azure-storage-blob Python module includes an extra query string parameter when generating the URL via make_container_url.
This issue was just raised to us, and a fix will be released in the coming weeks, but an easy workaround is instead of using make_container_url to craft the URL, craft it yourself like so: container_url = 'https://{}/{}?{}'.format(blob_service.primary_endpoint, blob_details['container_name'], container_sas).
The resulting URL should look something like this: https://<account>.blob.core.windows.net/<container>?se=2019-01-12T01%3A34%3A05Z&sp=w&sv=2018-03-28&sr=c&sig=<sig> - specifically it shouldn't have restype=container in it (which is what the azure-storage-blob package is including)

Categories