Uploading a Video to Azure Media Services with Python SDKs - python

I am currently looking for a way to upload a video to Azure Media Services (AMS v3) via Python SDKs. I have followed its instruction, and am able to connect to AMS successfully.
credentials = AdalAuthentication(
client = AzureMediaServices(credentials, SUBSCRIPTION_ID) # Successful
I also successfully get all the videos' details uploaded via its portal
for data in client.assets.list(RESOUCE_GROUP_NAME, ACCOUNT_NAME).get(0):
print(f'Asset_name: {data.name}, file_name: {data.description}')
# Asset_name: 4f904060-d15c-4880-8c5a-xxxxxxxx, file_name: 夢想全紀錄.mp4
# Asset_name: 8f2e5e36-d043-4182-9634-xxxxxxxx, file_name: an552Qb_460svvp9.webm
# Asset_name: aef495c1-a3dd-49bb-8e3e-xxxxxxxx, file_name: world_war_2.webm
# Asset_name: b53d8152-6ecd-41a2-a59e-xxxxxxxx, file_name: an552Qb_460svvp9.webm - Media Encoder Standard encoded
However, when I tried to use the following method; it failed. Since I have no idea what to parse as parameters - Link to Python SDKs
create_or_update(resource_group_name, account_name, asset_name,
parameters, custom_headers=None, raw=False, **operation_config)
Therefore, I would like to ask questions as follows (everything is done via Python SDKs):
What kind of parameters does it expect?
Can a video be uploaded directly to AMS or it should be uploaded to Blob Storage first?
Should an Asset contain only one video or multiple files are fine?

The documentation for the REST version of that method is at https://learn.microsoft.com/en-us/rest/api/media/assets/createorupdate. This is effectively the same as the Python parameters.
Videos are stored in Azure Storage for Media Services. This is true for input assets, the assets that are encoded, and any streamed content. It all is in Storage but accessed by Media Services. You do need to create an asset in Media Services which creates the Storage container. Once the Storage container exists you upload via the Storage APIs to that Media Services created container.
Technically multiple files are fine, but there are a number of issues with doing that that you may not expect. I'd recommend using 1 input video = 1 Media Services asset. On the encoding output side there will be more than one file in the asset. Encoding output contains one or more videos, manifests, and metadata files.

I have found my method to work around using Python SDKs and REST; however, I am not quite sure it's proper.
Log-In to Azure Media Services and Blob Storage via Python packages
import adal
from msrestazure.azure_active_directory import AdalAuthentication
from msrestazure.azure_cloud import AZURE_PUBLIC_CLOUD
from azure.mgmt.media import AzureMediaServices
from azure.mgmt.media.models import MediaService
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
Create Assets for an original file and an encoded one by parsing these parameters. Example of the original file Asset creation.
asset_name = 'asset-myvideo'
asset_properties = {
'properties': {
'description': 'Original File Description',
'storageAccountName': "storage-account-name"
client.assets.create_or_update(RESOUCE_GROUP_NAME, ACCOUNT_NAME, asset_name, asset_properties)
Upload a video to the Blob Storage derived from the created original asset
current_container = [data.container for data in client.assets.list(RESOUCE_GROUP_NAME, ACCOUNT_NAME).get(0) if data.name == asset_name][0] # Get Blob Storage location
file_name = "myvideo.mp4"
blob_client = blob_service_client.get_blob_client(container=current_container, blob=file_name)
with open('original_video.mp4', 'rb') as data:
print(f'Video uploaded to {current_container}')
And after that, I do Transform, Job, and Streaming Locator to get the video Streaming Link successfully.

I was able to get this to work with the newer python SDK. The python documentation is mostly missing, so I constructed this mainly from the python SDK source code and the C# examples.
0) Import a lot of stuff
from azure.mgmt.media.models import Asset, Transform, Job,
BuiltInStandardEncoderPreset, TransformOutput, \
JobInputAsset, JobOutputAsset, AssetContainerSas, AssetContainerPermission
import adal
from msrestazure.azure_active_directory import AdalAuthentication
from msrestazure.azure_cloud import AZURE_PUBLIC_CLOUD
from azure.mgmt.media import AzureMediaServices
from azure.storage.blob import BlobServiceClient, ContainerClient
import datetime as dt
import time
LOGIN_ENDPOINT = AZURE_PUBLIC_CLOUD.endpoints.active_directory
RESOURCE = AZURE_PUBLIC_CLOUD.endpoints.active_directory_resource_id
# AzureSettings is a custom NamedTuple
1) Log in to AMS:
def get_ams_client(settings: AzureSettings) -> AzureMediaServices:
context = adal.AuthenticationContext(LOGIN_ENDPOINT + '/' +
credentials = AdalAuthentication(
return AzureMediaServices(credentials, settings.AZURE_SUBSCRIPTION_ID)
2) Create an input and output asset
input_asset = create_or_update_asset(
input_asset_name, "My Input Asset", client, azure_settings)
input_asset = create_or_update_asset(
output_asset_name, "My Output Asset", client, azure_settings)
3) Get the Container Name. (most documentation refers to BlockBlobService, which is seems to have been removed from the SDK)
def get_container_name(client: AzureMediaServices, asset_name: str, settings: AzureSettings):
expiry_time = dt.datetime.now(dt.timezone.utc) + dt.timedelta(hours=4)
container_list: AssetContainerSas = client.assets.list_container_sas(
permissions = AssetContainerPermission.read_write,
sas_uri: str = container_list.asset_container_sas_urls[0]
container_client: ContainerClient = ContainerClient.from_container_url(sas_uri)
return container_client.container_name
4) Upload a file the the input asset container:
def upload_file_to_asset_container(
container: str, local_file, uploaded_file_name, settings: AzureSettings):
blob_service_client = BlobServiceClient.from_connection_string(settings.AZURE_MEDIA_STORAGE_CONNECTION_STRING))
blob_client = blob_service_client.get_blob_client(container=container, blob=uploaded_file_name)
with open(local_file, 'rb') as data:
5) Create a transform (in my case, using the adaptive streaming preset):
def get_or_create_transform(
client: AzureMediaServices,
transform_name: str,
settings: AzureSettings):
transform_output = TransformOutput(preset=BuiltInStandardEncoderPreset(preset_name="AdaptiveStreaming"))
transform: Transform = client.transforms.create_or_update(
return transform
5) Submit the Job
def submit_job(
client: AzureMediaServices,
settings: AzureSettings,
input_asset: Asset,
output_asset: Asset,
transform_name: str,
correlation_data: dict) -> Job:
job_input = JobInputAsset(asset_name=input_asset.name)
job_outputs = [JobOutputAsset(asset_name=output_asset.name)]
job: Job = client.jobs.create(
return job
6) Then I get the URLs after the Event Grid has told me the job is done:
# side-effect warning: this starts the streaming endpoint $$$
def get_urls(client: AzureMediaServices, output_asset_name: str
locator_name: str):
locator: StreamingLocator = client.streaming_locators.create(
except Exception as ex:
print("ignoring existing")
streaming_endpoint: StreamingEndpoint = client.streaming_endpoints.get(
if streaming_endpoint:
if streaming_endpoint.resource_state != "Running":
paths = client.streaming_locators.list_paths(
return [f"https://{streaming_endpoint.host_name}{path.paths[0]}" for path in paths.streaming_paths]


Problem triggering nested dependencies in Azure Function

I have a problem using the videohash package for python when deployed to Azure function.
My deployed azure function does not seem to be able to use a nested dependency properly. Specifically, I am trying to use the package “videohash” and the function VideoHash from it. The
input to VideoHash is a SAS url token for a video placed on an Azure blob storage.  
In the monitor of my output it prints: 
Accessing the sas url token directly takes me to the video, so that part seems to be working.  
Looking at the source code for videohash this error seems to occur in the process of downloading the video from a given url (link:
.. where self.yt_dlp_path = str(which("yt-dlp")). This to me indicates, that after deploying the function, the package yt-dlp isn’t properly activated. This is a dependency from the videohash
module, but adding yt-dlp directly to the requirements file of the azure function also does not solve the issue. 
Any ideas on what is happening? 
Deploying code to Azure function, which resulted in the details highlighted in the issue description.
I have a work around where you download the video file on you own instead of the videohash using azure.storage.blob
To download you will need a BlobServiceClient , ContainerClient and connection string of azure storage account.
Please create two files called v1.mp3 and v2.mp3 before downloading the video.
file structure:
Complete Code:
import logging
from videohash import VideoHash
import azure.functions as func
import subprocess
import tempfile
import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
def main(req: func.HttpRequest) -> func.HttpResponse:
# local file path on the server
local_path = tempfile.gettempdir()
filepath1 = os.path.join(local_path, "v1.mp3")
filepath2 = os.path.join(local_path,"v2.mp3")
# Reference to Blob Storage
client = BlobServiceClient.from_connection_string("<Connection String >")
# Reference to Container
container = client.get_container_client(container= "test")
# Downloading the file
with open(file=filepath1, mode="wb") as download_file:
with open(file=filepath2, mode="wb") as download_file:
// video hash code .
videohash1 = VideoHash(path=filepath1)
videohash2 = VideoHash(path=filepath2)
t = videohash2.is_similar(videohash1)
return func.HttpResponse(f"Hello, {t}. This HTTP triggered function executed successfully.")
Output :
Now here I am getting the ffmpeg error which related to my test file and not related to error you are facing.
This work around as far as I know will not affect performance as in both scenario you are downloading blobs anyway

Separating development/staging/production media buckets on S3 in Django

We are currently using AWS S3 buckets as a storage for media files in a Django 1.11 project (using S3BotoStorage from django-storages library). The relevant code is here:
# storage.py
from storages.backends.s3boto import S3BotoStorage
class MediaRootS3BotoStorage(S3BotoStorage):
"""Storage for uploaded media files."""
bucket_name = settings.AWS_MEDIA_STORAGE_BUCKET_NAME
custom_domain = domain(settings.MEDIA_URL)
# common_settings.py
DEFAULT_FILE_STORAGE = 'storage.MediaRootS3BotoStorage'
AWS_MEDIA_STORAGE_BUCKET_NAME = 'xxxxxxxxxxxxxxxx'
MEDIA_URL = "//media.example.com/"
# models.py
import os
import uuid
from django.db import models
from django.utils import timezone
from django.utils.module_loading import import_string
def upload_to_unique_filename(instance, filename):
extension = os.path.splitext(filename)[1]
except Exception:
extension = ""
now = timezone.now()
return f'resume/{now.year}/{now.month}/{uuid.uuid4()}{extension}'
class Candidate(models.Model):
resume = models.FileField(
The issue is that the bucket key is hardcoded in the settings file, and since there are multiple developers + 1 staging environment, all of the junk files that are uploaded for testing/QA purposes end up in the same S3 bucket as the real production data.
One obvious solution would be to override AWS_MEDIA_STORAGE_BUCKET_NAME in staging_settings.py and development_settings.py files, but that would make the production data unavailable on staging and testing instances. To make this work, we would somehow how to sync the production bucket to the dev/staging one, which I'm unsure how to do efficiently and seamlessly.
Another option would be to use local filesystem for media storage in development and staging environments. This would also require the download of substantial amount of media files, and would exclude one part of the stack (django-storages and S3 API) from the testing/QA process.
How to handle this? Is the mixing of testing and production media files in the same bucket even an issue (I was sure it was until I started thinking about how to handle it)? What are some best practices about separating development/staging/production cloud storages in general?
In that case, our team use one bucket for all environments, but we add some metadata to uploaded static & media files. By this way, in order delete some kind not production S3 Objects you can just make filter using AWS API, and delete them.
It's possible by adding in settings.py:
ENVIRONMENT = "development/production/qa"
'CacheControl': 'max-age=86400',
'Metadata': {
'environment': ENVIRONMENT
We recently addressed this issue with a custom S3Storage class that supports two buckets instead of one. Each environment writes to their own bucket which means that the production bucket doesn't get polluted with files from the temporary environments (dev, staging, QA, ...). However, if a given environment needs a resource that it can't find in its own bucket, then it automatically tries to fetch it from the production bucket. Accordingly, we do not need to duplicate tons of mostly static resources that are already available in the production bucket.
In settings.py, we add two new variables and specify a custom storage class
# The alternate bucket (typically the production bucket) is used as a fallback when the primary one doesn't contain the resource requested.
# Custom storage class
STATICFILES_STORAGE = 'hello_django.storage_backends.StaticStorage'
Then in the custom storage class, we overwrite the url() method as follows
from datetime import datetime, timedelta
from urllib.parse import urlencode
from django.utils.encoding import filepath_to_uri
from storages.backends.s3boto3 import S3Boto3Storage
from storages.utils import setting
class StaticStorage(S3Boto3Storage):
location = 'static'
default_acl = 'public-read'
def __init__(self, **settings):
def get_default_settings(self):
settings_dict = super().get_default_settings()
"alternate_bucket_name": setting("AWS_STORAGE_ALTERNATE_BUCKET_NAME"),
"alternate_custom_domain": setting("AWS_S3_ALTERNATE_CUSTOM_DOMAIN")
return settings_dict
def url(self, name, parameters=None, expire=None, http_method=None):
params = parameters.copy() if parameters else {}
if self.exists(name):
r = self._url(name, parameters=params, expire=expire, http_method=http_method)
if self.alternate_bucket_name:
params['Bucket'] = self.alternate_bucket_name
r = self._url(name, parameters=params, expire=expire, http_method=http_method)
return r
def _url(self, name, parameters=None, expire=None, http_method=None):
Similar to super().url() except that it allows the caller to provide
an alternate bucket name in parameters['Bucket']
# Preserve the trailing slash after normalizing the path.
name = self._normalize_name(self._clean_name(name))
params = parameters.copy() if parameters else {}
if expire is None:
expire = self.querystring_expire
if self.custom_domain:
bucket_name = params.pop('Bucket', None)
if bucket_name is None or self.alternate_custom_domain is None:
custom_domain = self.custom_domain
custom_domain = self.alternate_custom_domain
url = '{}//{}/{}{}'.format(
'?{}'.format(urlencode(params)) if params else '',
if self.querystring_auth and self.cloudfront_signer:
expiration = datetime.utcnow() + timedelta(seconds=expire)
return self.cloudfront_signer.generate_presigned_url(url, date_less_than=expiration)
return url
if params.get('Bucket') is None:
params['Bucket'] = self.bucket.name
params['Key'] = name
url = self.bucket.meta.client.generate_presigned_url('get_object', Params=params,
ExpiresIn=expire, HttpMethod=http_method)
if self.querystring_auth:
return url
return self._strip_signing_parameters(url)
This sample project illustrates the approach.

Get Properties of storage blobs returning empty dict

I've just uploaded a 5GB of data and would like to verify that the MD5 sums match. I've calculated this for my local copy of the files, but am having problems fetching ContentMD5 from Azure. So far, I get an empty dict, but I can see the blob names. I've limited it to the first 10 items at the moment, just for debugging. I'm aware that MD5 is different on Azure from a typical md5sum call and have allowed for that locally. But, currently, I cannot see any blob properties. The properties are there when I browse via the Azure console (as is the ContentMD5 property).
Where am I going wrong?
Here's my code at the moment:
import os
from os import sys
from azure.storage.blob import BlobServiceClient
def remote_check(connection_str):
blob_service_client = BlobServiceClient.from_connection_string(connection_str)
container_name = "global"
container = blob_service_client.get_container_client(container=container_name)
blob_list = container.list_blobs()
count = 0
for blob in blob_list:
if count < 10:
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob)
a = blob_client.get_blob_properties()
print("Blob name: " + str(blob_client.blob_name))
count = count + 1
def main():
except KeyError:
if __name__ == '__main__':
Please make sure you're using the latest version of package azure-storage-blob 12.6.0.
Some properties are in the content_settings, for example, to get content_md5, you should use the following code:
Here is the my test result:
Maybe you can check the blob properties with a rest (e.g. with an rest client like postman) call described here:
The "Content-MD5" is returned as HTTP-Response Header.

Python 3 and Azure table storage tablestorageaccount not working

I'm trying to use the sample provided by Microsoft to connect to an Azure storage table using Python. The code below fail because of tablestorageaccount not found. What I'm missing I installed the azure package but still complaining that it's not found.
import azure.common
from azure.storage import CloudStorageAccount
from tablestorageaccount import TableStorageAccount
print('Azure Table Storage samples for Python')
# Create the storage account object and specify its credentials
# to either point to the local Emulator or your Azure subscription
account = TableStorageAccount(is_emulated=True)
account_connection_string = STORAGE_CONNECTION_STRING
# Split into key=value pairs removing empties, then split the pairs into a dict
config = dict(s.split('=', 1) for s in account_connection_string.split(';') if s)
# Authentication
account_name = config.get('AccountName')
account_key = config.get('AccountKey')
# Basic URL Configuration
endpoint_suffix = config.get('EndpointSuffix')
if endpoint_suffix == None:
table_endpoint = config.get('TableEndpoint')
table_prefix = '.table.'
start_index = table_endpoint.find(table_prefix)
end_index = table_endpoint.endswith(':') and len(table_endpoint) or table_endpoint.rfind(':')
endpoint_suffix = table_endpoint[start_index+len(table_prefix):end_index]
account = TableStorageAccount(account_name = account_name, connection_string = account_connection_string, endpoint_suffix=endpoint_suffix)
I find the source sample code, and in the sample code there is still a custom module tablestorageaccount.py, it's just used to return TableService. If you already have the storage connection string and want to have a test, you could connect to table directly.
from azure.storage.table import TableService, Entity
account_connection_string = 'DefaultEndpointsProtocol=https;AccountName=account name;AccountKey=account key;EndpointSuffix=core.windows.net'
Also you could refer to the new sdk to connect table. Here is the official tutorial about Get started with Azure Table storage.

FileUploadMiscError while persisting output file from Azure Batch

I'm facing the following error while trying to persist log files to Azure Blob storage from Azure Batch execution - "FileUploadMiscError - A miscellaneous error was encountered while uploading one of the output files". This error doesn't give a lot of information as to what might be going wrong. I tried checking the Microsoft Documentation for this error code, but it doesn't mention this particular error code.
Below is the relevant code for adding the task to Azure Batch that I have ported from C# to Python for persisting the log files.
Note: The container that I have configured gets created when the task is added, but there's no blob inside.
import datetime
import logging
import os
import azure.storage.blob.models as blob_model
import yaml
from azure.batch import models
from azure.storage.blob.baseblobservice import BaseBlobService
from azure.storage.common.cloudstorageaccount import CloudStorageAccount
from dotenv import load_dotenv
LOG = logging.getLogger(__name__)
def add_tasks(batch_client, job_id, task_id, io_details, blob_details):
task_commands = "This is a placeholder. Actual code has an actual task. This gets completed successfully."
LOG.info("Configuring the blob storage details")
base_blob_service = BaseBlobService(
LOG.info("Base blob service created")
container_name=blob_details['container_name'], fail_on_exist=False)
LOG.info("Container present")
container_sas = base_blob_service.generate_container_shared_access_signature(
expiry=datetime.datetime.now() + datetime.timedelta(days=1))
LOG.info(f"Container SAS created: {container_sas}")
container_url = base_blob_service.make_container_url(
container_name=blob_details['container_name'], sas_token=container_sas)
LOG.info(f"Container URL created: {container_url}")
# fpath = task_id + '/output.txt'
fpath = task_id
LOG.info(f"Creating output file object:")
out_files_list = list()
out_files = models.OutputFile(
container_url=container_url, path=fpath)),
LOG.info(f"Output files: {out_files_list}")
LOG.info(f"Creating the task now: {task_id}")
task = models.TaskAddParameter(
id=task_id, command_line=task_commands, output_files=out_files_list)
batch_client.task.add(job_id=job_id, task=task)
LOG.info(f"Added task: {task_id}")
There is a bug in Batch's OutputFile handling which causes it to fail to upload to containers if the full container URL includes any query-string parameters other than the ones included in the SAS token. Unfortunately, the azure-storage-blob Python module includes an extra query string parameter when generating the URL via make_container_url.
This issue was just raised to us, and a fix will be released in the coming weeks, but an easy workaround is instead of using make_container_url to craft the URL, craft it yourself like so: container_url = 'https://{}/{}?{}'.format(blob_service.primary_endpoint, blob_details['container_name'], container_sas).
The resulting URL should look something like this: https://<account>.blob.core.windows.net/<container>?se=2019-01-12T01%3A34%3A05Z&sp=w&sv=2018-03-28&sr=c&sig=<sig> - specifically it shouldn't have restype=container in it (which is what the azure-storage-blob package is including)
