I have a couple of questions regarding uploading to SAS using Python. I have a SAS provided by our client, in the form of:
https://<company_name>.blob.core.windows.net/<container_name>?sp<long_string>
I tried following this code: Uploading csv files to azure container using SAS URI in python?
from azure.storage.blob import BlobClient
upload_file_path="d:\\a11.csv"
sas_url="https://xxx.blob.core.windows.net/test5/a11.csv?sastoken"
client = BlobClient.from_blob_url(sas_url)
with open(upload_file_path,'rb') as data:
client.upload_blob(data)
print("**file uploaded**")
and I get the following error:
azure.core.exceptions.ResourceExistsError: Public access is not permitted on this storage account.
RequestId:946bd6ea-e01e-0040-3932-ee6a4e000000
Time:2021-12-11T01:58:51.0010075Z
ErrorCode:PublicAccessNotPermitted
The Azure SDK mentions using an account name which I do not have so that's a no go (I can upload file using the Azure Storage Explorer however that is slow for what I need, but I know the SAS is working). Am I using the wrong code for uploading? Also, it is not clear how to tell the code where to upload the file to in the blob container? E.g. if I wanted to upload a file image.jpg to 2021-12-11/dataset_1/, where would I put that in the code?
I tried to upload file using SAS URL which i have generated from container , and unable to upload the file.
Instead of using SAS URL of Container use your storage account SAS URL which is worked fine for me with the same code which you have given .
To generate SAS URL for Storage account follow the below step:
Added SAS URL of storage account and run the below cmd
Here are the output screenshots:
For more information please refer this MS DOC & &Github sample
Related
I am working on a project to allow users to upload blob into blob container in our storage account. I developed a simple UI (flask) using Azure App Service to allow user choose files to upload, and then want to upload these files to the blob container.
My original design is UI -> Blob Container by Python Storage SDK:
containerClient.upload_blob(filename, file)
But I am facing the timeout issue due to Azure App Service when uploading large files.
So I change the upload UI with dropzone.js, and enable uploading in chunk, so that the server can consistently receive response to prevent timeout.
And another issue coming up is that upload process is executed for every piece of chunk, and blob container only receives the last chunk of the data that I upload. (From the document, I know that the chunking is automatically used in blob upload, I wonder if we are able to track the progress of the upload??? if so, I probably don't need to use dropzone.js for uploading in chunk).
I also tried another approach by creating Azure App Function (HTTPS trigger), and then send an http trigger to that endpoint to start the blob upload.
for file in files:
fileToSend = {'file': (f.filename, f.stream, f.content_type, f.headers)}
r = requests.post('https://myazurefunctionapp.azurewebsites.net/api/funcName', files=fileToSend)
In the azure function, I use Python Storage SDK to connect to container and then upload blob
container = ContainerClient.from_connection_string(conn_str, container_name)
for k, f in req.files.items():
container.upload_blob(f.filename, f)
But I notice that the function is triggered by piece of chunk (request), and I also end up with only receiving the last chunk of data in the container.
I wonder what would be the better workflow? or if there any way that makes sure the upload is completed (in azure function) and then start the upload to blob container.
Many Thanks,
• Storage clients default to a 32 MB maximum single block upload. When a block blob upload is larger than the value in ‘SingleBlobUploadThresholdInBytes’ property, storage clients break the file into blocks of maximum allowed size and try to upload it. Since the block blob size that you are trying to upload is greater than 32 MB, it throws an exception and breaks the file into allowed smaller chunks. Also, you might not be using the correct ‘Blob service client’ which interacts with the resources, i.e., storage account, blob storage containers and blobs.
Below is an example of the code for client object creation which requires a storage account’s blob service account URL and a credential that allows you to access a storage account: -
from azure.storage.blob import BlobServiceClient
service = BlobServiceClient(account_url="https://<my-storage-account-name>.blob.core.windows.net/", credential=credential)
• Thus, similarly, as you are using the above code in python to create a blob service client for interacting with storage accounts, kindly refer to the below documentation link that describes in detail as in how to develop a python code to integrate it with blob storage for storing massive amounts of unstructured data, such as text or binary data.
https://learn.microsoft.com/en-us/python/api/overview/azure/storage-blob-readme?view=azure-python
You can deploy this code in your app service or function and set the trigger accordingly for uploading and downloading blobs from the storage account. It also describes as in how you can configure authentication for this process to ensure that the correct user and files are being given access.
And refer to the documentation link for details on how to configure a blob trigger function in Azure for various interactions with the storage account when any users initiate any transaction through it.
https://learn.microsoft.com/en-us/azure/storage/blobs/blob-upload-function-trigger?tabs=azure-portal
I am using Flask, and I have a form on my web app's index page, which requires users to upload MP4 videos. I expect my users to upload 30min long videos, so the video sizes are likely going to be in the hundreds of megabytes. The issue now is that I intend to deploy this Flask application to Google App Engine, and apparently I cannot work with any static file above 32MB. Somehow, when I try to upload any video in the deployed version that is above 32MB, I get a Request Too Large error.
I see that the BlobStore Python API used to be a recommended solution to work with really large files on the server in the past. But that was for Python 2.7: https://cloud.google.com/appengine/docs/standard/python/blobstore/
I'm using Python 3.7, and Google now recommends that files get uploaded directly to Cloud Storage, and I am not exactly sure how to do that.
Below is a snippet showing how I'm currently storing my users' uploaded videos through the form into Cloud Storage. Unfortunately, I'm still restricted from uploading large files because I get error messages. So again, my question is: How can I make my users upload their files directly to Cloud Storage in a way that won't let the server timeout or give me a Request Too Large error?
form = SessionForm()
blob_url = ""
if form.validate_on_submit():
f = form.video.data
video_string = f.read()
filename = secure_filename(f.filename)
try:
# The custom function upload_session_video() uploads the file to a Cloud Storage bucket
# It uses the Storage API's upload_from_string() method.
blob_url = upload_session_video(video_string, filename)
except FileNotFoundError as error:
flash(error, 'alert')
# Create the Cloud Storage bucket (same name as the video file)
user_bucket = create_bucket(form.patient_name.data.lower())
You cannot upload files more than 32MB to Cloud Storage using Google App Engine due to a request limitation. However, you can bypass that by uploading to Cloud Storage using with resumable uploads in python case use "google-resumable-media".
the size of the resource is not known (i.e. it is generated on the
fly)
requests must be short-lived
the client has request size limitations
the resource is too large to fit into memory
example code included here.
I have a Python application which creates a HTML file which I then want to upload to an Azure Web Application.
What is the best way to do this?
I originally started try to do it using FTP and then switched to pushing with GIT. None of these really felt right. How should I be doing this?
UPDATE
I have this 99% working. I'm using a Storage Account to host a static site (which feels like the right way to do this).
This is how I am uploading:
blob_service_client = BlobServiceClient.from_connection_string(az_string)
# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=local_file_name)
print("\nUploading to Azure Storage as blob:\n\t" + local_file_name)
# Upload the created file
with open('populated.html', "rb") as data:
blob_client.upload_blob(data)
The only problem that I have now, is that the file is downloading instead of opening in the browser. I think I need to set the content type somewhere.
Update 2
Working now, I added:
my_content_settings = ContentSettings(content_type='text/html')
test = blob_client.upload_blob(data, overwrite=True, content_settings=my_content_settings)
Cheers,
Mick
The best way to do this is up to you.
Generally, there are two ways to upload a HTML file to Azure Web App for Windows, as below.
Following the Kudu wiki page Accessing files via ftp to upload a file via FTP.
Following the sections VFS, Zip and Zip Deployment of Kudu wiki page REST API to call the related PUT REST API to upload a file via HTTP client.
However, based on my understanding for your scenario, the two ways above are not simple. So I recommanded to use the feature Static website of Azure Blob Storage Gen 2 to host your static HTML file generated by your Python application and to upload files via Azure Storage SDK for Python. I think it's simple enough to you, even you can bind a custom domain to the default host name of static website of Azure Blob Storage via DNS CNAME.
The steps are below.
Refer to the offical document Host a static website in Azure Storage to create an account of Azure Blob Storage Gen 2 and enable the feature Static website.
Refer to the other offical document Quickstart: Azure Blob storage client library v12 for Python to write the code for uploading in your current Python application . The container default named $web is for hosting static website, you just need to upload the files to it, then access its primary endpoint as the figure from offical document below to see it.
So the google ferris2 framework seems to exclusively use the blobstore api for the Upload component, making me question whether it's possible to make images uploaded to cloud storage public without having to write my own upload method and abandoning the use of the Upload component altogether, which also seems to create compatibility issues when using the cloud storage client library (python).
Backstory / context
using- google App engine, python, cloud storage client library
Requirements
0.5 We require that blob information nor the file be stored in the model. We want a public cloud serving url on the model and that is all. This seems to prevent us from using the normal ferris approach for uploading to cloud storage.
Things I already know / road blocks
One of the big roadblocks is dealing with Ferris using cgi / the blobstore api for field storage on the form. This seems to cause problems because so far it hasn't allowed sending data to to be sent to cloud storage through the google cloud storage python client.
Things we know about the google cloud storage python client and cgi:
To write data to cloud storage from our server, cloud storage needs to be called with cloudstorage.open("/bucket/object", "w", ...), (a cloud storage library method). However, it appears so far that a cgi.FieldStorage is returned from the post for the wtforms.fields.FileField() (as shown by a simple "print image" statement) before the data is applied to the model, after it is applied to the model, it is a blob store instance.
I would like verification on this:
after a lot of research and testing , it seems that because ferris is limited to the blobstore api for the uploads component, using the blob store api and blob keys to handle uploads seems basically unavoidable without having to create a second upload function just for the cloud storage call. Blob instances seem not to be compatible with that cloud storage client library, and it seems there is no way to get anything but meta data from blob files (without actually making a call to cloud storage to get the original file). However, it appears that this will not require storing extra data on the server. Furthermore, I believe it may be possible to get around the public link issue by setting the entire bucket to have read permissions.
Clarifying Questions:
1. To make uploaded images available to the public via our application, (any user, not an authenticated user), will I have to use the the cloudstorage python client library, or is there a way to do this with the blobstore api?
Is there a way to get the original file from a blob key (on save with the add action method) without actually having to make a call to cloud storage first, so that the file can be uploaded using that library?
If not, is there a way to grab the file from the cgi.FieldStorage, then send to cloud storage with the python client library? It seems that using cgi.FieldStorage.value is just meta data and not the file, same with cgi.FieldStorage.file.read()
1) You cannot use the GAE GCS client to update an ACL.
2) You can use the GCS json API after the blobstore upload to GCS and change the ACL to make it public. You do not have to upload again.
See this example code which inserts an acl.
3) Or use cgi.Fieldstorage to read the data (< 32 Mb) and write it to GCS using GAE GCS client.
import cloudstorage as gcs
import mimetypes
class UploadHandler(webapp2.RequestHandler):
def post(self):
file_data = self.request.get("file", default_value=None)
filename = self.request.POST["file"].filename
content_type = mimetypes.guess_type(self.filename)[0]
with gcs.open(filename, 'w', content_type=content_type or b'binary/octet-stream',
options={b'x-goog-acl': b'public-read'}) as f:
f.write(file_data)
A third method: use a form post upload with a GCS signed url and a policy document to control the upload.
And you can always use a public download handler, which reads files from the blobstore or GCS.
You can now specify the ACL when uploading a file from App Engine to Cloud Storage. Not sure how long it's been in place, just wanted to share:
filename = '/' + bucket_name + '/Leads_' + newUNID() + '.csv'
write_retry_params = gcs.RetryParams(backoff_factor=1.1)
gcs_file = gcs.open(filename,
'w',
content_type='text/csv',
options={'x-goog-acl': 'public-read'},
retry_params=write_retry_params)
docs: https://cloud.google.com/storage/docs/xml-api/reference-headers#standard
The Python Drive API requires a "local file" to perform a resumable file upload to Google Drive, how can this be accomplished using Google Appengine which only has blobs and no access to a local file system.
Under the old doclist API (now depreciated) you could upload files from Google Appengine blobstore to Google Drive using the code below:
CHUNK_SIZE = 524288
uploader = gdata.client.ResumableUploader(
client, blob_info.open(), blob_info.content_type, blob_info.size, chunk_size=CHUNK_SIZE, desired_class=gdata.docs.data.DocsEntry)
The key part is using blob_info.open() rather than providing a reference to a local file.
How can we accomplish the same using the new Drive API?
Note the files are fairly big so a resumable upload is required, also I know this can be accomplished in Java but I am looking for a Python solution.
Many thanks,
Ian.
It looks like you are using the older GData client library and the Documents List API. If you use the new Drive SDK and the Google APIs Python client library, you can use the MediaIoBaseUpload class to create a media upload object from memory instead of from a file.