Signed URL for video upload causing problem in GCP

Signed URL for video upload causing problem in GCP - python

I am working on a project in which I need to upload the video files to GCS bucket using V4 Signed URL. Currently I am generating the signed url using Python script which is a part of Flask API. Here is the method signature I am using to generate url.
def GenerateURL(self,bucket_name,blob_name,method,timeout,content_type=None):
bucket = StoreCon.get_con(bucket_name)
blob = bucket.blob(blob_name)
url = blob.generate_signed_url(
version="v4",
expiration=datetime.timedelta(minutes=timeout),
method=method,
content_type=content_type,
)
resp = jsonify({'message':{'%s URL'%method:url}})
resp.status_code = 200
return resp
Now this is being called inside a blueprint route. Here is the snippet:
#CloudStoreEnd.route('/uploadMedia',methods=['POST'])
def uploadMedia():
blob_name = request.get_json()['FILE_NAME']
return StoreOperator.postMediaURL(blob_name)
When I make the call to this API route using Client side code, the video files are getting uploaded successfully to GCS bucket. But when I download the same video file from GCS bucket. The file becomes corrupted. Mentioning "0xc00d36c4" error.
Here is a sample function for client side:
def upload_file(path):
file_name = path.split('\\')[-1]
data = {'FILE_NAME':file_name}
#GET SIGNED URL FOR MEDIA UPLOAD
get_signed_url = 'https://CLOUD-RUN-SERVICE/uploadMedia'
headers = {'Content-Type':'application/json'}
resp = requests.post(url=get_signed_url,data=json.dumps(data),headers=headers)
upload_url = json.loads(resp.content)['message']['PUT URL']
#SEND A PUT REQUEST WITH MEDIA FILE
headers = {'Content-Type':MimeTypes().guess_type(file_name)[0]}
file = {'file':open(path,'rb')}
resp = requests.put(url=upload_url,headers=headers,files=file)
return resp
I am not sure why the Media(.mp4,.mov) are getting corrupted when I retrieve the same files, whereas for other files like (.pdf,.png) the files are fine. Is there an extra request parameter I need to add to get proper signed url? Or from client application I am sending the files wrong way to the signed url?

Related

Failed to load object with presigned url Minio Python

i'm using Minio Server to handle files in my Flask API. I generate Presigned Url to upload images directly from Angular FrontEnd to save Backend resources.
Presign Url Generation works fine but when I upload my file from Postman or Angular Code, the file seems corrupted.
Same on the Minio web browser
I use simple code for presigned url generation :
def get_presigned_get_url(self, bucket: str, object_path: str) -> str:
url = self.client.presigned_get_object(
bucket_name=bucket,
object_name=object_path,
)
return url
def get_presigned_put_url(self, bucket: str, object_path: str) -> str:
url = self.client.presigned_put_object(
bucket_name=bucket,
object_name=object_path,
)
return url
And PUT request on Postman
Thanks for your help

The key in this case is how the file is uploaded from the postman. While uploading the file, you need to use Body > Binary > Select File, rather than using the Body > Form-Data.

Google Drive thumbnailLink returns 404

This method of getting Google Drive file thumbnails has been working for me but seems to have stopped recently.
All answers I can find online indicate that this is because thumbnailLink requires authorization (eg). However, I'm am accessing the thumbnails with authorized access tokens. I can get the file info using the Drive API "Files: get" with these access tokens but the thumbnailLink returns 404.
print(http)
# <google_auth_httplib2.AuthorizedHttp object at 0x11561d0f0>
# An instance of google_auth_httplib2.AuthorizedHttp
url = 'https://www.googleapis.com/drive/v3/files/%s?fields=thumbnailLink' % file_id
response, content = http.request(url)
data = json.loads(content)
print(data['thumbnailLink'])
# https://docs.google.com/u//feeds/vt?gd=true&id=***fileID***&v=203&s=***&sz=s220
# Works ✓
response, content = http.request(data['thumbnailLink'])
print(response['status'])
# 404
# :(
Also giving a 404 error:
thumbnailLink + "&access_token=" + YOURTOKEN; as suggested here.
Opening thumbnailLink in a browser (logged in to Google as the file owner).
Opening a modified thumbnailLink in a browser - replacing /u// with /u/0/, /u/1/ , /u/2/ (When I open drive as this user the URL is https://drive.google.com/drive/u/1/my-drive)
Does anyone know a reliable way to get Google Drive thumbnail image files?

I believe your goal as follows.
You want to retrieve the thumbnail from the thumbnail link retrieved by the method of "files.get" in Drive API.
From your sample thumbnail link, you want to retrieve the thumbnail from Google Docs (Document, Spreadsheet, and so on).
Issue and workaround:
In the current stage, it seems that the situation of 404 from the thumbnail is the bug. This has already been reported to the Google issue tracker. Ref And it seems that Google side has already been known. Unfortunately, I think that this is the current direct answer. And also, I believe that this issue will be resolved by the future update.
Here, as the current workaround, how about converting it to PDF file and retrieve the thumbnail? In this case, the thumbnail link can be used. The flow of this workaround is as follows.
Convert Google Docs to a PDF file.
The PDF file is created to the same folder of the Google Docs.
Retrieve the thumbnail link from the created PDF file.
When above flow is converted to the python script, it becomes as follows.
Sample script:
Before you use this script, please set the access token and file ID. In this case, in order to request multipart/form-data with the simple script, I used requests library.
import json
import httplib2
import requests
import time
http = httplib2.Http()
access_token = '###' # Please set the access token.
file_id = '###' # Please set the file ID.
headers = {"Authorization": "Bearer " + access_token}
# 1. Retrieve filename and parent ID.
url1 = "https://www.googleapis.com/drive/v3/files/" + file_id + "?fields=*"
res, res1 = http.request(url1, 'GET', headers=headers)
d = json.loads(res1.decode('utf-8'))
# 2. Retrieve PDF data by converting from the Google Docs.
url2 = "https://www.googleapis.com/drive/v3/files/" + file_id + "/export?mimeType=application%2Fpdf"
res, res2 = http.request(url2, 'GET', headers=headers)
# 3. Upload PDF data as a file to the same folder of Google Docs.
para = {'name': d['name'] + '.pdf', 'parents': d['parents']}
files = {
'data': ('metadata', json.dumps(para), 'application/json; charset=UTF-8'),
'file': res2
}
res3 = requests.post(
"https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart",
headers=headers,
files=files
)
obj = res3.json()
# It seems that this is required to use by creating the thumbnail link from the uploaded file.
time.sleep(5)
# 4. Retrieve thumbnail link of the uploaded PDF file.
url3 = "https://www.googleapis.com/drive/v3/files/" + obj['id'] + "?fields=thumbnailLink"
res, res4 = http.request(url3, 'GET', headers=headers)
data = json.loads(res4.decode('utf-8')) # or data = json.loads(res4)
print(data['thumbnailLink'])
# 5. Retrieve thumbnail.
response, content = http.request(data['thumbnailLink'])
print(response['status'])
print(content)
When you run this script, the Google Docs file is exported as the PDF data, and the PDF data is uploaded to Google Drive and retrieve the thumbnail link.
Note:
In this case, please include the scope of https://www.googleapis.com/auth/drive to the scopes of your access token. Because the file is uploaded.
In order to retrieve the file metadata and export the PDF file and upload the data, the access token is required to be used. But when the thumbnail is retrieved from the thumbnail link, the access token is not required to be used.
After January, 2020, the access token cannot be used with the query parameter of access_token=###.So please use the access token to the request header. Ref
When above issue was resolved, I think that you can use your script.
References:
Files: get
Files: export
Files: create

python AWS boto3 create presigned url for file upload

I'm writing a django backend for an application in which the client will upload a video file to s3. I want to use presigned urls, so the django server will sign a url and pass it back to the client, who will then upload their video to s3. The problem is, the generate_presigned_url method does not seem to know about the s3 client upload_file method...
Following this example, I use the following code to generate the url for upload:
s3_client = boto3.client('s3')
try:
s3_object_name = str(uuid4()) + file_extension
params = {
"file_name": local_filename,
"bucket": settings.VIDEO_UPLOAD_BUCKET_NAME,
"object_name": s3_object_name,
}
response = s3_client.generate_presigned_url(ClientMethod="upload_file",
Params=params,
ExpiresIn=500)
except ClientError as e:
logging.error(e)
return HttpResponse(503, reason="Could not retrieve upload url.")
When running it I get the error:
File "/Users/bridgedudley/.local/share/virtualenvs/ShoMe/lib/python3.6/site-packages/botocore/signers.py", line 574, in generate_presigned_url
operation_name = self._PY_TO_OP_NAME[client_method]
KeyError: 'upload_file'
which triggers the exception:
botocore.exceptions.UnknownClientMethodError: Client does not have method: upload_file
Afer debugging I found that the self._PY_TO_OP_NAME dictionary only contains a subset of the s3 client commands offered here:
scrolling down to "upload"...
No upload_file method! I tried the same code using "list_buckets" and it worked perfectly, giving me a presigned url that listed the buckets under the signer's credentials.
So without the upload_file method available in the generate_presigned_url function, how can I achieve my desired functionality?
Thanks!

In addition to the already mentioned usage of:
boto3.client('s3').generate_presigned_url('put_object', Params={'Bucket':'your-bucket-name', 'Key':'your-object-name'})
You can also use:
boto3.client('s3').generate_presigned_post('your-bucket_name', 'your-object_name')
Reference: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html#generating-a-presigned-url-to-upload-a-file
Sample generation of URL:
import boto3
bucket_name = 'my-bucket'
key_name = 'any-name.txt'
s3_client = boto3.client('s3')
upload_details = s3_client.generate_presigned_post(bucket_name, key_name)
print(upload_details)
Output:
{'url': 'https://my-bucket.s3.amazonaws.com/', 'fields': {'key': 'any-name.txt', 'AWSAccessKeyId': 'QWERTYUOP123', 'x-amz-security-token': 'a1s2d3f4g5h6j7k8l9', 'policy': 'z0x9c8v7b6n5m4', 'signature': 'qaz123wsx456edc'}}
Sample uploading of file:
import requests
filename_to_upload = './some-file.txt'
with open(filename_to_upload, 'rb') as file_to_upload:
files = {'file': (filename_to_upload, file_to_upload)}
upload_response = requests.post(upload_details['url'], data=upload_details['fields'], files=files)
print(f"Upload response: {upload_response.status_code}")
Output:
Upload response: 204
Additional notes:
As documented:
The credentials used by the presigned URL are those of the AWS user
who generated the URL.
Thus, make sure that the entity that would execute this generation of a presigned URL allows the policy s3:PutObject to be able to upload a file to S3 using the signed URL. Once created, it can be configured through different ways. Some of them are:
As an allowed policy for a Lambda function
Or through boto3:
s3_client = boto3.client('s3',
aws_access_key_id="your-access-key-id",
aws_secret_access_key="your-secret-access-key",
aws_session_token="your-session-token", # Only for credentials that has it
)
Or on the working environment:
# Run in the Linux environment
export AWS_ACCESS_KEY_ID="your-access-key-id"
export AWS_SECRET_ACCESS_KEY="your-secret-access-key"
export AWS_SESSION_TOKEN="your-session-token", # Only for credentials that has it
Or through libraries e.g. django-storages for Django

You should be able to use the put_object method here. It is a pure client object, rather than a meta client object like upload_file. That is the reason that upload_file is not appearing in client._PY_TO_OP_NAME. The two functions do take different inputs, which may necessitate a slight refactor in your code.
put_object: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.put_object

The accepted answer doesn't let you post your data to S3 from your client. This will:
import boto3
s3_client = boto3.client('s3',
aws_access_key_id="AKIA....",
aws_secret_access_key="G789...",
)
s3_client.generate_presigned_url('put_object', Params={
'Bucket': 'cat-pictures',
'Key': 'whiskers.png',
'ContentType': 'image/png', # required!
})
Send that to your front end, then in JavaScript on the frontend:
fetch(url, {
method: "PUT",
body: file,
})
Where file is a File object.

Python/GCP sanity check: is this correct means of referencing image stored in GCP Storage in a POST call

Scenario: an image file stored in a GCP bucket need to be sent to a third-party REST endpoint via a POST
Question: Is this really the best pattern? Is there a more efficient less verbose way?
We have images being uploaded by a mobile app to a GCP Storage bucket. When the finalize event for the image upload fires we have a GCP Cloud Function (Python 3) that reacts to this by getting ref to uploaded image, downloads it to a temp file, and then uses that temp file as the image source for the POST. This is our current code and it works, but to my eye seems convoluted with the multiple open commands. More specifically: is there a better way to simply get the image blob from GCP Storage and simply attach it to the POST call without first saving it as a local file and then opening it so it can be attached to the POST?
def third_party_upload(data, context):
# get image from bucket
storage_client = storage.Client()
bucket = storage_client.bucket(data['bucket'])
image_blob = bucket.get_blob(data['name'])
download_path = '/tmp/{}.jpg'.format(str(uuid.uuid4())) #temp image file download location
# save GCP Storage blob as a temp file
with open(download_path, 'wb') as file_obj:
image_blob.download_to_file(file_obj)
# open temp file and send to 3rd-party via rest post call
with open(download_path, 'rb') as img:
files = {'image': (data['name'], img, 'multipart/form-data', {'Expires': '0'}) }
headers = {
'X-Auth-Token': api_key,
'Content-Type': 'image/jpg',
'Accept': 'application/json'
}
# make POST call
response = requests.post(third_party_endpoint, headers=headers, files=files)
print('POST response:', response)
Update: a couple of commenters have mentioned that Signed URLs are a possibility and I agree they are an excellent choice. However we are stuck with a requirement to include the image binary as the POST body. Signed-URLs won't work in this case.

The HTTP method POST requires data. You must provide that data in the HTTP request. There is no magic method to obtain Cloud Storage data except to read it. The process is to read the data from Cloud Storage and then provide that data to the POST request.

If you're able to send a URL to the third-party endpoint instead of the actual image contents, you could use Signed URLs give time-limited access to the image without needing to provide the 3rd party access to the bucket or make the bucket public.
More information here: https://cloud.google.com/storage/docs/access-control/signed-urls

How to upload files >10MB with resumable uploads to Google Bucket using Python

I have an app that successfully uploads videos less than 10MB to a specified Google Cloud Storage Bucket, however, it does not do so is the file is greater than 10MB.
I would like to know how to do this with resumable uploads. I do not want any links to documentation, I have read all of the documentation. It does not help.
I am doing this in python. I understand for me to do a resumable upload I need my authorization token and I have no idea how to do that. So could some one help me with some code?
Here is the code that post Videos < 10MB
class GetData(webapp2.RequestHandler):
def post(self):
data = self.request.get('file')
bucketname = self.request.get('bucketname')
typeOf = self.request.get('content_type')
sender = self.request.get('sender')
profileVideo = self.request.get('isProfileVideo')
file_path = ''
now = time.time()
objectname = now
try:
keytext = open(conf.PRIVATE_KEY_PATH, 'rb').read()
except IOError as e:
sys.exit('Error while reading private key: %s' % e)
private_key = RSA.importKey(keytext)
signer = CloudStorageURLSigner(private_key, conf.SERVICE_ACCOUNT_EMAIL,
GCS_API_ENDPOINT)
subDirectory = 'videoMesseagesFrom' + sender
if profileVideo == 'true':
file_path = '/%s/Profile_Videos/%s' % (bucketname, objectname)
r = signer.Put(file_path, typeOf, data)
else:
file_path = '/%s/%s/%s' % (bucketname, subDirectory, objectname)
r = signer.Put(file_path, typeOf, data)
self.response.headers['Content-Type'] = 'application/json'
obj = {
'Completion' : 'Video Successfully uploaded'
}
self.response.out.write(json.dumps(obj))
I pass in the file and the bucket name. This grabs my credentials from my Google generated key that is in the project and signs a PUT url, then PUTs it for me.
def Put(self, path, content_type, data):
"""Performs a PUT request.
Args:
path: The relative API path to access, e.g. '/bucket/object'.
content_type: The content type to assign to the upload.
data: The file data to upload to the new file.
Returns:
An instance of requests.Response containing the HTTP response.
"""
md5_digest = base64.b64encode(md5.new(data).digest())
base_url, query_params = self._MakeUrl('PUT', path, content_type,
md5_digest)
headers = {}
headers['Content-Type'] = content_type
headers['Content-Length'] = str(len(data))
headers['Content-MD5'] = md5_digest
return self.session.put(base_url, params=query_params, headers=headers,
data=data)
It is already making a put request with Content-Type headers and all that. Maybe I could put a check on this, so if the file is larger than 10MB somehow edit that PUT method to be a resumable upload, or upload it in chunks. However, I have no idea how to do that.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Signed URL for video upload causing problem in GCP - python

Related

Failed to load object with presigned url Minio Python

Google Drive thumbnailLink returns 404

python AWS boto3 create presigned url for file upload

Python/GCP sanity check: is this correct means of referencing image stored in GCP Storage in a POST call

How to upload files >10MB with resumable uploads to Google Bucket using Python

Categories

Resources