I am trying to change the metadata of an image in an s3 bucket through lambda, this lambda triggers when an object is uploaded. But for some reason when I update the metadata through copy_from it adds user metadata instead of the System Metadata like this:
Is there a special way to edit the system metadata? My code is:
import json
import boto3
import urllib
s3 = boto3.resource('s3')
def lambda_handler(event, context):
# TODO implement
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
s3_object = s3.Object(bucket, key)
s3_object.metadata.update({'Content-Type':'image/png'})
s3_object.copy_from(CopySource={'Bucket':bucket, 'Key':key}, Metadata=s3_object.metadata, MetadataDirective='REPLACE')
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
Content-Type is special metadata categorised as system defined metadata, and there are other means to update it. It is derived based on the contents of the object when it is created/uploaded.
Let's say you want to update System defined Content-Type metadata. Try with this code, which updates System defined metadata and also adds a user defined metadata:
s3_object.metadata.update({'My-Metadata':'abc'})
s3_object.copy_from(CopySource={'Bucket':BUCKET_NAME, 'Key':OBJECT_KEY}, ContentType='image/png', Metadata=s3_object.metadata, MetadataDirective='REPLACE')
As you see here, copy_from takes parameter ContentType explicitly to update content-type. One does not need to use metadata json to update this parameter. Use metadata json to update other user defined parameters.
Related
I want to upload a image from front end to google storage using javascript ajax functionality. I need a presigned url that the server would generate which would provide authentication to my frontend to upload a blob.
How can I generate a presigned url when using my local machine.
Previously for aws s3 I would do :
pp = s3.generate_presigned_post(
Bucket=settings.S3_BUCKET_NAME,
Key='folder1/' + file_name,
ExpiresIn=20 # seconds
)
When generating a signed url for a user to just view a file stored on google storage I do :
bucket = settings.CLIENT.bucket(settings.BUCKET_NAME)
blob_name = 'folder/img1.jpg'
blob = bucket.blob(blob_name)
url = blob.generate_signed_url(
version='v4',
expiration=datetime.timedelta(minutes=1),
method='GET')
Spent 100$ on google support and 2 weeks of my time to finally find a solution.
client = storage.Client() # works on app engine standard without any credentials requirements
But if you want to use generate_signed_url() function then you need service account Json key.
Every app engine standard has a default service account. ( You can find it in IAM/service account). Create a key for that default sv account and download the key ('sv_key.json') in json format. Store that key in your Django project right next to app.yaml file. Then do the following :
from google.cloud import storage
CLIENT = storage.Client.from_service_account_json('sv_key.json')
bucket = CLIENT.bucket('bucket_name_1')
blob = bucket.blob('img1.jpg') # name of file to be saved/uploaded to storage
pp = blob.generate_signed_url(
version='v4',
expiration=datetime.timedelta(minutes=1),
method='POST')
This will work on your local machine and GAE standard. WHen you deploy your app to GAE, sv_key.json also gets deployed with Django project and hence it works.
Hope it helps you.
Editing my answer as I didn't understand the problem you were facing.
Taking a look at the comments thread in the question, as #Nick Shebanov stated, there's one possibility to accomplish what are you trying to when using GAE with flex environment.
I have been trying to do the same with GAE Standard environment with no luck so far. At this point, I would recommend opening a feature request at the public issue tracker so this gets somehow implemented.
Create a service account private key and store it in SecretManager (SM).
In settings.py retrieve that key from SecretManager and store it in a constant - SV_ACCOUNT_KEY
Override Client() class func from_service_account_json() to take json key content instead of a path to json file. This way we dont have to have a json file in our file system (locally, cloudbuild or in GAE). we can just get private key contents from SM anytime anywhere.
settings.py
secret = SecretManager()
SV_ACCOUNT_KEY = secret.access_secret_data('SV_ACCOUNT_KEY')
signed_url_mixin.py
import datetime
import json
from django.conf import settings
from google.cloud.storage.client import Client
from google.oauth2 import service_account
class CustomClient(Client):
#classmethod
def from_service_account_json(cls, json_credentials_path, *args, **kwargs):
"""
Copying everything from base func (from_service_account_json).
Instead of passing json file for private key, we pass the private key
json contents directly (since we cannot save a file on GAE).
Since its not properly written, we cannot just overwrite a class or a
func, we have to rewrite this entire func.
"""
if "credentials" in kwargs:
raise TypeError("credentials must not be in keyword arguments")
credentials_info = json.loads(json_credentials_path)
credentials = service_account.Credentials.from_service_account_info(
credentials_info
)
if cls._SET_PROJECT:
if "project" not in kwargs:
kwargs["project"] = credentials_info.get("project_id")
kwargs["credentials"] = credentials
return cls(*args, **kwargs)
class _SignedUrlMixin:
bucket_name = settings.BUCKET_NAME
CLIENT = CustomClient.from_service_account_json(settings.SV_ACCOUNT_KEY)
exp_min = 4 # expire minutes
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.bucket = self.CLIENT.bucket(self.bucket_name)
def _signed_url(self, file_name, method):
blob = self.bucket.blob(file_name)
signed_url = blob.generate_signed_url(
version='v4',
expiration=datetime.timedelta(minutes=self.exp_min),
method=method
)
return signed_url
class GetSignedUrlMixin(_SignedUrlMixin):
"""
A GET url to view file on CS
"""
def get_signed_url(self, file_name):
"""
:param file_name: name of file to be retrieved from CS.
xyz/f1.pdf
:return: GET signed url
"""
method = 'GET'
return self._signed_url(file_name, method)
class PutSignedUrlMixin(_SignedUrlMixin):
"""
A PUT url to make a put req to upload a file to CS
"""
def put_signed_url(self, file_name):
"""
:file_name: xyz/f1.pdf
"""
method = 'PUT'
return self._signed_url(file_name, method)
I am trying to detect labels of multiple images using AWS Rekognition in Python.
This process requires around 3 seconds for an image to get labelled. Is there any way I can label these images in parallel?
Since I have restrained using boto3 sessions, please provide the code snippet, if possible.
The best thing you can do is, instead of running your code in local machine, run it in the cloud as a function. With AWS Lambda you can do this easily. Just add s3 object upload as a trigger to your lambda , whenever any image will be uploaded to your s3 bucket , it will trigger your lambda function and it will detect_labels and then you can use those labels the way you want, you can even store them to a dynamodb table for later reference and fetch from that table.
Best thing is if you upload multiple images simulataneously, then each image will be parallely executed as lambda is highly scalable and you get all results at same time.
Example Code for the same :
from __future__ import print_function
import boto3
from decimal import Decimal
import json
import urllib
print('Loading function')
rekognition = boto3.client('rekognition')
# --------------- Helper Functions to call Rekognition APIs ------------------
def detect_labels(bucket, key):
response = rekognition.detect_labels(Image={"S3Object": {"Bucket": bucket, "Name": key}})
# Sample code to write response to DynamoDB table 'MyTable' with 'PK' as Primary Key.
# Note: role used for executing this Lambda function should have write access to the table.
#table = boto3.resource('dynamodb').Table('MyTable')
#labels = [{'Confidence': Decimal(str(label_prediction['Confidence'])), 'Name': label_prediction['Name']} for label_prediction in response['Labels']]
#table.put_item(Item={'PK': key, 'Labels': labels})
return response
# --------------- Main handler ------------------
def lambda_handler(event, context):
# Get the object from the event
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
try:
#Calls rekognition DetectLabels API to detect labels in S3 object
response = detect_labels(bucket, key)
print(response)
return response
except Exception as e:
print(e)
print("Error processing object {} from bucket {}. ".format(key, bucket) +
"Make sure your object and bucket exist and your bucket is in the same region as this function.")
raise e
Running a line like:
s3_obj = boto3.resource('s3').Object(bucket, key)
s3_obj.meta.client.generate_presigned_url('get_object', ExpiresIn=0, Params={'Bucket':bucket,'Key':key})
Yields a result like:
https://my-bucket.s3.amazonaws.com/my-key/my-object-name?AWSAccessKeyId=SOMEKEY&Expires=SOMENUMBER&x-amz-security-token=SOMETOKEN
For an s3 object with public-read ACL, all the GET params are unnecessary.
I could cheat and use rewrite the URL without the GET params but that feels unclean and hacky.
How do I use boto3 to provide me with just the public link, e.g. https://my-bucket.s3.amazonaws.com/my-key/my-object-name? In other words, how do I skip the signing step in generate_presigned_url? I don't see anything like a generated_unsigned_url function.
The best solution I found is still to use the generate_presigned_url, just that the Client.Config.signature_version needs to be set to botocore.UNSIGNED.
The following returns the public link without the signing stuff.
config.signature_version = botocore.UNSIGNED
boto3.client('s3', config=config).generate_presigned_url('get_object', ExpiresIn=0, Params={'Bucket': bucket, 'Key': key})
The relevant discussions on the boto3 repository are:
https://github.com/boto/boto3/issues/110
https://github.com/boto/boto3/issues/169
https://github.com/boto/boto3/issues/1415
I am trying to detect faces in an image using AWS Image Rekognition API. But getting the following Error:
Error1:
ClientError: An error occurred (InvalidS3ObjectException) when calling the DetectFaces operation: Unable to get image metadata from S3. Check object key, region and/or access permissions.
Python Code1:
def detect_faces(object_name="path/to/image/001.jpg"):
client = get_aws_client('rekognition')
response = client.detect_faces(
Image={
# 'Bytes': source_bytes,
'S3Object': {
'Bucket': "bucket-name",
'Name': object_name,
'Version': 'string'
}
},
Attributes=[
'ALL',
]
)
return response
The Object "path/to/image/001.jpg" exists in the AWS S3 Bucket "bucket-name". And the region Name is also correct.
The Permissions for this object '001.jpg' is: Everyone is granted Open/Download/view Permission.
MetaData for the Object: Content-Type: image/jpeg
Not sure how to debug this. Any Suggestion to resolve this please ?
Thanks,
You appear to be asking the service to fetch the object with a version id of string.
Version
If the bucket is versioning enabled, you can specify the object version.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 1024.
Required: No
http://docs.aws.amazon.com/rekognition/latest/dg/API_S3Object.html#rekognition-Type-S3Object-Version
Remove 'Version': 'string' from your request parameters, unless you really intend to fetch a specific version of the object from a versioned bucket, in which case, provide the actual version id of the object in question.
I had the same issue and avoid using '-' or whitespaces in bucket and uploaded file names solved it for me.
Maybe removing underscores can also help.
Old thread, new solution:
I got the same error message.
My mistake was due to a region mismatch; My S3 bucket was residing in us-east-2, but my recognition client defaulted to us-west-1.
I changed the line
client = get_aws_client('rekognition')
to
client = get_aws_client('rekognition', region_name='us-east-2')
and it worked.
How does one use put_bucket_policy()? It throws a MalformedPolicy error even when I try to pass in an existing valid policy:
import boto3
client = boto3.client('s3')
dict_policy = client.get_bucket_policy(Bucket = 'my_bucket')
str_policy = str(dict_policy)
response = client.put_bucket_policy(Bucket = 'my_bucket', Policy = str_policy)
* error message: *
botocore.exceptions.ClientError: An error occurred (MalformedPolicy) when calling the PutBucketPolicy operation: This policy contains invalid Json
That's because applying str to a dict doesn't turn it into a valid json, use json.dumps instead:
import boto3
import json
client = boto3.client('s3')
dict_policy = client.get_bucket_policy(Bucket = 'my_bucket')
str_policy = json.dumps(dict_policy)
response = client.put_bucket_policy(Bucket = 'my_bucket', Policy = str_policy)
Current boto3 API doesn't have a function to APPEND the bucket policy, whether add another items/elements/attributes. You need load and manipulate the JSON yourself. E.g. write script load the policy into a dict, append the "Statement" element list, then use the policy.put to replace the whole policy. Without the original statement id, user policy will be appended. HOWEVER, there is no way to tell whether later user policy will override rules of the earlier one.
For example
import boto3
s3_conn = boto3.resource('s3')
bucket_policy = s3_conn.BucketPolicy('bucket_name')
policy = s3_conn.get_bucket_policy(Bucket='bucket_name')
user_policy = { "Effect": "Allow",... }
new_policy = policy['Statement'].append(user_policy)
bucket_policy.put(Policy=new_policy)
The user don't need know the old policy in the process.