Multiprocessing multiple images using Rekognition in Python

Multiprocessing multiple images using Rekognition in Python - python

I am trying to detect labels of multiple images using AWS Rekognition in Python.
This process requires around 3 seconds for an image to get labelled. Is there any way I can label these images in parallel?
Since I have restrained using boto3 sessions, please provide the code snippet, if possible.

The best thing you can do is, instead of running your code in local machine, run it in the cloud as a function. With AWS Lambda you can do this easily. Just add s3 object upload as a trigger to your lambda , whenever any image will be uploaded to your s3 bucket , it will trigger your lambda function and it will detect_labels and then you can use those labels the way you want, you can even store them to a dynamodb table for later reference and fetch from that table.
Best thing is if you upload multiple images simulataneously, then each image will be parallely executed as lambda is highly scalable and you get all results at same time.
Example Code for the same :
from __future__ import print_function
import boto3
from decimal import Decimal
import json
import urllib
print('Loading function')
rekognition = boto3.client('rekognition')
# --------------- Helper Functions to call Rekognition APIs ------------------
def detect_labels(bucket, key):
response = rekognition.detect_labels(Image={"S3Object": {"Bucket": bucket, "Name": key}})
# Sample code to write response to DynamoDB table 'MyTable' with 'PK' as Primary Key.
# Note: role used for executing this Lambda function should have write access to the table.
#table = boto3.resource('dynamodb').Table('MyTable')
#labels = [{'Confidence': Decimal(str(label_prediction['Confidence'])), 'Name': label_prediction['Name']} for label_prediction in response['Labels']]
#table.put_item(Item={'PK': key, 'Labels': labels})
return response
# --------------- Main handler ------------------
def lambda_handler(event, context):
# Get the object from the event
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
try:
#Calls rekognition DetectLabels API to detect labels in S3 object
response = detect_labels(bucket, key)
print(response)
return response
except Exception as e:
print(e)
print("Error processing object {} from bucket {}. ".format(key, bucket) +
"Make sure your object and bucket exist and your bucket is in the same region as this function.")
raise e

Related

How to Change System Metadata in AWS S3

I am trying to change the metadata of an image in an s3 bucket through lambda, this lambda triggers when an object is uploaded. But for some reason when I update the metadata through copy_from it adds user metadata instead of the System Metadata like this:
Is there a special way to edit the system metadata? My code is:
import json
import boto3
import urllib
s3 = boto3.resource('s3')
def lambda_handler(event, context):
# TODO implement
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
s3_object = s3.Object(bucket, key)
s3_object.metadata.update({'Content-Type':'image/png'})
s3_object.copy_from(CopySource={'Bucket':bucket, 'Key':key}, Metadata=s3_object.metadata, MetadataDirective='REPLACE')
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}

Content-Type is special metadata categorised as system defined metadata, and there are other means to update it. It is derived based on the contents of the object when it is created/uploaded.
Let's say you want to update System defined Content-Type metadata. Try with this code, which updates System defined metadata and also adds a user defined metadata:
s3_object.metadata.update({'My-Metadata':'abc'})
s3_object.copy_from(CopySource={'Bucket':BUCKET_NAME, 'Key':OBJECT_KEY}, ContentType='image/png', Metadata=s3_object.metadata, MetadataDirective='REPLACE')
As you see here, copy_from takes parameter ContentType explicitly to update content-type. One does not need to use metadata json to update this parameter. Use metadata json to update other user defined parameters.

Using Lambda function to parse SES email with attachment and send to S3 bucket

SITUATION: I verified a domain name in AWS and set up an S3 bucket that recieves emails.
These emails contain a .csv file and are delivered to this bucket on a daily basis. I can verify the presence of the attachment by manually exploring the raw email. No problems.
DESIRED OUTCOME: I want to parse these emails, extract the attached .csv and send the .csv file to a destination S3 bucket (or destination folder within the same S3 bucket) so that I can later process it using a seperate Python script
ISSUE: I have written the Lambda function in Python and the logs show this executes successfully when testing yet no files appear in the destination folder.
There is an ObjectCreated trigger enabled on the source bucket which I believe should activate the function on arrival of a new email but this does not have any effect on the execution of the function.
See Lambda function code below:
import json
import urllib
import boto3
import os
import email
import base64
FILE_MIMETYPE = 'text/csv'
# destination folder
S3_OUTPUT_BUCKETNAME = 's3-bucketname/folder'
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
#source email bucket
inBucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.quote(event['Records'][0]['s3']['object']['key'].encode('utf8'))
try:
response = s3.get_object(Bucket=inBucket, Key=key)
msg = email.message_from_string(response['Body'].read().decode('utf-8'))
except Exception as e:
print(e)
print('Error retrieving object {} from source bucket {}. Verify existence and ensure bucket is in same region as function.'.format(key, inBucket))
raise e
attachment_list = []
try:
#scan each part of email
for message in msg.get_payload():
# Check filename and email MIME type
if (msg.get_filename() != None and msg.get_content_type() == FILE_MIMETYPE):
attachment_list.append ({'original_msg_key':key, 'attachment_filename':msg.get_filename(), 'body': base64.b64decode(message.get_payload()) })
except Exception as e:
print(e)
print ('Error processing email for CSV attachments')
raise e
# if multiple attachments send all to bucket
for attachment in attachment_list:
try:
s3.put_object(Bucket=S3_OUTPUT_BUCKETNAME, Key=attachment['original_msg_key'] +'-'+attachment['attachment_filename'] , Body=attachment['body'])
except Exception as e:
print(e)
print ('Error sending object {} to destination bucket {}. Verify existence and ensure bucket is in same region as function.'.format(attachment['attachment_filename'], S3_OUTPUT_BUCKETNAME))
raise e
return event
Unfamiliar territory so please let me know if further information is required.
EDIT
As per comments I have checked the logs. Seems that function is being invoked but the attachment is not being parsed and sent to destination folder. It's possible that there's an error in the Python file itself.

Generate presigned url for uploading file to google storage using python

I want to upload a image from front end to google storage using javascript ajax functionality. I need a presigned url that the server would generate which would provide authentication to my frontend to upload a blob.
How can I generate a presigned url when using my local machine.
Previously for aws s3 I would do :
pp = s3.generate_presigned_post(
Bucket=settings.S3_BUCKET_NAME,
Key='folder1/' + file_name,
ExpiresIn=20 # seconds
)
When generating a signed url for a user to just view a file stored on google storage I do :
bucket = settings.CLIENT.bucket(settings.BUCKET_NAME)
blob_name = 'folder/img1.jpg'
blob = bucket.blob(blob_name)
url = blob.generate_signed_url(
version='v4',
expiration=datetime.timedelta(minutes=1),
method='GET')

Spent 100$ on google support and 2 weeks of my time to finally find a solution.
client = storage.Client() # works on app engine standard without any credentials requirements
But if you want to use generate_signed_url() function then you need service account Json key.
Every app engine standard has a default service account. ( You can find it in IAM/service account). Create a key for that default sv account and download the key ('sv_key.json') in json format. Store that key in your Django project right next to app.yaml file. Then do the following :
from google.cloud import storage
CLIENT = storage.Client.from_service_account_json('sv_key.json')
bucket = CLIENT.bucket('bucket_name_1')
blob = bucket.blob('img1.jpg') # name of file to be saved/uploaded to storage
pp = blob.generate_signed_url(
version='v4',
expiration=datetime.timedelta(minutes=1),
method='POST')
This will work on your local machine and GAE standard. WHen you deploy your app to GAE, sv_key.json also gets deployed with Django project and hence it works.
Hope it helps you.

Editing my answer as I didn't understand the problem you were facing.
Taking a look at the comments thread in the question, as #Nick Shebanov stated, there's one possibility to accomplish what are you trying to when using GAE with flex environment.
I have been trying to do the same with GAE Standard environment with no luck so far. At this point, I would recommend opening a feature request at the public issue tracker so this gets somehow implemented.

Create a service account private key and store it in SecretManager (SM).
In settings.py retrieve that key from SecretManager and store it in a constant - SV_ACCOUNT_KEY
Override Client() class func from_service_account_json() to take json key content instead of a path to json file. This way we dont have to have a json file in our file system (locally, cloudbuild or in GAE). we can just get private key contents from SM anytime anywhere.
settings.py
secret = SecretManager()
SV_ACCOUNT_KEY = secret.access_secret_data('SV_ACCOUNT_KEY')
signed_url_mixin.py
import datetime
import json
from django.conf import settings
from google.cloud.storage.client import Client
from google.oauth2 import service_account
class CustomClient(Client):
#classmethod
def from_service_account_json(cls, json_credentials_path, *args, **kwargs):
"""
Copying everything from base func (from_service_account_json).
Instead of passing json file for private key, we pass the private key
json contents directly (since we cannot save a file on GAE).
Since its not properly written, we cannot just overwrite a class or a
func, we have to rewrite this entire func.
"""
if "credentials" in kwargs:
raise TypeError("credentials must not be in keyword arguments")
credentials_info = json.loads(json_credentials_path)
credentials = service_account.Credentials.from_service_account_info(
credentials_info
)
if cls._SET_PROJECT:
if "project" not in kwargs:
kwargs["project"] = credentials_info.get("project_id")
kwargs["credentials"] = credentials
return cls(*args, **kwargs)
class _SignedUrlMixin:
bucket_name = settings.BUCKET_NAME
CLIENT = CustomClient.from_service_account_json(settings.SV_ACCOUNT_KEY)
exp_min = 4 # expire minutes
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.bucket = self.CLIENT.bucket(self.bucket_name)
def _signed_url(self, file_name, method):
blob = self.bucket.blob(file_name)
signed_url = blob.generate_signed_url(
version='v4',
expiration=datetime.timedelta(minutes=self.exp_min),
method=method
)
return signed_url
class GetSignedUrlMixin(_SignedUrlMixin):
"""
A GET url to view file on CS
"""
def get_signed_url(self, file_name):
"""
:param file_name: name of file to be retrieved from CS.
xyz/f1.pdf
:return: GET signed url
"""
method = 'GET'
return self._signed_url(file_name, method)
class PutSignedUrlMixin(_SignedUrlMixin):
"""
A PUT url to make a put req to upload a file to CS
"""
def put_signed_url(self, file_name):
"""
:file_name: xyz/f1.pdf
"""
method = 'PUT'
return self._signed_url(file_name, method)

python aws put_bucket_policy() MalformedPolicy

How does one use put_bucket_policy()? It throws a MalformedPolicy error even when I try to pass in an existing valid policy:
import boto3
client = boto3.client('s3')
dict_policy = client.get_bucket_policy(Bucket = 'my_bucket')
str_policy = str(dict_policy)
response = client.put_bucket_policy(Bucket = 'my_bucket', Policy = str_policy)
* error message: *
botocore.exceptions.ClientError: An error occurred (MalformedPolicy) when calling the PutBucketPolicy operation: This policy contains invalid Json

That's because applying str to a dict doesn't turn it into a valid json, use json.dumps instead:
import boto3
import json
client = boto3.client('s3')
dict_policy = client.get_bucket_policy(Bucket = 'my_bucket')
str_policy = json.dumps(dict_policy)
response = client.put_bucket_policy(Bucket = 'my_bucket', Policy = str_policy)

Current boto3 API doesn't have a function to APPEND the bucket policy, whether add another items/elements/attributes. You need load and manipulate the JSON yourself. E.g. write script load the policy into a dict, append the "Statement" element list, then use the policy.put to replace the whole policy. Without the original statement id, user policy will be appended. HOWEVER, there is no way to tell whether later user policy will override rules of the earlier one.
For example
import boto3
s3_conn = boto3.resource('s3')
bucket_policy = s3_conn.BucketPolicy('bucket_name')
policy = s3_conn.get_bucket_policy(Bucket='bucket_name')
user_policy = { "Effect": "Allow",... }
new_policy = policy['Statement'].append(user_policy)
bucket_policy.put(Policy=new_policy)
The user don't need know the old policy in the process.

Reading part of a file in S3 using Boto

I am trying to read 700MB file stored in S3. How ever I only require bytes from locations 73 to 1024.
I tried to find a usable solution but failed to. Would be a great help if someone could help me out.

S3 supports GET requests using the 'Range' HTTP header which is what you're after.
To specify a Range request in boto, just add a header dictionary specifying the 'Range' key for the bytes you are interested in. Adapted from Mitchell Garnaat's response:
import boto
s3 = boto.connect_s3()
bucket = s3.lookup('mybucket')
key = bucket.lookup('mykey')
your_bytes = key.get_contents_as_string(headers={'Range' : 'bytes=73-1024'})

import boto3
obj = boto3.resource('s3').Object('mybucket', 'mykey')
stream = obj.get(Range='bytes=32-64')['Body']
print(stream.read())
boto3 version from https://github.com/boto/boto3/issues/1236

Please have a look on the python script here
import boto3
region = 'us-east-1' # define your region here
bucketname = 'test' # define bucket
key = 'objkey' # s3 file
Bytes_range = 'bytes=73-1024'
client = boto3.client('s3',region_name = region)
resp = client.get_object(Bucket=bucketname,Key=key,Range=Bytes_range)
data = resp['Body'].read()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing multiple images using Rekognition in Python - python

I am trying to detect labels of multiple images using AWS Rekognition in Python. This process requires around 3 seconds for an image to get labelled. Is there any way I can label these images in parallel? Since I have restrained using boto3 sessions, please provide the code snippet, if possible.

Related

How to Change System Metadata in AWS S3

Using Lambda function to parse SES email with attachment and send to S3 bucket

Generate presigned url for uploading file to google storage using python

python aws put_bucket_policy() MalformedPolicy

Reading part of a file in S3 using Boto

Categories

Resources