Using multiple Python functions within AWS Lambda script - python

SITUATION
I'm using a Lambda function that takes a CSV attachment from an incoming email and places it into what is, in effect, a sub-folder of an S3 bucket. This part of the Lambda works well, however there are other UDFs which I need to execute, within the same Lambda function, to perform susequent tasks.
CODE
import boto3
import email
import base64
import math
import pickle
import numpy as np
import pandas as pd
import io
###############################
### GET THE ATTACHMENT ###
###############################
#s3 = boto3.client('s3')
FILE_MIMETYPE = 'text/csv'
#'application/octet-stream'
# destination folder
S3_OUTPUT_BUCKETNAME = 'my_bucket'
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
#source email bucket
inBucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.quote(event['Records'][0]['s3']['object']['key'].encode('utf8'))
try:
response = s3.get_object(Bucket=inBucket, Key=key)
msg = email.message_from_string(response['Body'].read().decode('utf-8'))
except Exception as e:
print(e)
print('Error retrieving object {} from source bucket {}. Verify existence and ensure bucket is in same region as function.'.format(key, inBucket))
raise e
attachment_list = []
try:
#scan each part of email
for message in msg.walk():
# Check filename and email MIME type
if (message.get_content_type() == FILE_MIMETYPE and message.get_filename() != None):
attachment_list.append ({'original_msg_key':key, 'attachment_filename':message.get_filename(), 'body': base64.b64decode(message.get_payload()) })
except Exception as e:
print(e)
print ('Error processing email for CSV attachments')
raise e
# if multiple attachments send all to bucket
for attachment in attachment_list:
try:
s3.put_object(Bucket=S3_OUTPUT_BUCKETNAME, Key='attachments/' + attachment['original_msg_key'] + '-' + attachment['attachment_filename'] , Body=attachment['body']
)
except Exception as e:
print(e)
print ('Error sending object {} to destination bucket {}. Verify existence and ensure bucket is in same region as function.'.format(attachment['attachment_filename'], S3_OUTPUT_BUCKETNAME))
raise e
#################################
### ADDITIONAL FUNCTIONS ###
#################################
def my_function():
print("Hello, this is another function")
OUTCOME
The CSV attachment is successfully retrieved and placed in the destination as specified by s3.put_object, however there is no evidence in the Cloudwatch logs that my_function runs.
WHAT I HAVE TRIED
I've tried using def my_function(event, context): in an attempt to ascertain whether the function requires the same criteria to be executed as the first functon. I've also tried to include the my_function() as part of the first function but this does not appear to work either.
How can I ensure that both functions are executed within the Lambda?

Based on the comments.
The issue was caused because my_function function was not called inside the lambda handler.
The solution was to add my_function() into the handler lambda_handler so that the my_function is actually called.

Related

How to get file key out of an SNS connected to S3 Bucket

I have two different profiles on AWS. The s3 bucket and SNS are in profile A and my lambda function is in profile B. When a new file is added to the s3 bucket, SNS triggers the lambda function.
The lambda function then supposed to access the new file and process it using pandas. Here is what I'm doing now;
sts_connection = boto3.client('sts')
acct_b = sts_connection.assume_role(
RoleArn="arn:aws:iam::**************:role/AllowS3AccessFromAccountB",
RoleSessionName="cross_acct_lambda"
)
ACCESS_KEY = acct_b['Credentials']['AccessKeyId']
SECRET_KEY = acct_b['Credentials']['SecretAccessKey']
SESSION_TOKEN = acct_b['Credentials']['SessionToken']
s3 = boto3.client('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, aws_session_token=SESSION_TOKEN)
path = get_file_path(event)
obj = s3.get_object(Bucket='my-bucket-name', Key=path)
csv_string = io.BytesIO(obj['Body'].read())
# Read a csv file and turn it into a DataFrame
df = pd.read_json(csv_string, delimiter=';', engine ='c', encoding= 'unicode_escape')
def get_file_path(event_body):
"""Function to get manifest path anc check if it is manifest"""
try:
# Get message for first SNS record
sns_message = json.loads(event_body["Records"][0]["Sns"]["Message"])
path = sns_message["Records"][0]["s3"]["object"]["key"]
except TypeError as ex:
logging.error("Unable to parse event: " + str(event_body))
raise ex
return path
Everything works fine until the s3.get_object() part. I'm getting the following error;
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
Maybe I'm reading the file key in the wrong way?
Edit:
Here is what path looks like when I debugged it.
svv/sensor%3D11219V22151/year%3D2020/month%3D03/day%3D02/test.csv
And the s3 file structure is like this;
sensor-data/sensor=*******/year=2020/month=03/day=02
Seems like I need to use a regex for the equal signs. But there should be a more generic solution.
Here's a snippet I have in some Lambda code that is directly triggered by Amazon S3 (not via Amazon SNS):
import urllib
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
You could try the similar parsing to see if it corrects the Key.

How to make a link to S3 file download

I want to make a linke to download S3 stored file.
<a href="https://s3.region.amazonaws.com/bucket/file.txt" download>DownLoad</a>
it only display file.txt on the browser.
So I found way to download. It is add Content-Disposition : attachment meta tag to file.
But I need to add this meta tag to new file automately. So I made lambda function by python.
import json
import urllib.parse
import boto3
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
try:
response = s3.get_object(Bucket=bucket, Key=key)
print("CONTENT TYPE: " + response['ContentType'])
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
try:
s3_2 = boto3.resource('s3')
s3_object = s3_2.Object(bucket, key)
print(s3_object.metadata)
s3_object.metadata.update({'ContentDisposition':'attachment'})
print(bucket, key)
s3_object.copy_from(CopySource={'Bucket':bucket, 'Key':key}, Metadata=s3_object.metadata, MetadataDirective='REPLACE')
except:
print(s3_object.metadata)
return response['ContentType']
But this function add user defined metatag not system metatag. . .
What should I do?
Content-Disposition is treated by S3 as (somewhat) more like system metadata than custom/user-defined metadata, so it has its own argument.
s3_object.copy_from(CopySource={'Bucket':bucket, 'Key':key}, ContentDisposition='attachment', Metadata=s3_object.metadata, MetadataDirective='REPLACE')
Note that you still need Metadata and MetadataDirective as shown, for this to work, but s3_object.metadata.update() is not required since you are not changing the custom metadata.

How to make a Python AWS Lambda open an email stored in S3 as email object

I realize this is a total noob question and hopefully an easy solution exists. However, I'm stuck and turning to you for help! What I'm trying to do is this: I have an SES rule set that stores emails in my S3 bucket. The specific emails I'm storing contain a .txt attachment. I'm hoping to have a Lambda function that is triggered on S3 bucket "Create" function, open the email AND attachment, and then perform some other processing based on specific text in the email attachment.
My specific question is this: How do I allow the Lambda function to take the S3 email "object" and convert it to the standard Python "message" object format so that I can use Python's Email library against it?
Here is what I have so far...not much, I know:
import boto3
import email
def lambda_handler(event, context):
s3 = boto3.client("s3")
if event:
print("My Event is : ", event)
file_obj = event["Records"][0]
filename = str(file_obj["s3"]['object']['key'])
print("filename: ", filename)
fileObj = s3.get_object(Bucket = "mytestbucket", Key=filename)
print("file has been gotten!")
#Now that the .eml file that was stored in S3 is stored in fileObj,
#start parsing it--but how to convert it to "email" class???
#??????
Can you try something like this?. With this, you will get msg object back from stream you opened with S3 file.
import boto3
import email
def lambda_handler(event, context):
s3 = boto3.client("s3")
if event:
print("My Event is : ", event)
file_obj = event["Records"][0]
filename = str(file_obj["s3"]['object']['key'])
print("filename: ", filename)
fileObj = s3.get_object(Bucket = "mytestbucket", Key=filename)
print("file has been gotten!")
msg = email.message_from_bytes(fileObj['Body'].read())
print(msg['Subject'])
#Hello

Using AWS Lambda and boto3 to append new lines to text file objects in S3

I'm trying to use a python lambda function to append a text file with a new line on a object stored in S3. Since objects stored in S3 are immutable, you must first download the file into '/tmp/', then modify it, then upload the new version back to S3. My code appends the data, however it will not append it with a new line.
BUCKET_NAME = 'mybucket'
KEY = 'test.txt'
s3 = boto3.resource('s3')
def lambda_handler(event, context):
try:
s3.Object(BUCKET_NAME, KEY).download_file('/tmp/test.txt')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
with open('/tmp/test.txt', 'a') as fd:
fd.write("this is a new string\n")
s3.meta.client.upload_file('/tmp/test.txt', BUCKET_NAME, KEY)
The file is always appended with the new string but never with a new line. Any ideas?
UPDATE: This problem does not occur on linux machines or on a Mac. Lambda functions run on linux containers, which means the file in /tmp/ is saved as a Unix-formatted text file. Some Windows applications will not show line breaks on Unix-formatted text files, which was the case here. I'm dumb.
You don't need to download and upload a file in order to overwrite a file in S3; To overwrite an existing object you can just upload the file with the same name and it will be done automatically (reference). Look into the put_object function (S3 doc).
So your code will look like this:
BUCKET_NAME = 'mybucket'
KEY = 'test.txt'
# Use .client() instead of .resource()
s3 = boto3.client('s3')
def lambda_handler(event, context):
try:
# (Optional) Read the object
obj = s3.get_object(Bucket=BUCKET_NAME, Key=KEY)
file_content = obj['Body'].read().decode('utf-8')
# (Optional) Update the file content
new_file_content = file_content + "this is a new string\n"
# Write to the object
s3.put_object(Bucket=BUCKET_NAME, Key=KEY, Body=str(new_file_content))
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
You need to specify the local file path
import boto3
import botocore
from botocore.exceptions import ClientError
BUCKET_NAME = 'mybucket'
KEY = 'test.txt'
LOCAL_FILE = '/tmp/test.txt'
s3 = boto3.resource('s3')
def lambda_handler(event, context):
try:
obj=s3.Bucket(BUCKET_NAME).download_file(LOCAL_FILE, KEY)
except ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
with open('/tmp/test.txt', 'a') as fd:
fd.write("this is a new string\n")
s3.meta.client.upload_file(LOCAL_FILE, BUCKET_NAME, KEY)
Boto3 doc reference: http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Bucket.download_file
Nice Post!
Just an adjustment..
You should change the order of LOCAL_FILE and KEY in the parameters of the download_file method.
The correct syntax is:
obj=s3.Bucket(BUCKET_NAME).download_file(KEY,LOCAL_FILE)
Also it would be nice if we delete de local file in case of file not found in the bucket. because if we dont remove the local file (if exists obviously) we may be adding a new line to the already existed local file.
With the help of this function:
def remove_local_file(filePath):
import os
# As file at filePath is deleted now, so we should check if file exists or not not before deleting them
if os.path.exists(filePath):
os.remove(filePath)
else:
print("Can not delete the file as it doesn't exists")
the final code starting in the 'try' could be like this:
try:
obj=s3.Bucket(BUCKET_NAME).download_file(KEY,LOCAL_FILE)
except ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
remove_local_file(LOCAL_FILE)
else:
raise
with open(LOCAL_FILE, 'a') as fd:
fd.write("this is a new string\n")
s3.meta.client.upload_file(LOCAL_FILE, BUCKET_NAME, KEY)

How to get latest file-name or file from S3 bucket using event triggered lambda

I am very new to AWS services and has just a week worth of experience with serverless architecture, My requirement is to trigger an event when a new file is uploaded to a specific bucket, once the event trigger is set my Lambda should get the details of the latest file such as Name, Size, and Date of creation.
The source is uploading this file in a new folder every time and names the folder with the current date.
So far I am able to crack how to create my Lambda function and listen to the event trigger.
Here is my code.
import boto3
import botocore
import datetime
import logging
def lambda_handler(event, context):
logging.info('Start function')
s3 = boto3.resource('s3')
DATE = datetime.datetime.today().strftime('%Y-%m-%d')
BUCKET_NAME = 'monkey-banana-dev'
KEY = '/banana/incoming/daily/{}'.format(DATE)
logging.info('Getting file from {}'.format(KEY))
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'name_of_my_file')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
here since I know that it's going to be today's date hence I am using datetime to get the exact KEY but file-name will always be different. although I know that it's going to be a text file with .txt suffix, I am not able to get around on how to get the name of the latest uploaded file and other details from the trigger.
You have an event object, it contains a key "Records" that is a list.
You can filter the records for eventName 'ObjectCreated:Put' and then sort the list by key "eventTime" to get the latest event data.
def lambda_handler(event, context):
records = [x for x in event.get('Records', []) if x.get('eventName') == 'ObjectCreated:Put']
sorted_events = sorted(records, key=lambda e: e.get('eventTime'))
latest_event = sorted_events[-1] if sorted_events else {}
info = latest_event.get('s3', {})
file_key = info.get('object', {}).get('key')
bucket_name = info.get('bucket', {}).get('name')
As was mentioned, this link has the info - http://docs.aws.amazon.com/lambda/latest/dg/eventsources.html#eventsources-s3-put
What you need to do is utilize the event object that is passed into the function. That contains the detail that is provided in the link. As you can see in the example in the link, you need to access the key. This will contain the full path, including the date that you mentioned, since the key is the full file path.
To help debug this, you an always print the value of event to the console using the print function in Python.
'Key' will contain the whole file path.
example-
import boto3
import os
s3 = boto3.resource('s3')
bucket=s3.Bucket('hcss-heavyjob-raw-tables')
for key in bucket.objects.all():
if key.key.startswith('heavyjob/EMPMAST'):
print(key.key)'
Output-
heavyjob/EMPMAST/20190524-165352044.csv
heavyjob/EMPMAST/20190529-153011532.csv
heavyjob/EMPMAST/LOAD00000001.csv
You can get the file name by using basename on key.key
or
head,tail = os.path.split(key.key)
print(tail)

Categories