unable to use s3 trigger to transfer s3 objects to rds - python

I have the lambda function code below that transfers objects from s3 buckets to AWS RDS database.
import json
import boto3
import pymysql
s3_client = boto3.client('s3')
def lambda_handler(event, context):
bucket_name = event["bucket"]
s3_file_name = event["object"]
resp = s3_client.get_object(Bucket=bucket_name, Key=s3_file_name)
data = resp['Body']
rds_endpoint = ""
username = #username for RDS Mysql
password = # RDS Mysql password
db_name = # RDS MySQL DB name
conn = None
try:
conn = pymysql.connect(host=rds_endpoint, user=username, password=password, database=db_name)
except pymysql.MySQLError as e:
print("ERROR: Unexpected error: Could not connect to MySQL instance.")
try:
cur = conn.cursor()
cur.execute(#db stuff)
conn.commit()
except Exception as e:
print(e)
return 'Table not created!'
with conn.cursor() as cur:
try:
cur.execute(#db stuff)
conn.commit()
output = cur.execute()
except:
output = ("Entry not inputted! Error!")
print("Deleting the csv file from s3 bucket")
return {
'statusCode': 200,
'body': 'Successfully uploaded!'
}
The code above works fine with this given test ev:
{
"bucket": "python-bucket",
"object": "bobmarley.mp3"
}
However, when I try to adapt it to the s3 bucket by changing the lines of code to below as seen in this tutorial: https://www.data-stats.com/s3-data-ingestion-to-rds-through-lambda/
bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
s3_file_name = event["Records"][0]["s3"]["object"]["key"]
I get this error:
[ERROR] TypeError: list indices must be integers or slices, not str
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 7, in lambda_handler
bucket_name = event["Records"]["s3"]["bucket"]["name"]

Related

Boto3 throws exception when uploading file but the file is saved in S3

I need to upload files using boto3 with Flask. I have the following method to upload the files and I want to return the path of the file within S3.
utils.py
import io
import boto3
from app import app
s3 = boto3.client(
"s3",
aws_access_key_id = app.config['S3_KEY'],
aws_secret_access_key = app.config['S3_SECRET']
)
def upload_file(file, bucket_name, acl=app.config['AWS_DEFAULT_ACL']):
try:
s3.upload_fileobj(
file,
bucket_name,
f"Profile_Photos/{file.filename}",
ExtraArgs = {
"ACL": 'private',
"ContentType": file.content_type
}
)
return "{}{}".format(app.config['S3_LOCATION', file.filename])
except Exception as e:
print("Something was wrong: ", e) # This exception is thrown
return e
Inside the main class I have the following:
main.py
#app.route('/registro', methods=['POST'])
def register():
conn = None
cursor = None
try:
username= request.form.get('user', None)
password = request.form.get('password', None)
if username and password:
hashed_password = hashlib.md5(password.encode()).hexdigest()
sql = "INSERT INTO Users(username, password) VALUES (%s, %s)"
data = (username, hashed_password)
conn = mySQL.connect()
cursor = conn.cursor()
cursor.execute(sql, data)
conn.commit()
if 'current_photo' in request.files:
file = request.files['current_photo']
if file.filename != '':
file_name = os.path.splitext(file.filename)[0]
extension = file.filename.split('.')[-1]
new_name = "{}_{}.{}".format(file_name, username, extension)
file.filename = new_name
file.filename = secure_filename(file.filename)
print("Before")
path = upload_file(file, app.config['S3_BUCKET']) # Error occurs here
print("After")
res = jsonify('User created.')
res.status_code = 200
return res
except Exception as e:
print(e)
return 'Error'
if __name__ == '__main__':
app.run()
The problem is that when executing the code, an exception is always thrown in the upload_file method, however the photo is uploaded to S3. The message doesn't seem very informative:
Something was wrong ('S3_LOCATION', 'profile_user2.jpg').
What does that message mean and why is the exception being thrown?

KeyError while updating csv to dynamodb

Yesterday my code was working while inserting my csv into dynamo db, It could not identify the bucket_name also Yesterday in the cloud watch logs the event was visible while uploading but today it is not
import boto3
s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
def lambda_handler(event, context):
bucket_name = event['Records'][0]['s3']['bucket']['name']
#bucket_name = event['query']['Records'][0]['s3']['bucket']['name']
print (bucket_name)
s3_file_name = event['Records'][0]['s3']['object']['key']
resp = s3_client.get_object(Bucket=bucket_name,Key=s3_file_name)
data = resp['Body'].read().decode('utf-8')
employees = data.split("\n")
table = dynamodb.Table('employees')
for emp in employees:
emp_data = emp.split(',')
print (emp_data)
try:
table.put_item(
Item = {
"emp_id": emp_data[0],
"Name": emp_data[1],
"Company": emp_data[2]
}
)
except Exception as e:
print ('endof file')
return 'files saved to Dynamodb'
Today i got the error below
Response:
{
"errorMessage": "'Records'",
"errorType": "KeyError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 7, in lambda_handler\n bucket_name = event['Records'][0]['s3']['bucket']['name']\n"
]
}
The error means that event does not contain Records.
To check this and protect against the error you can do the following:
def lambda_handler(event, context):
if 'Records' not in event:
# execute some operations that you want
# in case there are no Records
# in the event
return
# continue processing Records if
# they are available
event['Records'][0]['s3']['bucket']['name']
# the rest of your code

Snowflake python-connector; Error 604 when issuing multiple requests

I have an Azure Function that sends queries to snowflake using the Python snowflake-connector. It opens a connection, creates a cursor, sends the query and does not check to ensure the query was successful using _no_results=True. When I run it it works fine. However when I use it to run multiple queries at once, some queries are randomly failing with status code 604: Query Execution was canceled. Is there some sort of concurrent limit that I'm hitting? I cannot find any information in the documentation. The queries being sent are very simple (truncate table x) and are not timing out.
My code is attached below.
import logging
import json
import time
import gc
from flatten_json import flatten
import os
import snowflake.connector
import azure.functions as func
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
# Deserialize request body
req_body = req.get_json()
logging.info('Deserialized input successfully. Request body: ' + json.dumps(req_body))
#Create result JSON to be returned to caller
result = {
"TaskName":req_body['TaskName'],
"Status":"Complete",
"TaskKey": req_body['TaskKey'],
"Query_ID":"",
"Session_ID":""
}
#Create the Snowflake parameters for connection
USER = <sfusername>
PASSWD = <sfpw>
ACCOUNT = <sfAcc>
WAREHOUSE = <sfwh>
DATABASE = <sfdb>
logging.info('Connection string created')
copy_sql_statement = create_sql_statement(req_body)
logging.info('Insert SQL Statement: ' + copy_sql_statement)
logging.info('Attempting to Connect to Snowflake...')
try:
# Try to connect to Snowflake
connection = snowflake.connector.connect(user=USER, password=PASSWD, account=ACCOUNT, warehouse=WAREHOUSE, database=DATABASE)
logging.info('Connection Successful')
except Exception as e:
raise e
logging.info('Try block for query_snowflake started.')
try:
# Call function to execute copy into
output_list = query_snowflake(req_body, connection, copy_sql_statement) #return queryid and sessionid from Snowflake
queryid = output_list[0]
sessionid = output_list[1]
result['Query_ID']= queryid
result['Session_ID']= sessionid
logging.info('Query sent to Snowflake Successfully.')
return func.HttpResponse(json.dumps(result), status_code = 200)
except Exception as e:
result['Status'] = 'Failed'
result['Error_Message'] = str(e)
logging.info('Copy Into function failed. Error: ' + str(e))
return func.HttpResponse(json.dumps(result),
status_code=400)
def create_sql_statement(req_body):
# Replace TaskKey and CDCMin
copy_sql_statement = req_body['InsertSQL'].replace('#TaskKey', req_body['TaskKey']).replace('#CDCMinDate', req_body['CDCMinDate']).replace('#CDCMaxDate', req_body['CDCMaxDate'])
return copy_sql_statement
def query_snowflake(req_body, connection, copy_sql_statement):
try:
# Execute copy into statement
cur = connection.cursor()
sessionid = cur.execute("select current_session()").fetchone()
cur.execute(copy_sql_statement, _no_results=True)
#connection.execute('COMMIT;')
return [cur.sfqid, sessionid[0]] #return queryid and sessionid as list for result body
except Exception as e:
raise e
#finally:
# Close and dispose connection
cur.close()
connection.close()
NEW CODE:
import logging
import json
import time
import gc
from flatten_json import flatten
import os
import snowflake.connector
import azure.functions as func
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
# Deserialize request body
req_body = req.get_json()
logging.info('Deserialized input successfully. Request body: ' + json.dumps(req_body))
#Create result JSON to be returned to caller
result = {
"TaskName":req_body['TaskName'],
"Status":"Complete",
"TaskKey": req_body['TaskKey'],
"Query_ID":"",
"Session_ID":"",
"Error_Message":""
}
#Create the Snowflake parameters for connection
USER = <sfusername>
PASSWD = <sfpw>
ACCOUNT = <sfAcc>
WAREHOUSE = <sfwh>
DATABASE = <sfdb>
logging.info('Connection string created')
copy_sql_statement = create_sql_statement(req_body)
logging.info('SQL Statement: ' + copy_sql_statement)
logging.info('Attempting to Connect to Snowflake...')
try:
# Try to connect to Snowflake
connection = snowflake.connector.connect(user=USER, password=PASSWD, account=ACCOUNT, warehouse=WAREHOUSE, database=DATABASE)
logging.info('Connection Successful')
except Exception as e:
raise e
logging.info('Try block for send query started.')
try:
# Call function to execute copy into
logging.info('Sending Query to Snowflake...')
output_list = query_snowflake(req_body, connection, copy_sql_statement) #return queryid and sessionid from Snowflake
queryid = output_list[0]
sessionid = output_list[1]
result['Query_ID']= queryid
result['Session_ID']= sessionid
logging.info('Ensuring Query was Sent...')
status_stmt = create_status_statement(queryid, sessionid)
for x in range(1,14): #it will try for 3.5min in case query is pending
time.sleep(5)
returnValues = get_query_status(status_stmt, connection)
# check result, if error code 604 we know the query canceled.
if returnValues[1] == '604':
result['Status'] = 'Failed'
result['Error_Message'] = 'SQL Execution Canceled'
return func.HttpResponse(json.dumps(result), status_code = 400)
# if its anything but pending, we know the query was sent to snowflake
# 2nd Function worries about the result
elif returnValues[0] != 'PENDING':
result['Status'] = returnValues[0]
logging.info('Query sent to Snowflake Successfully.')
return func.HttpResponse(json.dumps(result), status_code = 200)
else:
logging.info('Loop ' + str(x) + ' completed, trying again...')
time.sleep(10)
#if it exits for loop, mark success, let 2nd function surface any failures.
result['Status'] = 'Success'
return func.HttpResponse(json.dumps(result), status_code = 200)
except Exception as e:
result['Status'] = 'Failed'
result['Error_Message'] = str(e)
logging.info('Copy Into function failed. Error: ' + str(e))
return func.HttpResponse(json.dumps(result),
status_code=400)
def create_sql_statement(req_body):
# Replace TaskKey and CDCMin
copy_sql_statement = req_body['InsertSQL'].replace('#TaskKey', req_body['TaskKey']).replace('#CDCMinDate', req_body['CDCMinDate']).replace('#CDCMaxDate', req_body['CDCMaxDate'])
return copy_sql_statement
def query_snowflake(req_body, connection, copy_sql_statement):
try:
# Execute copy into statement
cur = connection.cursor()
sessionid = cur.execute("select current_session()").fetchone()
cur.execute(copy_sql_statement, _no_results=True)
# return queryid and sessionid as list for result body
return [cur.sfqid, sessionid[0]]
except Exception as e:
raise e
def create_status_statement(queryid, sessionid):
sql_statement = "SELECT execution_status, error_code, query_id \
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY_BY_SESSION(SESSION_ID => " + sessionid + ")) \
WHERE QUERY_ID = '" + queryid + "'"
return sql_statement
def get_query_status(etl_sql_statement, conn):
QueryStatus = ''
ErrorCode = ''
QueryID = ''
try:
#Execute sql statement.
cur = conn.cursor()
Result = cur.execute(etl_sql_statement)
row = Result.fetchone()
if row is None:
ErrorCode = 'PENDING'
else:
QueryStatus = str(row[0])
ErrorCode = str(row[1])
QueryID = str(row[2])
except Exception as e:
logging.info('Failed to get query status. Error: ' + str(e))
raise e
finally:
#Close and dispose cursor
cur.close()
return (QueryStatus, ErrorCode, QueryID)

How to read a CSV file from s3 and write the content in RDS database table using python lambda function?

I have a CSV file Employee.csv in the S3 bucket with all info about employee: name, age, salary, designation.
I have to write a python lambda function to read this file and write in RDS db such as it should create a table as Employee, with columns name, age, salary, designation and rows will have the data.
The Employee.csv is just for example, actually it can be any csv file with any number of columns in it.
from __future__ import print_function
import boto3
import logging
import os
import sys
import uuid
import pymysql
import csv
import rds_config
rds_host = rds_config.rds_host
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name
logger = logging.getLogger()
logger.setLevel(logging.INFO)
try:
conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
except Exception as e:
logger.error("ERROR: Unexpected error: Could not connect to MySql instance.")
logger.error(e)
sys.exit()
logger.info("SUCCESS: Connection to RDS mysql instance succeeded")
s3_client = boto3.client('s3')
def handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
s3_client.download_file(bucket, key,download_path)
csv_data = csv.reader(file( download_path))
with conn.cursor() as cur:
for idx, row in enumerate(csv_data):
logger.info(row)
try:
cur.execute('INSERT INTO target_table(name, age, salary, designation)' \
'VALUES("%s", "%s", "%s", "%s")'
, row)
except Exception as e:
logger.error(e)
if idx % 100 == 0:
conn.commit()
conn.commit()
return 'File loaded into RDS:' + str(download_path)
Here is the code which is working for me now:
s3 = boto3.resource('s3')
file_object=event['Records'][0]
key=str(file_object['s3']['object']['key'])
obj = s3.Object(bucket, key)
content_lines=obj.get()['Body'].read().decode('utf-8').splitlines(True)
tableName= key.strip('folder/').strip('.csv')
with conn.cursor() as cur:
try:
cur.execute('TRUNCATE TABLE '+tableName)
except Exception as e:
print("ERROR: Unexpected error:Table does not exit.")
sys.exit()
header=True
for row in csv.reader(content_lines):
if(header):
numberOfColumns=len(row)
columnNames= str(row).replace('[','').replace(']','').replace("'",'')
print("columnNames:"+columnNames)
values='%s'
numberOfValues=len(values)
numberOfValues=1
while numberOfValues< numberOfColumns:
values=values+",%s"
numberOfValues+=1
print("INSERT into "+tableName+"("+columnNames+") VALUES("+values+")")
header=False
else:
try:
cur.execute('INSERT into '+tableName+'('+columnNames+') VALUES('+values+')', row)
except Exception as e:
raise e
conn.commit()

AWS Lambda RDS MySQL DB Connection InterfaceError

When I try to connect to AWS RDS (MySQL), most of the time I receive an InterfaceError. When I edit the Lambda code and re-run, it will work fine the first time, but then the same error occurs.
My code:
import sys
import logging
import pymysql
import json
import traceback
rds_host = "*****.rds.amazonaws.com"
name = "*****"
password = "****"
db_name = "myDB"
logger = logging.getLogger()
logger.setLevel(logging.INFO)
try:
conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
except:
logger.error("ERROR: Unexpected error: Could not connect to MySql instance.")
sys.exit()
logger.info("SUCCESS: Connection to RDS mysql instance succeeded")
def handler(event, context):
sub = event['sub']
username = event['username']
givenname = event['givenname']
isAdmin = event['isAdmin']
print (sub)
print (username)
print (givenname)
print (isAdmin)
data = {}
cur = conn.cursor()
try:
cmd = "SELECT AuthState FROM UserInfo WHERE UserName=" + "\'" + username + "\'"
rowCnt = cur.execute(cmd)
print (cmd)
except:
print("ERROR: DB Query Execution failed.")
traceback.print_exc()
data['errorMessage'] = 'Internal server error'
response = {}
response['statusCode'] = 500
response['body'] = data
return response
if rowCnt <= 0:
print (username)
data['errorMessage'] = 'No User Name Found'
response = {}
response['statusCode'] = 400
response['body'] = data
conn.close()
return response
for row in cur:
print row[0]
if int(row[0]) == 0:#NOT_AUTHORIZED
ret = "NOT_AUTHORIZED"
elif int(row[0]) == 1:#PENDING
ret = "PENDING"
elif int(row[0]) == 2:#AUTHORIZED
ret = "AUTHORIZED"
else:#BLOCKED
ret = "BLOCKED"
data['state'] = ret
response = {}
response['statusCode'] = 200
response['body'] = data
conn.close()
return response
The stacktrace:
Traceback (most recent call last):
File "/var/task/app.py", line 37, in handler
File "/var/task/pymysql/connections.py", line 851, in query
self._execute_command(COMMAND.COM_QUERY, sql)
File "/var/task/pymysql/connections.py", line 1067, in _execute_command
raise err.InterfaceError("(0, '')")
InterfaceError: (0, '')
Read Understanding Container Reuse in Lambda.
It was written about Node but is just as accurate for Python.
Your code doesn't run from the top with each invocation. Sometimes it starts with the handler.
Why? It's faster.
How do you know when this will happen? You don't... except for each time you redeploy the function, of course, you'll always get a fresh container on the first invocation, because the old containers would have been abandoned by the redeploy.
If you're going to do your DB connection outside the handler, don't call conn.close(), because on the next invocation of the function, you might find your container is still alive, and the handler is invoked with an already-closed database handle.
You have to write Lambda functions so that they neither fail if a container is reused, nor fail if a container is not reused.
The simpler solution is to open the DB connection inside the handler. The more complex but also more optimal solution (in terms of runtime) is to never close it, so that it can potentially be reused.

Categories