AWS Lambda Connection To MySQL DB on RDS Times Out Intermittently

AWS Lambda Connection To MySQL DB on RDS Times Out Intermittently - python

To get my feet wet with deploying Lambda functions on AWS, I wrote what I thought was a simple one: calculate the current time and write it to a table in RDS. A separate process triggers the lambda every 5 minutes. For a few hours it will work as expected, but after some time the connection starts to hang. Then, after a few hours, it will magically work again. Error message is
(pymysql.err.OperationalError) (2003, \"Can't connect to MySQL server on '<server info redacted>' (timed out)\")
(Background on this error at: http://sqlalche.me/e/e3q8)
The issue is not with the VPC, or else the lambda wouldn't run at all I don't think. I have tried defining the connection outside the lambda handler (as many suggest), inside the handler, closing the connection after every run; now, I have the main code running in a separate helper function the lambda handler calls. The connection is created, used, closed, and even explicitly deleted within a try/except block. Still, the intermittent connection issue persists. At this point, I am not sure what else to try. Any help would be appreciated; thank you! Code is below:
import pandas as pd
import sqlalchemy
from sqlalchemy import event as eventSA
# This function is something I added to fix an issue with writing floats
def add_own_encoders(conn, cursor, query, *args):
cursor.connection.encoders[np.float64] = lambda value, encoders: float(value)
def writeTestRecord(event):
try:
connStr = "mysql+pymysql://user:pwd#server.host:3306/db"
engine = sqlalchemy.create_engine(connStr)
eventSA.listen(engine, "before_cursor_execute", add_own_encoders)
conn = engine.connect()
timeRecorded = pd.Timestamp.now().tz_localize("GMT").tz_convert("US/Eastern").tz_localize(None)
s = pd.Series([timeRecorded,])
s=s.rename("timeRecorded")
s.to_sql('lambdastest',conn,if_exists='append',index=False,dtype={'timeRecorded':sqlalchemy.types.DateTime()})
conn.close()
del conn
del engine
return {
'success' : 'true',
'dateTimeRecorded' : timeRecorded.strftime("%Y-%m-%d %H:%M:%S")
}
except:
conn.close()
del conn
del engine
return {
'success' : 'false'
}
def lambda_handler(event, context):
toReturn = writeTestRecord(event)
return toReturn

Related

Blob storage trigger timing out anywhere from ~10 seconds to a couple minutes

I'm getting quite a few timeouts as my blob storage trigger is running. It seems to timeout whenever I'm inserting values into an Azure SQL DB. I have the functionTimeout parameter set in the host.json to "functionTimeout": "00:40:00", although I'm seeing timeouts happen within a couple of minutes. Why would this be the case? My function app is on ElasticPremium pricing tier.
System.TimeoutException message:
Exception while executing function: Functions.BlobTrigger2 The operation has timed out.
My connection to the db (I close it at the end of the script):
# urllib.parse.quote_plus for python 3
params = urllib.parse.quote_plus(fr'Driver={DRIVER};Server=tcp:{SERVER_NAME},1433;Database=newTestdb;Uid={USER_NAME};Pwd={PASSWORD};Encrypt=yes;TrustServerCertificate=no;Connection Timeout=0;')
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine_azure = create_engine(conn_str,echo=True)
conn = engine_azure.connect()
This is the line of code that is run before the timeout happens (Inserting to db):
processed_df.to_sql(blob_name_file.lower(), conn, if_exists = 'append', index=False, chunksize=500)

SQLalchemy fetch via pandas not completed when running in airflow env but when started manually

I have a function that connects to a mysql db and executes a query, that takes quite long (approx. 10 min)
def foo(connections_string): # connection_string something like "mysql://user:key#jost/db"
statement = "SELECT * FROM largtable"
conn = None
df = None
try:
engine = sqlalchemy.create_engine(
connections_string,
connect_args={
"connect_timeout": 1500,
},
poolclass = QueuePool,
pool_pre_ping = True,
pool_size = 10,
pool_recycle=3600,
pool_timeout = 900,
)
conn = engine.connect()
df = pd.read_sql_query(statement, conn)
except Exception:
raise Exception("could not load data")
finally:
if conn:
conn.close()
return df
When I run this in my local envionment, it works and takes about 600 seconds. When I run this via airflow, it fails after about 5 to 6 Mins with the error (_mysql_exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query')
I have tried the suggestions on stakoverflow to adjust the timeout of sqlalchemy (e.g., this and this) and from the sqlalchemy docs, which lead to the additional args (pool_ and connection_args) for the create_engine() function. However, these didn't seem to have any effect at all.
I've also tried to replace sqlalchemy with pymysql, which lead to the same error on airflow. Thus, I didn't try flask-sqlalchemy yet, since I expect the same result.
Since it works in the basically same environment (py version 3.7.x, sqlalchemy 1.3.3 and pandas 1.3.x) if not run by airflow but doesn't when run by airflow, I think there is some global variable, that overrules my timeout settings. But I have no idea where to start the search.
And some additional info, b/c somebody could work with the info: I got it running with airflow twice now in off-hours (5 am and sundays). But not again since.
PS: unfortunately, pagination as suggested here is not an option, since the query runtime results from transformations and calculations.

How to make the copy command continue its run in redshift even after the lambda function which initiated it has timed out?

I am trying to run a copy command which loads around 100 GB of data from S3 to redshift. I am using the lambda function to initiate this copy command every day. This is my current code
from datetime import datetime, timedelta
import dateutil.tz
import psycopg2
from config import *
def lambda_handler(event, context):
con = psycopg2.connect(dbname=dbname, user=user, password=password, host=host, port=port)
cur = con.cursor()
try:
query = """BEGIN TRANSACTION;
COPY """ + table_name + """ FROM '""" + intermediate_path + """' iam_role '""" + iam_role + """' FORMAT AS parquet;
END TRANSACTION;"""
print(query)
cur.execute(query)
except Exception as e:
subject = "Error emr copy: {}".format(str(datetime.now().date()))
body = "Exception occured " + str(e)
print(body)
con.close()
This function is running fine but the only problem is, after the 15 min timeout of the lambda function, the copy command also stops executing in reshift. Therefore, I cannot finish my copy loading from s3 to redshift.
I also tried to include the statement_timeout statement below after the begin statement and before the copy command. It didn't help.
SET statement_timeout to 18000000;
Can someone suggest how do I solve this issue?

The AWS documentation isn't explicit about what happens when timeout occurs. But I think it's safe to say that it transitions into the "Shutdown" phase, at which point the runtime container is forcibly terminated by the environment.
What this means is that the socket connection used by the database connection will be closed, and the Redshift process that is listening to that socket will receive an end-of-file -- a client disconnect. The normal behavior of any database in this situation is to terminate any outstanding queries and rollback their transactions.
The reason that I gave that description is to let you know that you can't extend the life of a query beyond the life of the Lambda that initiates that query. If you want to stick with using a database connection library, you will need to use a service that doesn't timeout: AWS Batch or ECS are two options.
But, there's a better option: the Redshift Data API, which is supported by Boto3.
This API operates asynchronously: you submit a query to Redshift, and get a token that can be used to check the query's operation. You can also instruct Redshift to send a message to AWS Eventbridge when the query completes/fails (so you can create another Lambda to take appropriate action).

I recommend using Redshift Data API in lambda to load data into Redshift from S3.
You can get rid of psycopgs2 package and use built-in boto3 package in lambda.
This will run copy query asynchronously and lambda function won't take more than a few seconds to run it.
I use sentry_sdk to get notifications of runtime error from lambda.
import boto3
import sentry_sdk
from sentry_sdk.integrations.aws_lambda import AwsLambdaIntegration
sentry_sdk.init(
"https://aaaaaa#aaaa.ingest.sentry.io/aaaaaa",
integrations=[AwsLambdaIntegration(timeout_warning=True)],
traces_sample_rate=0
)
def execute_redshift_query(sql):
data_client = boto3.client('redshift-data')
data_client.execute_statement(
ClusterIdentifier='redshift-cluster-test',
Database='db',
DbUser='db_user',
Sql=sql,
StatementName='Test query',
WithEvent=True,
)
def handler(event, context):
query = """
copy schema.test_table
from 's3://test-bucket/test.csv'
IAM_ROLE 'arn:aws:iam::1234567890:role/TestRole'
region 'us-east-1'
ignoreheader 1 csv delimiter ','
"""
execute_redshift_query(query)
return True
And another lambda function to send error notification if copy query fails.
You can add EventBridge lambda trigger using the rule in screenshot below.
Here is lambda code to send error notification.
import sentry_sdk
from sentry_sdk.integrations.aws_lambda import AwsLambdaIntegration
sentry_sdk.init(
"https://aaaa#aaa.ingest.sentry.io/aaaaa",
integrations=[AwsLambdaIntegration(timeout_warning=True)],
traces_sample_rate=0
)
def lambda_handler(event, context):
try:
if event["detail"]["state"] != "FINISHED":
raise ValueError(str(event))
except Exception as e:
sentry_sdk.capture_exception(e)
return True
You can identify which copy query failed by using StatementName defined in the first lambda function.
Hope it is helpful.

Connecting to MySQL DB using mysql.connector.connect fails with no error to catch

I'm using python to try and connect to a DB. This code worked and something in my environment changed so that the host in not present/accessible. This is as expected. The thing that I'm trying to work out is, I can't seem to catch the error of this happening. This is my code:
def create_db_connection(self):
try:
message('try...')
DB_HOST = os.environ['DB_HOST']
DB_USERNAME = os.environ['DB_USERNAME']
DB_PASSWORD = os.environ['DB_PASSWORD']
message('connecting...')
db = mysql.connector.connect(
host=DB_HOST,
user=DB_USERNAME,
password=DB_PASSWORD,
auth_plugin='mysql_native_password'
)
message('connected...')
return db
except mysql.connector.Error as err:
log.info('bad stuff happened...')
log.info("Something went wrong: {}".format(err))
message('exception connecting...')
except Exception as ex:
log.info('something bad happened')
message("Exception: {}".format(ex))
message('returning false connection...')
return False
I see up to the message('connecting...') call, but nothing afterwards. Also, I don't see any of the except messages/logs at all.
Is there something else I need to catch/check in order to know that a DB connection attempt has failed?
This is running inside an AWS Lambda and was working until I changed some subnets/etc. The key thing is I want to catch it no longer being able to connect.

The issue is most likely that your lambda function is timing out before the database connection is timing out.
First, modify the lambda function to execute for 60 seconds and test. You should find after about 30 seconds you will see the connection to the database timeout.
To resolve this issue, modify the security group on the database instance to include the security group configured for lambda. Use this entry to open a the correct port 3306

Set database connection timeout in Python

I'm creating a RESTful API which needs to access the database. I'm using Restish, Oracle, and SQLAlchemy. However, I'll try to frame my question as generically as possible, without taking Restish or other web APIs into account.
I would like to be able to set a timeout for a connection executing a query. This is to ensure that long running queries are abandoned, and the connection discarded (or recycled). This query timeout can be a global value, meaning, I don't need to change it per query or connection creation.
Given the following code:
import cx_Oracle
import sqlalchemy.pool as pool
conn_pool = pool.manage(cx_Oracle)
conn = conn_pool.connect("username/p4ss#dbname")
conn.ping()
try:
cursor = conn.cursor()
cursor.execute("SELECT * FROM really_slow_query")
print cursor.fetchone()
finally:
cursor.close()
How can I modify the above code to set a query timeout on it?
Will this timeout also apply to connection creation?
This is similar to what java.sql.Statement's setQueryTimeout(int seconds) method does in Java.
Thanks

for the query, you can look on timer and conn.cancel() call.
something in those lines:
t = threading.Timer(timeout,conn.cancel)
t.start()
cursor = conn.cursor()
cursor.execute(query)
res = cursor.fetchall()
t.cancel()

In linux see /etc/oracle/sqlnet.ora,
sqlnet.outbound_connect_timeout= value
also have options:
tcp.connect_timeout and sqlnet.expire_time, good luck!

You could look at setting up PROFILEs in Oracle to terminate the queries after a certain number of logical_reads_per_call and/or cpu_per_call

Timing Out with the System Alarm
Here's how to use the operating system timout to do this. It's generic, and works for things other than Oracle.
import signal
class TimeoutExc(Exception):
"""this exception is raised when there's a timeout"""
def __init__(self): Exception.__init__(self)
def alarmhandler(signame,frame):
"sigalarm handler. raises a Timeout exception"""
raise TimeoutExc()
nsecs=5
signal.signal(signal.SIGALRM, alarmhandler) # set the signal handler function
signal.alarm(nsecs) # in 5s, the process receives a SIGALRM
try:
cx_Oracle.connect(blah blah) # do your thing, connect, query, etc
signal.alarm(0) # if successful, turn of alarm
except TimeoutExc:
print "timed out!" # timed out!!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

AWS Lambda Connection To MySQL DB on RDS Times Out Intermittently - python

Related

Blob storage trigger timing out anywhere from ~10 seconds to a couple minutes

SQLalchemy fetch via pandas not completed when running in airflow env but when started manually

How to make the copy command continue its run in redshift even after the lambda function which initiated it has timed out?

Connecting to MySQL DB using mysql.connector.connect fails with no error to catch

Set database connection timeout in Python

Categories

Resources