I have a simple lambda function which prints an event and then attempts to insert a row into a database. It runs with no error, but does not execute all of the code.
event gets printed, but the row never gets inserted into the table. Anything, even a print statement I put after connection doesn't get executed. I'm guessing something is wrong with the connection, but as far as I know I have no way of telling what is wrong. Are there more logs somewhere? In CloudWatch I see at the end it says Task timed out after 3.00 seconds
import boto3
import psycopg2
s3 = boto3.client('s3')
def insert_data(event=None, context=None):
print(event)
connection = psycopg2.connect(user="xxxx", password="xxxx",
host="xxxx", port="xx",
database="xxxx")
cursor = connection.cursor()
postgres_insert_query = "INSERT INTO dronedata (name,lat,long,other) VALUES ('img2','54','43','from lambda')"
cursor.execute(postgres_insert_query)
connection.commit()
count = cursor.rowcount
print(count, "Record inserted successfully into mobile table")
The typical security setup is:
A security group on the AWS Lambda function (Lambda-SG) that permits all outbound access (no need for inbound rules)
A security group on the database (either an EC2 instance or Amazon RDS) (DB-SG) that permits inbound access on the appropriate port from Lambda-SG
That is, DB-SG should specifically reference Lambda-SG in its inbound rules.
Yes, you have to increase default Timeout from 3 seconds to more:
Timeout – The amount of time that Lambda allows a function to run before stopping it. The default is 3 seconds. The maximum allowed value is 900 seconds.
hence psycopg2 is an external lib, please upload that lib along with your code into your Lambda Function. So Issue is, it is not able to connect, that's why you are facing a timeout issue.
Related
Which library is best to use among "boto3" and "Psycopg2" for redshift operations in python lambda functions:
Lookup for a table in redshift cluster
Create a table in redshift cluster
Insert data in redshift cluster
I would appretiate if i am answered with following:
python code for either of the library that addresses all of the above 3 needs.
Thanks in Advance!!
Connecting directly to Redshift from Lambda with psycopg2 is the simpler, more straight-forward way to go but comes with a significant limitation. Lambda functions have run-time limits and even if your SQL commands don't exceed the max run-time, you will be paying for the Lambda function to wait for Redshift to complete the SQL. For fast-running SQL commands things run quickly and this isn't a problem but inserting data can take some time depending on the amount of data.
If all your Redshift actions are less than a few seconds (and won't grow longer with time) then psycopg2 connecting directly to Redshift is likely the way to go. If the data insert takes a minute or 2 BUT this process doesn't run very often (daily) then psycopg2 may still be the way to go as Lambda isn't very expensive when run in frequently. It is a process simplicity vs. cost calculation.
Using Redshift Data API is more complicated. This process lets you fire the SQL to Redshift and terminate the Lambda. A later running Lambda checks to see if the SQL has completed and the results of the SQL are checked. The SQL not completing means that Lambda needs to be invoke at a later time to see if things are complete. This polling process often is done by a Step Function and a set of different Lambda functions. Not super difficult but a level of complexity above a single Lambda. Since this is a polling process there is a wait time between checks for results which if too long leads to latency and if too short over-polling and additional costs.
If you need to have Data API for time-out reasons then you may want to use both psycopg2 for short running queries to the database - like 'does this table exist?'. Use Data API for long-running steps like 'insert this 1TB set of data into Redshift'.
Sample basic python code for all three operations using boto3.
import json
import boto3
clientdata = boto3.client('redshift-data')
# looks up table and returns true if found
def lookup_table(table_name):
response = clientdata.list_tables(
ClusterIdentifier='redshift-cluster-1',
Database='dev',
DbUser='awsuser',
TablePattern=table_name
)
print(response)
if ( len(response['Tables']) == 0 ):
return False
else:
return True
# creates table with one integer column
def create_table(table_name):
sqlstmt = 'CREATE TABLE '+table_name+' (col1 integer);'
print(sqlstmt)
response = clientdata.execute_statement(
ClusterIdentifier='redshift-cluster-1',
Database='dev',
DbUser='awsuser',
Sql=sqlstmt,
StatementName='CreateTable'
)
print(response)
# inserts one row with integer value for col1
def insert_data(table_name, dval):
print(dval)
sqlstmt = 'INSERT INTO '+table_name+'(col1) VALUES ('+str(dval)+');'
response = clientdata.execute_statement(
ClusterIdentifier='redshift-cluster-1',
Database='dev',
DbUser='awsuser',
Sql=sqlstmt,
StatementName='InsertData'
)
print(response)
result = lookup_table('date')
if ( result ):
print("Table exists.")
else:
print("Table does not exist!")
create_table("testtab")
insert_data("testtab", 11)
I am not using Lambda, instead executing it just from my shell. Hope this helps. Assuming credentials and default region are already set up for the client.
From the Best Practices for Working with AWS Lambda Functions:
Take advantage of execution context reuse to improve the performance of your function. Initialize SDK clients and database connections outside of the function handler, [...]
I would like to implement this principle to improve my lambda function, where a database handle is initialized and closed every time the function is invocated. Take the following example:
def lambda_handler(event, context):
# Open a connection to the database
db_handle = connect_database()
# Do something with the database
result = perform_actions(db_handle)
# Clean up, close the connection
db_handle.close()
# Return the result
return result
From my understanding of the AWS documentation, the code should be optimized as follows:
# Initialize the database connection outside the handler
db_handle = conn_database()
def lambda_handler(event, context):
# Do something with the database and return the result
return perform_actions(db_handle)
This would result in the db_handle.close() method not being called, thus potentially leaking a connection.
How should I handle the cleanup of such resources when using AWS Lambda with Python?
Many people looking for the same thing with you. I believe it is impossible at this time. But we could handle the issue from the database side.
Take a look at this one
The connection leak would only happen while the Lambda execution environment is alive; in other words the connection would timeout (be closed) after the execution environment is destroyed.
Whether a global connection object is worth implementing depends on your particular use case:
- how much of the total execution time is taken by the database
initialization
- how often your function is called
- how do you handle database connection errors
If you want to have a bit more control of the connection you can try this approach which recycles the database connection every two hours or when encountering a database-related exception:
# Initialize the global object to hold database connection and timestamp
db_conn = {
"db_handle": None,
"init_dt": None
}
def lambda_handler(event, context):
# check database connection
if not db_conn["db_handle"]:
db_conn["db_handle"] = connect_database()
db_conn["init_dt"] = datetime.datetime.now()
# Do something with the database and return the result
try:
result = do_work(db_conn["db_handle"])
except DBError:
try:
db_conn["db_handle"].close()
except:
pass
db_conn["db_handle"] = None
return "db error occured"
# check connection age
if datetime.datetime.now() - db_conn["init_dt"] > datetime.timedelta(hours=2):
db_conn["db_handle"].close()
db_conn["db_handle"] = None
return result
Please note I haven't tested the above on Lambda so you need to check it with your setup.
I have several lambda functions. I need to scrape my logs generated from all of my lambda functions and load to our internal data warehouse. I thought of these solutions.
Have a lambda function subscribed to my lambda function's cloudwatch log groups and polish and log messages and push it to s3.
Pros: Works and simple to implement.
Cons: There is no way for me to
"replay". Say My exporter failed for some reason. I wouldn't be able
to replay this action.
Have a lambda function that runs every 10 min or so and creates export task and scrapes logs from cloudwatch and loads them to s3.
import boto3
client = boto3.client('logs')
response = client.create_export_task(
taskName='export_task',
logGroupName='/aws/lambda/<lambda_function_1>',
fromTime=from_time,
to=to_time,
destination='<application_logs>',
destinationPrefix='<lambda_function_1>'
)
response = client.create_export_task(
taskName='export_task',
logGroupName='/aws/lambda/<lambda_function_2>',
fromTime=from_time,
to=to_time,
destination='<application_logs>',
destinationPrefix='<lambda_function_2>'
)
Second create_export_task fails here
An error occurred (LimitExceededException) when calling the
CreateExportTask operation: Resource limit exceeded."
I cant create multiple export task. Is there a way to address this?
From AWS docs: One active (running or pending) export task at a time, per account. This limit cannot be changed.
U can use the below function to check if the status has been changed to 'COMPLETED'
response = client.create_export_task(
taskName='export_cw_to_s3',
logGroupName='/ecs/',
logStreamNamePrefix=org_id,
fromTime=int((yesterday-unix_start).total_seconds() * 1000),
to=int((today-unix_start).total_seconds() * 1000),
destination='test-bucket',
destinationPrefix=f'random-string/{today.year}/{today.month}/{today.day}/{org_id}')
taskId = (response['taskId'])
status = 'RUNNING'
while status in ['RUNNING','PENDING']:
response_desc = client.describe_export_tasks(
taskId=taskId
)
status = response_desc['exportTasks'][0]['status']['code']
Came across the same error message and the reason is you can only have one running/pending export task per account at a given time hence this task is failing. From AWS docs: One active (running or pending) export task at a time, per account. This limit cannot be changed.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/cloudwatch_limits_cwl.html
Sometimes one createExport task stays in pending state for long preventing other lambda functions with the same task to run. You could see this task and cancel it allowing the other functions to run.
Using psycopg2 package with python 2.7 I keep getting the titled error: psycopg2.DatabaseError: SSL SYSCALL error: EOF detected
It only occurs when I add a WHERE column LIKE ''%X%'' clause to my pgrouting query. An example:
SELECT id1 as node, cost FROM PGR_Driving_Distance(
'SELECT id, source, target, cost
FROM edge_table
WHERE cost IS NOT NULL and column LIKE ''%x%'' ',
1, 10, false, false)
Threads on the internet suggest it is an issue with SSL intuitively, but whenever I comment out the pattern matching side of things the query and connection to the database works fine.
This is on a local database running Xubuntu 13.10.
After further investigation: It looks like this may be cause by the pgrouting extension crashing the database because it is a bad query and their are not links which have this pattern.
Will post an answer soon ...
The error: psycopg2.operationalerror: SSL SYSCALL error: EOF detected
The setup: Airflow + Redshift + psycopg2
When: Queries take a long time to execute (more than 300 seconds).
A socket timeout occurs in this instance. What solves this specific variant of the error is adding keepalive arguments to the connection string.
keepalive_kwargs = {
"keepalives": 1,
"keepalives_idle": 30,
"keepalives_interval": 5,
"keepalives_count": 5,
}
conection = psycopg2.connect(connection_string, **keepalive_kwargs)
Redshift requires a keepalives_idle of less than 300. A value of 30 worked for me, your mileage may vary. It is also possible that the keepalives_idle argument is the only one you need to set - but ensure keepalives is set to 1.
Link to docs on postgres keepalives.
Link to airflow doc advising on 300 timeout.
I ran into this problem when running a slow query in a Droplet on a Digital Ocean instance. All other SQL would run fine and it worked on my laptop. After scaling up to a 1 GB RAM instance instead of 512 MB it works fine so it seems that this error could occur if the process is running out of memory.
Very similar answer to what #FoxMulder900 did, except I could not get his first select to work. This works, though:
WITH long_running AS (
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '1 minutes'
and state = 'active'
)
SELECT * from long_running;
If you want to kill the processes from long_running just comment out the last line and insert SELECT pg_cancel_backend(long_running.pid) from long_running ;
This issue occurred for me when I had some rogue queries running causing tables to be locked indefinitely. I was able to see the queries by running:
SELECT * from STV_RECENTS where status='Running' order by starttime desc;
then kill them with:
SELECT pg_terminate_backend(<pid>);
I encountered the same error. By CPU, RAM usage everything was ok, solution by #antonagestam didn't work for me.
Basically, the issue was at the step of engine creation. pool_pre_ping=True solved the problem:
engine = sqlalchemy.create_engine(connection_string, pool_pre_ping=True)
What it does, is that each time when the connection is being used, it sends SELECT 1 query to check the connection. If it is failed, then the connection is recycled and checked again. Upon success, the query is then executed.
sqlalchemy docs on pool_pre_ping
In my case, I had the same error in python logs. I checked the log file in /var/log/postgresql/, and there were a lot of error messages could not receive data from client: Connection reset by peer and unexpected EOF on client connection with an open transaction. This can happen due to network issues.
In my case that was OOM killer (query is too heavy)
Check dmesg:
dmesg | grep -A2 Kill
In my case:
Out of memory: Kill process 28715 (postgres) score 150 or sacrifice child
I got this error running a large UPDATE statement on a 3 million row table. In my case it turned out the disk was full. Once I had added more space the UPDATE worked fine.
You may need to express % as %% because % is the placeholder marker. http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries
I have a script that waits until some row in a db is updated:
con = MySQLdb.connect(server, user, pwd, db)
When the script starts the row's value is "running", and it waits for the value to become "finished"
while(True):
sql = '''select value from table where some_condition'''
cur = self.getCursor()
cur.execute(sql)
r = cur.fetchone()
cur.close()
res = r['value']
if res == 'finished':
break
print res
time.sleep(5)
When I run this script it hangs forever. Even though I see the value of the row has changed to "finished" when I query the table, the printout of the script is still "running".
Is there some setting I didn't set?
EDIT: The python script only queries the table. The update to the table is carried out by a tomcat webapp, using JDBC, that is set on autocommit.
This is an InnoDB table, right? InnoDB is transactional storage engine. Setting autocommit to true will probably fix this behavior for you.
conn.autocommit(True)
Alternatively, you could change the transaction isolation level. You can read more about this here:
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html
The reason for this behavior is that inside a single transaction the reads need to be consistent. All consistent reads within the same transaction read the snapshot established by the first read. Even if you script only reads the table this is considered a transaction too. This is the default behavior in InnoDB and you need to change that or run conn.commit() after each read.
This page explains this in more details: http://dev.mysql.com/doc/refman/5.0/en/innodb-consistent-read.html
I worked around this by running
c.execute("""set session transaction isolation level READ COMMITTED""")
early on in my reading session. Updates from other threads do come through now.
In my instance I was keeping connections open for a long time (inside mod_python) and so updates by other processes weren't being seen at all.