AWS Lambda Function duration spikes caused by https connection timeouts - python

I have a Flask API running on API Gateway and Lambda Functions, where my Lambda Functions are configured to run in my VPC.
Normal duration for my Lambda Function should be about 3 seconds, but sometimes it spikes to 130 seconds or more, which causes my API Gateway to return a 504.
The Lambda Function makes a GET request using the requests library:
url = base_url + endpoint
req = requests.get(url, headers=headers)
response = json.loads(req.content.decode('utf-8'))
CloudWatch shows the following error on the request that times out:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='host', port=port): Max retries exceeded with url: /foo/bar (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at foo>: Failed to establish a new connection: [Errno 110] Connection timed out'))
Most all of the posts I have read refer to an incorrectly configured Lambda Function running in a private subnet, but I know that is not my issue since my functions have access to the internet.
My other theory is that a session is getting reused on the function's underlying container, which is causing a timeout.
Thanks for your help in advance!

Since you've set your Lambda to run in your VPC, it's quite possible that a Network ACL is not allowing the traffic.
Your described behavior would be consistent with an ephemeral port being blocked. It would cause the traffic to eventually time out, leading to somewhat random spikes in lambda runtime & a failure for seemingly unknown reasons.
I'm not even sure it would be apparent from VPC flow logs what happened, since it was the ephemeral port that was blocked, not the reserved port, but I'll have to double-check that.
AWS uses ephemeral ports 1024 - 65535. I would take a look at the Network ACLs and double-check that those ports are allowed.

It is quite possible the connection you created in the first invocation of Lambda is being reused when the Lambda container is being reused in the subsequent requests. This connection may have been terminated by the server, but the Lambda container does not have an idea about it. So, it tries to make the new request on the stale connection until it times out.
Possible ways to avoid this -
1. Manage connections appropriately
Create the connection outside the Lambda handler function, but manage it for error handling inside the handler function. Also, consider terminating the connection after completion of execution if your function is not being very frequently invoked.
2. Set a timeout
Every SDK supports a timeout value for connection termination. For example, Python has 60 seconds. That means, your Python request will try to connect on that stale connection for 60 seconds before timing out. Try setting a custom timeout value for your request (something lower so that Lambda doesn't time out) and if an error occurs, catch it and create a new connection. Read more about this here.

Related

Spring-Boot REST endpoint, increase number of retries/connection from a single host Tomcat will accepts

I'm load testing a Spring Boot API, a POC to show the team it will handle high throughput. I'm making the requests with a Python script that uses a multiprocessing pool. When I start sending more than like 10,000 records I get an error that "Max retries exceeded" which I've determined means the endpoint is refusing the connection from the client, because it's making too many connections.
Is there a Tomcat setting to allow more requests from a client (temporarily) for something like load testing? I tried setting "server.tomcat.max-threads" in the applicatin.properties file, but that doesn't seem to help.

pyodbc: How to test whether it's possible to establish connection with SQL server without freezing up

I am writing an app with wxPython that incorporates pyodbc to access SQL Server. A user must first establish a VPN connection before they can establish a connection with the SQL server. In cases where a user forgets to establish a VPN connection or is simply not authorized to access a particular server, the app will freeze for up to 60+ seconds before it produces an error message. Often, users will get impatient and force-close the app before the error message pops up.
I wonder if there is a way to test whether it's possible to connect to the server without freezing up. I thought about using timeout, but it seems that timeout can be used only after I establish a connection
A sample connection string I use is below:
connection = pyodbc.connect(r'DRIVER={SQL Server};SERVER=ServerName;database=DatabaseName;Trusted_Connection=True;unicode_results=True')
See https://code.google.com/archive/p/pyodbc/wikis/Connection.wiki under timeout
Note: This attribute only affects queries. To set the timeout for the
actual connection process, use the timeout keyword of the
pyodbc.connect function.
So change your connection string to:
connection = pyodbc.connect(r'DRIVER={SQL Server};SERVER=ServerName;database=DatabaseName;Trusted_Connection=True;unicode_results=True', timeout=3)
should work
took a while before it threw an error message about server not existing or access being denied
Your comment conflates two very different kinds of errors:
server not existing is a network error. Either the name has no address, or the address is unreachable. No connection can be made.
access being denied is a response from the server. For the server to respond, a connection must exist. This is not to be confused with connection refused (ECONNREFUSED), which means the remote is not accepting connections on the port.
SQL Server uses TCP/IP. You can use standard network functions to determine if the network hostname of the machine running SQL Server can be found, and if the IP address is reachable. One advantage to using them to "pre-test" the connection is that any error you'll get will be much more specific than the typical there was a problem connecting to the server.
Note that not all delay-inducing errors can be avoided. For example, if the DNS server is not responding, the resolver will typically wait 30 seconds before giving up. If an IP address is valid, but there's no machine with that address, attempting a connection will take a long time to fail. There's no way for the client to know there's no such machine; it could just be taking a long time to get a response.

Bluemix Flask API Call Timeout

I have an API written with python flask running on Bluemix. Whenever I send it a request and the API takes more than 120 seconds to respond it times out. It does not return anything and it returns the following error: 500 Error: Failed to establish a backside connection.
I need it to be able to process longer requests as well. Is there any way to extend the timeout value or is there a workaround for this issue?
All Bluemix traffic goes through the IBM WebSphere® DataPower® SOA Appliances, which provide reverse proxy, SSL termination, and load balancing functions. For security reasons DataPower closes inactive connections after 2 minutes.
This is not configurable (as it affects all Bluemix users), so the only solution for your scenario is to change your program to make sure the connection is not idle for more than 2 minutes.

make HTTP request from python and wait a long time for a response

I'm using Python to to access a REST API that sometimes takes a long time to run (more than 5 minutes). I'm using pyelasticsearch to make the request, and tried setting the timeout to 10 minutes like this:
es = ElasticSearch(config["es_server_url"], timeout=600)
results = es.send_request("POST",
[config["es_index"], "_search_with_clusters" ],
cluster_query)
but it times out after 5 minutes (not 10) with requests.exceptions.ConnectionError (Caused by <class 'socket.error'>: [Errno 104] Connection reset by peer)
I tried setting the socket timeout and using requests directly like this:
socket.setdefaulttimeout(600)
try:
r = requests.post(url, data=post, timeout=600)
except:
print "timed out"
and it times out after approximately 5 minutes every time.
How can I make my script wait longer until the request returns?
The err "Connection reset by peer", aka ECONNRESET, means that the server—or some router or proxy between you and the server—closed the connection forcibly.
So, specifying a longer timeout on your end isn't going to make any difference. You need to figure out who's closing the connection and configure it to wait longer.
Plausible places to look are the server application itself, whatever server program drives that application (e.g., if you're using Apache with mod_wsgi, Apache), a load-balancing router or front-end server or reverse proxy in front of that server, or a web proxy in front of your client.
Once you figure out where the problem is, if it's something you can't fix yourself, you may be able to fix it by trickling from the server to the client—have it send something useless but harmless (an HTTP 100, an extra header, some body text that your client knows how to skip over, whatever) every 120 seconds. This may or may not work, depending on what component is hanging up.

Python httplib.HTTPSConnection timeout -- connection vs. response

When creating an HTTPSConnection with httplib, easy enough to set a timeout:
connection = httplib.HTTPSConnection('some.server.com', timeout=10)
connection.request('POST', '/api', xml, headers={'Content-Type': 'text/xml'})
response = connection.getresponse().read()
There are various parts to this operation, e.g. the connection being accepted and a response being received.
Does the timeout apply to the entire operation? Will it still timeout if the remote host accepts the connection but never sends back a response? I want to be sure that setting the timeout ensure that the operation blocks for a maximum of 10 seconds.
Some context:
I am connecting to an external API and want the operation to block. Just not for more than 10 seconds, and if it is blocking for more than 10 seconds, stop blocking and raise an exception. I'm correctly handling the case when an external API is unreachable, but unsure about when it accepts my connection but fails to respond.
It seems the standard library implementation does not support a timeout on the socket read operations. You would have to make the HTTPSConnection (technically the HTTPResponse._safe_read method) non-blocking for this.
There is a similar question here, which might also help:
Does python's httplib.HTTPConnection block?
I would use gevent for the whole application if that's possible in your case, that supports fully non-blocking I/O and you can implement any timeout scheme you want, even for multiple connections at once.

Categories