Flask RESTful API request, Broken pipe [Errno 32] ! - python

I'm new to web development and I'm trying to create a RESTful web service using the Flask micro-framework.
Here is my code:
app = Flask(__name__)
client = MongoClient()
db = client.markets
def toJson(data):
return json.dumps(data, default=json_util.default)
#app.route('/', methods=['GET'])
def get_tasks():
cursor = db.europe.find()
list = []
for i in cursor:
list.append(i)
return toJson(list)
When I send the request from my browser, it is constantly waiting for the server and nothing is returned.
Eventually I will see the flask server running in the terminal will give me: [Errno 32] Broken pipe.
My collection has 1.5 million entries, each with about 20 attributes. Could it be because the request is too large?
Thanks in advance.

The Broken pipe indicates that the other end of a socket or pipe that your flask process wants to talk to has died. Considering that you are interacting with the database it's very likely that the database has terminated the connection or the connection has died for other reasons.
Probably you should be analyzing the query that you run on your db, because the code itself doesn't seem to have an obvious problem.
Try running the query on your MongoDB manually and see what happens. Does the query return successfully?
You're mentioning that it takes a lot of time until you get that error. Could it be that some indexes are missing or not properly used in your schema, which makes the query execute very slow, and after waiting for a long time it reaches a timeout (f.e. maxTimeMS)?

Related

Flask Server not responding to HTTP POST Request

I built a simple flask server in order to automate python scripts based on an HTTP POST Request coming in from an outside service. This service sends these requests in response to events that we have created based on data conditions. All the flask server needs to do is parse the requests, and take the numerical value stored in the request and use it to run the corresponding python script. At one point, this system was working consistently. Fast forward a few months, the server was no longer working when tried again. No changes were made to the code. According to wireshark, The computer hosting the flask server is receiving the request to the correct port, but the request is now timing out. The host system is failing to respond to the request. Any Idea what is going on here? The firewall has temporarily been turned off.
Alternatively, is there another better package to achieve this goal with?
from flask import Flask, request
import threading
import runpy
app = Flask(__name__)
#app.route('/', methods=['POST'])
def PostHandler():
directory = {}
with open(r"M:\redacted") as f:
for line in f:
(key,val) = line.split()
directory[int(key)] = val
print(directory)
path = r"M:\redacted"
content = request.json
content = content['value']
print(content)
sel_script = int(content)
print(directory[sel_script])
runpy.run_path(path_name=path + directory[sel_script])
return
app.run(host="10.244.xx.xx", port=8080, threaded=True)

GoneException when calling post_to_connection on AWS lambda and API gateway

I want to send a message to a websocket client when it connects to the server on AWS lambda and API gateway. Currently, I use wscat as a client. Since the response 'connected' is not shown on the wscat console when I connect to the server, I added post_to_connection to send a message 'hello world' to the client. However, it raises GoneException.
An error occurred (GoneException) when calling the PostToConnection
operation
How can I solve this problem and send some message to wscat when connecting to the server?
My python code is below. I use Python 3.8.5.
import os
import boto3
import botocore
dynamodb = boto3.resource('dynamodb')
connections = dynamodb.Table(os.environ['TABLE_NAME'])
def lambda_handler(event, context):
domain_name = event.get('requestContext',{}).get('domainName')
stage = event.get('requestContext',{}).get('stage')
connection_id = event.get('requestContext',{}).get('connectionId')
result = connections.put_item(Item={ 'id': connection_id })
apigw_management = boto3.client('apigatewaymanagementapi',
endpoint_url=F"https://{domain_name}/{stage}")
ret = "hello world";
try:
_ = apigw_management.post_to_connection(ConnectionId=connection_id,
Data=ret)
except botocore.exceptions.ClientError as e:
print(e);
return { 'statusCode': 500,
'body': 'something went wrong' }
return { 'statusCode': 200,
"body": 'connected'};
Self-answer: you cannot post_to_connection to the connection itself in onconnect.
I have found that the GoneException can occur when the client that initiated the websocket has disconnected from the socket and its connectionId can no longer be found. Is there something causing the originating client to disconnect from the socket before it can receive your return message?
My use case is different but I am basically using a DB to check the state of a connection before replying to it, and not using the request context to do that. This error's appearance was reduced by writing connectionIds to DynamoDB on connect, and deleting them from the table upon disconnect events. Messaging now writes to connectionIds in the table instead of the id in the request context. Most messages go through but some errors are still emitted when the client leaves the socket but does not emit a proper disconnect event which leaves orphans in the table. The next step is to enforce item deletes when irregular disconnections occur. Involving a DB may be overkill for your situation, just sharing what helped me make progress on the GoneException error.
We need to post to connection after connecting (i.e. when the routeKey is not $connect)
routeKey = event.get('requestContext', {}).get('routeKey')
print(routeKey) # for debugging
if routeKey != '$connect': # if we have defined multiple route keys we can choose the right one here
apigw_management.post_to_connection(ConnectionId=connection_id, Data=ret)
#nemy's answer is totally true but it doesn't explain the reason. So, I just want to explain...
So, first of all What is GoneException or GoneError 410 ?
A 410 Gone error occurs when a user tries to access an asset which no longer exists on the requested server. In order for a request to return a 410 Gone status, the resource must also have no forwarding address and be considered to be gone permanently.
you can find more details about GoneException in this article.
In here, GoneException has occured; it means that the POST connection we are trying to make, doesn't exist - which fits perfectly in the scenario. Because we still haven't established the connection between Client and Server. The way APIGatewayWebsocketAPIs work is that you request an Endpoint(Route) and that Endpoint will invoke that Lambda Function (In our case it is ConnectionLambdaFunction for $connect Route).
Now, if The Lambda function resolves with statusCode: 200 then and only then the API Gateway will allow the connection to be established. So, basically untill we return statusCode: 200 from our Lambda Function we are not connected and untill then we are totally unknown to server and thats why the Post call that has been made before the return statement itself will throw an error.

Elasticsearch/dataflow - connection timeout after ~60 concurrent connection

We host elatsicsearch cluster on Elastic Cloud and call it from dataflow (GCP). Job works fine in dev but when we deploy to prod we're seeing lots of connection timeout on the client side.
Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1213, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 570, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "main.py", line 159, in process
File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 152, in _wrapped
return func(*args, params=params, headers=headers, **kwargs)
File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/__init__.py", line 1617, in search
body=body,
File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 390, in perform_request
raise e
File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 365, in perform_request
timeout=timeout,
File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 258, in perform_request
raise ConnectionError("N/A", str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError(<urllib3.connection.HTTPSConnection object at 0x7fe5d04e5690>: Failed to establish a new connection: [Errno 110] Connection timed out) caused by: NewConnectionError(<urllib3.connection.HTTPSConnection object at 0x7fe5d04e5690>: Failed to establish a new connection: [Errno 110] Connection timed out)
I increased timeout setting in elasticsearch client to 300s like below but it didn't seem to help.
self.elasticsearch = Elasticsearch([es_host], http_auth=http_auth, timeout=300)
Looking at deployment at https://cloud.elastic.co/deployments//metrics
CPU and memory usage are very low (below 10%) and search response time is also order of 200ms.
What could be the bottleneck here and how we can we avoid such timeouts?
As seen in below log most of requests are failing with connection timeout while successful request receives response very quick:
I tried ssh into the VM where we experience the connection error. netstat showed there were about 60 ESTABLISHED connections to the elastic search IP address. When I curl from the VM to elasticsearch address I was able to reproduce timeout. I can curl fine to other URLs. Also I can curl fine to elasticsearch from my local so issue is only connection between VM and elasticsaerch server.
Does dataflow (compute engine) or ElasticSearch has limitation on number of concurrent connection? I could not find any information online.
I did a little bit of research about the connector for ElasticSearch. There are a two principles that you may want to try to ensure your connector is as efficient as possible.
Note Setting a maximum number of workers, as suggested in the other answer, will probably not help as much (for now) - let's improve utilization from your Beam/Elastic cluster resources, and if we start hitting limits for either, then we can consider restricting # of workers - but right now, you can try to improve your connector.
Using bulk requests to external services
The code you provide issues an individual search request for every element coming into the DoFn. As you've noted, this works fine, but it will cause your pipeline to spend too much time waiting on external requests for each element - so your wait for roundtrips will be O(n).
Gladly, the Elasticsearch client has an msearch method, which should allow you to perform searches in bulk. You can do something like this:
class PredictionFn(beam.DoFn):
def __init__(self, ...):
self.buffer = []
...
def process(self, element):
self.buffer.append(element)
if len(self.buffer) > BATCH_SIZE:
return self.flush()
def flush(self):
result = []
# Perform the search requests for user ids
user_ids = [uid for cid, did, uid in self.buffer]
user_ids_request = self._build_uid_reqs(user_ids)
resp = es.msearch(body=user_ids_request)
user_id_and_device_id_lists = []
for r, elm in zip(resp['responses'], self.buffer):
if len(r["hits"]["hits"]) == 0:
continue
# Get new device_id_list
user_id_and_device_id_lists.append((elm[2], # User ID
device_id_list))
device_id_lists = [elm[1] for elm in user_id_and_device_id_lists]
device_ids_request = self._build_device_id_reqs(device_id_lists)
resp = es.msearch(body=device_ids_request)
resp = self.elasticsearch.search(index="sessions", body={"query": {"match": {"userId": user_id }}})
# Handle the result, output anything necessary
def _build_uid_reqs(self, uids):
# Relying on this answer: https://stackoverflow.com/questions/28546253/how-to-create-request-body-for-python-elasticsearch-msearch/37187352
res = []
for uid in uids:
res.append(json.dumps({'index': 'sessions'})) # Request HEAD
res.append(json.dumps({"query": {"match": {"userId": uid }}})) # Request BODY
return '\n'.join(res)
Reusing the client as it's thread-safe
The Elasticsearch client is also thread safe!
So rather than creating a new one every time, you can do something like this:
class PredictionFn(beam.DoFn):
CLIENT = None
def init_elasticsearch(self):
if PredictionFn.CLIENT is not None:
return PredictionFn.CLIENT
es_host = fetch_host()
http_auth = fetch_auth()
PredictionFn.CLIENT = Elasticsearch([es_host], http_auth=http_auth,
timeout=300, sniff_on_connection_fail=True,
retry_on_timeout=True, max_retries=2,
maxsize=5) # 5 connections per client
return PredictionFn.CLIENT
This should ensure that you keep a single client for each worker, and you won't be creating so many connections to ElasticSearch - and thus not getting the rejection messages.
Let me know if these two help, or if we need to try further improvements!
EDIT: This was red herring. CLOSE_WAIT is not related. I again had the same issue and most of connections are now in ESTABLISHED status :/
While both of answers below are insightful, I don't think they answered the question.
After some more investigation, I find out that somehow elasticsearch-py (or urllib3), in combination with dataflow, will leave connection in CLOSE_WAIT status. Once connection got this status, these connections got stuck (OS will not release these sockets because OS thinks application code will close it) so after running job sometime, all of my connections in connection pool are in this CLOSE_WAIT status and therefore I cannot make any new connections. If I don't use connection pool and instantiate elasticsaerch client for each pardo, it just gets worth, somehow connections got stuck even faster.
I reported issue here https://github.com/elastic/elasticsearch-py/issues/1459 but honestly the issue seems deeper in stack, because I had similar issue when I directly used requests package's connection pool (which I believe also used urllib3 under the hood).
Dataflow has no limit on the number of outgoing connections.
It uses a K8s cluster under the hood, and every python thread lives into their own docker container.
API calls to Elastic cloud are rate-limited (take a look at the x-rate-limit-{interval,limit,remaining} fields in the response headers).
With Dataflow it is very easy to hit API rate limits if you do a lot of parallel jobs and/or google cloud scales up the nodes of your job to make it faster.
Possible workarounds in your Dataflow / Apache Beam job:
1 - (no code required) Play with (Dataflow execution parameters)[ https://cloud.google.com/dataflow/docs/guides/specifying-exec-params] to limit the number of concurrent processing threads.
The three parameters you need to tweak are:
max_num_workers : maximum number of worker instances (machines) running.
number_of_worker_harness_threads: by default 1 thead per CPU your instance has.
machine_type: the instance type you will use.
2 - Implement rate-limit on your code. See Apache Beam Timely (and stateful) processing processing with Apache Beam

Flask Cloudant slow response time

I am creating a Flask application that is connecting to a Cloudant database using the python cloudant library.
My response time when I just add connect statement (with no queries) can be anywhere from .4s to 12s. My connect statement is like so:
client = Cloudant(USERNAME, PASSWORD, url=URL, connect=True)
When I remove the connection code, my response time is very low.
I have run a profiler on my system and it shows that the increase in response time is due to reading an ssl socket.
I have also tried using the default example from IBM Bluemix Github and got similar results for response time.
I am running my Flask application using the built in development web server. I have tried connecting to the database before every request and I have tried having a single connection that gets reused. Could this delay be due to my local machine? And what would cause it to be quick some times and not others? Other posts have suggested issues with IPv6 or DNS, but I do not think that is the case.
With API calls like:
ddoc = DesignDocument(g.db, '_design/docs')
g.myview = View(ddoc, 'my-view')
g.myview(key=[somekey])['rows']
I have already created the views and are indexed by the appropriate key, so it is not slow due to indexing.
try to use this code to connect to your Cloudant database:
def conn(user, pwd, db, **kwargs):
client = Cloudant(user, pwd, account=kwargs.get('host', user))
client.connect()
database = self.client[db]

Flask JSON request is None

I'm working on my first Flask app (version 0.10.1), and also my first Python (version 3.5) app. One of its pieces needs to work like this:
Submit a form
Run a Celery task (which makes some third-party API calls)
When the Celery task's API calls complete, send a JSON post to another URL in the app
Get that JSON data and update a database record with it
Here's the relevant part of the Celery task:
if not response['errors']: # response comes from the Salesforce API call
# do something to notify that the task was finished successfully
message = {'flask_id' : flask_id, 'sf_id' : response['id']}
message = json.dumps(message)
print('call endpoint now and update it')
res = requests.post('http://0.0.0.0:5000/transaction_result/', json=message)
And here's the endpoint it calls:
#app.route('/transaction_result/', methods=['POST'])
def transaction_result():
result = jsonify(request.get_json(force=True))
print(result.flask_id)
return result.flask_id
So far I'm just trying to get the data and print the ID, and I'll worry about the database after that.
The error I get though is this: requests.exceptions.ConnectionError: None: Max retries exceeded with url: /transaction_result/ (Caused by None)
My reading indicates that my data might not be coming over as JSON, hence the Force=True on the result, but even this doesn't seem to work. I've also tried doing the same request in CocoaRestClient, with a Content-Type header of application/json, and I get the same result.
Because both of these attempts break, I can't tell if my issue is in the request or in the attempt to parse the response.
First of all request.get_json(force=True) returns an object (or None if silent=True). jsonify converts objects to JSON strings. You're trying to access str_val.flask_id. It's impossible. However, even after removing redundant jsonify call, you'll have to change result.flask_id to result['flask_id'].
So, eventually the code should look like this:
#app.route('/transaction_result/', methods=['POST'])
def transaction_result():
result = request.get_json()
return result['flask_id']
And you are absolutely right when you're using REST client to test the route. It crucially simplifies testing process by reducing involved parts. One well-known problem during sending requests from a flask app to the same app is running this app under development server with only one thread. In such case a request will always be blocked by an internal request because the current thread is serving the outermost request and cannot handle the internal one. However, since you are sending a request from the Celery task, it's not likely your scenario.
UPD: Finally, the last one reason was an IP address 0.0.0.0. Changing it to the real one solved the problem.

Categories