Socket Programming between Django REST API, VUE and Scrapy - python

Write now my code is able to successfully send a post request to the Django REST API which in result triggers the spider respectively and stores its output to the database successfully. For this cause I am using scrapyd API as you can see in the below code snippet
#api_view(['POST'])
#permission_classes((permissions.AllowAny,))
def crawlRottenTomatoes(request):
print("in crawl tottentomatoes method")
url = request.data["url"]
if not url:
return JsonResponse({'error': 'Missing args'})
if not is_valid_url(url):
return JsonResponse({'error': 'URL is invalid'})
domain = urlparse(url).netloc
msg_dic = {}
try:
scrapyd.schedule(project="movie_spider", spider="rottentomatoes", url=url, domain=domain)
msg_dic['msg'] = "Spider RottenTomatoes for given url is up and running"
except Exception as e:
print("exception")
msg_dic['error'] = "Error running spider for RottenTomatoes"
return JsonResponse(msg_dic, safe=False)
But now what I want. Is to have some kind of response back from the scrapyd when it's done crawling and parsing the website/channel for that I came across the WebSockets. I tried to use WebSockets but the problem is scrapyd is a demon itself. I am not really able to send a message to the WebSocket client from the scrappy. Does anyone have any idea of how do to do or can share some resources which can help me with this?

Looking at the API reference, it seems the only way to get the status of a scrapyd job is via listjobs https://scrapyd.readthedocs.io/en/stable/api.html#listjobs-json and looking for your job's id in the list.
Another option is to monitor scrapyd's log file. Or you could also try to add this feature to https://github.com/scrapy/scrapyd where it would send an http request after it's done with a job.

Related

flask proxy for ttyd

I am looking for a method to write a simple proxy in flask for ttyd which is an open-source web terminal(https://github.com/tsl0922/ttyd). The most immediate way is to read client request and relay to ttyd server. However, it fails when the websocket is connecting.
My view function is as follows:
#app.route('/')
#app.route('/auth_token.js')
#app.route('/ws')
def ttyd():
if request.path=='/ws':
url = 'ws://192.168.123.172:7681' + request.path
else:
url = 'http://192.168.123.172:7681' + request.path
method = request.method
data = request.data or request.form or None
cookies = request.cookies
headers = request.headers
with closing(
requests.request(method, url, headers=headers, data=data, cookies=cookies)
) as r:
resp_headers = []
for name, value in r.headers.items():
resp_headers.append((name, value))
return Response(r, status=r.status_code, headers=resp_headers)
As you can see, the view function will handle 3 url requests, the first two succeed with status code 200, the third fails with status code 500. The error code in server side is as follows:
requests.exceptions.InvalidSchema: No connection adapters were found for 'ws://192.168.123.172:7681/ws'
I also check the network in two cases(with/without proxy). The picture 'without proxy' means direct type 'http://192.168.123.172:7681', it succeeds. The picture 'with proxy' means access ttyd server with flask proxy, it fails.
Without proxy
With proxy
Since I am new to flask and websocket, I am confused about the result. The sHTTPe flask proxy can handle any other http request(e.g. access google.com) but fails in WebSocket connection.
Thank you for telling me why and how can I fix it?
According to Websockets in Flask there is a flask-sockets project at https://github.com/heroku-python/flask-sockets to serve a websocket-endpoint in flask. To make the backend websocket connection to the server you can't use requests but websocket-client, see How do I format a websocket request?.
When I had this problem I solved it using the autobahn-python project, see https://github.com/arska/stringreplacingwebsocketproxy/
Cheers,
Aarno

HTTP REST Gateway to AMQP Request-Response, Without Web Sockets Or Polling

I've struggled for two days to understand how REST API Gateways should return GET requests to browsers when the backend service runs on AMQP (without using Web Sockets or polling).
Have successfully RPC'ed betweeen AMQP service (with RabbitMqs reply_to & correlation_id), but with Flask HTTP request waiting I'm still lost.
gateway.py - Response Handler Inside The HTTP Handler, Times out
def products_get():
def handler(ch=None, method=None, properties=None, body=None):
if body:
return body
return False
return_queue = 'products.get.return'
broker.channel.queue_declare(return_queue)
broker.channel.basic_consume(handler, return_queue)
broker.publish(exchange='', routing_key='products.get', body='Request data', properties=pika.BasicProperties(reply_to=return_queue))
now = time.time() # for timeout. Not having this returns 'no content' immediately
while time.time() < now + 1:
if handler():
return handler()
return 'Time out'
POST/PUT can simply send the AMQP message, return 200/201/201 immediately and the service work at its own pace. A separate REST interface just for GET requests seems implausible, but don't know the other options.
Regards
I think what you're asking is "how to perform asynchronous GET requests". and I reckon that the answer is - you can't. and should not. its bad practice or bad design. and it does not scale.
Why are you trying to get your GET response payload from AMQP?
If the paylaod (the content of the response) can be pulled from some DB, just pull it from there. that's called a synchronous request.
If the payload must be processed in some backend, send it away and don't have the requester wait for a response. You could assign some ID and have the requester ask again later (or collect some callback URL from the requester and have your backend POST the response once its ready - less common design).
EDIT:
so, given that you have to work with AMQP-backed backend, I would do something a little more elaborate: spawn a thread or a process in your front end that would constantly consume from AMQP and store the results locally or in some db. and serve GET results based on data that you stored locally. if the data isn't yet available, just return 404. ideally you'll need to re-shape your API: split it into "post" requests (that would trigger work at the backend) and "get" requests (that would return the results if they're available).

Health check failed when service is still running

I'm using google health check in order to send request to my flask client to make sure my service is alive.
the same route in flask client sends request to two more flask clients to make sure the other two is also alive.
For some reason the request sometimes fails when the service is still running.
I tries to figure out why but there is nothing in my services logs that indicates that something happened and on most cases it works fine.
This is my code:
#GET /health_check//
def get(self):
try:
for service in INTERNAL_SERVICES_HEALTH_CHECKS:
client = getattr(all_clients, service + '_client')
response = client.get('g_health_check')
except Exception, e:
sentry_client.captureMessage('health check failed for '+env+ ' environment. error log:' + repr(e))
return output_json({'I\'m Not fine!':False}, requests.codes.server_error)
return output_json({'I\'m fine!':True}, requests.codes.ok)
If anyone has any suggestions I will be happy to try and fix it.

Flask JSON request is None

I'm working on my first Flask app (version 0.10.1), and also my first Python (version 3.5) app. One of its pieces needs to work like this:
Submit a form
Run a Celery task (which makes some third-party API calls)
When the Celery task's API calls complete, send a JSON post to another URL in the app
Get that JSON data and update a database record with it
Here's the relevant part of the Celery task:
if not response['errors']: # response comes from the Salesforce API call
# do something to notify that the task was finished successfully
message = {'flask_id' : flask_id, 'sf_id' : response['id']}
message = json.dumps(message)
print('call endpoint now and update it')
res = requests.post('http://0.0.0.0:5000/transaction_result/', json=message)
And here's the endpoint it calls:
#app.route('/transaction_result/', methods=['POST'])
def transaction_result():
result = jsonify(request.get_json(force=True))
print(result.flask_id)
return result.flask_id
So far I'm just trying to get the data and print the ID, and I'll worry about the database after that.
The error I get though is this: requests.exceptions.ConnectionError: None: Max retries exceeded with url: /transaction_result/ (Caused by None)
My reading indicates that my data might not be coming over as JSON, hence the Force=True on the result, but even this doesn't seem to work. I've also tried doing the same request in CocoaRestClient, with a Content-Type header of application/json, and I get the same result.
Because both of these attempts break, I can't tell if my issue is in the request or in the attempt to parse the response.
First of all request.get_json(force=True) returns an object (or None if silent=True). jsonify converts objects to JSON strings. You're trying to access str_val.flask_id. It's impossible. However, even after removing redundant jsonify call, you'll have to change result.flask_id to result['flask_id'].
So, eventually the code should look like this:
#app.route('/transaction_result/', methods=['POST'])
def transaction_result():
result = request.get_json()
return result['flask_id']
And you are absolutely right when you're using REST client to test the route. It crucially simplifies testing process by reducing involved parts. One well-known problem during sending requests from a flask app to the same app is running this app under development server with only one thread. In such case a request will always be blocked by an internal request because the current thread is serving the outermost request and cannot handle the internal one. However, since you are sending a request from the Celery task, it's not likely your scenario.
UPD: Finally, the last one reason was an IP address 0.0.0.0. Changing it to the real one solved the problem.

Reading POST data from Google App Engine pipeline callback

I'm trying to download data from an external API. There's going to be a lot of downloads, so I want to use pipelines for easier parallelization. The way the API is set up, I can make a request to start a download job, and pass a postback url in that request. When the download job finishes, their API sends a POST to the given url. I want to do the following:
class DownloadPipeline(pipeline.Pipeline):
async = True
public_callbacks = True
def run(self, filename):
postback = self.get_callback_url()
# make API request with postback as a param
def callback(self):
# Read data from the POST
However, all the docs I've read online only have examples of GET requests on the callback url, where data is passed through a query string on the URL. Is there a way to read POST data instead?
Looks like both the POST and GET both call over to run_callback() ... so you should be able to do either

Categories