how to integrate kafka with rest and to pyspark?

how to integrate kafka with rest and to pyspark? - python

i have a rest service written in flask sitting at localhost:5000. It has endpoint function 'parser'.
It parses one website.
from flask import Flask
import requests
from flask import jsonify
from flask import make_response
from flask import request
app = Flask(__name__)
#app.route("/", methods = ['GET'])
def main():
return jsonify('service is waiting, standby..')
#app.route("/parser/<string:website>", methods = ['GET'])
def parse(website):
if website != 'news.com':
return make_response(jsonify({'error': 'only news.com is parsable' }), 404)
else:
result = main_parse()#need to send variable result via kafka to streaming for analysis
return make_response(jsonify("parsing this website...", 200))
if __name__ == "__main__":
app.run(debug=True)
The function main_parse has other stuff in it, but it returns some parsed data -a list of lists.
I never used kafka before, how do I throw this parsed data result via kafka or kafka topic(task?) to pyspark for basic analysis? I m naive.
It should be a pipeline without clicking buttons- once rest api receives correct website name - it does the rest.

You don't need Spark for Kafka in this code, and I would not put Spark in any web server as it is meant for distributed processing (meaning your Spark code wouldn't even be running within the context of Flask, if you did have a Spark cluster).
You can use any Kafka Python library, which can be easily found on Github for examples and usage documentation.
do we actually send entire data to kafka
Sure. Keep in mind; Kafka has a default record limit of 1MB

Related

Issue with simple python API in flask. Trying to create a post method to add json data to a list

I am trying to build a simple flask api to post json data to a list (eventually with be redshift but this is just a simple test program).
I have attached the api code first followed by the code to send data.
I am getting internal server error issues when running the second script.
The code seems very simple though and I cannot figure out what is wrong.
from flask_restful import Api, Resource
from flask import request
app = Flask(__name__)
api = Api(app)
audit_log = []
class audit(Resource):
#def get (self):
#return {"data":"HelloWorld"}
def put (self):
new_item = request.get_json()
audit_log.append(new_item)
return new_item
api.add_resource(audit,"/")
app.run()
import requests
BASE = "HTTP://127.0.0.1:5000/"
response = requests.put(BASE, params = {'auditid' : 'xyz', 'jobname' : 'abc'})
print (response.json())

It seems that you haven't imported the Flask properly
instead of this
from flask import request
use this
from flask import Flask, request
This should work fine...

Returning 'still loading' response with Flask API

I have a scikit-learn classifier running as a Dockerised Flask app, launched with gunicorn. It receives input data in JSON format as a POST request, and responds with a JSON object of results.
When the app is first launched with gunicorn, a large model (serialised with joblib) is read from a database, and loaded into memory before the app is ready for requests. This can take 10-15 minutes.
A reproducible example isn't feasible, but the basic structure is illustrated below:
from flask import Flask, jsonify, request, Response
import joblib
import json
def classifier_app(model_name):
# Line below takes 10-15 mins to complete
classifier = _load_model(model_name)
app = Flask(__name__)
#app.route('/classify_invoice', methods=['POST'])
def apicall():
query = request.get_json()
results = _build_results(query['data'])
return Response(response=results,
status=200,
mimetype='application/json')
print('App loaded!')
return app
How do I configure Flask or gunicorn to return a 'still loading' response (or suitable error message) to any incoming http requests while _load_model is still running?

Basically, you want to return two responses for one request. So there are two different possibilities.
First one is to run time-consuming task in background and ping server with simple ajax requests every two seconds to check if task is completed or not. If task is completed, return result, if not, return "Please standby" string or something.
Second one is to use websockets and flask-socketio extension.
Basic server code would be something like this:
from threading import Thread
from flask import Flask
app = Flask(__name__)
socketio = SocketIO(app)
def do_work():
result = your_heavy_function()
socketio.emit("result", {"result": result}, namespace="/test/")
#app.route("/api/", methods=["POST"])
def start():
socketio.start_background_task(target=do_work)
# return intermediate response
return Response()
On the client side you should do something like this
var socket = io.connect('http://' + document.domain + ':' + location.port + '/test/');
socket.on('result', function(msg) {
// Process your request here
});
For further details, visit this blog post, flask-socketio documentation for server-side reference and socketio documentation for client-side reference.
PS Using web-sockets this you can make progress-bar too.

Running two process on Flask with common variables

I want to build a Webapp with Flask where some data is printed on a dynamic page in real time.
The data is taken from a Python script which connects to a Websocket, then it's printed on the frontend with Flask.
I have two problems:
1) I can't run both the scripts together
2) I don't know how to call parsed from test to yield
Here is the code:
from time import sleep
from flask import Flask, render_template
import websocket
from bitmex_websocket import Instrument
from bitmex_websocket.constants import InstrumentChannels
from bitmex_websocket.constants import Channels
import json
from threading import Thread, Event
app = Flask(__name__)
websocket.enableTrace(True)
channels = [
InstrumentChannels.trade,
]
XBTUSD = Instrument(symbol='XBTUSD',
channels=channels)
XBTUSD.on('action', lambda msg: test(msg))
def test(msg):
parsed = json.loads(json.dumps(msg))
print(parsed)
#app.route('/')
def index():
# render the template (below) that will use JavaScript to read the stream
return render_template('index.html')
#app.route('/stream_sqrt')
def stream():
def generate():
yield '{}\n'.format('test')
return app.response_class(generate(), mimetype='text/plain')
if __name__ == '__main__':
XBTUSD.run_forever()
app.run()
If i put XBTUSD.run_forever() before app.run() i will start the part supposed to retrieve the data but the Flask app won't start. If i do the opposite, the Flask app will run but not the other part. How can i run together the whole app? How could i "share" variables between test and generate?

An easier way to go, please use flask-socketio instead flask.
https://flask-socketio.readthedocs.io/en/latest/
Sample for sending messages using flask-socketio
https://flask-socketio.readthedocs.io/en/latest/#sending-messages

Make a python script API

I have a script on python, which prints some data. The script is on Centos7, nginx.
How could I connect to the script via URL (GET query) to be able to parse the data?

You can use a framework like Django or flask to make an api out of it. I'll suggest flask since it's very light-weight, making it ideal for such small tasks.
E.g.
def your_function(input):
# do something
return output
from flask import Flask
from flask import request
app = Flask(__name__)
#app.route('/my_api')
def your_api_function():
input = request.args.get('my_query_string')
return your_function(input)
if __name__ == '__main__':
app.run(debug=True)
And then use the endpoint
/my_api?my_query_string=my_input
You can further play around with it to return JSON, take parameters from request body and so on and so forth.
Read more here http://flask.pocoo.org/

Python: Making a Flask Rest API Asynchronous and Deploying it

I have a python server that is currently keeping track of the location of all the buses in my university and generating predictions of arrivals to specific locations.
Now, I wanted to attach a lightweight REST API to this server but I have been running intro problems.
I tried using flask with the following code:
from flask import Flask, jsonify
from PredictionWrapper import *
import threading
class RequestHandler():
def __init__(self,predictionWrapper):
self.app = Flask(__name__)
self.predictor = predictionWrapper
self.app.debug = False
self.app.add_url_rule('/<route>/<int:busStop>','getSinglePrediction',self.getSinglePrediction)
t = threading.Thread(target=self.app.run, kwargs={'host':'0.0.0.0', 'port':80, 'threaded':True})
t.start()
def getSinglePrediction(self, route, busStop):
# TODO Get the actual prediction with given parameters
prediction = self.predictor.getPredictionForStop(route, busStop)
return jsonify({'busStop': busStop, 'prediction': prediction})
def getStopPrediction(self, busStop):
# TODO Get the actual prediction with given parameters
return jsonify({'busStop': busStop, 'prediction': 2})
def run(self):
self.app.run(host='0.0.0.0', port=80, threaded=True)
The problem is that I have been encountering the error below after about half a day of running the server. Note that no requests were made to the server around the time it failed with the following error:
ERROR:werkzeug: - - [01/May/2016 09:55:55] code 400, message Bad request syntax ('\x02\xfd\xb1\xc5!')
After investigating I believe I need to deploy to a WSGI production server. But I have no clue what it means in this specific approach given that 1)the flask server is being threaded in order to run the rest of the prediction application, and 2)I am using classes which none of the documentation uses.
Any help on how to setup the wsgi file with apache, gunicorn, or the technology of your choice would be appreciated. Also, any comments on a better approach on making a non-blocking REST API would be helpful.
Let me know if you need any further clarification!

Not sure if this can actually solve your problem but you can use the coroutine based web server gevent. They have a WSGI server that you can use if that's what you meant by saying that you need to deploy a WSGI production server.
If you want to implement the server to your flask application just do the following:
from gevent.pywsgi import WSGIServer
app = Flask(__name__)
http_server = WSGIServer(('', 5000), app)
http_server.serve_forever()
Gevent in general is a very powerful tool and by issuing context switches as necessary it can handle multiple clients very easily. Also, gevent fully supports flask.

First thing to do would be to put exception handling to deal with bad JSON request data (which maybe is what's happening) something like
def getSinglePrediction(self, route, busStop):
try:
prediction = self.predictor.getPredictionForStop(route, busStop)
return jsonify({'busStop': busStop, 'prediction': prediction})
except:
return jsonify({'busStop': 'error', 'prediction': 'error'})

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to integrate kafka with rest and to pyspark? - python

Related

Issue with simple python API in flask. Trying to create a post method to add json data to a list

Returning 'still loading' response with Flask API

Running two process on Flask with common variables

Make a python script API

Python: Making a Flask Rest API Asynchronous and Deploying it

Categories

Resources