Ironworker job done notification - python

I'm writing python app which currently is being hosted on Heroku. It is in early development stage, so I'm using free account with one web dyno. Still, I want my heavier tasks to be done asynchronously so I'm using iron worker add-on. I have it all set up and it does the simplest jobs like sending emails or anything that doesn't require any data being sent back to the application. The question is: How do I send the worker output back to my application from the iron worker? Or even better, how do I notify my app that the worker is done with the job?
I looked at other iron solutions like cache and message queue, but the only thing I can find is that I can explicitly ask for the worker state. Obviously I don't want my web service to poll the worker because it kind of defeats the original purpose of moving the tasks to background. What am I missing here?

I see this question is high in Google so in case you came here with hopes to find some more details, here is what I ended up doing:
First, I prepared the endpoint on my app. My app uses Flask, so this is how the code looks:
#app.route("/worker", methods=["GET", "POST"])
def worker():
#refresh the interface or whatever is necessary
if flask.request.method == 'POST':
return 'Worker endpoint reached'
elif flask.request.method == 'GET':
worker = IronWorker()
task = worker.queue(code_name="hello", payload={"WORKER_DB_URL": app.config['WORKER_DB_URL'],
"WORKER_CALLBACK_URL": app.config['WORKER_CALLBACK_URL']})
details = worker.task(task)
flask.flash("Work queued, response: ", details.status)
return flask.redirect('/')
Note that in my case, GET is here only for testing, I don't want my users to hit this endpoint and invoke the task. But I can imagine situations when this is actually useful, specifically if you don't use any type of scheduler for your tasks.
With the endpoint ready, I started to look for a way of visiting that endpoint from the worker. I found this fantastic requests library and used it in my worker:
import sys, json
from sqlalchemy import *
import requests
print "hello_worker initialized, connecting to database..."
payload = None
payload_file = None
for i in range(len(sys.argv)):
if sys.argv[i] == "-payload" and (i + 1) < len(sys.argv):
payload_file = sys.argv[i + 1]
break
f = open(payload_file, "r")
contents = f.read()
f.close()
payload = json.loads(contents)
print "contents: ", contents
print "payload as json: ", payload
db_url = payload['WORKER_DB_URL']
print "connecting to database ", db_url
db = create_engine(db_url)
metadata = MetaData(db)
print "connection to the database established"
users = Table('users', metadata, autoload=True)
s = users.select()
#def run(stmt):
# rs = stmt.execute()
# for row in rs:
# print row
#run(s)
callback_url = payload['WORKER_CALLBACK_URL']
print "task finished, sending post to ", callback_url
r = requests.post(callback_url)
print r.text
So in the end there is no real magic here, the only important thing is to send the callback url in the payload if you need to notify your page when the task is done. Alternatively you can place the endpoint url in the database if you use one in your app. Btw. the snipped above also shows how to connect to the postgresql database in your worker and print all the users.
One last thing you need to be aware of is how to format your .worker file, mine looks like this:
# set the runtime language. Python workers use "python"
runtime "python"
# exec is the file that will be executed:
exec "hello_worker.py"
# dependencies
pip "SQLAlchemy"
pip "requests"
This will install the latest versions of SQLAlchemy and requests, if your project is dependent on any specific version of the library, you should do this instead:
pip "SQLAlchemy", "0.9.1"

Easiest way - push message to your api from worker - it's log or anything you need to have in your app

Related

web service are very slow when several users are connected to the application

I am developing a flutter app using flask as back end framework and mariabd as database
Trying to reduce web service time response of ws:
1- open the connexion at the begining of ws
2- Execute queries
3-close connexion to database before returnning the response
Here is an exemple of my code archi:
#app.route('/ws_name', methods=['GET'])
def ws_name():
cnx=db_connexion()
try:
id_lanparamguage = request.args.get('param')
result = function_execute_many_query(cnx,param)
except:
cnx.close()
return jsonify(result), 200
response = {}
cnx.close()
return jsonify(result), 200
db_connexion is my function that handle connecting to database
The probleme is when only one user is connecting to the app (use ws) the time response is perfect
but if 3 users (as exemple) are connected th time response is up from millisecond to 10 seconds
I suspect you have a problem with many requests sharing the same thread. Read https://werkzeug.palletsprojects.com/en/1.0.x/local/ for how the local context works and why you need werkzeug to manage your local context in an WSGI application.
You would want to do something like:
from werkzeug.local import LocalProxy
cnx=LocalProxy(db_connexion)
I also recommend closing your connextion in a function decorated by #app.teardown_request
See https://flask.palletsprojects.com/en/1.1.x/api/#flask.Flask.teardown_request

Flask redirect from a child procces - make a waiting page using only python

today I try to make a "waiting page" using Flask.
I mean a client makes a request, I want to show him a page like "wait the process can take a few minutes", and when the process ends on the server display the result.I want to display "wait" before my function manageBill.teste but redirect work only when it returned right?
#application.route('/teste', methods=['POST', 'GET'])
def test_conf():
if request.method == 'POST':
if request.form.get('confList') != None:
conf_file = request.form.get('confList')
username = request.form.get('username')
password = request.form.get('password')
date = request.form.get('date')
if date == '' or conf_file == '' or username == '' or password == '':
return "You forget to provide information"
newpid = os.fork()
if newpid == 0: # in child procces
print('A new child ', os.getpid())
error = manageBill.teste(conf_file, username, password, date)
print ("Error :" + error)
return redirect('/tmp/' + error)
else: # in parent procces
return redirect('/tmp/wait')
return error
return manageBill.manageTest()`
My /tmp route:
#application.route('/tmp/<wait>')
def wait_teste(wait):
return "The procces can take few minute, you will be redirected when the teste is done.<br>" + wait
If you are using the WSGI server (the default), requests are handled by threads. This is likely incompatible with forking.
But even if it wasn't, you have another fundamental issue. A single request can only produce a single response. Once you return redirect('/tmp/wait') that request is done. Over. You can't send anything else.
To support such a feature you have a few choices:
The most common approach is to have AJAX make the request to start a long running process. Then setup an /is_done flask endpoint that you can check (via AJAX) periodically (this is called polling). Once your endpoint returns that the work is done, you can update the page (either with JS or by redirecting to a new page).
Have /is_done be a page instead of an API endpoint that is queried from JS. Set an HTTP refresh on it (with some short timeout like 10 seconds). Then your server can send a redirect for the /is_done endpoint to the results page once the task finishes.
Generally you should strive to serve web requests as quickly as possible. You shouldn't leave connections open (to wait for a long task to finish) and you should offload these long running tasks to a queue system running separately from the web process. In this way, you can scale your ability to handle web requests and background processes separately (and one failing does not bring the other down).

How to synchronize greenlets from gevent in a WSGIServer / flask environment

in my flask based http server designed to remotely manage some services on RPI I've approached a problem I cannot solve alone, thus a kind request to you to give me a hint.
Concept:
Via flask and gevent I can stop and run some (two) services running on RPI. I use gevent and server side event with respect javascript in order to listen to the html updates.
The html page shows the status (on/off/processing) of the services and provides buttons to switch them on/off. Additionally display some system parameters (CPU, RAM, HDD, NET).
As long as there is only one user/page opened everything works as desired. As soon as there are more users accessing the flask server there is a race between greenlets serving each user/page and not all pages are getting reloaded.
Problem:
How can I send a message to all running greenlets sse_worker() and process it on top of their regular job?
Below a high level code. The complete source can be found here: https://github.com/petervflocke/flasksse_rpi check the sse.py file
def sse_worker(): #neverending task
while True:
if there_is_a_change_in_process_status:
reload_page=True
else:
reload_page=False
Do some other tasks:
update some_single_parameters_to_be_passed_to_html_page
yield 'data: ' + json.dumps(all_parameters)
gevent.sleep(1)
#app.route('/stream/', methods=['GET', 'POST'])
def stream():
return Response(sse_worker(), mimetype="text/event-stream")
if __name__ == "__main__":
gevent.signal(signal.SIGTERM, stop)
http_server = WSGIServer(('', 5000), app)
http_server.serve_forever()
...on the html page the streamed json data are processed accordingly. If a status of a service has been changed based on the reload_page variable javascript reload the complete page - code extract below:
<script>
function listen() {
var source = new EventSource("/stream/");
var target1 = document.getElementById("time");
....
source.onmessage = function(msg) {
obj = JSON.parse(msg.data);
target1.innerHTML = obj.time;
....
if (obj.reload == "1") {
location.reload();
}
}
}
listen();
</script>
My desired solution would be to extend the sse_worker() like this:
def sse_worker():
while True:
if there_is_a_change_in_process_status:
reload_page=True
# NEW: set up a semaphore/flag that there is a change on the page
message_set(reload)
elif message_get(block=false)==reload: # NEW: check the semaphore
# issue: the message_get must retun "reload" for _all_ active sse_workers, that all of them can push the reload to "their" pages
reload_page=True
else:
reload_page=False
Do some other tasks:
update some_single_parameters_to_be_passed_to_html_page
yield 'data: ' + json.dumps(all_parameters)
gevent.sleep(1)
I hope I could pass on my message. Any idea from your side how I can solve the synchronization? Please notice that we have here the producer and consumer in the same sse_worker function.
Any idea is very welcome!
best regards
Peter

Can I asynchronously duplicate a webapp2.RequestHandler Request to a different url?

For a percentage of production traffic, I want to duplicate the received request to a different version of my application. This needs to happen asynchronously so I don't double service time to the client.
The reason for doing this is so I can compare the responses generated by the prod version and a production candidate version. If their results are appropriately similar, I can be confident that the new version hasn't broken anything. (If I've made a functional change to the application, I'd filter out the necessary part of the response from this comparison.)
So I'm looking for an equivalent to:
class Foo(webapp2.RequestHandler):
def post(self):
handle = make_async_call_to('http://other_service_endpoint.com/', self.request)
# process the user's request in the usual way
test_response = handle.get_response()
# compare the locally-prepared response and the remote one, and log
# the diffs
# return the locally-prepared response to the caller
UPDATE
google.appengine.api.urlfetch was suggested as a potential solution to my problem, but it's synchronous in the dev_appserver, though it behaves the way I wanted in production (the request doesn't go out until get_response() is called, and it blocks). :
start_time = time.time()
rpcs = []
print 'creating rpcs:'
for _ in xrange(3):
rpcs.append(urlfetch.create_rpc())
print time.time() - start_time
print 'making fetch calls:'
for rpc in rpcs:
urlfetch.make_fetch_call(rpc, 'http://httpbin.org/delay/3')
print time.time() - start_time
print 'getting results:'
for rpc in rpcs:
rpc.get_result()
print time.time() - start_time
creating rpcs:
9.51290130615e-05
0.000154972076416
0.000189065933228
making fetch calls:
0.00029993057251
0.000356912612915
0.000473976135254
getting results:
3.15417003632
6.31326603889
9.46627306938
UPDATE2
So, after playing with some other options, I found a way to make completely non-blocking requests:
start_time = time.time()
rpcs = []
logging.info('creating rpcs:')
for i in xrange(10):
rpc = urlfetch.create_rpc(deadline=30.0)
url = 'http://httpbin.org/delay/{}'.format(i)
urlfetch.make_fetch_call(rpc, url)
rpc.callback = create_callback(rpc, url)
rpcs.append(rpc)
logging.info(time.time() - start_time)
logging.info('getting results:')
while rpcs:
rpc = apiproxy_stub_map.UserRPC.wait_any(rpcs)
rpcs.remove(rpc)
logging.info(time.time() - start_time)
...but the important point to note is that none of the async fetch options in urllib work in the dev_appserver. Having discovered this, I went back to try #DanCornilescu's solution and found that it only works properly in production, but not in the dev_appserver.
The URL Fetch service supports asynchronous requests. From Issuing an asynchronous request:
HTTP(S) requests are synchronous by default. To issue an asynchronous
request, your application must:
Create a new RPC object using urlfetch.create_rpc(). This object represents your asynchronous call in subsequent method calls.
Call urlfetch.make_fetch_call() to make the request. This method takes your RPC object and the request target's URL as parameters.
Call the RPC object's get_result() method. This method returns the result object if the request is successful, and raises an exception if
an error occurred during the request.
The following snippets demonstrate how to make a basic asynchronous
request from a Python application. First, import the urlfetch library
from the App Engine SDK:
from google.appengine.api import urlfetch
Next, use urlfetch to make the asynchronous request:
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(rpc, "http://www.google.com/")
# ... do other things ...
try:
result = rpc.get_result()
if result.status_code == 200:
text = result.content
self.response.write(text)
else:
self.response.status_code = result.status_code
logging.error("Error making RPC request")
except urlfetch.DownloadError:
logging.error("Error fetching URL0")
Note: As per Sniggerfardimungus's experiment mentioned in the question's update the async calls might not work as expected on the development server - being serialized instead of concurrent, but they do so when deployed on GAE. Personally I didn't use the async calls yet, so I can't really say.
If the intent is not block at all waiting for the response from the production candidate app you could push a copy of the original request and the production-prepared response on a task queue then answer to the original request - with neglijible delay (that of enqueueing the task).
The handler for the respective task queue would, outside of the original request's critical path, make the request to the staging app using the copy of the original request (async or not, doesn't really matter from the point of view of impacting the production app's response time), get its response and compare it with the production-prepared response, log the deltas, etc. This can be nicely wrapped in a separate module for minimal changes to the production app and deployed/deleted as needed.

Python - Flask - open a webpage in default browser

I am working on a small project in Python. It is divided into two parts.
First part is responsible to crawl the web and extract some infromation and insert them into a database.
Second part is resposible for presenting those information with use of the database.
Both parts share the database. In the second part I am using Flask framework to display information as html with some formatting, styling and etc. to make it look cleaner.
Source files of both parts are in the same package, but to run this program properly user has to run crawler and results presenter separately like this :
python crawler.py
and then
python presenter.py
Everything is allright just except one thing. What I what presenter to do is to create result in html format and open the page with results in user's default browser, but it is always opened twice, probably due to the presence of run() method, which starts Flask in a new thread and things get cloudy for me. I don't know what I should do to be able to make my presenter.py to open only one tab/window after running it.
Here is the snippet of my code :
from flask import Flask, render_template
import os
import sqlite3
# configuration
DEBUG = True
DATABASE = os.getcwd() + '/database/database.db'
app = Flask(__name__)
app.config.from_object(__name__)
app.config.from_envvar('CRAWLER_SETTINGS', silent=True)
def connect_db():
"""Returns a new connection to the database."""
try:
conn = sqlite3.connect(app.config['DATABASE'])
return conn
except sqlite3.Error:
print 'Unable to connect to the database'
return False
#app.route('/')
def show_entries():
u"""Loads pages information and emails from the database and
inserts results into show_entires template. If there is a database
problem returns error page.
"""
conn = connect_db()
if conn:
try:
cur = connect_db().cursor()
results = cur.execute('SELECT url, title, doctype, pagesize FROM pages')
pages = [dict(url=row[0], title=row[1].encode('utf-8'), pageType=row[2], pageSize=row[3]) for row in results.fetchall()]
results = cur.execute('SELECT url, email from emails')
emails = {}
for row in results.fetchall():
emails.setdefault(row[0], []).append(row[1])
return render_template('show_entries.html', pages=pages, emails=emails)
except sqlite3.Error, e:
print ' Exception message %s ' % e
print 'Could not load data from the database!'
return render_template('show_error_page.html')
else:
return render_template('show_error_page.html')
if __name__ == '__main__':
url = 'http://127.0.0.1:5000'
webbrowser.open_new(url)
app.run()
I use similar code on Mac OS X (with Safari, Firefox, and Chrome browsers) all the time, and it runs fine. Guessing you may be running into Flask's auto-reload feature. Set debug=False and it will not try to auto-reload.
Other suggestions, based on my experience:
Consider randomizing the port you use, as quick edit-run-test loops sometimes find the OS thinking port 5000 is still in use. (Or, if you run the code several times simultaneously, say by accident, the port truly is still in use.)
Give the app a short while to spin up before you start the browser request. I do that through invoking threading.Timer.
Here's my code:
import random, threading, webbrowser
port = 5000 + random.randint(0, 999)
url = "http://127.0.0.1:{0}".format(port)
threading.Timer(1.25, lambda: webbrowser.open(url) ).start()
app.run(port=port, debug=False)
(This is all under the if __name__ == '__main__':, or in a separate "start app" function if you like.)
So this may or may not help. But my issue was with flask opening in microsoft edge when executing my app.py script... NOOB solution. Go to settings and default apps... And then change microsoft edge to chrome... And now it opens flask in chrome everytime. I still have the same issue where things just load though

Categories