Writing output in a sub-process leads to error - python

Keras forces output to sys.stderr (for which a fix was rejected on GitHub). There seems to be a problem writing to system outputs from a child process in a Web App. This leads to my code throwing the appropriate error when attempting to inform about the backend implementation when Keras is imported.
AttributeError: 'NoneType' object has no attribute 'write'
I tried to redirect output to os.devnull according to this answer before instantiating a Flask application and starting it with a web.config. However, the error persisted. Curiously, writing output without multiprocessing worked just fine.
import sys
from flask import Flask
import keras
app = Flask(__name__)
#app.route('/')
def main():
print('Hello!')
sys.stdout.write('test\n')
sys.stderr.write('emsg\n')
return 'OK.', 200
Even from keras import backend as k works. That's the statement that originally produced the error. This left me baffled. What could possibly be the matter?
Minimal example
In my application, a sub process is spawned for training models. When trying to write output within the multiprocessing.Process, an error is thrown. Here's some code to reproduce the situation.
import sys
from flask import Flask
from multiprocessing import Process
def write_output():
sys.stdout.write('hello\n')
def create_app():
apl = Flask(__name__)
Process(target=write_output).start()
#apl.route('/')
def main():
return 'OK.', 200
return apl
This application is then instantiated in another file and called from web.config. Basic logging confirmed the error was still being thrown.
Almost a fix
Although not a fix, I got the system working using threading. By simply switching multiprocessing.Queue and Process to queue.Queue and threading.Thread, no errors like above are thrown. For my use case this is acceptable for now. Of course it's not a solution to the problem of writing output in a child process.

I noticed that you mentioned the web.config file for Azure WebApp on Windows. And there is a limitation about Azure Web App sandbox you need to know. After I reviewed it with your scenario, I think your app was rejected by some restrictions. Per my experience, I think the task of trainning model is not suitable for running on Azure WebApp, especially for a Windows instance, even just on a sandbox based on CPU.
My suggestion is to move your app to a high-performace Azure VM with GPU, such as NC-series you can see from https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/.
Otherwise, you may have other reason to use Azure WebApp for your app, I recommanded that you could try Azure WebApp for Linux based on Docker.
Hope it helps.

Related

flask API calls scheduling with cron jobs

I have a function which calls several API's and updates the database upon being called. I want to schedule the function to run daily at specific time.
Already tried flask_apscheduler and APScheduler which gives this error:
This typically means that you attempted to use functionality that needed an active HTTP request. Consult the documentation on testing for information about how to avoid this problem.
Any leads on this will be helpful.
You should:
Post the code where you define your flask application.
Specify how you try to access the app.
How you're calling the APIs.
Whether those APIs are 3rd party or part of your blueprint.
However, this is probably a context issue. I have come across a similar one with SQLAlchemy before.
You will need to somehow get access to your app, either by using app_context or by importing current_app from Flask and accessing the config.
Assuming you imported the app where your function is used, try this:
with app.app_context():
# call your function here
Refer to this document for more information: Flask Documentation
Another approach you can try, is passing your app configurations through a config class object.
You can define the jobs you want to schedule and pass a reference to your function inside.
Check this example from flask-apscheduler repository on GitHub.

How do I make my Haskell functions available in my Python Flask app?

I have a machine learning model written in Haskell, but want to use Python and Flask for the front end. How do I pass data to my Haskell function from within my Python Flask app? Some code examples would be helpful - I have looked at Servant so far, but don't know how that would work with a Flask app already in place.
If you are looking for a reasonably fast way to make a web-based interface for your machine learning model as FifthCode has suggested, you might want to consider Scotty. It's a Sinatra-inspired web framework for Haskell.
{-# LANGUAGE OverloadedStrings #-}
import Web.Scotty
import Data.Monoid (mconcat)
main = scotty 3000 $
get "/:word" $ do
beam <- param "word"
html $ mconcat ["<h1>Scotty, ", beam, " me up!</h1>"]
Notably, a call using a REST API like this will block. Depending on how long it takes your machine learning model to run, you may want to use a a webhook approach where you submit a job with HTTP/HTTPS and include a webhook URL on your flask app that the Scotty app will POST to when it has finished running the model.
When your Flask app POSTs to /predict on the Scotty app it will end up blocking until the Scotty app responds. Having the Scotty app spawn a thread for the ML work, respond with 204 Accepted immediately after, and then the ML thread POSTs to /prediction on the Flask app with the result when it completes.

Heroku RQ (Redis Queue) Django Error: "Apps aren't loaded yet."

I have a functional Django app that has many Google Text-To-Speech API calls and database reads/writes in my view. When testing locally it takes about 3 seconds to load a page, but when I deploy the app live to Heroku it takes about 15 seconds to load the webpage. So I am trying to reduce load time.
I came across this article: https://devcenter.heroku.com/articles/python-rq that suggests I should use background tasks by queueing jobs to workers using an RQ (Redis Queue) library. I followed their steps and included their worker.py file in the same directory as my manage.py file (not sure if that's the right place to put it). I wanted to test it out locally with a dummy function and view to see if it would run without errors.
# views.py
from rq import Queue
from worker import conn
def dummy(foo):
return 2
def my_view(request):
q = Queue(connection=conn)
for i in range(10):
dummy_foo = q.enqueue(dummy, "howdy")
return render(request, 'dummy.html', {})
In separate terminals I run:
$ python worker.py
$ python manage.py runserver
But when loading the webpage I received many "Apps aren't loaded yet." error messages in the python worker.py terminal. I haven't tried to deploy to Heroku yet, but I'm wondering why I am I getting this error message locally?
Better late than never.
Django-rq requires Django2.0, unfortunately for our project there is no plan to upgrade to the latest version.
So if you are in the same situation, you can still use plain RQ, you just need to add the two following lines in worker.py (worker_django_1_11) :
import django
django.setup()
and pass the worker class like :
> DJANGO_SETTINGS_MODULE=YOURPROJECT.settings rq worker --worker-class='worker_django_1_11.Worker'
You didn't post the code of worker.py, but I'd wager it does not properly initialize Django. Take a look at the contents of manage.py to see an example. So, if worker.py tries to instantiate (or even import) any models, views, etc, you'll get that kind of error. Django needs to resolve settings.py (among other things), then use that to look up database settings, resolve models/relationships, etc.
Simplest path is to use django-rq, a simple library that integrates RQ and Django to handle all this. Your worker.py essentially just becomes python manage.py rqworker.

App Engine Backend not working

I'm having a hard time getting a backend to run on the GAE servers. The following works locally, but not when deployed:
counter.py:
from google.appengine.api import logservice
logservice.AUTOFLUSH_ENABLED = False
logging.error("Backend started!")
logservice.flush()
No log message is seen when deployed. I've even tried putting syntax error's in, they are not reported either, so it doesn't seem like the backend is actually running my code. I've tried doing the same in infinite loops with sleeps and such too, same result.
Here is the backends.yaml:
backends:
- name: counter
start: counter.py
instances: 1
class: B1
The backend is listed as running in the management console, but doesn't seem to be actually doing anything.
Anyone able to get a backend running on the GAE servers? Thanks!
There are three ways to call a backend service: scheduled Backend, tasked Backend and browsed Backend. Try http://counter.appname.appspot.com/path.
Sources:
http://www.pdjamez.com/2011/05/google-app-engine-backend-patterns/
http://www.pdjamez.com/2011/05/google-app-engine-backends/comment-page-1/

Keeping concurrency in web.py applications on mod_wsgi

Sorry if this makes no sense. Please comment if clarification is needed.
I'm writing a small file upload app in web.py which I am deploying using mod_wsgi + apache. I have been having a problem with my session management and would like clarification on how the threading works in web.py.
Essentially I embed a code in a hidden field of the html page I render when someone accesses my page. The file upload is then done via a standard POST request containing both the file and the code. Then I retrieve the progress of the file by updating it in the file upload POST method and grabbing it with a GET request to a different class. The 'session' (apologies for it being fairly naive) is stored in a session object like this:
class session:
def __init__(self):
self.progress = 0
self.title = ""
self.finished = False
def advance(self):
self.progress = self.progress + 1
The sessions are all kept in a global dictionary within my app script and then accessed with my code (from earlier) as the key.
For some reason my progress seems to stay at 0 and never increments. I've been debugging for a couple hours now and I've found that the two session objects referenced from the upload class and the progress class are not the same. The two codes, however, are (as far as I can tell) equal. This is driving me mad as it worked without any problems on the web.py test server on my local machine.
EDIT: After some research it seems that the dictionary may get copied for every request. I've tried putting the dictionary in another and importing but this doesn't work. Is there some other way short of using a database to 'seperate' the sessions dictionary?
Apache/mod_wsgi can run in multiprocess configurations and possible your requests aren't even being serviced by the same process and never will if for that multiprocess configuration each process is single thread because while the upload is occuring no other requests can be handled by that same process. Read:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Possibly you should use mod_wsgi daemon mode with single multiple thread daemon process.
From PEP 333, defining WSGI:
Servers that can run multiple requests in parallel, should also provide the option of running an application in a single-threaded fashion, so that applications or frameworks that are not thread-safe may still be used with that server
Check the documentation of your WSGI server.

Categories