App Engine Backend not working - python

I'm having a hard time getting a backend to run on the GAE servers. The following works locally, but not when deployed:
counter.py:
from google.appengine.api import logservice
logservice.AUTOFLUSH_ENABLED = False
logging.error("Backend started!")
logservice.flush()
No log message is seen when deployed. I've even tried putting syntax error's in, they are not reported either, so it doesn't seem like the backend is actually running my code. I've tried doing the same in infinite loops with sleeps and such too, same result.
Here is the backends.yaml:
backends:
- name: counter
start: counter.py
instances: 1
class: B1
The backend is listed as running in the management console, but doesn't seem to be actually doing anything.
Anyone able to get a backend running on the GAE servers? Thanks!

There are three ways to call a backend service: scheduled Backend, tasked Backend and browsed Backend. Try http://counter.appname.appspot.com/path.
Sources:
http://www.pdjamez.com/2011/05/google-app-engine-backend-patterns/
http://www.pdjamez.com/2011/05/google-app-engine-backends/comment-page-1/

Related

Why does this gRPC call from the Google Secret Manager API hang when run by Apache?

In short:
I have a Django application being served up by Apache on a Google Compute Engine VM.
I want to access a secret from Google Secret Manager in my Python code (when the Django app is initialising).
When I do 'python manage.py runserver', the secret is successfully retrieved. However, when I get Apache to run my application, it hangs when it sends a request to the secret manager.
Too much detail:
I followed the answer to this question GCP VM Instance is not able to access secrets from Secret Manager despite of appropriate Roles. I have created a service account (not the default), and have given it the 'cloud-platform' scope. I also gave it the 'Secret Manager Admin' role in the web console.
After initially running into trouble, I downloaded the a json key for the service account from the web console, and set the GOOGLE_APPLICATION_CREDENTIALS env-var to point to it.
When I run the django server directly on the VM, everything works fine. When I let Apache run the application, I can see from the logs that the service account credential json is loaded successfully.
However, when I make my first API call, via google.cloud.secretmanager.SecretManagerServiceClient.list_secret_versions , the application hangs. I don't even get a 500 error in my browser, just an eternal loading icon. I traced the execution as far as:
grpc._channel._UnaryUnaryMultiCallable._blocking, line 926 : 'call = self._channel.segregated_call(...'
It never gets past that line. I couldn't figure out where that call goes so I couldnt inspect it any further than that.
Thoughts
I don't understand GCP service accounts / API access very well. I can't understand why this difference is occurring between the django dev server and apache, given that they're both using the same service account credentials from json. I'm also surprised that the application just hangs in the google library rather than throwing an exception. There's even a timeout option when sending a request, but changing this doesn't make any difference.
I wonder if it's somehow related to the fact that I'm running the django server under my own account, but apache is using whatever user account it uses?
Update
I tried changing the user/group that apache runs as to match my own. No change.
I enabled logging for gRPC itself. There is a clear difference between when I run with apache vs the django dev server.
On Django:
secure_channel_create.cc:178] grpc_secure_channel_create(creds=0x17cfda0, target=secretmanager.googleapis.com:443, args=0x7fe254620f20, reserved=(nil))
init.cc:167] grpc_init(void)
client_channel.cc:1099] chand=0x2299b88: creating client_channel for channel stack 0x2299b18
...
timer_manager.cc:188] sleep for a 1001 milliseconds
...
client_channel.cc:1879] chand=0x2299b88 calld=0x229e440: created call
...
call.cc:1980] grpc_call_start_batch(call=0x229daa0, ops=0x20cfe70, nops=6, tag=0x7fe25463c680, reserved=(nil))
call.cc:1573] ops[0]: SEND_INITIAL_METADATA...
call.cc:1573] ops[1]: SEND_MESSAGE ptr=0x21f7a20
...
So, a channel is created, then a call is created, and then we see gRPC start to execute the operations for that call (as far as I read it).
On Apache:
secure_channel_create.cc:178] grpc_secure_channel_create(creds=0x7fd5bc850f70, target=secretmanager.googleapis.com:443, args=0x7fd583065c50, reserved=(nil))
init.cc:167] grpc_init(void)
client_channel.cc:1099] chand=0x7fd5bca91bb8: creating client_channel for channel stack 0x7fd5bca91b48
...
timer_manager.cc:188] sleep for a 1001 milliseconds
...
timer_manager.cc:188] sleep for a 1001 milliseconds
...
So, we a channel is created... and then nothing. No call, no operations. So the python code is sitting there waiting for gRPC to make this call, which it never does.
The problem appears to be that the forking behaviour of Apache breaks gRPC somehow. I couldn't nail down the precise cause, but after I began to suspect that forking was the issue, I found this old gRPC issue that indicates that forking is a bit of a tricky area.
I tried to reconfigure Apache to use a different 'Multi-processing Module', but as my experience in this is limited, I couldn't get gRPC to work under any of them.
In the end, I switched to using nginx/uwsgi instead of Apache/mod_wsgi, and I did not have the same issue. If you're trying to solve a problem like this and you have to use Apache, I'd advice further investigating Apache forking, how gRPC handles forking, and the different MPMs available for Apache.
I'm facing a similar issue. When running my Flask Application with eventlet==0.33.0 and gunicorn https://github.com/benoitc/gunicorn/archive/ff58e0c6da83d5520916bc4cc109a529258d76e1.zip#egg=gunicorn==20.1.0. When calling secret_client.access_secret_version it hangs forever.
It used to work fine with an older eventlet version, but we needed to upgrade to the latest version of eventlet due to security reasons.
I experienced a similar issue and I was able to solve with the following:
import grpc.experimental.gevent as grpc_gevent
from gevent import monkey
from google.cloud import secretmanager
monkey.patch_all()
grpc_gevent.init_gevent()
client = secretmanager.SecretManagerServiceClient()

My backend calls itself via API - works fine with flask webserver, hangs with gunicorn

My application is a Flask backend that serves a SPA frontend react app. The backend actually consists of 2 layers called api and bff. All three run from the same process.
The frontend calls the bff which sometimes makes calls to the api layer - a REST api call, not an internal function call. E.g. the backend does requests.get("http://localhost/api/foo").
When I run my app locally using flask webserver, it works absolutely fine.
When I run my app locally using gunicorn, the calls from the frontend to the backend work fine but the backend calls to itself do not.
I don't understand what is different. Here's a typical printout; my print debug message showing the URL that is being called followed by the gunicorn critical worker death message 30 seconds later. I note that the session cookies are actually wiped at this point so the worker is actually dieing. Why?!
http://localhost:5000/api/requests/
[2020-05-20 21:30:12 +0100] [769] [CRITICAL] WORKER TIMEOUT (pid:771)
I'd be super grateful for any help here, thanks.
It's not clear to me why I wasn't seeing any errors but it looks there was an unhandled error. I fixed that and this now works. When I ran locally, i was using a slightly different config which didn't experience the unhandled error and so it wasn't an equivelant scenario.

Writing output in a sub-process leads to error

Keras forces output to sys.stderr (for which a fix was rejected on GitHub). There seems to be a problem writing to system outputs from a child process in a Web App. This leads to my code throwing the appropriate error when attempting to inform about the backend implementation when Keras is imported.
AttributeError: 'NoneType' object has no attribute 'write'
I tried to redirect output to os.devnull according to this answer before instantiating a Flask application and starting it with a web.config. However, the error persisted. Curiously, writing output without multiprocessing worked just fine.
import sys
from flask import Flask
import keras
app = Flask(__name__)
#app.route('/')
def main():
print('Hello!')
sys.stdout.write('test\n')
sys.stderr.write('emsg\n')
return 'OK.', 200
Even from keras import backend as k works. That's the statement that originally produced the error. This left me baffled. What could possibly be the matter?
Minimal example
In my application, a sub process is spawned for training models. When trying to write output within the multiprocessing.Process, an error is thrown. Here's some code to reproduce the situation.
import sys
from flask import Flask
from multiprocessing import Process
def write_output():
sys.stdout.write('hello\n')
def create_app():
apl = Flask(__name__)
Process(target=write_output).start()
#apl.route('/')
def main():
return 'OK.', 200
return apl
This application is then instantiated in another file and called from web.config. Basic logging confirmed the error was still being thrown.
Almost a fix
Although not a fix, I got the system working using threading. By simply switching multiprocessing.Queue and Process to queue.Queue and threading.Thread, no errors like above are thrown. For my use case this is acceptable for now. Of course it's not a solution to the problem of writing output in a child process.
I noticed that you mentioned the web.config file for Azure WebApp on Windows. And there is a limitation about Azure Web App sandbox you need to know. After I reviewed it with your scenario, I think your app was rejected by some restrictions. Per my experience, I think the task of trainning model is not suitable for running on Azure WebApp, especially for a Windows instance, even just on a sandbox based on CPU.
My suggestion is to move your app to a high-performace Azure VM with GPU, such as NC-series you can see from https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/.
Otherwise, you may have other reason to use Azure WebApp for your app, I recommanded that you could try Azure WebApp for Linux based on Docker.
Hope it helps.

Heroku RQ (Redis Queue) Django Error: "Apps aren't loaded yet."

I have a functional Django app that has many Google Text-To-Speech API calls and database reads/writes in my view. When testing locally it takes about 3 seconds to load a page, but when I deploy the app live to Heroku it takes about 15 seconds to load the webpage. So I am trying to reduce load time.
I came across this article: https://devcenter.heroku.com/articles/python-rq that suggests I should use background tasks by queueing jobs to workers using an RQ (Redis Queue) library. I followed their steps and included their worker.py file in the same directory as my manage.py file (not sure if that's the right place to put it). I wanted to test it out locally with a dummy function and view to see if it would run without errors.
# views.py
from rq import Queue
from worker import conn
def dummy(foo):
return 2
def my_view(request):
q = Queue(connection=conn)
for i in range(10):
dummy_foo = q.enqueue(dummy, "howdy")
return render(request, 'dummy.html', {})
In separate terminals I run:
$ python worker.py
$ python manage.py runserver
But when loading the webpage I received many "Apps aren't loaded yet." error messages in the python worker.py terminal. I haven't tried to deploy to Heroku yet, but I'm wondering why I am I getting this error message locally?
Better late than never.
Django-rq requires Django2.0, unfortunately for our project there is no plan to upgrade to the latest version.
So if you are in the same situation, you can still use plain RQ, you just need to add the two following lines in worker.py (worker_django_1_11) :
import django
django.setup()
and pass the worker class like :
> DJANGO_SETTINGS_MODULE=YOURPROJECT.settings rq worker --worker-class='worker_django_1_11.Worker'
You didn't post the code of worker.py, but I'd wager it does not properly initialize Django. Take a look at the contents of manage.py to see an example. So, if worker.py tries to instantiate (or even import) any models, views, etc, you'll get that kind of error. Django needs to resolve settings.py (among other things), then use that to look up database settings, resolve models/relationships, etc.
Simplest path is to use django-rq, a simple library that integrates RQ and Django to handle all this. Your worker.py essentially just becomes python manage.py rqworker.

Flask request.form.get too slow?

I am using Flask for my Web Api service.
Finding that my services sometimes (1/100 requests) respond really slow (seconds), I started debugging, which showed me that sometimes the service hangs on reading the request field.
#app.route('/scan', methods=['POST'])
def scan():
start_time = time.time()
request_description = request.form.get('requestDescription')
end_time = time.time()
app.logger.debug('delay is ' + end_time-start_time)
Here I found that delay between start_time and end_time can be up to 2 minutes.
I've read about using Flask's Werkzeug as a production server, so I tried GUnicorn as an alternative - same thing.
I feel that my problem is somehow similar to this one, with the difference that another server didn't solve the problem.
I tried to profile the app using cProfile and SnakeViz, but with the non-prod Werkzeug server - as I don't get how to profile python apps running on GUnicorn. (maybe anyone here knows how to?)
My POST requests contain description and a file. The file can vary in size, but the logs show that the issue reproduces regardless of the file size.
People also usually say that Flask should be used in Nginx-[normal server]-flask combo, but as I use the service inside Openshift, I doubt this has any meaning. (HaProxy works as a balancer)
So my settings:
Alpine 3.8.1
GUnicorn:
workers:3
threads:1
What happens under the hood when I call this?
request.form.get('requestDescription')
How can I profile Python code under GUnicorn?
Did anyone else encounter such a problem?
Any help will be appreciated
I did face this issue as well. I was uploading a video file using request.post(). Turns out that the video uploading was not the issue.
The timing bottleneck was the request.form.get(). While I am still trying to figure out the issue, you can use Flask Monitoring Dashboard to time profile the code
Turns out that the under the hood is return self._sock.recv_into(b) if you use the profiler

Categories