Google App Engine backend starts

Google App Engine backend starts - python

I am writing a GAE application that when it starts needs to initialise a connection to a third party service, and then run a continuous check in the background (essentially pulling data from third party and pushing it to a GAE task queue)
I know that backends get a call to /_ah/start which initialises them and lets GAE know the backend has started. Is it safe to start the pull process from StartHandler, i.e.
f = urllib2.urlop
for l in f:
deferred.defer(doMyStuff,l)
I think the answer is to have a StartHandler along the lines of:
class StartHandler(webapp2.RequestHandler):
def get(self):
logging.info("Handler started")
key = self.request.get('key')
taskqueue.add('/backend/startdata', params={'key':key}, target='1.backend0')
and then have the handler for /backend/startdata run the loop.
Advice and comments welcome.

Answer to this question. Google App Engine will not let this work. I gave it up and used a different cloud provider, because life's too short, and python should be python, anywhere.

Related

Use Multiple Azure Application Insights in one Flask app

Hi i have a flask application that Build as a docker image to serve as an api
this image is deployed to multiple environments (DEV/QA/PROD)
i want to use an applicationInsight for each environment
using a single application Insight works fine
here is a code snippet
app.config['APPINSIGHTS_INSTRUMENTATIONKEY'] = APPINSIGHTS_INSTRUMENTATIONKEY
appinsights = AppInsights(app)
#app.after_request
def after_request(response):
appinsights.flush()
return response
but to have multiple application i need to configure app.config with the key of the app insight
i thought of this solution which thourghs errors
here a snippet :
app = Flask(__name__)
def monitor(key):
app.config['APPINSIGHTS_INSTRUMENTATIONKEY'] = key
appinsights = AppInsights(app)
#app.after_request
def after_request(response):
appinsights.flush()
return response
#app.route("/")
def hello():
hostname = urlparse(request.base_url).hostname
print(hostname)
if hostname == "dev url":
print('Dev')
monitor('3ed57a90-********')
if hostname == "prod url":
print('prod')
monitor('941caeca-********-******')
return "hello"
this example contains the function monitor which reads the url and decide which app key to give so it can send metrics to the right place but apparently i can't do those processes after the request is sent (is there a way a config variable can be changed based on the url condition ?)
error Message :
AssertionError: The setup method 'errorhandler' can no longer be called on the application. It has already handled its first request, any changes will not be applied consistently. Make sure all imports, decorators, functions, etc. needed to set up the application are done before running it.
i hope someone can guide me to a better solution
thanks in advance

AFAIK, Normally Application Insights SDK collect the telemetry data, and it has sent to Azure by batch. So, you have to keep a single application insights resource for an application. Use staging for use different application insights for same application.
When the request started for the service till to complete his response the Application insights has taking care of the specific service life cycle. The application while start to end it will track the information. So that we can't use more than one Application Insights in single application.
When Application starts the AI start collecting telemetry data when Application stops then the AI stops gathering telemetry information. We were using Flush to even though in between application stops to send information to AI.
I have tried what you have used. It confirms the same in the log
I have tried with single application insights I can be able to collect all telemetry information.

Python-Flask setting up a que if a process is already running

I have a python flask application where a person is gonna request data by sending a query from A form. This data goes through a certain python script that does some api-requests and covert the data into a geographic standard.
The thing is because this data can take some time based on how many datapoints there are, this will happen in the background (we are researching this for Azure). Also there is another problem, and that is cueing up. Because if one request is running, another one cannot be started up. And the last command cannot be saved:
#app.route('/handle_data', methods=['POST'])
def handle_data():
sper_year = int(request.form["Speryear"])
email = request.form["inputEmail"]
url = request.form["api-url-input"]
random_string=get_random_string(5)
# app.route('/request-completed')
Requested_data = Program_Converter.main(url, sper_year,random_string)
Requested_data = Program_Converter.main(url, sper_year,random_string) is the function that needs to be qued.
How do I do this?

I believe that the most recommended way is to run this task asynchronously. Take a look at Celery and pick up a backend (I recommend Redis), with this setup you can provide Celery a task that will run your GIPOD_Converter process in the background and store the result somewhere you chose, then it can be sent back to the user.
Note that Celery will provide you a task id and is up to your client (web interface or mobile app, I'm not sure what you're working with) to poll and endpoint and wait for the celery task to come to an end.
There are a couple of examples on who to implement this all over the web, but take a look at the Flask official documentation and check out this article from the mighty Miguel Grinberg, I believe those are the best starting points for you.

Is there a less clunky way to interact with an AWS worker tier?

I have an Elastic Beanstalk application which is running a web server environment and a worker tier environment. My goal is to pass some parameters to an endpoint in the web server which submits a request to the worker which will then go off and do a long computation and write the results to an S3 bucket. For now I'm ignoring the "long computation" part and just writing a little hello world application which simulates the workflow. Here's my Flask application:
from flask import Flask, request
import boto3
import json
application = Flask(__name__)
#application.route("/web")
def test():
data = json.dumps({"file": request.args["file"], "message": request.args["message"]})
boto3.client("sqs").send_message(
QueueUrl = "really_really_long_url_for_the_workers_sqs_queue",
MessageBody = data)
return data
#application.route("/worker", methods = ["POST"])
def worker():
data = request.get_json()
boto3.resource("s3").Bucket("myBucket").put_object(Key = data["file"], Body = data["message"])
return data["message"]
if __name__ == "__main__":
application.run(debug = True)
(Note that I changed the worker's HTTP Path from the default / to /worker.) I deployed this application to both the web server and to the worker, and it does exactly what I expected. Of course, I had to do the usual IAMS configuration.
What I don't like about this is the fact that I have to hard code my worker's SQS URL into my web server code. This makes it more complicated to change which queue the worker polls, and more complicated to add additional workers, both of which will be convenient in production. I would like some code which says "send this message to whatever queue worker X is currently polling". It's obviously not a huge deal, but I thought I would see if anyone knows a way to do this.

Given the nature of the queue URLs, you may want to try keeping them in some external storage (an in-memory database or key-value store, perhaps) that associates the URLs with the IDs of the workers currently using them. That way you can update them as need be without having to modify your application. (The downside would be that you then have [an] additional source[s] of data to maintain and you'd need to write the interfacing code for both the server and workers.)

Keeping concurrency in web.py applications on mod_wsgi

Sorry if this makes no sense. Please comment if clarification is needed.
I'm writing a small file upload app in web.py which I am deploying using mod_wsgi + apache. I have been having a problem with my session management and would like clarification on how the threading works in web.py.
Essentially I embed a code in a hidden field of the html page I render when someone accesses my page. The file upload is then done via a standard POST request containing both the file and the code. Then I retrieve the progress of the file by updating it in the file upload POST method and grabbing it with a GET request to a different class. The 'session' (apologies for it being fairly naive) is stored in a session object like this:
class session:
def __init__(self):
self.progress = 0
self.title = ""
self.finished = False
def advance(self):
self.progress = self.progress + 1
The sessions are all kept in a global dictionary within my app script and then accessed with my code (from earlier) as the key.
For some reason my progress seems to stay at 0 and never increments. I've been debugging for a couple hours now and I've found that the two session objects referenced from the upload class and the progress class are not the same. The two codes, however, are (as far as I can tell) equal. This is driving me mad as it worked without any problems on the web.py test server on my local machine.
EDIT: After some research it seems that the dictionary may get copied for every request. I've tried putting the dictionary in another and importing but this doesn't work. Is there some other way short of using a database to 'seperate' the sessions dictionary?

Apache/mod_wsgi can run in multiprocess configurations and possible your requests aren't even being serviced by the same process and never will if for that multiprocess configuration each process is single thread because while the upload is occuring no other requests can be handled by that same process. Read:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Possibly you should use mod_wsgi daemon mode with single multiple thread daemon process.

From PEP 333, defining WSGI:
Servers that can run multiple requests in parallel, should also provide the option of running an application in a single-threaded fashion, so that applications or frameworks that are not thread-safe may still be used with that server
Check the documentation of your WSGI server.

Python, Twisted, Django, reactor.run() causing problem

I have a Django web application. I also have a spell server written using twisted running on the same machine having django (running on localhost:8090). The idea being when user does some action, request comes to Django which in turn connects to this twisted server & server sends data back to Django. Finally Django puts this data in some html template & serves it back to the user.
Here's where I am having a problem. In my Django app, when the request comes in I create a simple twisted client to connect to the locally run twisted server.
...
factory = Spell_Factory(query)
reactor.connectTCP(AS_SERVER_HOST, AS_SERVER_PORT, factory)
reactor.run(installSignalHandlers=0)
print factory.results
...
The reactor.run() is causing a problem. Since it's an event loop. The next time this same code is executed by Django, I am unable to connect to the server. How does one handle this?

The above two answers are correct. However, considering that you've already implemented a spelling server then run it as one. You can start by running it on the same machine as a separate process - at localhost:PORT. Right now it seems you have a very simple binary protocol interface already - you can implement an equally simple Python client using the standard lib's socket interface in blocking mode.
However, I suggest playing around with twisted.web and expose a simple web interface. You can use JSON to serialize and deserialize data - which is well supported by Django. Here's a very quick example:
import json
from twisted.web import server, resource
from twisted.python import log
class Root(resource.Resource):
def getChild(self, path, request):
# represents / on your web interface
return self
class WebInterface(resource.Resource):
isLeaf = True
def render_GET(self, request):
log.msg('GOT a GET request.')
# read request.args if you need to process query args
# ... call some internal service and get output ...
return json.dumps(output)
class SpellingSite(server.Site):
def __init__(self, *args, **kwargs):
self.root = Root()
server.Site.__init__(self, self.root, **kwargs)
self.root.putChild('spell', WebInterface())
And to run it you can use the following skeleton .tac file:
from twisted.application import service, internet
site = SpellingSite()
application = service.Application('WebSpell')
# attach the service to its parent application
service_collection = service.IServiceCollection(application)
internet.TCPServer(PORT, site).setServiceParent(service_collection)
Running your service as another first class service allows you to run it on another machine one day if you find the need - exposing a web interface makes it easy to horizontally scale it behind a reverse proxying load balancer too.

reactor.run() should be called only once in your whole program. Don't think of it as "start this one request I have", think of it as "start all of Twisted".
Running the reactor in a background thread is one way to get around this; then your django application can use blockingCallFromThread in your Django application and use a Twisted API as you would any blocking API. You will need a little bit of cooperation from your WSGI container, though, because you will need to make sure that this background Twisted thread is started and stopped at appropriate times (when your interpreter is initialized and torn down, respectively).
You could also use Twisted as your WSGI container, and then you don't need to start or stop anything special; blockingCallFromThread will just work immediately. See the command-line help for twistd web --wsgi.

You should stop reactor after you got results from Twisted server or some error/timeout happening. So on each Django request that requires query your Twisted server you should run reactor and then stop it. But, it's not supported by Twisted library — reactor is not restartable. Possible solutions:
Use separate thread for Twisted reactor, but you will need to deploy your django app with server, which has support for long running threads (I don't now any of these, but you can write your own easily :-)).
Don't use Twisted for implementing client protocol, just use plain stdlib's socket module.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Google App Engine backend starts - python

Answer to this question. Google App Engine will not let this work. I gave it up and used a different cloud provider, because life's too short, and python should be python, anywhere.

Related

Use Multiple Azure Application Insights in one Flask app

Python-Flask setting up a que if a process is already running

Is there a less clunky way to interact with an AWS worker tier?

Keeping concurrency in web.py applications on mod_wsgi

Python, Twisted, Django, reactor.run() causing problem

Categories

Resources