Im receiving a heroku timeout error with code H12 when Im calling an api via my flask app. The api usualy responds within 2min. Im calling the api via a different thread so that the main flask app thread keeps running.
with ThreadPoolExecutor(max_workers=5) as executor:
future = executor.submit(shub_api, website, merchant.id)
result = future.result()
There is some documentation on Heroku on running background tasks, however the python examples were for using Redis that i know nothing about. Are there some other solutions to this problem?
This is not working because of the way Heroku is architected.
When your web application is deployed to Heroku, it runs on dynos. Dynos are "ephemeral webservers" that only live for a small amount of time. This means that when a user makes a request to your app, the user's request will be handled by a dyno that may only live for a short period of time.
Heroku dynos are constantly starting, stopping, and being moved around to other physical hosts. This means that web dynos should not be used to run tasks that take a long time to complete (there are different worker dynos for that).
Furthermore, every web request that is served by a Heroku dyno has a 30-second timeout. What this means is that if someone makes an HTTP request to your app on Heroku, your app must start responding to the client within 30 seconds, otherwise, Heroku's routing layer will issue an H12 TIMEOUT error to you because it thinks your app has frozen or gotten stuck in a loop somewhere.
To sum it up: Heroku's architecture is such that it is designed from the ground up to follow web best practices, which means having your HTTP requests finish quickly (< 30 seconds) and not relying on your web servers being permanent fixtures where you can just run code on them all the time.
What you should do to resolve this issue instead is to use a background worker process (essentially it's just a second type of dyno you can run some code on that will process long-running tasks) and have your web application send a notification to your worker process to start running your task code.
This is typically done via a message queue like Redis, AWS SQS, etc. This Heroku article explains the concept in more detail.
Related
I am using Heroku to host a Django web app. This is just for a fun project that will make no money so paying for the premium service would not make sense.
I am using APScheduler to run a cron job once per day. So for this, I have a clock dyno running. The issue is that the clock dyno keeps going idle after 30mins of inactivity. I read that you can ping the app to keep it from idling but unfortunately, this just keeps the web dyno from idling, the clock dyno still idles.
Any recommendations?
I'm essentially looking for a free way to send scheduled emails once a day. I tried using mailchimp but you have to pay to schedule an email.
Okay so it looks like my original solution does actually work, the issue was with the timezone that was set for the cron job.
There is a service to ping your app to keep it from idling called keffeine.
http://kaffeine.herokuapp.com/
i'm currently working on a Python web app that needs to implement RabbitMQ.
The app is structured like that :
The client connects to a HTTP server
His connexion is send to a message queue that is connected to the main service of my app
the main service receive the message and give the user his information
I understand how to make work RabbitMq using the documentation and tutorial on the website but I have trouble seeing how can it work with real tasks like displaying a web page or printing a file ? How does my service connected to the message queue will read the message received and say : "oh, i'm gonna display this webpage".
Sorry if this is confusing, if you need further explanations on what i'm trying to get, just tell me.
Thanks for reading!
RabbitMq can be good to send message to service which can execute long running process - ie download big file, generate complex animation. Web server can't (or shoudn't) execute long running process.
Web page sends message to RabbitMq (ie. with parameters for long running process) and get unique number. When service has free worker then it checks if there is new message in queue, get it (with unique number) and start worker. When worker finish job then service send result to RabbitMQ with the same uniqe number.
At the same time web page uses JavaScript to run loop which periodically check in RabbitMQ if there is result with this unique number. If there is no result then it may display progressbar, if there is result then it may display this result.
Example: Celery - Distributed Task Queue.
Celery can use RabbitMQ to communicate with Django or Flask.
(but it can use other modules ie. Redis)
Using Celery with Django.
Flask - Celery Background Tasks
From Celery repo
Celery is usually used with a message broker to send and receive messages.
The RabbitMQ, Redis transports are feature complete, but there's also
experimental support for a myriad of other solutions, including using
SQLite for local development.
I'm currently developing HTTP Rest API server using Flask and Gunicorn. For various reason, it is not possible to put a reverse proxy server in front of Gunicorn. I don't have any static media, and all url are being served by #app.route pattern in Flask Framework. Can Flask run on Gunicorn alone?
It could, but it is a very bad idea. Gunicorn is not working well without a proxy that is doing request and response buffering for slow clients.
Without buffering the gunicorn worker has to wait until the client has send the whole request and then has to wait until the client has read the whole response.
This can be a serious problem if there are clients on a slow network for example.
http://docs.gunicorn.org/en/latest/deploy.html?highlight=buffering
see also: http://blog.etianen.com/blog/2014/01/19/gunicorn-heroku-django/
Because Gunicorn has a relatively small (2x CPU cores) pool of workers, if can only handle a small number of concurrent requests. If all the worker processes become tied up waiting for network traffic, the entire server will become unresponsive. To the outside world, your web application will cease to exist.
I am trying to design a web application that processes large quantities of large mixed-media files coming from asynchronous processes. Each process can take several minutes.
The files are either uploaded as a POST body or pulled by the web server according to a source URL provided. The files can be processed by a variety of external tools in a synchronous or asynchronous way.
I need to be able to load balance this application so I can process multiple large files simultaneously for as much as I can afford to scale.
I think Python is my best choice for this project, but beside this, I am open to any solution. The app can either deliver the file back or rely on a messaging channel to notify the clients about the process completion.
Some approaches I thought I might use:
1) Use a non-blocking web server such as Tornado that keeps the connection open until the file processing is done. The external processing command is launched and the web server waits until the file is ready and pipes the resulting IO stream directly back to the web app that returns it. Since the processes sending requests are asynchronous, they might afford to wait (unless memory or some other issues come up).
2) Use a regular web server like Cherrypy (which I am more confident with) and have the webapp use a messaging channel to report the processing progress. The web server returns a HTTP response as soon as it receives the file, validates it and sends it to a background process. At the same time it sends a message notifying the process start. The background process then takes care of delivering the file to an available location and sending another message to the channel notifying the location of the new file. This solution looks more flexible than 1), but requires writing a separate script to handle the messages outside the web application, as well as a separate storage space for the temp files that have to be cleaned up at a certain point.
3) Use some internal messaging capability of any of the webserves mentioned above, which I am not familiar with...
Edit: something like CherryPy's pub-sub engine (http://cherrypy.readthedocs.org/en/latest/extend.html?highlight=messaging#publish-subscribe-pattern) could be a good solution.
Any suggestions?
Thank you,
gm
I had a similar situation come up with a really large scale data processing engine that my team implemented. We wanted to build our api calls in Flask, some of which can take many hours to complete, but have a way to notify the user in real time what is going on.
Basically what I came up with is was what you described as option 2. On the same machine that I am serving the flask app through apache, I created a tornado app that serves up a websocket that reports progress to the end user. Once my main page is served, it establishes the websocket connection to the tornado server, and the flask app periodically sends updates to the tornado app, and down to the end user. Even if the browser is closed during the long running application, apache keeps the request alive and processing, and upon logging back in, I can still see the current progress.
I wrote about this solution in some more detail here:
http://jonfeatherstone.com/2013/08/01/mongo-and-websockets-for-application-logging/
Good luck!
I want to make a Google App Engine app that does the following:
Client makes an asynchronous http request
Server starts processing that request
Client makes ajax http requests to get progress
The problem is that the server processing (step #2) may take more than 30 seconds.
I know that you can't have threads on Google Application Engine and that all tasks must complete within 30 seconds or they get shut down. Is there some way to work around this?
Also, I'm using python-django as a backend.
You'll want to use the Task Queue API, probably via deferred tasks. The deferred API makes working with Task Queues dramatically simpler.
Essentially, you'll want to spawn a task to start the processing. That task should catch DeadlineExceeded exceptions and reschedule itself (again via the deferred API) to continue processing. This requires that your tasks be able to keep track of their own progress. They can also update their own status in memcache, which you can use to write a view that checks a task's status. That view can then be polled via Ajax.