I'm trying to build an app in Python with Google App Engine that fetches followers of specific accounts and then their tweets. I'm basing it on this template and changing it to adapt it to what I need.
The issue at the moment is that when I try to fetch followers, I get an DeadlineExceededError due to the Twitter API waiting time.
I have found this post on how to fix the same problem and I think that in my case the best solution would be to use backends, but I noticed that they are deprecated.
Does someone know how I can achieve the same result without the deprecated module?
You have a couple options that you can use for long-running tasks:
Use GAE Task Queues: GAE provides push and pull queues which allow you to do work asynchronously outside of the individual request.
Use Cloud Pub/Sub: A type of pull queue, this would allow your App Engine app to publish a message every time you wanted fetch followers or fetch tweets. The subscriber would then take the message from the queue, perform a long-running task, and then put the result into some datastore.
Use GAE Services: This would allow you to create a background service and manually scale it to run as long as you need.
Backends (modules) have been deprecated in favor of Services:
https://cloud.google.com/appengine/docs/flexible/python/an-overview-of-app-engine
For the Service you want to be able to handle requests longer than 60 seconds, set it to Manual Scaling. Then, a request can run for up to 24 hours (or until you shut it down). See:
https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed#instance_scaling
Of course, your costs may go up with long running instances and request.
Related
So, I am currently working on a django project hosted at pythonanywhere, which includes a feature for notifications, while also receiving data externally from sensors through AWS. I have been thinking of the best practice in order to implement this.
I currently have a simple implementation which is a view that checks all notifications and does the actions as needed if required, with an always-on task (which simply means a script that is running independently) sending a REST request to the server every minute.
Server side:
views.py:
def checkNotifications(request):
notificationsObject = notifications.objects.order_by('thing').values_list('thing').distinct()
thingsList = list(notificationsObject)
for thing in thingsList:
valuesDic = returnAllField(thing)
thingNotifications = notifications.objects.filter(thing=thing)
#Do stuff for each notification
urls:
path('notifications/',views.checkNotifications,name="checkNotification")
and the client just sents a GET request to my URL/notifications/. which works.
Now, while researching I saw some other options such as the ones discussed here with django background tasks and/or celery:
How to initialize repeating tasks using Django Background Tasks?
Celery task best practices in Django/Python
as well as some other options.
My question is: Is there a benefit to moving from my first implementation to this one? The only benefit I can see directly is avoid abuse from another service trying to hit my URl to check notifications too often, but I can/have a required authentication to avoid that. And, is there a certain "best practice" with regards to this, considering that I am checking with this repeating task quite so often, it almost feels like there should be a more proper/cleaner solution. For one, I am not sure if running a repeating task is the best option with pythonanywhere.
(https://help.pythonanywhere.com/pages/AsyncInWebApps/ suggests using always-on tasks, but it also mentions django background tasks)
Thank you
To use Django background tasks on PythonAnywhere you need to run it using an always-on task, so it is not an alternative, but just the other use of always-on tasks.
You can also access your Django code in your always-on task directly with some kind of long-running management command, so you do not need to hit your web app with a special request.
I'm using a frontend built in angularjs and a backend built in python and webapp2 in app engine.
The backend makes calls to a third party API, fetches data and returns to the frontend.
The API request from the backend may take upto 30s or more. The problem is the frontend can't really progress any further until it gets the data.
I tried running 3 simultaneous requests to the backend using different tabs and 2 of them failed. I'm afraid that this seems to suggest that the app only allows one user at a time.
What's the best way to handle this? One thought I have is:
Use task queues to run the API call to 3rd party in the background
Create a new handler which reads from the queue for the last task sent and let the frontend poll this one at regular intervals
Update the frontend once data is available
Is that the right way? I'm sure this is a problem solved in a frontend+backend kind of world, but I just don't know what to search for.
Thanks!
Requests from the frontend are capped at 30 seconds; after that they time out in the server side. That is part of GAE's design. Requests originating from the task queue get 10 minutes, so your idea is viable. However, you'll want some identifier to use for polling, rather than just using "the last sent," to distinguish between concurrent tasks.
What would be the best practice in this scenario?
I have an App Engine Python app, with multiple cron jobs. Instantiated by user requests and cron jobs, push notifications might be sent. This could easily scale up to a total of +- 100 pushes per minute.
Setting up and tearing down a connection to APNs for every batch is not what I want. Neither is Apple advising to do this. So I would like to keep the connection alive, even when user requests finish or when a cron finishes. Possibly with a timeout (2 minutes no pushes, then close then connection).
Reading the GAE documentation, I couldn't figure out if there even is such a thing available. Also, I might need this to be available in different apps and/or modules.
You can put the messages in a pull taskqueue and have a backend instance (or a cron job) to process the tasks
First, please take a look at Google Cloud Messaging. It's cool and you can use it easier than APNS's protocol.
If you can not use GCM (because of code refactoring, etc ...), I think AppEngine Managed VM is suitable for your situation now. Managed VM is something that stands between AppEngine and Compute Engine.
You can use the datastore (eventually shadowed by memcache for performance) to persist all the necessary APN (or any other) connection/protocol status/context info such that multiple related requests can share the same connection as if your app would be a long-living one.
Maybe not trivial, but definitely feasible.
Some requests may need to be postponed temporarily, depending on the shared connection status/context, that's true.
I am currently working on an application for Google App Engine, and I need some advice to detect the number of online users in the application. How can I do this?
I am using a session library. Do I need to overwrite the session methods (create_session, destroy_session increment/and decrement a value in datastore) or is there another method that I can use?
HTTP is stateless, so there's no inherent definition of "online user". You could count the number of non-destroyed sessions you've created, but unless you've got a cron job that destroys old sessions, this won't give an accurate picture.
You basically need to decide how much time without a new page request counts as "online" and query for the sessions that have been updated in that range of time.
You may use channel api to maintain a connection with the client. "The Channel API creates a persistent connection between your application and Google servers, allowing your application to send messages to JavaScript clients in real time without the use of polling. "
http://code.google.com/appengine/docs/java/channel/overview.html
I want to make a Google App Engine app that does the following:
Client makes an asynchronous http request
Server starts processing that request
Client makes ajax http requests to get progress
The problem is that the server processing (step #2) may take more than 30 seconds.
I know that you can't have threads on Google Application Engine and that all tasks must complete within 30 seconds or they get shut down. Is there some way to work around this?
Also, I'm using python-django as a backend.
You'll want to use the Task Queue API, probably via deferred tasks. The deferred API makes working with Task Queues dramatically simpler.
Essentially, you'll want to spawn a task to start the processing. That task should catch DeadlineExceeded exceptions and reschedule itself (again via the deferred API) to continue processing. This requires that your tasks be able to keep track of their own progress. They can also update their own status in memcache, which you can use to write a view that checks a task's status. That view can then be polled via Ajax.