Exact mechanics of time limits during Google App Engine Python's urlfetch?

Exact mechanics of time limits during Google App Engine Python's urlfetch? - python

I have a http request in my code that takes ~5-10 s to run. Through searching this site, I've found the code to increase the limit before timeout:
from google.appengine.api import urlfetch
urlfetch.set_default_fetch_deadline(60)
My question: What is that number '60'? Seconds or tenths of a second? Most responses seem to imply it's seconds, but that can't be right. When I use 60, I get a time out in less than 10 s while testing on localhost. I have to set the number to at least 100 to avoid the issue - which I worry will invoke the ire of the Google gods.

It's seconds, you can passed it in the fetch function. Have you tried to fetch another website? Are you sure it's a timeout not another error?
https://developers.google.com/appengine/docs/python/urlfetch/fetchfunction

Related

HTTP Requests in Azure Python Function App

I have an Azure function app with a timer trigger. Inside the app it makes several HTTP requests using the requests library:
requests.get(baseURL, params=params)
When I debug on my computer it runs without error. Requests take anywhere from 2 to 30 seconds to return. When I deploy in Azure, though, the function will hang after sending some requests and you have to restart it to get it to work again. It never throws an exception and never fails. Just hangs.
The number of requests that Azure successfully completes varies between 2 and 6. The requests are always sent in the same order and always return the same data. There doesn't seem to be any clear pattern for when it hangs. Sometimes it's on requests that return little data, sometimes requests that return more data.
Any ideas??

For this problem, please check if the function show the logs in time. Please click "Monitor" to check the log but not check logs in "Logs" window because the logs sometimes not show in "Logs" window. By the way, logs in "Monitor" may delay about 5 minutes.
Then please check if your function is timeout. The function in consumption plan set timeout value 5 minutes as default. So if your request several request in function and each request will take tens of seconds, please set functionTimeout with 00:10:00 in "host.json" (the max value of functionTimeout is 10 minutes in consumption plan).
If the function still doesn't work, please check if you request other HttpTrigger function url in your timer trigger function(and the HttpTrigger function is in same function app with your timer trigger function). If so, it may leads to some problem, you can refer to this post which I found similar problem in the past. To solve this problem, just create another function app to separate the timer trigger function and http trigger function.

Put a time limit on a request

I have a program and in order verify that the user doesnt download such big files using input i need a time limit on how long each request is allowed to take.
Does anyone know a good way to put a time limit(/lifetime) on each python requests get requests so if it takes 10 seconds an exception will be thrown.
Thanks

You can define your own timeout like:
requests.get('https://github.com/', timeout=0.001)

You can pass an additional timeout parameter to every request you make. This is always recommended as it will make your code more robust to hanging indefinitely in case you don't receive a response from the other end.
requests.get('https://github.com/', timeout=0.001)
Read the official python request documentation for timeouts here.

Delaying 1 second per request, not enough for 3600 per hour

The Amazon API limit is apparently 1 req per second or 3600 per hour. So I implemented it like so:
while True:
#sql stuff
time.sleep(1)
result = api.item_lookup(row[0], ResponseGroup='Images,ItemAttributes,Offers,OfferSummary', IdType='EAN', SearchIndex='All')
#sql stuff
Error:
amazonproduct.errors.TooManyRequests: RequestThrottled: AWS Access Key ID: ACCESS_KEY_REDACTED. You are submitting requests too quickly. Please retry your requests at a slower rate.
Any ideas why?

This code looks correct, and it looks like 1 request/second limit is still actual:
http://docs.aws.amazon.com/AWSECommerceService/latest/DG/TroubleshootingApplications.html#efficiency-guidelines
You want to make sure that no other process is using the same associate account. Depending on where and how you run the code, there may be an old version of the VM, or another instance of your application running, or maybe there is a version on the cloud and other one on your laptop, or if you are using a threaded web server, there may be multiple threads all running the same code.
If you still hit the query limit, you just want to retry, possibly with the TCP-like "additive increase/multiplicative decrease" back-off. You start by setting extra_delay = 0. When request fails, you set extra_delay += 1 and sleep(1 + extra_delay), then retry. When it finally succeeds, set extra_delay = extra_delay * 0.9.

Computer time is funny
This post is correct in saying "it varies in a non-deterministic manner" (https://stackoverflow.com/a/1133888/5044893). Depending on a whole host of factors, the time measured by a processor can be quite unreliable.
This is compounded by the fact that Amazon's API has a different clock than your program does. They are certainly not in-sync, and there's likely some overlap between their "1 second" time measurement and your program's. It's likely that Amazon tries to average out this inconsistency, and they probably also allow a small bit of error, maybe +/- 5%. Even so, the discrepancy between your clock and theirs is probably triggering the ACCESS_KEY_REDACTED signal.
Give yourself some buffer
Here are some thoughts to consider.
Do you really need to hit the Amazon API every single second? Would your program work with a 5 second interval? Even a 2-second interval is 200% less likely to trigger a lockout. Also, Amazon may be charging you for every service call, so spacing them out could save you money.
This is really a question of "optimization" now. If you use a constant variable to control your API call rate (say, SLEEP = 2), then you can adjust that rate easily. Fiddle with it, increase and decrease it, and see how your program performs.
Push, not pull
Sometimes, hitting an API every second means that you're polling for new data. Polling is notoriously wasteful, which is why Amazon API has a rate-limit.
Instead, could you switch to a queue-based approach? Amazon SQS can fire off events to your programs. This is especially easy if you host them with Amazon Lambda.

Avoid Python CGI browser timeout

I have a Python CGI that I use along with SQLAlchemy, to get data from a database, process it and return the result in Json format to my webpage.
The problem is that this process takes about 2mins to complete, and the browsers return a time out after 20 or 30 seconds of script execution.
Is there a way in Python (maybe a library?) or an idea of design that can help me let the script execute completely ?
Thanks!

You will have to set the timing on the http server's (Apache for
example) configuration. The default should be more than 120 seconds, if I remember
correct.

google cloud endpoints /_ah/api/discovery/v1/apis/myapi/v1/rpc takes half a minute to load

Very often, https://myversion-dot-my-module-dot-my-app-id.appspot.com/_ah/api/discovery/v1/apis/archivedash/v1/rpc?fields=methods%2F*%2Fid&pp=0 can take half a minute to load.
This is killing the load time of my app.
Searching the logs under that version & module with path:/_ah/api.* gives not results - so I can't see what's slowing it down. Also the warmup requests usually take less than a few seconds so it's not that?

I found this related issue here https://code.google.com/p/googleappengine/issues/detail?id=10017

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.