I have the following function,
import requests
def get_url_type(data):
x = {}
for i in range(0,len(data)):
print i
try:
x[i] = requests.head(data['url'][i]).headers.get('content-type')
except:
x[i] = 'Not Available'
return(x)
This function returns the URL type of each URL that is being passed to it and whenever there is no response, it throws error which is caught using exception. My problem here is, some of the requests take more than 5-10 mins time which is too much on production environment. I want the function to return "Not Available" when it takes more than 5 mins. When I did a research about it, it was mentioned to convert the function to asynchronous one. I have trying to change it without much success.
The following is what I have tried,
import asyncio
import time
from datetime import datetime
async def custom_sleep():
print('SLEEP', datetime.now())
time.sleep(5)
My objective is, whenever the request function takes more than 5 mins, it should return "Not available" and move to the next iteration.
Can anybody help me in doing this?
Thanks in advance !
It seems you just want a request to time out after a given time has passed without reply and move on to the next request. For this functionality there is a timeout parameter you can add to your request. The documentation on this: http://docs.python-requests.org/en/master/user/quickstart/#timeouts.
With a 300 seconds (5 minutes) timeout your code becomes:
requests.head(data['url'][i], timeout=300)
The asynchronous functionality you are mentioning has actually a different objective. It would allow your code to not have to wait the 5 minutes at all before continuing execution but I believe that would be a different question.
Related
I have a python library which must be fast enough for online application. If a particular request (function call) takes too long, I want to just bypass this request with an empty result returned.
The function looks like the following:
def fast_function(text):
result = mylibrary.process(text)
...
If the mylibrary.process spend time more than a threshold limit, i.e. 100 milliseconds, I want to bypass this request and proceed to process the next 'text'.
What's the normal way to handle this? Is this a normal scenario? My application can afford to bypass a very small number of requests like this, if it takes too long.
One way is to use a signal timer. As an example:
import signal
def took_too_long():
raise TimeoutError
signal.signal(signal.SIGALRM, took_too_long)
signal.setitimer(signal.ITIMER_REAL, 0.1) # 0.1 seconds
try:
result = mylibrary.process(text)
signal.setitimer(signal.ITIMER_REAL, 0) # success, reset to 0 to disable the timer
except TimeoutError:
# took too long, do something
You'll have to experiment to see if this does or does not add too much overhead.
You can add a timeout to your function.
One way to implement it is to use a timeout decorator which will throw an exception if the function runs for more than the defined timeout. In order to pass to the next operation you can catch the exception thrown by the timeout.
Install this one for example: pip install timeout-decorator
import timeout_decorator
#timeout_decorator.timeout(5) # timeout of 5 seconds
def fast_function(text):
result = mylibrary.process(text)
I have a cloud function calling another cloud function in python. My issue is that when I call the next function, the first one waits for its execution or times out.
The key is that this is about Google Cloud Functions. Particularly as mismatch between function timeout and maximum API call rate. My issue is that the function's maximum timeout (540 seconds) is shorter than the time I need to make the required API calls and that I don't want to create more triggers.
How can I make the first (the "caller") finish, after calling the second function, that does its work?
Some sample code:
# main.py
# url: this-particular-cloud-function
# function initiated with a post request containing {"previous_tin_index": 0}
import requests
import time
import logging
final_tin_index = 100
def eat_spam(request):
started_eating_spam = time.time()
spam_json = request.get_json()
spam_to_eat = spam_json["previous_tin_index"]
for spam in range(spam_to_eat):
time.sleep(5)
previous_tin_index += 1
logging.info("I hate spam....")
finished_previous_spam_time = time.time() - started_eating_spam
if finished_previous_spam_time >= 10:
logging.info("Make it stop!")
requests.post("this-particular-cloud-function", json={"previous_tin_index": previous_tin_index})
return "200"
EDIT: I know that the inherent problem is that the function never reaches the return value. I am wondering if this can be fixed, other than, for example, rewriting the code into a Javascript promise.
P.S. I looked at the Cloud Documentation, but python seems to be lacking in the particular example.
This may solve your issue.
def f1(x):
print('f1', x)
return f2, (x+1,)
def f2(x):
print('f2', x)
return f1, (x+1,)
f, args = f1, (0,)
while True:
f, args = f(*args)
Credits to this post.
Although I believe that the issue in your case is the work flow. You are returning a value but the problem is that you never reach that point in your code. Let's assume that you have foo1() and inside foo1() you call foo2(). Then foo()2 is starting to be executed but before you return to foo1() and continue until you reach the return command, foo1() is timed out.
In case your issue isn’t solved then the problem may be at the second function, so it will may be needed to review this function in order to resolve your problem.
Please let me know if this was helpful.
I'm using an API to receive data every one second using a structure like this:
from time import sleep
try:
data = API(options)
except:
sleep(0.5)
The whole process is in a bigger loop for repeating that every one second. Sometimes this API returns an error so this structure can handle this issue. But sometimes I think it doesn't return any value for a long time, so this will stop the code, and I need to run the API function again. How can I handle this in my code?
I'm writing a Twitter application with tweepy that crawls up the tweets by looking at in_reply_to_status_ID.
Everything works fine up to the rate limit, after a few minutes, I have to wait another 15 minutes or so.
This is strange because I used nearly identical code until a few months ago before API 1.0 got deprecated, and it didn't have the rate limit problem.
Is there a known way I can get rid of, or at least increase the rate limit?
Or is there a workaround?
Seems like a lot of people are having trouble with this, but can't find a definite solution..
i will greatly appreciate it if you could help.
auth1 = tweepy.auth.OAuthHandler('consumer_token','consumer_secret')
auth1.set_access_token('access_token','access_secret')
api=tweepy.API(auth1)
def hasParent(s):
#return true if s is not None, i.e., s is an in_reply_to_status_id numbe
....
while hasParent(ps):
try:
parent=api.get_status(ps)
except tweepy.error.TweepError:
print 'tweeperror'
break
newparent = parent.in_reply_to_status_id
......
ps=newparent
I put a limit and worked:
def index(request):
statuses = tweepy.Cursor(api.user_timeline).items(10)
return TemplateResponse(request, 'index.html', {'statuses': statuses})
This is due to you reached max limit. Just disconnect your internet connection and reconnect again, no need to wait.
Use cursor:
statuses = tweepy.Cursor(api.user_timeline).items(2)
If you get the error again, just reduce items.
Currently I am trying to get all Github user location. I am using github3 python library to get the location. But it gives me over-API usage error when my api calls are more than 5K. Here is my code.
import github3
from datetime import datetime
import sys
def main(pswd):
g = github3.login(username="rakeshcusat", password=pswd)
current_time = datetime.now()
fhandler = open("githubuser_"+current_time.strftime("%d-%m-%y-%H:%M:%S"), "w")
for user in g.iter_all_users():
user.refresh()
try:
fhandler.write(" user: {0}, email: {1}, location: {2}\n".format(str(user), str(user.email), str(user.location)))
except:
print "Something wrong, user id : {0}".format(user.id);
fhandler.close()
if __name__ == "__main__":
if len(sys.argv) == 2:
main(sys.argv[1])
else:
print "Please provide your password"
I can do this by downloading all username first which will be only single API call. And then iteratively download the user location. If hit over-usage then wait for one hour and resume the api call where it was left. But this seems like a lame solution and definitely it will take more time(almost 25+ hours). Can some one provide me better way of doing this?
So if you use the development version of github3.py you can use the per_page parameter, e.g.,
for user in g.iter_all_users(per_page=200):
user.refresh()
#: other logic
The thing is, you'll save 7 requests using per_page (1 request now returns 25 if I remember correctly, so you'll get the equivalent of 8 requests in 1). The problem is you're then using 200 requests rather quickly with User#refresh. What you could do, to avoid the ratelimit is to use sleep in your code to space out your requests. 5000 requests split over 3600 seconds is 1.389 requests per second. If each request takes half a second (which I think is an underestimation personally), you could do
import time
for user in g.iter_all_users(per_page=200):
user.refresh()
#: other logic
time.sleep(0.5)
This will make sure one request is made per second and that you never hit the ratelimit. Regardless, it's rather lame.
In the future, I would store these values in the database using the user's id as the id in the database and then just look for the max and try to start there. I'll have to check if /users supports something akin to the since parameter. Alternatively, you could also work like so
import time
i = g.iter_all_users(per_page=200):
for user in i:
user.refresh()
#: other logic
time.sleep(0.5)
# We have all users
# store i.etag somewhere then later
i = g.iter_all_users(per_page=200, etag=i.etag)
for user in i:
user.refresh()
#: etc
The second iterator should give you all new users since the last one in your last request if I remember correctly but I'm currently very tired so I could be remembering something wrong.