Accommodating Twitter's rate limit in Python's Twython - python

I'm pulling data through Twitter's REST API using Twython.
I want the code to automatically rest as long as it needs to when it's reached the Twitter rate limit, then begin querying again.
Here's the code, which takes a list of Twitter IDs and adds their followers'IDs to the list:
for user in first_ids:
try:
followers = twitter.get_followers_ids(user_id=user, count=600)
for individual in followers['ids']:
if individual not in ids:
ids.append(individual)
except TwythonRateLimitError as error:
remainder = float(twitter.get_lastfunction_header(header='x-rate-limit-reset')) - time.time()
time.sleep(remainder)
continue
When I run it I get the following error: "Connection aborted. Error 10054: An existing connection was forcibly closed by the remote host"
What does the error mean? I imagine it's related to Twitter's rate limit -- is there another way around it?

you're leaving the connection open while your program sleeps, try closing it manually and then connecting again after the sleep timeout. Something like:
except TwythonRateLimitError as error:
remainder = float(twitter.get_lastfunction_header(header='x-rate-limit-reset')) - time.time()
twitter.disconnect()
time.sleep(remainder)
twitter = Twython(APP_KEY, APP_SECRET,OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
continue

if you are using REST api you can use the same solution deleting the api instead of using .disconnect()
simply use
del twitter
instead of
twitter.disconnect()
i had the same problem and it worked for me

Related

How do I handle Thread Errors in Tweepy

I am writing a program which uses Tweepy to get data from Twitter. Tweepy uses another thread, and on occasion this thread throws an exception. However, my error catching logic does not catch the exceptions because they occur in a different thread. Is there any way to catch exceptions that are thrown by other threads without changing the thread's code?
To clarify, I needed to use the extra thread option in Tweepy so that the stream wouldn't block the rest of my program from executing. I get occasional updates from a database regarding which Twitter accounts to track, and the only way I was able to do this while streaming was to stream on a separate thread.
while 1:
# Create twitter stream
try:
# Reconnect to the stream if it was disconnected (or at start)
if reconnect:
reconnect = False
# NEW THREAD CREATED HERE
tweet_stream.filter(follow=twitter_uids, async=True)
# Sleep for sleep_interval before checking for new usernames
time.sleep(sleep_interval)
users_update = get_user_names(twitter_usernames)
# Restart the stream if new users were found in DB
if len(users_update) != 0:
# Disconnect and set flag for stream to be restarted with new usernames
twitter_usernames = users_update
twitter_uids = get_twitter_uids(users_update)
reconnect = True
tweet_stream.disconnect()
tweet_stream._thread.join()
except Exception as e:
# ERROR HANDLING CODE

Python Requests Module - API Calls

I'm written a django web project and am using some API calls. I'd like to try and build in some mechanisms to handle slow and failed API calls. Specifically, I'd like to try the API call three times with increasing call times breaking the loop when the request is successful. What is a good way to handle this or is what I've put together acceptable? Below is the code I have in place now.
for x in [0.5, 1, 5]:
try:
r = requests.get(api_url, headers = headers, timeout=x)
break
except:
pass
You can use exceptions provided by requests itself to handle failed Api calls. You can use ConnectionError exception if a network problem occurs. Refer to this so post for more details. I am not pasting a link to requests docs and explaining every exception in detail since SO post given before have the answer for your question. An example code segment is given below
try:
r = requests.get(url, params={'key': 'value'})
except requests.exceptions.ConnectionError as e:
print e
This outlines the procedure I'm talking about. A single API request could end up being a little flaky.
migrateup.com/making-unreliable-apis-reliable-with-python/#

Working around error 104 and Twitter rate limiting

I'm working on a research project that involves analyzing large amounts of data from Twitter. The project is being built in Python using Tweepy. As you might imagine I have to work very closely within the confines of the Twitter rate limiter. As such, my authentication code looks like this.
auth1 = tweepy.OAuthHandler("...", "...")
auth1.set_access_token("...", "...")
api1 = tweepy.API(auth1, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
Which does a wonderful job of stopping and waiting before I trip my limit on requests for a small scaled down run. However, when I try and run the program on my full data set I eventually get this error while the program is sleeping:
tweepy.error.TweepError: Failed to send request: ('Connection aborted.', error(104, 'Connection reset by peer'))
My research tells me that this is happening because Twitter is disconnecting and I need to catch the error. How would I catch this error, reconnect and have my program pick up where it left off? Any advice would be welcome.
The twitter disconnection error are socket exception,which is a special case of IOError exceptions.In order to catch that you need to do something like
auth = tweepy.OAuthHandler(… # set up your oauth here
try:
stream = tweepy.Stream(auth=auth, listener=SomeListener()) # start the stream
except IOError, ex:
print 'I just caught the exception: %s' % ex
If it works wrap in a while True loop with an increasing backoff so to provide some pause between re-connection.Reference link
I've also tried at the same way to wrap Tweepy calls inside a while True loop, but I got also issues with reconnections (in some cases this solution does not permit equally to solve the problem). Otherwise, I've thought to switch Auth (connected to Tweepy API instance, here "twapi") in case of error, and it seems to work properly:
...
while True:
try:
users_stream = twapi.lookup_users(screen_names=[scrname_list_here])
except tweepy.error.TweepError, ex:
time.sleep(120)
global twapi
global switch_auth
if switch_auth == False:
twapi = tweepy.API(auths[auth_id+1])
switch_auth = True
elif switch_auth == True:
twapi = tweepy.API(auths[auth_id])
switch_auth = False
continue
break
...
By using a bool variable switch_auth is possible (in case arises the Tweepy error related to failed reconnection) to "switch" the auth input of Tweepy API module (it can be assumed stored in auths list) to solve the problem.
The same technique can be used to 'switch' Auth when research's rate limit is reached. I hope it will be useful, just try!

Have a python function run for an alotted time

I have a python script that pulls from various internal network sources. With how our systems are set up we will initiate a urllib pull from a network location and it will get hung up waiting forever for a response on certain parts of the network. I would like my script to check that if it hasnt finished the pull in lets say 5 minutes it will pass the function and attempt to pull from the next address, and record it to a bad directory repository(so we can go check out which systems get hung up, there's like over 20,000 IP addresses we are checking some with some older scripts running on them that no longer work but will still try and run when requested, and they never stop trying to run)
Im familiar with having a script pause at a certain point
import time
time.sleep(300)
What Im thinking from a psuedo code perspective (not proper python just illustrating the idea)
import time
import urllib2
url_dict = ['http://1', 'http://2', 'http://3', ...]
fail_log_path = 'C:/Temp/fail_log.txt'
for addresses in url_dict:
clock_value = time.start()
while clock_value <= 300:
print str(clock_value)
res = urllib2.retrieve(url)
if res != []:
pass
else:
fail_log = open(fail_log_path, 'a')
fail_log.write("Failed to pull from site location: " + str(url) + "\n")
faile_log.close
Update: a specific option for this dealing with urls timeout for urllib2.urlopen() in pre Python 2.6 versions
Found this answer which is more in line with the overall problem of my question:
kill a function after a certain time in windows
Your code as is doesn't seem to describe what you were saying. It seems you want the if/else check inside your while loop. On top of that, you would want to loop over the ip addresses and not over a time period as your code is currently written (otherwise you will keep requesting the same ip address every time). Instead of keeping track of time yourself, I would suggest reading up on urllib.request.urlopen - specifically the timeout parameter. Once set, that function call will throw a socket.timeout exception once the time limit is reached. Surround that with a try/except block catching that error and then handle it appropriately.

get_rate_status "remaining hits" does not decrease when I make GET calls anymore

I have a script written to iterate through a list of Twitter userIDs and save the lists of follower_ids to a file. I have used it several times with no problems. For long lists, I have added this piece of code to check the rate limit before every GET request and sleep if I'm about to be rate-limited:
rate_limit_json = api.rate_limit_status()
remaining_hits = rate_limit_json["remaining_hits"]
print 'you have', remaining_hits, 'API calls remaining until next hour'
if remaining_hits < 2:
dtcode = datetime.utcnow()
unixtime = calendar.timegm(dtcode.utctimetuple())
sleeptime = rate_limit_json['reset_time_in_seconds'] - unixtime + 10
print 'waiting ', sleeptime, 'seconds'
time.sleep(sleeptime)
else:
pass
I have this Oauth blurb set up at the top of the script:
auth = tweepy.OAuthHandler('xxxx', 'xxxx')
auth.set_access_token('xxxx', 'xxxxx')
api = tweepy.API(auth)
The call I'm repeatedly making is:
follower_cursors = tweepy.Cursor(api.followers_ids)
So now, no matter how many calls I make, my "remaining hits" stays at 150 until Twitter returns a "Rate limit exceeded. Clients may not make more than 350 requests per hour."
It seems like the rate limit checker is reporting my unauthorized, IP address's rate limit (150), but my calls are counting against my app's rate limit (350).
How can I adjust my rate limit checker to actually check the app's rate limit again?
Repeated creating a Cursor instance doesn't use up any API quote, as it doesn't actually make a request to twitter. A Cursor is just a Tweepy wrapper around different pagination methods. Calling the next or prev methods, of iterating over it, cause actual API calls.
Twitter no longer has a concept of unauthenticated API calls (the v1.1 API always requires authentication). I think the problem lies with your use of Cursor. Try creating just once instance, and calling the next method repeatedly (make sure to catch a StopIteration exception, as next is a generator).

Categories