I'm working on a research project that involves analyzing large amounts of data from Twitter. The project is being built in Python using Tweepy. As you might imagine I have to work very closely within the confines of the Twitter rate limiter. As such, my authentication code looks like this.
auth1 = tweepy.OAuthHandler("...", "...")
auth1.set_access_token("...", "...")
api1 = tweepy.API(auth1, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
Which does a wonderful job of stopping and waiting before I trip my limit on requests for a small scaled down run. However, when I try and run the program on my full data set I eventually get this error while the program is sleeping:
tweepy.error.TweepError: Failed to send request: ('Connection aborted.', error(104, 'Connection reset by peer'))
My research tells me that this is happening because Twitter is disconnecting and I need to catch the error. How would I catch this error, reconnect and have my program pick up where it left off? Any advice would be welcome.
The twitter disconnection error are socket exception,which is a special case of IOError exceptions.In order to catch that you need to do something like
auth = tweepy.OAuthHandler(… # set up your oauth here
try:
stream = tweepy.Stream(auth=auth, listener=SomeListener()) # start the stream
except IOError, ex:
print 'I just caught the exception: %s' % ex
If it works wrap in a while True loop with an increasing backoff so to provide some pause between re-connection.Reference link
I've also tried at the same way to wrap Tweepy calls inside a while True loop, but I got also issues with reconnections (in some cases this solution does not permit equally to solve the problem). Otherwise, I've thought to switch Auth (connected to Tweepy API instance, here "twapi") in case of error, and it seems to work properly:
...
while True:
try:
users_stream = twapi.lookup_users(screen_names=[scrname_list_here])
except tweepy.error.TweepError, ex:
time.sleep(120)
global twapi
global switch_auth
if switch_auth == False:
twapi = tweepy.API(auths[auth_id+1])
switch_auth = True
elif switch_auth == True:
twapi = tweepy.API(auths[auth_id])
switch_auth = False
continue
break
...
By using a bool variable switch_auth is possible (in case arises the Tweepy error related to failed reconnection) to "switch" the auth input of Tweepy API module (it can be assumed stored in auths list) to solve the problem.
The same technique can be used to 'switch' Auth when research's rate limit is reached. I hope it will be useful, just try!
Related
Firestore listeners will randomly close after some length of time (possibly due to inactivity), and in python there is no easy way of catching the errors they throw because they throw them in a separate thread. For my case, I want to maintain a long lasting listener that never closes due to inactivity or server side error.
I've tried wrapping everything in a try - except, and then wrapping that all in the while(True) loop, but that doesn't catch the error because the error is thrown in a separate thread.
The error occurs after 10 minutes - 24 hours of inactivity (I'm not sure inactivity is the case, it could be random, but the shortest interval I ever found was 10 minutes after starting it) on both Linux and windows devices. I haven't tried Mac or any other devices, but I doubt it's device specific.
Looking at gRPC (the thing listeners use to communicate between client and server) spec, there is no default timeout for the python api (and a timeout wouldn't explain why it disconnects after different amounts of time), and no timeout it set anywhere in Firestores listener code.
The specific error that occurs is:
google.api_core.exceptions.InternalServerError: 500 Received RST_STREAM with error code 0
and sometimes
google.api_core.exceptions.InternalServerError: 500 Received RST_STREAM with error code 2
Minimal code to show the problem (left running on a dummy collection called info that only has one document in it for a while):
class TestWatchInfo():
def __init__(self):
self.query_watch = db.collection(u'info').on_snapshot(self.on_snapshot)
def on_snapshot(self, col_snapshot, changes, read_time):
try:
for change in changes:
pass
except Exception as err:
print(err)
print("Error occurred at " + str(time.ctime()))
traceback.print_exc()
if __name__ == '__main__':
try:
test_object = TestWatchInfo()
while(True):
time.sleep(60)
except Exception as err:
print(err)
print("Error occurred at " + str(time.ctime()))
traceback.print_exc()
Ideally, I would be able to catch the actual error that occurs in the main python thread, but as far as I can tell since I am not the one spawning the threads I have no way of adding thread/gRPC specific code to catch that error. Alternatively, I would like to be able to auto-restart the gRPC connection after it gets closed due to the server side.
In actuality, the Firestore listener just raises an error in the thread it created and closes the listener.
I figured out an alternative method to detecting the listener error and restarting the listener after a server side close. I have no idea how to catch the actual error, but I figured out how to detect when Firestore just randomly closes the listener connection.
In the Firebase listener code they keep track of a private variable '_closed' that becomes true if the connection ever gets closed for any reason. Therefore, if we periodically check that, we can restart our listener and be on our merry way.
Using the code from before, I added a new method start_snapshot in order to restart our failed listener expression on error, and in my long running code, I added a check against the listener to see if it is closed, and restart it if it is.
class TestWatchInfo():
def __init__(self):
self.start_snapshot()
def start_snapshot(self):
self.query_watch = db.collection(u'info').on_snapshot(self.on_snapshot)
def on_snapshot(self, col_snapshot, changes, read_time):
try:
for change in changes:
pass
except Exception as err:
print(err)
print("Error occurred at " + str(time.ctime()))
traceback.print_exc()
if __name__ == '__main__':
try:
test_object = TestWatchInfo()
while(True):
if test_object.query_watch._closed:
test_object.start_snapshot()
# code here
except Exception as err:
print(err)
print("Error occurred at " + str(time.ctime()))
traceback.print_exc()
I am writing a program which uses Tweepy to get data from Twitter. Tweepy uses another thread, and on occasion this thread throws an exception. However, my error catching logic does not catch the exceptions because they occur in a different thread. Is there any way to catch exceptions that are thrown by other threads without changing the thread's code?
To clarify, I needed to use the extra thread option in Tweepy so that the stream wouldn't block the rest of my program from executing. I get occasional updates from a database regarding which Twitter accounts to track, and the only way I was able to do this while streaming was to stream on a separate thread.
while 1:
# Create twitter stream
try:
# Reconnect to the stream if it was disconnected (or at start)
if reconnect:
reconnect = False
# NEW THREAD CREATED HERE
tweet_stream.filter(follow=twitter_uids, async=True)
# Sleep for sleep_interval before checking for new usernames
time.sleep(sleep_interval)
users_update = get_user_names(twitter_usernames)
# Restart the stream if new users were found in DB
if len(users_update) != 0:
# Disconnect and set flag for stream to be restarted with new usernames
twitter_usernames = users_update
twitter_uids = get_twitter_uids(users_update)
reconnect = True
tweet_stream.disconnect()
tweet_stream._thread.join()
except Exception as e:
# ERROR HANDLING CODE
I'm writing a basic program using Python and Tweepy to take a list of Twitter screen names and pull down the corresponding user IDs. I've got the rate limiter implemented and the program works but things fall apart when it hits my exception handling. It's telling me that the screen name in X isn't present after it waits the 15 minutes. I need exception handling as Tweepy often runs into issues while running. What am I doing wrong here?
f = open('output2.txt', 'w')
while True:
for x in HandleList1:
try:
u = api.get_user(id = x)
print >> f, u.id
except tweepy.TweepError:
print "We just hit an error, waiting for 15min and then reconnecting..."
time.sleep(60*15)
u = api.get_user(id = x)
print >> f, u.id
except StopIteration:
print "Stopping the iteration and processing the results!"
break
f.close()
I guess that TweepError covers multiple kinds of errors, including rate-limit errors and query errors. If you are searching for a username that no longer exists you may get the same error.
Check out how to print the exact kind of error you are running into here:
Get the error code from tweepy exception instance
I would add an if-else statement to your except tweepy.TweepError catch that checks if the error is a ratelimit error or something else as explained in the link. In the latter case you can just pass (or prints the error and the specific query you made).
I'm pulling data through Twitter's REST API using Twython.
I want the code to automatically rest as long as it needs to when it's reached the Twitter rate limit, then begin querying again.
Here's the code, which takes a list of Twitter IDs and adds their followers'IDs to the list:
for user in first_ids:
try:
followers = twitter.get_followers_ids(user_id=user, count=600)
for individual in followers['ids']:
if individual not in ids:
ids.append(individual)
except TwythonRateLimitError as error:
remainder = float(twitter.get_lastfunction_header(header='x-rate-limit-reset')) - time.time()
time.sleep(remainder)
continue
When I run it I get the following error: "Connection aborted. Error 10054: An existing connection was forcibly closed by the remote host"
What does the error mean? I imagine it's related to Twitter's rate limit -- is there another way around it?
you're leaving the connection open while your program sleeps, try closing it manually and then connecting again after the sleep timeout. Something like:
except TwythonRateLimitError as error:
remainder = float(twitter.get_lastfunction_header(header='x-rate-limit-reset')) - time.time()
twitter.disconnect()
time.sleep(remainder)
twitter = Twython(APP_KEY, APP_SECRET,OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
continue
if you are using REST api you can use the same solution deleting the api instead of using .disconnect()
simply use
del twitter
instead of
twitter.disconnect()
i had the same problem and it worked for me
Some of you might remember a question very similar to this, as I seeked your help writin the original util in C (using libssh2 and openssl).
I'm now trying to port it to python and got stuck at an unexpected place. Ported about 80% of the core and functionality in 30 minutes, and then spend 10hours+ and still haven't finished that ONE function, so I'm here again to ask for you help one more time :)
The whole source (~130 lines, should be easily readable, not complex) is available here: http://pastebin.com/Udm6Ehu3
The connecting, switching on SSL, handshaking, authentication and even sending (encrypted) commands works fine (I can see from my routers log that I log in with proper user and password).
The problem is with ftp_read in the tunnel scenario (else from self.proxy is None). One attempt was this:
def ftp_read(self, trim=False):
if self.proxy is None:
temp = self.s.read(READBUFF)
else:
while True:
try:
temp = self.sock.bio_read(READBUFF)
except Exception, e:
print type(e)
if type(e) == SSL.WantReadError:
try:
self.chan.send(self.sock.bio_read(10240))
except Exception, e:
print type(e)
self.chan.send(self.sock.bio_read(10240))
elif type(e) == SSL.WantWriteError:
self.chan.send(self.sock.bio_read(10240))
But I end up stuck at either having a blocked waiting for bio read (or channel read in the ftp_write function), or exception OpenSSL.SSL.WantReadError which, ironicly, is what I'm trying to handle.
If I comment out the ftp_read calls, the proxy scenario works fine (logging in, sending commands no problem), as mentioned. So out of read/write unencrypted, read/write encrypted I'm just missing the read tunnel encrypted.
I've spend 12hours+ now, and feel like I'm getting nowhere, so any thoughts are highly appreciated.
EDIT: I'm not asking someone to write the function for me, so if you know a thing or two about SSL (especially BIOs), and you can see an obvious flaw in my interaction between tunnel and BIO, that'll suffice as a answer :) Like: maybe the ftp_write returns more data than those 10240 bytes requested (or just sends two texts ("blabla\n", "command done.\n")) so it isn't properly flushed. Which might be true, but apparently I can't rely on .want_write()/.want_read() from pyOpenSSL to report anything but 0 bytes available.
Okay, so I think I manged to sort it out.
sarnold, you'll like this updated version:
def ftp_read(self, trim=False):
if self.proxy is None:
temp = self.s.read(READBUFF)
else:
temp = ""
while True:
try:
temp += self.sock.recv(READBUFF)
break
except Exception, e:
if type(e) == SSL.WantReadError:
self.ssl_wants_read()
elif type(e) == SSL.WantWriteError:
self.ssl_wants_write()
where ssl_wants_* is:
def ssl_wants_read(self):
try:
self.chan.send(self.sock.bio_read(10240))
except Exception, e:
chan_output = None
chan_output = self.chan.recv(10240)
self.sock.bio_write(chan_output)
def ssl_wants_write(self):
self.chan.send(self.sock.bio_read(10240))
Thanks for the input, sarnold. It made things a bit clearer and easier to work with. However, my issue seemed to be one missed error handling (broke out of SSL.WantReadError exception too soon).