I'm writing a Twitter application with tweepy that crawls up the tweets by looking at in_reply_to_status_ID.
Everything works fine up to the rate limit, after a few minutes, I have to wait another 15 minutes or so.
This is strange because I used nearly identical code until a few months ago before API 1.0 got deprecated, and it didn't have the rate limit problem.
Is there a known way I can get rid of, or at least increase the rate limit?
Or is there a workaround?
Seems like a lot of people are having trouble with this, but can't find a definite solution..
i will greatly appreciate it if you could help.
auth1 = tweepy.auth.OAuthHandler('consumer_token','consumer_secret')
auth1.set_access_token('access_token','access_secret')
api=tweepy.API(auth1)
def hasParent(s):
#return true if s is not None, i.e., s is an in_reply_to_status_id numbe
....
while hasParent(ps):
try:
parent=api.get_status(ps)
except tweepy.error.TweepError:
print 'tweeperror'
break
newparent = parent.in_reply_to_status_id
......
ps=newparent
I put a limit and worked:
def index(request):
statuses = tweepy.Cursor(api.user_timeline).items(10)
return TemplateResponse(request, 'index.html', {'statuses': statuses})
This is due to you reached max limit. Just disconnect your internet connection and reconnect again, no need to wait.
Use cursor:
statuses = tweepy.Cursor(api.user_timeline).items(2)
If you get the error again, just reduce items.
Related
I'm getting a ConnectionResetError(104, 'Connection reset by peer'), and its not really in my control, from the other posts about this I've seen on SO, I've seen people add sleeps and it works for them. Here's my code:
for i,id in enumerate(id_list):
base_endpoint=f"https://endpoint.io/v1/resource/{id}/"
print("i:",i)
if i % 100 == 0:
print("sleeping")
sleep(10) #told it to sleep every 100 calls
with requests.Session() as session:
session.auth = (key, '')
sleep(1) #even added this
r = session.get(base_endpoint)
This is a toy example, I know I can add better exception handling, but point is there a better way to get around this stingy api? This is a SaaS product that we pay for, the api isn't meant to be used this way, but going to the devs is a several week long haul even to get a meeting.
Is there a different way to do this beyond just increasing the sleep time until it works?
I have the following function,
import requests
def get_url_type(data):
x = {}
for i in range(0,len(data)):
print i
try:
x[i] = requests.head(data['url'][i]).headers.get('content-type')
except:
x[i] = 'Not Available'
return(x)
This function returns the URL type of each URL that is being passed to it and whenever there is no response, it throws error which is caught using exception. My problem here is, some of the requests take more than 5-10 mins time which is too much on production environment. I want the function to return "Not Available" when it takes more than 5 mins. When I did a research about it, it was mentioned to convert the function to asynchronous one. I have trying to change it without much success.
The following is what I have tried,
import asyncio
import time
from datetime import datetime
async def custom_sleep():
print('SLEEP', datetime.now())
time.sleep(5)
My objective is, whenever the request function takes more than 5 mins, it should return "Not available" and move to the next iteration.
Can anybody help me in doing this?
Thanks in advance !
It seems you just want a request to time out after a given time has passed without reply and move on to the next request. For this functionality there is a timeout parameter you can add to your request. The documentation on this: http://docs.python-requests.org/en/master/user/quickstart/#timeouts.
With a 300 seconds (5 minutes) timeout your code becomes:
requests.head(data['url'][i], timeout=300)
The asynchronous functionality you are mentioning has actually a different objective. It would allow your code to not have to wait the 5 minutes at all before continuing execution but I believe that would be a different question.
I'm working on a research project that involves analyzing large amounts of data from Twitter. The project is being built in Python using Tweepy. As you might imagine I have to work very closely within the confines of the Twitter rate limiter. As such, my authentication code looks like this.
auth1 = tweepy.OAuthHandler("...", "...")
auth1.set_access_token("...", "...")
api1 = tweepy.API(auth1, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
Which does a wonderful job of stopping and waiting before I trip my limit on requests for a small scaled down run. However, when I try and run the program on my full data set I eventually get this error while the program is sleeping:
tweepy.error.TweepError: Failed to send request: ('Connection aborted.', error(104, 'Connection reset by peer'))
My research tells me that this is happening because Twitter is disconnecting and I need to catch the error. How would I catch this error, reconnect and have my program pick up where it left off? Any advice would be welcome.
The twitter disconnection error are socket exception,which is a special case of IOError exceptions.In order to catch that you need to do something like
auth = tweepy.OAuthHandler(… # set up your oauth here
try:
stream = tweepy.Stream(auth=auth, listener=SomeListener()) # start the stream
except IOError, ex:
print 'I just caught the exception: %s' % ex
If it works wrap in a while True loop with an increasing backoff so to provide some pause between re-connection.Reference link
I've also tried at the same way to wrap Tweepy calls inside a while True loop, but I got also issues with reconnections (in some cases this solution does not permit equally to solve the problem). Otherwise, I've thought to switch Auth (connected to Tweepy API instance, here "twapi") in case of error, and it seems to work properly:
...
while True:
try:
users_stream = twapi.lookup_users(screen_names=[scrname_list_here])
except tweepy.error.TweepError, ex:
time.sleep(120)
global twapi
global switch_auth
if switch_auth == False:
twapi = tweepy.API(auths[auth_id+1])
switch_auth = True
elif switch_auth == True:
twapi = tweepy.API(auths[auth_id])
switch_auth = False
continue
break
...
By using a bool variable switch_auth is possible (in case arises the Tweepy error related to failed reconnection) to "switch" the auth input of Tweepy API module (it can be assumed stored in auths list) to solve the problem.
The same technique can be used to 'switch' Auth when research's rate limit is reached. I hope it will be useful, just try!
I have a script written to iterate through a list of Twitter userIDs and save the lists of follower_ids to a file. I have used it several times with no problems. For long lists, I have added this piece of code to check the rate limit before every GET request and sleep if I'm about to be rate-limited:
rate_limit_json = api.rate_limit_status()
remaining_hits = rate_limit_json["remaining_hits"]
print 'you have', remaining_hits, 'API calls remaining until next hour'
if remaining_hits < 2:
dtcode = datetime.utcnow()
unixtime = calendar.timegm(dtcode.utctimetuple())
sleeptime = rate_limit_json['reset_time_in_seconds'] - unixtime + 10
print 'waiting ', sleeptime, 'seconds'
time.sleep(sleeptime)
else:
pass
I have this Oauth blurb set up at the top of the script:
auth = tweepy.OAuthHandler('xxxx', 'xxxx')
auth.set_access_token('xxxx', 'xxxxx')
api = tweepy.API(auth)
The call I'm repeatedly making is:
follower_cursors = tweepy.Cursor(api.followers_ids)
So now, no matter how many calls I make, my "remaining hits" stays at 150 until Twitter returns a "Rate limit exceeded. Clients may not make more than 350 requests per hour."
It seems like the rate limit checker is reporting my unauthorized, IP address's rate limit (150), but my calls are counting against my app's rate limit (350).
How can I adjust my rate limit checker to actually check the app's rate limit again?
Repeated creating a Cursor instance doesn't use up any API quote, as it doesn't actually make a request to twitter. A Cursor is just a Tweepy wrapper around different pagination methods. Calling the next or prev methods, of iterating over it, cause actual API calls.
Twitter no longer has a concept of unauthenticated API calls (the v1.1 API always requires authentication). I think the problem lies with your use of Cursor. Try creating just once instance, and calling the next method repeatedly (make sure to catch a StopIteration exception, as next is a generator).
Is there a way to get a random track from soundcloud API ? A work around I was thinking of also is getting a the total number of tracks, and picking a random number. But I can't find the way to get the total number of tracks either.
At the moment I am just wrapping the thing in a try/except, but then I do useless requests. If there's a way to avoid that !?
while (not track):
try:
track = client.get('/tracks/%s' % random.randint(0, 100000))
except requests.exceptions.HTTPError as e:
logger.error(e)
Are there any other requirements for the track you want to pick out? Doing a simple GET request on /tracks will return 50 track instances unless you specify a limit. You could just pick a random one out of that set?
import random
import soundcloud
client = soundcloud.Client(access_token='YOUR_ACCESS_TOKEN')
tracks = client.get('/tracks')
track = random.choice(tracks)
Hope that helps! Otherwise comment and I'll edit my answer with more details.