Tweepy and Python: how to list all followers - python

With tweepy in Python I'm looking for a way to list all followers from one account, with username and number of followers.
Now I can obtain the list of all ids in this way:
ids = []
for page in tweepy.Cursor(api.followers_ids, screen_name="username").pages():
ids.extend(page)
time.sleep(1)
but with this list of ids I can't obtain username and number of followers of every id, because the rate limit exceed...
How I can complete this code?
Thank you all!

On the REST API, your are allowed 180 queries every 15 minutes, and I guess the Streaming API has a similar limitation. You do not want to come too close to this limit, since your application will eventually get blocked even if you do not strictly hit it.
Since your problem has something to do with the rate limit, you should put a sleep in your for loop. I'd say a sleep(4) should be enough, but it's mostly a matter of trial and error there, try to change the value and see for yourself.
Something like
sleeptime = 4
pages = tweepy.Cursor(api.followers, screen_name="username").pages()
while True:
try:
page = next(pages)
time.sleep(sleeptime)
except tweepy.TweepError: #taking extra care of the "rate limit exceeded"
time.sleep(60*15)
page = next(pages)
except StopIteration:
break
for user in page:
print(user.id_str)
print(user.screen_name)
print(user.followers_count)

Related

rate limit tweepy paginator search_all_tweets

I'm not sure why I am getting rate limited so quickly using:
mentions = []
for tweet in tweepy.Paginator(client.search_all_tweets, query= "to:######## lang:nl -is:retweet",
start_time = "2022-01-01T00:00:00Z", end_time = "2022-05-31T00:00:00Z",
max_results=500).flatten(limit=10000):
mention = tweet.text
mentions.append(mention)
I suppose I could put time.sleep(1) after these lines, but then it would mean I could only process one Tweet every second, whereas with a regular client.search_all_tweets I would get 500 Tweets per request.
Is there anything I'm missing here? How can I process more than one Tweet a second using tweepy.Paginator?
BTW: I have academic access and know the rate limit documentation.
See the FAQ section about this in Tweepy's documentation:
Why am I getting rate-limited so quickly when using Client.search_all_tweets() with Paginator?
The GET /2/tweets/search/all Twitter API endpoint that Client.search_all_tweets() uses has an additional 1 request per second rate limit that is not handled by Paginator.
You can time.sleep() 1 second while iterating through responses to handle this rate limit.
See also the relevant Tweepy issues #1688 and #1871.

Twitter pagination per page limit in downloading user profile Tweets

Here is the code I am using from this link. I have updated the original code as I need the full .json object. But I am having a problem with pagination as I am not getting the full 3200 Tweets.
api = tweepy.API(auth, parser=tweepy.parsers.JSONParser(),wait_on_rate_limit=True)
jsonFile = open(path+filname+'.json', "a+",encoding='utf-8')
page=1
max_pages=3200
result_limit=2
last_tweet_id=False
while page <= max_pages:
if last_tweet_id:
tweet = api.user_timeline(screen_name=user,
count=result_limit,
max_id=last_tweet_id - 1,
tweet_mode = 'extended',
include_retweets=True
)
else:
tweet = api.user_timeline(screen_name=user,
count=result_limit,
tweet_mode = 'extended',
include_retweets=True)
json_str = json.dumps(tweet, ensure_ascii=False, indent=4)
as per author "result_limit and max_pages are multiplied together to get the number of tweets called."
Then shouldn't I get 6400 Tweets by this definition. But the problem is I am getting 2 Tweets 3200 times. I also updated the values to
max_pages=3200
result_limit=5000
You can say it as a super limit so I should at least get 3200 Tweets. But in this case I got 200 Tweets repeated many times (as I terminated the code).
I just want 3200 Tweets per user profile, nothing fancy. Consider that I have 100 users list, so I want that in an efficient way. Currently seems like I am just sending so many requests and wasting time and assets.
Even though I update the code with a smaller value of max_pages, I am still not sure what should be that value, How am I supposed to know that a one-page covers how many Tweets?
Note: "This answer is not useful" as it has an error at .item() so please don't mark it duplicate.
You don't change last_tweet_id after setting it to False, so only the code in the else block is executing. None of the parameters in that method call change while looping, so you're making the same request and receiving the same response back over and over again.
Also, neither page nor max_pages changes within your loop, so this will loop infinitely.
I would recommend looking into using tweepy.Cursor instead, as it handles pagination for you.

how to set the limit of maxresults upto permissible limit of youtube data api in python?

I created this script to extract all playlist video info from a youtube channel in python but due to quota limit I'm unable to extract more than 10k videos info. How to set the limit to less than 10k or any other method to extract info?
This is my code:
while 1:
res = youtube.playlistItems().list(playlistId=playlist_id,\
part='id,snippet',maxResults=50,pageToken=next_page_token).execute()
next_page_token = res.get('nextPageToken')
if next_page_token is None:
break
One can use the counter to come out of the loop. Always keep in mind that calculate the quota for any kind of search. Yes its a bad thing to having 10K limit but anyhow one can extract atleast till 10k. Ex: a youtube.search.list with will incur cost as 5 per page.
count =1
...loop starts
then code
if count and next page response is not none
increment count
else
break
...

Create dictionary of multiple Twitter users' followers with Tweepy - get around 5000 per page limit

I am working on a project using Tweepy where I need to first grab all followers for a particular Twitter user, and then do the same for all of those followers' followers. I would like to store the latter part in a dictionary where the keys are the first set of followers, and the values are a list of their followers.
Here is my code:
followers_dict = {}
for h in myHandleList:
try:
c = tweepy.Cursor(api.followers_ids, id = h)
for page in c.pages():
followers_dict[h] = page
except tweepy.TweepError:
pass
This code works well for users with under 5000 followers. However, for users with more than 5000 followers, when I run the same code, the code splits their followers into separate lists of no more than 5000 values, and then only adds the second list as values in the dictionary.
For example, one user has 5,400 followers, so when I download their followers, it is formatted as two lists of 5000 and 400. When I use my loop to add their followers to a dictionary, it only adds the second list of 400. I would like to add all 5,400 as values in the dictionary.
I am a noob when it comes to Python, and as someone pointed out in the comments, this is surely an issue with my code - Any suggestions for how to fix this?
Thanks in advance!

Twitter error code 429 with Tweepy

I am trying to create a project that accesses a twitter account using the tweepy api but I am faced with status code 429. Now, I've looked around and I see that it means that I have too many requests. However, I am only ever for 10 tweets at a time and within those, only one should exist during my testing.
for tweet in tweepy.Cursor(api.search, q = '#realtwitchess ',lang = ' ').items(10):
try:
text = str(tweet.text)
textparts = str.split(text) #convert tweet into string array to disect
print(text)
for x, string in enumerate(textparts):
if (x < len(textparts)-1): #prevents error that arises with an incomplete call of the twitter bot to start a game
if string == "gamestart" and textparts[x+1][:1] == "#": #find games
otheruser = api.get_user(screen_name = textparts[2][1:]) #drop the # sign (although it might not matter)
self.games.append((tweet.user.id,otheruser.id))
elif (len(textparts[x]) == 4): #find moves
newMove = Move(tweet.user.id,string)
print newMove.getMove()
self.moves.append(newMove)
if tweet.user.id == thisBot.id: #ignore self tweets
continue
except tweepy.TweepError as e:
print(e.reason)
sleep(900)
continue
except StopIteration: #stop iteration when last tweet is reached
break
When the error does appear, it is in the first for loop line. The kinda weird part is that it doesn't complain every time, or even in consistent intervals. Sometimes it will work and other times, seemingly randomly, not work.
We have tried adding longer sleep times in the loop and reducing the item count.
Add wait_on_rate_limit=True on the API call like this:
api = tweepy.API(auth, wait_on_rate_limit=True)
This will make the rest of the code obey the rate limit
You found the correct information about error code. In fact, the 429 code is returned when a request cannot be served due to the application’s rate limit having been exhausted for the resource.(from documentation)
I suppose that your problem regards not the quantity of data but the frequency.
Check the Twitter API rate limits (that are the same for tweepy).
Rate limits are divided into 15 minute intervals. All endpoints require authentication, so there is no concept of unauthenticated calls and rate limits.
There are two initial buckets available for GET requests: 15 calls every 15 minutes, and 180 calls every 15 minutes.
I think that you can try to use API in this range to avoid the problem
Update
For the latest versions of Tweepy (from 3.2.0), the wait_on_rate_limit has been introduced.
If set to True, it allows to automatically avoid this problem.
From documentation:
wait_on_rate_limit – Whether or not to automatically wait for rate limits to replenish
api =tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
this should help for setting rate

Categories