Query regarding pagination in tweepy (get_followers) of a particular twitter user - python

I am fairly new to tweepy and pagination using the cursor class. I have been trying to user the cursor class to get all the followers of a particular twitter user but I keep getting the error where it says "tweepy.error.TweepError: This method does not perform pagination"
Hence I would really appreciate any help if someone could please help me achieve this task of obtaining all the followers of a particular twitter user with pagination, using tweepy. The code I have so far is as follows:
import tweepy
consumer_key='xyz'
consumer_secret='xyz'
access_token='abc'
access_token_secret='def'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
user = api.get_user('somehandle')
print user.name
followers = tweepy.Cursor(user.followers)
temp=[]
for user in followers.items():
temp.append(user)
print temp
#the following part works fine but that is without pagination so I will be able to retrieve at #most 100 followers
aDict = user.followers()
for friend in aDict:
friendDict = friend.__getstate__()
print friendDict['screen_name']

There is a handy method called followers_ids. It returns up to 5000 followers (twitter api limit) ids for the given screen_name (or id, user_id or cursor).
Then, you can paginate these results manually in python and call lookup_users for every chunk. As long as lookup_users can handle only 100 user ids at a time (twitter api limit), it's pretty logical to set chunk size to 100.
Here's the code (pagination part was taken from here):
import itertools
import tweepy
def paginate(iterable, page_size):
while True:
i1, i2 = itertools.tee(iterable)
iterable, page = (itertools.islice(i1, page_size, None),
list(itertools.islice(i2, page_size)))
if len(page) == 0:
break
yield page
auth = tweepy.OAuthHandler(<consumer_key>, <consumer_secret>)
auth.set_access_token(<key>, <secret>)
api = tweepy.API(auth)
followers = api.followers_ids(screen_name='gvanrossum')
for page in paginate(followers, 100):
results = api.lookup_users(user_ids=page)
for result in results:
print result.screen_name
Hope that helps.

Related

how to get whole user timeline of a specific twitter user

so I came up with this script to get the all of a user tweet from one twitter user
import tweepy
from tweepy import OAuthHandler
import json
def load_api():
Consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
return tweepy.API(auth)
api = load_api()
user = 'allkpop'
tweets = api.user_timeline(id=user, count=2000)
print('Found %d tweets' % len(tweets))
tweets_text = [t.text for t in tweets]
filename = 'tweets-'+user+'.json'
json.dump(tweets_text, open(filename, 'w'))
print('Saved to file:', filename)
but when I run it I can only get 200 tweets per request. Is there a way to get 2000 tweets or at least more than 2000 tweets?
please help me, thank you
The Twitter API has request limits. The one you're using corresponds to the Twitter statuses/user_timeline endpoint. The max number that you can get for this endpoint is documented as 3,200. Also note that there's a max number of requests in a 15-minute window, which might explain why you're only getting 2000, instead of the max. Here are a couple observations that might be interesting for you:
Documentation says that the max count is 200.
There's an include_rts (include retweets) parameter that might return more values. While it's part of the Twitter API, I can't see where Tweepy documents that.
You might try Tweepy Cursors to see if that will bring back more items.
Because of the 15 minute limits, you might be able to pause until the next 15 minute window to continue. That said, I don't know enough about your use case to know if this is practical or not.

Followers in twitter (tweepy)

I'm trying to download my followers of twitter and the followers of my followers. T
The code seems to work but it doesn´t download all my followers. It downloads a portion and in this portion I think it works well. But why not all?
why is it?
-- coding: utf-8 --
"""
#author: Mik
"""
import csv
import time
import tweepy
# Copy the api key, the api secret, the access token and the access token secret from the relevant page on your Twitter app
api_key = ''
api_secret = ''
access_token = '-'
access_token_secret = ''
# You don't need to make any changes below here # This bit authorises you to ask for information from Twitter
auth = tweepy.OAuthHandler(api_key, api_secret)
auth.set_access_token(access_token, access_token_secret)
# The api object gives you access to all of the http calls that Twitter accepts
api = tweepy.API(auth)
#User we want to use as initial node
user=''
#This creates a csv file and defines that each new entry will be in a new line
csvfile=open(user+'network2.csv', 'w')
spamwriter = csv.writer(csvfile, delimiter=' ',quotechar='|', quoting=csv.QUOTE_MINIMAL)
#This is the function that takes a node (user) and looks for all its followers #and print them into a CSV file... and look for the followers of each follower...
def fib(n,user,spamwriter):
if n>0:
#There is a limit to the traffic you can have with the API, so you need to wait
#a few seconds per call or after a few calls it will restrict your traffic
#for 15 minutes. This parameter can be tweeked
time.sleep(40)
#This is for private users that we wont be able to see their followers
try:
users=tweepy.Cursor(api.followers, screen_name = user, wait_on_rate_limit = True).items()
for follower in users:
spamwriter.writerow([user+';'+follower.screen_name])
fib(n-1,follower.screen_name,spamwriter)
#n defines the level of autorecurrence
except tweepy.TweepError:
print("Failed to run the command on that user, Skipping...")
n=2
fib(n,user,spamwriter)
If I understood correctly then you want to get ids of all followers of each of your followers.
Use logic like following, it will get you details of your 3000 followers per 15 minutes
import tweepy
#twitter credentials here---------------------------------------------------
auth = tweepy.OAuthHandler(your keys)
auth.set_access_token(your keys)
api = tweepy.API(auth)
iter1 = tweepy.Cursor(api.followers, screen_name = 'your_screen_name',count = 200).pages()
for request in range(15):
your_200_followers = next(iter1)
for each_follower in your_200_followers:
variable = each_follower.followers_ids
#store the <list> variable somewhere

TwitterAPI: how to streaming multiple users by id

I'm streaming all tweets that mention one of the usernames (screen_name) that I have on a list( TRACK_TERM ).
from TwitterAPI import TwitterAPI
api = TwitterAPI(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
TRACK_TERM = ['#CNN', '#FoxNews', '#FOXTV', '#BBC'... + 500]
r = api.request('statuses/filter', {'track': TRACK_TERM})
My problem is that users might sometimes change their screen_name. So I was wondering if there's a way to track user' mentions by their user id instead of their screen_name. As this script will run continuously for a month.
I'm using the TwitterAPI I also try twython
Instead of the track parameter try using the follow parameter.
USER_IDS = '%d,%d,%d' % (ID1,ID2,ID3)
r = api.request('statuses/filter', {'follow': USER_IDS})
The docs are here.

Get all friends of a given user on twitter with tweepy

Using tweepy I am able to return all of my friends using a cursor. Is it possible to specify another user and get all of their friends?
user = api.get_user('myTwitter')
print "Retreiving friends for", user.screen_name
for friend in tweepy.Cursor(api.friends).items():
print "\n", friend.screen_name
Which prints a list of all my friends, however if I change the first line
to another twitter user it still returns my friends. How can I get friends for any given user using tweepy?
#first line is changed to
user = api.get_user('otherUsername') #still returns my friends
Additionally user.screen_name when printed WILL return otherUsername
The question Get All Follower IDs in Twitter by Tweepy does essentially what I am looking for however it returns only a count of ID's. If I remove the len() function I will I can iterate through a list of user IDs, but is it possible to get screen names #twitter,#stackoverflow, #etc.....?
You can use the ids variable from the answer you referenced in the other answer to get the the id of the followers of a given person, and extend it to get the screen names of all of the followers using Tweepy's api.lookup_users method:
import time
import tweepy
auth = tweepy.OAuthHandler(..., ...)
auth.set_access_token(..., ...)
api = tweepy.API(auth)
ids = []
for page in tweepy.Cursor(api.followers_ids, screen_name="McDonalds").pages():
ids.extend(page)
time.sleep(60)
screen_names = [user.screen_name for user in api.lookup_users(user_ids=ids)]
You can use this:
# import the module
import tweepy
# assign the values accordingly
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
# authorization of consumer key and consumer secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# set access to user's access key and access secret
auth.set_access_token(access_token, access_token_secret)
# calling the api
api = tweepy.API(auth)
# the screen_name of the targeted user
screen_name = "TwitterIndia"
# printing the latest 20 friends of the user
for friend in api.friends(screen_name):
print(friend.screen_name)
for more details see https://www.geeksforgeeks.org/python-api-friends-in-tweepy/

Managing Tweepy API Search

Please forgive me if this is a gross repeat of a question previously answered elsewhere, but I am lost on how to use the tweepy API search function. Is there any documentation available on how to search for tweets using the api.search() function?
Is there any way I can control features such as number of tweets returned, results type etc.?
The results seem to max out at 100 for some reason.
the code snippet I use is as follows
searched_tweets = self.api.search(q=query,rpp=100,count=1000)
I originally worked out a solution based on Yuva Raj's suggestion to use additional parameters in GET search/tweets - the max_id parameter in conjunction with the id of the last tweet returned in each iteration of a loop that also checks for the occurrence of a TweepError.
However, I discovered there is a far simpler way to solve the problem using a tweepy.Cursor (see tweepy Cursor tutorial for more on using Cursor).
The following code fetches the most recent 1000 mentions of 'python'.
import tweepy
# assuming twitter_authentication.py contains each of the 4 oauth elements (1 per line)
from twitter_authentication import API_KEY, API_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET
auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
query = 'python'
max_tweets = 1000
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query).items(max_tweets)]
Update: in response to Andre Petre's comment about potential memory consumption issues with tweepy.Cursor, I'll include my original solution, replacing the single statement list comprehension used above to compute searched_tweets with the following:
searched_tweets = []
last_id = -1
while len(searched_tweets) < max_tweets:
count = max_tweets - len(searched_tweets)
try:
new_tweets = api.search(q=query, count=count, max_id=str(last_id - 1))
if not new_tweets:
break
searched_tweets.extend(new_tweets)
last_id = new_tweets[-1].id
except tweepy.TweepError as e:
# depending on TweepError.code, one may want to retry or wait
# to keep things simple, we will give up on an error
break
There's a problem in your code. Based on Twitter Documentation for GET search/tweets,
The number of tweets to return per page, up to a maximum of 100. Defaults to 15. This was
formerly the "rpp" parameter in the old Search API.
Your code should be,
CONSUMER_KEY = '....'
CONSUMER_SECRET = '....'
ACCESS_KEY = '....'
ACCESS_SECRET = '....'
auth = tweepy.auth.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
search_results = api.search(q="hello", count=100)
for i in search_results:
# Do Whatever You need to print here
The other questions are old and the API changed a lot.
Easy way, with Cursor (see the Cursor tutorial). Pages returns a list of elements (You can limit how many pages it returns. .pages(5) only returns 5 pages):
for page in tweepy.Cursor(api.search, q='python', count=100, tweet_mode='extended').pages():
# process status here
process_page(page)
Where q is the query, count how many will it bring for requests (100 is the maximum for requests) and tweet_mode='extended' is to have the full text. (without this the text is truncated to 140 characters) More info here. RTs are truncated as confirmed jaycech3n.
If you don't want to use tweepy.Cursor, you need to indicate max_id to bring the next chunk. See for more info.
last_id = None
result = True
while result:
result = api.search(q='python', count=100, tweet_mode='extended', max_id=last_id)
process_result(result)
# we subtract one to not have the same again.
last_id = result[-1]._json['id'] - 1
I am working on extracting twitter data for around a location (in here, around India), for all tweets which include a special keyword or a list of keywords.
import tweepy
import credentials ## all my twitter API credentials are in this file, this should be in the same directory as is this script
## set API connection
auth = tweepy.OAuthHandler(credentials.consumer_key,
credentials.consumer_secret)
auth.set_access_secret(credentials.access_token,
credentials.access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True) # set wait_on_rate_limit =True; as twitter may block you from querying if it finds you exceeding some limits
search_words = ["#covid19", "2020", "lockdown"]
date_since = "2020-05-21"
tweets = tweepy.Cursor(api.search, =search_words,
geocode="20.5937,78.9629,3000km",
lang="en", since=date_since).items(10)
## the geocode is for India; format for geocode="lattitude,longitude,radius"
## radius should be in miles or km
for tweet in tweets:
print("created_at: {}\nuser: {}\ntweet text: {}\ngeo_location: {}".
format(tweet.created_at, tweet.user.screen_name, tweet.text, tweet.user.location))
print("\n")
## tweet.user.location will give you the general location of the user and not the particular location for the tweet itself, as it turns out, most of the users do not share the exact location of the tweet
Results:
created_at: 2020-05-28 16:48:23
user: XXXXXXXXX
tweet text: RT #Eatala_Rajender: Media Bulletin on status of positive cases #COVID19 in Telangana. (Dated. 28.05.2020)
# TelanganaFightsCorona
# StayHom…
geo_location: Hyderabad, India
You can search the tweets with specific strings as showed below:
tweets = api.search('Artificial Intelligence', count=200)

Categories