Dataframe Number of Followers for List of Twitter Accounts - python

I'm trying to use Tweepy to get the number of followers for a list of twitter accounts, some of which are inactive/suspended. So far I have imported an excel file where one column is all the twitter usernames and another column would be the number of followers for that given account.
I want to iterate through the twitter handle column, remove any inactive/suspended usernames from the dataframe, then iterate through the revised dataframe and populate the number of followers column with the number of followers for each account using Tweepy. I know this will have to be done with two for loops and am looking for help with the first: I'm stuck on how to iterate through the column and remove these inactive accounts.
Here's the head of the dataframe for reference - I'd like to populate the Twitter Followers column and replace the NaNs with the number of twitter followers.
So far I've written an exception into my code that flags for specific Tweepy errors like inactive/suspended accounts.
I need help writing the first for loop that removes rows from the dataframe for inactive twitter accounts. Currently when I run this I get no output.
My code so far:
import tweepy
import sys
import csv
from urllib.request import urlopen
try:
import json
except ImportError:
import simplejson as json
from datetime import datetime
import pandas as pd
CONSUMER_KEY = ""
CONSUMER_SECRET = ""
ACCESS_TOKEN = ""
ACCESS_TOKEN_SECRET = ""
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
# Creation of the actual interface, using authentication
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
df = pd.read_excel('/Downloads/inc5k_twitterhandles_2.xlsx', sheet_name=0)
usernames = df['Twitter Handle'].tolist()
rows_to_drop = []
for i in df['Twitter Handle']:
try:
user_followers=api.friends_ids(id=i)# this returns the interger value of the user
except tweepy.error.TweepError as t:
if t.message[0]['code'] == 50: #Code for user not found error
rows_to_drop.append(i)
elif t.message[0]['code'] == 88: # The code for the rate limit error
time.sleep(15*5) # Sleep for 15 minutes
df.drop(df.index[rows_to_drop], inplace=True )
print(df.head())

Related

'API' object has no attribute 'search' using Tweepy

I am trying to scrape Twitter profiles for a project I am doing. I have the following code
from tweepy import OAuthHandler
import pandas as pd
"""I like to have my python script print a message at the beginning. This helps me confirm whether everything is set up correctly. And it's nice to get an uplifting message ;)."""
print("You got this!")
access_token = ''
access_token_secret = ''
consumer_key = ''
consumer_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
tweets = []
count = 1
"""Twitter will automatically sample the last 7 days of data. Depending on how many total tweets there are with the specific hashtag, keyword, handle, or key phrase that you are looking for, you can set the date back further by adding since= as one of the parameters. You can also manually add in the number of tweets you want to get back in the items() section."""
for tweet in tweepy.Cursor(api.search, q="#BNonnecke", count=450, since='2020-02-28').items(50000):
print(count)
count += 1
try:
data = [tweet.created_at, tweet.id, tweet.text, tweet.user._json['screen_name'], tweet.user._json['name'], tweet.user._json['created_at'], tweet.entities['urls']]
data = tuple(data)
tweets.append(data)
except tweepy.TweepError as e:
print(e.reason)
continue
except StopIteration:
break
df = pd.DataFrame(tweets, columns = ['created_at','tweet_id', 'tweet_text', 'screen_name', 'name', 'account_creation_date', 'urls'])
"""Add the path to the folder you want to save the CSV file in as well as what you want the CSV file to be named inside the single quotations"""
df.to_csv(path_or_buf = '/Users/Name/Desktop/FolderName/FileName.csv', index=False)
however, I keep getting the error "API" object has no attribute "search" from the line "for tweet in tweepy.Cursor(api.search, q="#BNonnecke", count=450, since='2020-02-28').items(50000):" I am not really sure why and don't know how to resolve this issue.
Thanks so much!
The latest version of Tweepy (v4 upwards) now has a search_tweets method instead of a search method. Check the documentation.
API.search_tweets(q, *, geocode, lang, locale, result_type, count, until, since_id, max_id, include_entities)
Also, read the comment in your code :-) The Search API has a 7 day history limit, so searching for Tweets since 2020-02-28 will only return Tweets posted in the 7 days before the date you run your code.

how to get whole user timeline of a specific twitter user

so I came up with this script to get the all of a user tweet from one twitter user
import tweepy
from tweepy import OAuthHandler
import json
def load_api():
Consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
return tweepy.API(auth)
api = load_api()
user = 'allkpop'
tweets = api.user_timeline(id=user, count=2000)
print('Found %d tweets' % len(tweets))
tweets_text = [t.text for t in tweets]
filename = 'tweets-'+user+'.json'
json.dump(tweets_text, open(filename, 'w'))
print('Saved to file:', filename)
but when I run it I can only get 200 tweets per request. Is there a way to get 2000 tweets or at least more than 2000 tweets?
please help me, thank you
The Twitter API has request limits. The one you're using corresponds to the Twitter statuses/user_timeline endpoint. The max number that you can get for this endpoint is documented as 3,200. Also note that there's a max number of requests in a 15-minute window, which might explain why you're only getting 2000, instead of the max. Here are a couple observations that might be interesting for you:
Documentation says that the max count is 200.
There's an include_rts (include retweets) parameter that might return more values. While it's part of the Twitter API, I can't see where Tweepy documents that.
You might try Tweepy Cursors to see if that will bring back more items.
Because of the 15 minute limits, you might be able to pause until the next 15 minute window to continue. That said, I don't know enough about your use case to know if this is practical or not.

Getting whole user timeline of a Twitter user

I want to get the all of a user tweets from one Twitter user and so far this is what I came up with:
import twitter
import json
import sys
import tweepy
from tweepy.auth import OAuthHandler
CONSUMER_KEY = ''
CONSUMER_SECRET= ''
OAUTH_TOKEN=''
OAUTH_TOKEN_SECRET = ''
auth = twitter.OAuth(OAUTH_TOKEN,OAUTH_TOKEN_SECRET,CONSUMER_KEY,CONSUMER_SECRET)
twitter_api =twitter.Twitter(auth=auth)
print twitter_api
statuses = twitter_api.statuses.user_timeline(screen_name='#realDonaldTrump')
print [status['text'] for status in statuses]
Please ignore the unnecessary imports. One problem is that this only gets a user's recent tweets (or the first 20 tweets). Is it possible to get all of a users tweet? To my knowledge, the GEt_user_timeline (?) only allows a limit of 3200. Is there a way to get at least 3200 tweets? What am I doing wrong?
There's a few issues with your code, including some superfluous imports. Particularly, you don't need to import twitter and import tweepy - tweepy can handle everything you need. The particular issue you are running into is one of pagination, which can be handled in tweepy using a Cursor object like so:
import tweepy
# Consumer keys and access tokens, used for OAuth
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# Creation of the actual interface, using authentication
api = tweepy.API(auth)
for status in tweepy.Cursor(api.user_timeline, screen_name='#realDonaldTrump', tweet_mode="extended").items():
print(status.full_text)

Get all friends of a given user on twitter with tweepy

Using tweepy I am able to return all of my friends using a cursor. Is it possible to specify another user and get all of their friends?
user = api.get_user('myTwitter')
print "Retreiving friends for", user.screen_name
for friend in tweepy.Cursor(api.friends).items():
print "\n", friend.screen_name
Which prints a list of all my friends, however if I change the first line
to another twitter user it still returns my friends. How can I get friends for any given user using tweepy?
#first line is changed to
user = api.get_user('otherUsername') #still returns my friends
Additionally user.screen_name when printed WILL return otherUsername
The question Get All Follower IDs in Twitter by Tweepy does essentially what I am looking for however it returns only a count of ID's. If I remove the len() function I will I can iterate through a list of user IDs, but is it possible to get screen names #twitter,#stackoverflow, #etc.....?
You can use the ids variable from the answer you referenced in the other answer to get the the id of the followers of a given person, and extend it to get the screen names of all of the followers using Tweepy's api.lookup_users method:
import time
import tweepy
auth = tweepy.OAuthHandler(..., ...)
auth.set_access_token(..., ...)
api = tweepy.API(auth)
ids = []
for page in tweepy.Cursor(api.followers_ids, screen_name="McDonalds").pages():
ids.extend(page)
time.sleep(60)
screen_names = [user.screen_name for user in api.lookup_users(user_ids=ids)]
You can use this:
# import the module
import tweepy
# assign the values accordingly
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
# authorization of consumer key and consumer secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# set access to user's access key and access secret
auth.set_access_token(access_token, access_token_secret)
# calling the api
api = tweepy.API(auth)
# the screen_name of the targeted user
screen_name = "TwitterIndia"
# printing the latest 20 friends of the user
for friend in api.friends(screen_name):
print(friend.screen_name)
for more details see https://www.geeksforgeeks.org/python-api-friends-in-tweepy/

Query regarding pagination in tweepy (get_followers) of a particular twitter user

I am fairly new to tweepy and pagination using the cursor class. I have been trying to user the cursor class to get all the followers of a particular twitter user but I keep getting the error where it says "tweepy.error.TweepError: This method does not perform pagination"
Hence I would really appreciate any help if someone could please help me achieve this task of obtaining all the followers of a particular twitter user with pagination, using tweepy. The code I have so far is as follows:
import tweepy
consumer_key='xyz'
consumer_secret='xyz'
access_token='abc'
access_token_secret='def'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
user = api.get_user('somehandle')
print user.name
followers = tweepy.Cursor(user.followers)
temp=[]
for user in followers.items():
temp.append(user)
print temp
#the following part works fine but that is without pagination so I will be able to retrieve at #most 100 followers
aDict = user.followers()
for friend in aDict:
friendDict = friend.__getstate__()
print friendDict['screen_name']
There is a handy method called followers_ids. It returns up to 5000 followers (twitter api limit) ids for the given screen_name (or id, user_id or cursor).
Then, you can paginate these results manually in python and call lookup_users for every chunk. As long as lookup_users can handle only 100 user ids at a time (twitter api limit), it's pretty logical to set chunk size to 100.
Here's the code (pagination part was taken from here):
import itertools
import tweepy
def paginate(iterable, page_size):
while True:
i1, i2 = itertools.tee(iterable)
iterable, page = (itertools.islice(i1, page_size, None),
list(itertools.islice(i2, page_size)))
if len(page) == 0:
break
yield page
auth = tweepy.OAuthHandler(<consumer_key>, <consumer_secret>)
auth.set_access_token(<key>, <secret>)
api = tweepy.API(auth)
followers = api.followers_ids(screen_name='gvanrossum')
for page in paginate(followers, 100):
results = api.lookup_users(user_ids=page)
for result in results:
print result.screen_name
Hope that helps.

Categories