I am a beginner in python, here is what I did and want to do with python: I searched for a hashtag in tweeter and saved the users that used the hashtag, now my question is how I can save these users as a list, since I want to find followings of these users later.
Here is my code:
for i in tweepy.Cursor(api.search, q="#hashtag").items():
author = i.author.id
tweet = i.text.replace('\n',' ').replace('\r',' ').replace('\r\n',' ')
Thank you!
For this, you can use a set https://docs.python.org/3.6/library/stdtypes.html#set. You can use something like this:
hashtag_search_results = tweepy.Cursor(api.search, q="#hashtag")
authors = set()
for i in hashtag_search_results.items():
author = i.author.id
tweet = i.text.replace('\n',' ').replace('\r',' ').replace('\r\n',' ')
authors.add(author)
if len(authors) == 100:
break
# work with the authors set
I am suggesting:
hashtag_search_results = tweepy.Cursor(api.search, q="#hashtag")
authors = []
for i in hashtag_search_results.items():
author = i.author.id
tweet = i.text.replace('\n',' ').replace('\r',' ').replace('\r\n',' ')
authors.append(author)
# work with the authors list
Related
Hy all, I need a little wisdom.
I maage to make a scrapper using the Twitter API and Tweepy. It scrapes tweets from individual profiles. I have a list of around 100 profiles that I want to scrape tweets from, but I cant figure out how to instruct the scraper to extract data from multiple profiles and how to save the output properly in csv. I have the following code:
import tweepy
import time
import pandas as pd
import csv
# API keyws that yous saved earlier
api_key = ''
api_secrets = ''
access_token = ''
access_secret = ''
# Authenticate to Twitter
auth = tweepy.OAuthHandler(api_key,api_secrets)
auth.set_access_token(access_token,access_secret)
#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)
username = "markrutte"
no_of_tweets = 3200
try:
#The number of tweets we want to retrieved from the user
tweets = api.user_timeline(screen_name=username, count=no_of_tweets)
#Pulling Some attributes from the tweet
attributes_container = [[tweet.created_at, tweet.favorite_count,tweet.source, tweet.text] for tweet in tweets]
#Creation of column list to rename the columns in the dataframe
columns = ["Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
print('Status Failed On,',str(e))
time.sleep(3)
In my head, I believe I should specify a list with usernames as the values. And then, for username in list: scrape tweets. However, I dont really know how to do this and am still learning. Can anyone give me some advice or know a tutorial on how I should do this?
Appreciate it.
In my head, I believe I should specify a list with usernames as the values. And then, for username in list: scrape tweets. However, I dont really know how to do this and am still learning. Can anyone give me some advice or know a tutorial on how I should do this?
Appreciate it.
If you put your scraping code into a function, you can then concat its results into an overall dataframe in a loop:
def get_tweets(username, no_of_tweets):
#Creation of column list to rename the columns in the dataframe
columns = ["Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
try:
#The number of tweets we want to retrieved from the user
tweets = api.user_timeline(screen_name=username, count=no_of_tweets)
#Pulling Some attributes from the tweet
attributes_container = [[tweet.created_at, tweet.favorite_count,tweet.source, tweet.text] for tweet in tweets]
# return a dataframe
return pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
print('Status Failed On,',str(e))
# return an empty dataframe
return pd.DataFrame(columns=columns)
usernames = ['user1', 'user2', 'user3']
no_of_tweets = 3200
tweets_df = pd.concat([get_tweets(username, no_of_tweets) for username in usernames])
I am trying to make a sentimental analysis in London's garden comment, but I can't add the geolocation in London and can't format a list of these tweet:(
but it shows for tweet in tweets:
TypeError: 'Cursor' object is not iterable
after that, I was trying to follow the YouTube tutorial to create a if loop to clean data.I want to delete the RT, #hashtag, #mention and HTTP link . but I can't find a efficient way to clean the data
api = tweepy.API(auth, wait_on_rate_limit=True)
#sentiment Analysis
keyword = ["Park","garden"]
noOfTweet = 500
date_since = "2020-01-01T00:00:00Z"
date_until= "2020-12-31T00:00:00Z"
tweets = tweepy.Cursor(api.search_tweets,
query = keyword,
start_time = date_since,
end_time = date_until,
tweet_mode ='extend',
geocode = "-0.098369,51.513557,70km",
lang ='en',
count = noOfTweet)
#try to format a list
all.tweets = []
for i in tweets:
all.tweets.append(i)
for tweet in tweets:
final_text = tweet.text.replace('RT','')
if final_text.startswith(' #'):
position = final_text.index(':')
final_text = final_text[position+2:]
elif final_text.startswith('#'):
position = final_text.index(' ')
final_text = final_text[position + 2:]
I am trying to scrape tweets from twitter using twython and I want to use enterprise search api for this because I want to define fromDate and toDate parameters.
I couldn't find any way to do it though, and when I try to cursor tweets from this date, It only returns the tweets about 14 days ago from now.
twitter = Twython(consumer_token, access_token=ACCESS_TOKEN)
# Search parameters
def search_query(QUERY_TO_BE_SEARCHED):
"""
QUERY_TO_BE_SEARCHED : text you want to search for
"""
df_dict=[]
results = twitter.cursor(twitter.search, q=QUERY_TO_BE_SEARCHED,fromDate='2019071200',toDate='2019071400',count=100)
for q in results:
retweet_count = q['retweet_count']
favs_count = q['favorite_count']
date_created = q['created_at']
text = q['text']
hashtags = q['entities']['hashtags']
user_name = '#'+str(q['user']['screen_name'])
user_mentions = []
if(len(q['entities']['user_mentions'])!=0):
for n in q['entities']['user_mentions']:
user_mentions.append(n['screen_name']) # Mentioned profile names in the tweet
temp_dict = {'User ID':user_name,'Date':date_created,'Text':text,'Favorites':favs_count,'RTs':retweet_count,
'Hashtags':hashtags,'Mentions':user_mentions}
df_dict.append(temp_dict)
return pd.DataFrame(df_dict)
that is my code, can you help me improve this ?
I want to look up all the friends (meaning the twitter users one is following) of a sample of friends of one twitter account, to see what other friends they have in common. The problem is that I don't know how to handle protected accounts, and I keep running into this error:
tweepy.error.TweepError: Not authorized.
This is the code I have:
...
screen_name = ----
file_name = "followers_data/follower_ids-" + screen_name + ".txt"
with open(file_name) as file:
ids = file.readlines()
num_samples = 30
ids = [x.strip() for x in ids]
friends = [[] for i in range(num_samples)]
for i in range(0, num_samples):
id = random.choice(ids)
for friend in tweepy.Cursor(api.friends_ids, id).items():
print(friend)
friends[i].append(friend)
I have a list of all friends from one account screen_name, from which I load the friend ids. I then want to sample a few of those and look up their friends.
I have also tried something like this:
def limit_handled(cursor, name):
try:
yield cursor.next()
except tweepy.TweepError:
print("Something went wrong... ", name)
pass
for i in range(0, num_samples):
id = random.choice(ids)
items = tweepy.Cursor(api.friends_ids, id).items()
for friend in limit_handled(items, id):
print(friend)
friends[i].append(friend)
But then it seems like only one friend per sample friend is stored before moving on to the next sample. I'm pretty new to Python and Tweepy so if anything looks weird, please let me know.
First of all, a couple of comments on naming. The names file and id are protected, so you should avoid using them to name variables - I have changes these.
Secondly, when you initialise your tweepy API, it's clever enough to deal with rate limits if you use wait_on_rate_limit=True and will inform you when it's delayed due to rate limits if you use wait_on_rate_limit_notify=True.
You also lose some information when you set friends = [[] for i in range(num_samples)], as you then won't be able to associate the friends you find with the account they relate to. You can instead use a dictionary, which will associate each ID used with the friends found, allowing for better processing.
My corrected code is as follows:
import tweepy
import random
consumer_key = '...'
consumer_secret = '...'
access_token = '...'
access_token_secret = '...'
# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# Creation of the actual interface, using authentication. Use rate limits.
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
screen_name = '----'
file_name = "followers_data/follower_ids-" + screen_name + ".txt"
with open(file_name) as f:
ids = [x.strip() for x in f.readlines()]
num_samples = 30
friends = dict()
# Initialise i
i = 0
# We want to check that i is less than our number of samples, but we also need to make
# sure there are IDs left to choose from.
while i <= num_samples and ids:
current_id = random.choice(ids)
# remove the ID we're testing from the list, so we don't pick it again.
ids.remove(current_id)
try:
# try to get friends, and add them to our dictionary value if we can
# use .get() to cope with the first loop.
for page in tweepy.Cursor(api.friends_ids, current_id).pages():
friends[current_id] = friends.get(current_id, []) + page
i += 1
except tweepy.TweepError:
# we get a tweep error when we can't view a user - skip them and move onto the next.
# don't increment i as we want to replace this user with someone else.
print 'Could not view user {}, skipping...'.format(current_id)
The output is a dictionary, friends, with keys of user IDs and items of the friends for each user.
what I want is to eliminate the last tweet, for that I use the following:
l = len(sys.argv)
if l >= 2:
twid = sys.argv[1]
else:
twid = input("ID number of tweet to delete: ")
try:
tweet = twitter.destroy_status(id=twid)
except TwythonError as e:
print(e)
It runs perfect.
You see I need the "ID" but not how to get it.
I hope you can help me, thanks!
I think this can help you.
user_timeline=twitter.get_user_timeline(screen_name="BarackObama", count=20)
for tweet in user_timeline:
print tweet["id"]
It prints the 20 lastest tweets id of Barack Obama.
Let me know if that is what you are looking for.
You can use get_user_timeline with your personal screen name to retrieve the tweet, and then access it with tweet[0] since it will be in the first index.
tweet = twitter.get_user_timeline(
screen_name=YOUR_SCREEN_NAME,
count=1)
twitter.destroy_status(id=tweet[0]['id'])