I am trying to retrieve tweets from Trump's twitter account with the Twitter API.
However, I am not getting the maximum amount of 3200 tweets with the code below. When I try another screenname I am getting the 3200 tweets. Now I'm only getting 100-200 tweets (it's different each time). The code I am using is as following:
import tweepy
import json
access_token = xxx
access_token_secret = xxx
consumer_key = xxx
consumer_secret = xxx
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
screen_name = "realdonaldtrump"
data = []
for tweets in tweepy.Cursor(api.user_timeline, screen_name = screen_name).pages():
for tweet in tweets:
print(tweet.text)
data.append(tweet._json)
filename = screen_name + "_tweets.json"
with open(filename, "w") as outfile:
json.dump(data, outfile)
Related
i have been trying to use tweepy to gather some data concerning the coronavirus, i am using python 3.8 and apparently as i have been reading there is a problem concerning bytes type that was not present in python 2. My error is in this line:
w.writerow(['timestamp', 'tweet_text', 'username', 'all_hashtags', 'followers_count'])
If anyone can help me modify the code so it works i would be grateful. PS:First time posting a question on stackoverflow be gentle :)
Code:
def search_for_hashtags(consumer_key, consumer_secret, access_token, access_token_secret,
hashtag_phrase):
#create authentication for accessing Twitter
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
#initialize Tweepy API
api = tweepy.API(auth)
#get the name of the spreadsheet we will write to
fname = '_'.join(re.findall(r"#(\w+)", hashtag_phrase))
#open the spreadsheet we will write to
with open('%s.csv' % (fname), 'wb') as file:
w = csv.writer(file)
#write header row to spreadsheet
w.writerow(['timestamp', 'tweet_text', 'username', 'all_hashtags', 'followers_count'])
#for each tweet matching our hashtags, write relevant info to the spreadsheet
for tweet in tweepy.Cursor(api.search, q=hashtag_phrase+' -filter:retweets', \
lang="en", tweet_mode='extended').items(100):
w.writerow([tweet.created_at, tweet.full_text.replace('\n',' ').encode('utf-8'),
tweet.user.screen_name.encode('utf-8'), [e['text'] for e in tweet._json['entities']['hashtags']], tweet.user.followers_count])
if __name__ == '__main__':
search_for_hashtags(consumer_key, consumer_secret, access_token, access_token_secret, hashtag_phrase)
I am new at tweepy, I was able to fetch data from twitter with following script :
import tweepy
from tweepy import OAuthHandler
access_token="---------"
access_token_secret="----------"
consumer_key="---------"
consumer_secret="-------"
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
public_tweets = api.home_timeline()
print("public_tweets.text")
now want to fetch the username of the twitting person as well fetched tweets as
example:
"USERNAME": " --------------TWEET----------"
Thank You in advance
public_tweets = api.home_timeline()
for tweet in public_tweets:
print('From :', tweet.user.screen_name, ', Text :', tweet.text)
I'm trying to retrive Tweets that particular accounts has posted. I do use
user_timeline parameter from the tweepy library, but it includes also replies from the concrete Twitter user. Does anyone has a clue how to omit them?
Code:
import tweepy
consumer_key = key
consumer_secret = key
access_key = key
access_secret = key
def get_tweets(username):
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
#set count to however many tweets you want; twitter only allows 200 at once
number_of_tweets = 20
#get tweets
tweets = api.user_timeline(screen_name = username,count = number_of_tweets)
#create array of tweet information: username, tweet id, date/time, text
tweets_for_csv = [[username,tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in tweets]
print(str(tweets_for_csv))
Pass exclude_replies as a kwarg.
tweets = api.user_timeline(screen_name=username, count=number_of_tweets, exclude_replies=True)
See Twitters API documentation for a full list of kwargs you can pass.
I am trying to gather the tweets of a user navalny, from 01.11.2017 to 31.01.2018 using tweepy. I have ids of the first and last tweets that I need, so I tried the following code:
import tweepy
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
t = api.user_timeline(screen_name='navalny', since_id = 933000445307518976, max_id = 936533580481814529)
However, the returned value is an empty list.
What is the problem here?
Are there any restrictions on the history of tweets that I can get?
What are possible solutions?
Quick answer:
Using Tweepy you can only retrieve the last 3200 tweets from the Twitter REST API for a given user.
Unfortunately the tweets you are trying to access are older than this.
Detailed answer:
I did a check using the code below:
import tweepy
from tweepy import OAuthHandler
def tweet_check(user):
"""
Scrapes a users most recent tweets
"""
# API keys and initial configuration
consumer_key = ""
consumer_secret = ""
access_token = ""
access_secret = ""
# Configure authentication
authorisation = OAuthHandler(consumer_key, consumer_secret)
authorisation.set_access_token(access_token, access_secret)
api = tweepy.API(authorisation)
# Requests most recent tweets from a users timeline
tweets = api.user_timeline(screen_name=user, count=2,
max_id=936533580481814529)
for tweet in tweets:
tid = tweet.id
print(tid)
twitter_users = ["#navalny"]
for twitter_user in twitter_users:
tweet_check(twitter_user)
This test returns nothing before 936533580481814529
Using a seperate script I scraped all 3200 tweets, the max Twitter will let you scrape and the youngest tweet id I can find is 943856915536326662
Seems like you have run into Twitter's tweet scraping limit for user timelines here.
def get_tweets(api, input_query):
for tweet in tweepy.Cursor(api.search, q=input_query,lang="en").items():
yield tweet
if __name__ == "__version__":
input_query = sys.argv[1]
access_token = "REPLACE_YOUR_KEY_HERE"
access_token_secret = "REPLACE_YOUR_KEY_HERE"
consumer_key = "REPLACE_YOUR_KEY_HERE"
consumer_secret = "REPLACE_YOUR_KEY_HERE"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
tweets = get_tweets(api, input_query)
for tweet in tweets:
print(tweet.text)
I am trying to download data from Twitter using the command prompt. I have entered my keys (I just recreated them all), saved the script as "print_tweets" and am entering "python print_tweets.py subject" into the command prompt but nothing is happening, no error message or anything.
I thought the problem might have to do with the path environment, but I created another program that prints out "hello world" and this executed without issue using the command prompt.
Can anyone see any obvious errors with my code above? Does this work for you?
I've even tried changing "version" to "main" but this gives me an error message:
if name == "version":
It seems you are running the script in an ipython interpreter, which won't be receiving any command line arguments. Try this:
import tweepy
def get_tweets(api, input_query):
for tweet in tweepy.Cursor(api.search, q=input_query,lang="en").items():
yield tweet
input_query = "springbreak" # Change this string to the topic you want to search tweets
access_token = "REPLACE_YOUR_KEY_HERE"
access_token_secret = "REPLACE_YOUR_KEY_HERE"
consumer_key = "REPLACE_YOUR_KEY_HERE"
consumer_secret = "REPLACE_YOUR_KEY_HERE"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
tweets = get_tweets(api, input_query)
for tweet in tweets:
print(tweet.text)