Extracting tweets using Python - python

I'm writing a Python code to extract tweets from a twitter account. I'm having a bit of trouble at the moment.
Below is my code (I'm removed my cosumer and access ID for this):
import csv
import tweepy
from tweepy import OAuthHandler
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
def get_all_tweets(screen_name):
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
alltweets = []
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
while len(new_tweets) > 0:
print ("getting tweets before %s" % (oldest))
new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
print ("...%s tweets downloaded so far" % (len(alltweets)))
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
with open('%s_tweets.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass
if __name__ == '__main__':
get_all_tweets("hello")
When I run it, I get this error:
Does anybody know where I'm going wrong?

Just close the opened file where you are going to write.
Here it is hello_tweets.csv

Check if you have permission to open the file and permission to read/write in the folder.
I wouldn't recommend it, but if you NEED to run the code and can't find the issue, try doing it as admin.

Related

Collecting tweets using screen names and using Tweepy

I have a list of Twitter screen names(one hundred) and want to collect 3200 tweets per screen name. But I can only collect 3200 tweets in total using code as below because It reached limit of collecting tweets If I tried to input 100 screen names.... Can anyone have suggestion to collect 3200 tweets per screen name? It would be really appreciated if you can share some advice! Thank you in advance!
import tweepy
import csv
def get_all_tweets(screen_name):
consumer_key = ****
consumer_secret = ****
access_key = ****
access_secret = ****
#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
#initialize a list to hold all the tweepy Tweets & list with no retweets
alltweets = []
noRT = []
#make initial request for most recent tweets with extended mode enabled to get full tweets
new_tweets = api.user_timeline(screen_name = screen_name, tweet_mode = 'extended', count=200, include_retweets=False)
#save most recent tweets
alltweets.extend(new_tweets)
#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#keep grabbing tweets until the api limit is reached
while len(alltweets) <= 3200:
print("getting tweets before {}".format(oldest))
#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,tweet_mode = 'extended', count=200,max_id=oldest, include_retweets=False)
#save most recent tweets
alltweets.extend(new_tweets)
#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print("...{} tweets downloaded so far".format(len(alltweets)))
#removes retweets
for tweet in alltweets:
if 'RT' in tweet.full_text:
continue
else:
noRT.append([tweet.id_str, tweet.created_at, tweet.full_text, ])
#write to csv
with open('{}_tweets.csv'.format(screen_name), 'w') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(noRT)
print('{}_tweets.csv was successfully created.'.format(screen_name))
pass
if __name__ == '__main__':
#pass in the username of the account you want to download. I have hundred username in the list
usernames = ["JLo", "ABC", 'Trump']
for x in usernames:
get_all_tweets(x)
First of all, in order to iterate through timelines you must use pagination. I recommend you to use Cursor in tweepy because it's much easier than dealing with max_id and so on.
for page in tweepy.Cursor(api.user_timeline,
screen_name = screen_name,
tweet_mode="extended",
include_retweets=False,
count=100).pages(num_pages = 32):
for status in page:
# do your process on status
Secondly, there is indeed a rate limit which you can find here, so getting a warning that you reached the limit is not something unusual:
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/timelines/faq

How to get full text of tweets using Tweepy in python

I am collecting tweets using tweepy api and I want the full text of the tweets. Referring to examples in https://github.com/tweepy/tweepy/issues/974, tweepy Streaming API : full text and Tweepy Truncated Status I tried this using the extended_mode. But I am getting an error saying AttributeError: 'Status' object has no attribute 'full_text'.
From the examples above I know that If the tweet is not more than 140 characters, then have to just get the text as usual. However, these examples were for StreamListener and I am not using a StreamListener. How can I use the try catch blocks like in tweepy Streaming API : full text and solve the error I get and get the full_text of the tweets? How should I modify my below code?
getData.py
import tweepy
import csv
# Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
def get_all_tweets(screen_name):
# Twitter only allows access to a users most recent 3240 tweets with this method
# authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
# initialize a list to hold all the tweepy Tweets
alltweets = []
# make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name=screen_name, count=200)
# save most recent tweets
alltweets.extend(new_tweets)
# save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
# keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
print
"getting tweets before %s" % (oldest)
# all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name=screen_name, count=200, max_id=oldest, include_entities=True,
tweet_mode='extended')
# save most recent tweets
alltweets.extend(new_tweets)
# update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print
"...%s tweets downloaded so far" % (len(alltweets))
user = api.get_user(screen_name)
followers_count = user.followers_count
# transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8"), 1 if 'media' in tweet.entities else 0,
1 if tweet.entities.get('hashtags') else 0, followers_count, tweet.retweet_count, tweet.favorite_count]
for tweet in alltweets]
# write the csv
with open('tweets.csv', mode='a', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(["id", "created_at", "text", "hasMedia", "hasHashtag", "followers_count", "retweet_count", "favourite_count"])
writer.writerows(outtweets)
pass
def main():
get_all_tweets("#MACcosmetics")
if __name__ == '__main__':
main()
Use the cursor to parse the tweets in extended mode [tweepy documentation 3.6.0 https://media.readthedocs.org/pdf/tweepy/latest/tweepy.pdf ] and change your usages of
.text to .full_text
for status in tweepy.Cursor(api.user_timeline, id='MACcosmetics', tweet_mode='extended').items():
print(status.full_text)
in the end, this is what worked for me:
status = tweet if 'extended_tweet' in status._json: status_json = status._json['extended_tweet']['full_text'] elif 'retweeted_status' in status._json and 'extended_tweet' in status._json['retweeted_status']: status_json = status._json['retweeted_status']['extended_tweet']['full_text'] elif 'retweeted_status' in status._json: status_json = status._json['retweeted_status']['full_text'] else: status_json = status._json['full_text'] print(status_json)'

How to resolve this error in python script for extracting twitter tweets

I'm using python3 but the code here is for python2. I'm getting an error while running the code. I have to get the tweets from twitter.
I have installed tweepy but still getting an error.
My access key, consumer key and consumer secret key is correct. How can I resolve that problem. Is it any alternate way to resolve this issue?
import tweepy #https://github.com/tweepy/tweepy
import csv
#Twitter API credentials
consumer_key = "xxxx"
consumer_secret = "xxxx"
access_key = "xxxx"
access_secret = "xxxx"
def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method
#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
#initialize a list to hold all the tweepy Tweets
alltweets = []
#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name,count=200)
#save most recent tweets
alltweets.extend(new_tweets)
#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
print ("getting tweets before %s" % (oldest))
#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
#save most recent tweets
alltweets.extend(new_tweets)`enter code here`
#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print("...%s tweets downloaded so far" % (len(alltweets)))
#transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
#write the csv
with open('%s_tweets.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass
if __name__ == '__main__':
#pass in the username of the account you want to download
get_all_tweets("ArsalanBajwa")
'
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-13-e2a778291283> in <module>()
21
22 #make initial request for most recent tweets (200 is the maximum allowed count)
---> 23 new_tweets = api.user_timeline(screen_name,count=200)
24
25 #save most recent tweets
NameError: name 'api' is not defined
how to resolve it?
Api
Try writing it in caps, I've had this error, that's how I fixed it.
new_tweets = API.user_timeline(screen_name,count=200)

Python tweepy api.user_timeline for list of multiple users error

I've modified the code below to try and run it for a list of multiple user names, but keep getting the following error :"File "testing1.py", line 19
def get_all_tweets(screen_name):
^
SyntaxError: invalid syntax"
If I move
with open('list.txt', 'r') as targets_file:
targets_list = targets_file.readlines()
usernames = []
for item in targets_list:
usernames.append(item.strip('\n')
after "pass", then I get the following error: "File "testing1.py", line 65
if name == 'main':
^
SyntaxError: invalid syntax"
Any help is much appreciated!
import tweepy
import csv
#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
with open('list.txt', 'r') as targets_file:
targets_list = targets_file.readlines()
usernames = []
for item in targets_list:
usernames.append(item.strip('\n')
def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method
#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
#initialize a list to hold all the tweepy Tweets
alltweets = []
#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
#save most recent tweets
alltweets.extend(new_tweets)
#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)
#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
#save most recent tweets
alltweets.extend(new_tweets)
#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print "...%s tweets downloaded so far" % (len(alltweets))
#transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
#write the csv
with open('%s_tweets.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass
if __name__ == '__main__':
#pass in the username of the account you want to download
for x in usernames:
get_all_tweets(x)
Seems like you forgot to close parenthesis:
for item in targets_list:
usernames.append(item.strip('\n'))

Getting Tweep Error 34 when switching from single user to range

I am trying to scrape last 1-10 tweets from approx 500 user names on twitter.
Code works perfectly when grabbing 1 user, but falls over when introducing a range of users.
First code is single user - Will grab last 7 tweets from Gavinfree and write to CSV
import tweepy
import csv
#Twitter API credentials
consumer_key = "secretcode"
consumer_secret = "secretcode"
access_key = "secretcode"
access_secret = "secretcode"
def get_all_tweets(GavinFree):
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
alltweets = []
new_tweets = api.user_timeline(screen_name = GavinFree,count=7)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)
new_tweets = api.user_timeline(screen_name = GavinFree,count=7,max_id=10)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
print "...%s tweets downloaded so far" % (len(alltweets))
outtweets = [[tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
with open('%s_tweets.csv' % GavinFree ,'wb') as f:
writer = csv.writer(f)
writer.writerow(["created_at","text"])
writer.writerows(outtweets)
pass
if __name__ == '__main__':
#pass in the username of the account you want to download
get_all_tweets("GavinFree")
Second code is range of users - Will grab 7 tweets from each user and write to CSV and apart from the range - is completely identical.
import tweepy
import csv
#Twitter API credentials
consumer_key = "secretcode"
consumer_secret = "secretcode"
access_key = "secretcode"
access_secret = "secretcode"
handles_list = ["gavinFree","bdunkelman","burnie","ashleyj",]
def get_all_tweets(handles_list):
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
alltweets = []
new_tweets = api.user_timeline(screen_name = handles_list,count=10)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)
new_tweets = api.user_timeline(screen_name = handles_list,count=10,max_id=10)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
print "...%s tweets downloaded so far" % (len(alltweets))
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
with open('%s_tweets.csv' % handles_list, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass
if __name__ == '__main__':
#pass in the username of the account you want to download
get_all_tweets("handles_list")
The Error code i receive is tweepy.error.TweepError: [(u'message' : u'sorry, that page does not exist.' , u'code :34)]
I have checked out the user names and have tried both with # and without.
I'm just wondering what the issue could be, as code 34 indicates a 404 error on the twitter api page, yet the error is only being introduced when the range is added.
Any insights would be greatly appreciated.
You're passing handles_list as a string literal, and the function doesn't seem modified to handle a list.
Try this:
if __name__ == '__main__':
for handle in handles_list:
get_all_tweets(handle)

Categories