I'm using python-twitter in my Web Application to post tweets like this:
import twitter
twitter_api = twitter.Api(
consumer_key="BlahBlahBlah",
consumer_secret="BlahBlahBlah",
access_token_key="BlahBlahBlah",
access_token_secret="BlahBlahBlah",
)
twitter_api.PostUpdate("Hello World")
How do I retrieve all tweets posted to this account (including tweets that were previously posted to this account from other Twitter clients)? I want to do this so that I can delete them all by calling twitter_api.destroyStatus() on each tweet.
One approach could be like the following:
import twitter
api = twitter.Api(consumer_key='consumer_key',
consumer_secret='consumer_secret',
access_token_key='access_token',
access_token_secret='access_token_secret')
# get user data from credentials
user_data = api.VerifyCredentials()
user_id = long(user_data.id)
max_status_id = 0
# repeat until all tweets are deleted
while True:
# let us get 200 statuses per API call.
# trim_user helps improve performance by reducing size of return value
timeline_args = {'user_id': user_id, 'count': 200, 'trim_user': 'true'}
# if not first iteration, use max_status_id seen so far
if max_status_id != 0:
timeline_args['max_id'] = max_status_id
# Get statuses from user timeline
statuses = api.GetUserTimeline(**timeline_args)
#if no more tweets are left, then break the loop
if statuses is None or len(statuses) == 0:
break
for status in statuses:
# remember max_status_id seen so far
max_status_id = long(status.id) - 1
# delete the tweet with current status[id]
api.DestroyStatus(status.id)
Related
I am trying to create a bot to retweet tweets that have a certain keyword in them. The code until now is this:
import time
import tweepy
import config
# Search/ Like/ Retweet
def get_client():
client = tweepy.Client(bearer_token=config.BEARER_TOKEN,
consumer_key=config.CONSUMER_KEY,
consumer_secret=config.CONSUMER_SECRET,
access_token=config.ACCESS_TOKEN,
access_token_secret=config.ACCESS_TOKEN_SECRET, )
return client
def search_tweets(query):
client = get_client()
tweets = client.search_recent_tweets(query=query, max_results=10)
tweet_data = tweets.data
results = []
if tweet_data is not None and len(tweet_data) > 0:
for tweet in tweet_data:
obj = {'id': tweet.id, 'text': tweet.text}
results.append(obj)
else:
return ''
return results
client = get_client()
tweets = search_tweets('#save the earth')
for tweet in tweets:
client.retweet(tweet["id"])
It works as intended , but i want to create an if statement on the for loop to check if i already retweed that tweet or not. If not to retweet it. I cant find how to do this. Please help me out.
EDITan answer relevant for Tweepy V2
unfortunately, the solution to your problem in the new version is a bit more "pricy".
you can get the list of users that retweeted a tweet using Client.get_retweeters(tweet_id)
and then check your client id against this list.
that might take you more resources than the previous solution, but that's what tweepy gives us.
you can read more on tweepy docs
using the tweet id and your user key you can check if you retweeted.
# fetching the status
status = api.get_status(id)
# fetching the retweeted attribute
retweeted = status.retweeted
your code would be this:
import time
import tweepy
import config
# Search/ Like/ Retweet
def get_client():
client = tweepy.Client(bearer_token=config.BEARER_TOKEN,
consumer_key=config.CONSUMER_KEY,
consumer_secret=config.CONSUMER_SECRET,
access_token=config.ACCESS_TOKEN,
access_token_secret=config.ACCESS_TOKEN_SECRET, )
return client
def search_tweets(query):
client = get_client()
tweets = client.search_recent_tweets(query=query, max_results=10)
tweet_data = tweets.data
results = []
if tweet_data is not None and len(tweet_data) > 0:
for tweet in tweet_data:
status = tweepy.api(client.access_token).get_status(tweet.id)
if status.retweeted:
continue
else:
obj = {'id': tweet.id, 'text': tweet.text}
results.append(obj)
else:
return ''
return results
client = get_client()
tweets = search_tweets('#save the earth')
for tweet in tweets:
client.retweet(tweet["id"])
Through the basic Academic Research Developer Account, I'm using the Tweepy API to collect tweets containing specified keywords or hashtags. This enables me to collect 10,000,000 tweets per month. Using the entire archive search, I'm trying to collect tweets from one whole calendar date at a time. I've gotten a rate limit error (despite the wait_on_rate_limit flag being set to true) Now there's an error with the request limit.
here is the code
import pandas as pd
import tweepy
# function to display data of each tweet
def printtweetdata(n, ith_tweet):
print()
print(f"Tweet {n}:")
print(f"Username:{ith_tweet[0]}")
print(f"tweet_ID:{ith_tweet[1]}")
print(f"userID:{ith_tweet[2]}")
print(f"creation:{ith_tweet[3]}")
print(f"location:{ith_tweet[4]}")
print(f"Total Tweets:{ith_tweet[5]}")
print(f"likes:{ith_tweet[6]}")
print(f"retweets:{ith_tweet[7]}")
print(f"hashtag:{ith_tweet[8]}")
# function to perform data extraction
def scrape(words, numtweet, since_date, until_date):
# Creating DataFrame using pandas
db = pd.DataFrame(columns=['username', 'tweet_ID', 'userID',
'creation', 'location', 'text','likes','retweets', 'hashtags'])
# We are using .Cursor() to search through twitter for the required tweets.
# The number of tweets can be restricted using .items(number of tweets)
tweets = tweepy.Cursor(api.search_full_archive,'research',query=words,
fromDate=since_date, toDate=until_date).items(numtweet)
# .Cursor() returns an iterable object. Each item in
# the iterator has various attributes that you can access to
# get information about each tweet
list_tweets = [tweet for tweet in tweets]
# Counter to maintain Tweet Count
i = 1
# we will iterate over each tweet in the list for extracting information about each tweet
for tweet in list_tweets:
username = tweet.user.screen_name
tweet_ID = tweet.id
userID= tweet.author.id
creation = tweet.created_at
location = tweet.user.location
likes = tweet.favorite_count
retweets = tweet.retweet_count
hashtags = tweet.entities['hashtags']
# Retweets can be distinguished by a retweeted_status attribute,
# in case it is an invalid reference, except block will be executed
try:
text = tweet.retweeted_status.full_text
except AttributeError:
text = tweet.text
hashtext = list()
for j in range(0, len(hashtags)):
hashtext.append(hashtags[j]['text'])
# Here we are appending all the extracted information in the DataFrame
ith_tweet = [username, tweet_ID, userID,
creation, location, text, likes,retweets,hashtext]
db.loc[len(db)] = ith_tweet
# Function call to print tweet data on screen
printtweetdata(i, ith_tweet)
i = i+1
filename = 'C:/Users/USER/Desktop/الجامعة الالمانية/output/twitter.csv'
# we will save our database as a CSV file.
db.to_csv(filename)
if __name__ == '__main__':
consumer_key = "####"
consumer_secret = "###"
access_token = "###"
access_token_secret = "###"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
since_date = '200701010000'
until_date = '202101012359'
words = "#USA"
# number of tweets you want to extract in one run
numtweet = 1000
scrape(words, numtweet, since_date, until_date)
print('Scraping has completed!')
I got this error:
TooManyRequests: 429 Too Many Requests
Request exceeds account’s current package request limits. Please upgrade your package and retry or contact Twitter about enterprise access.
Unfortunately, I believe this is due to the Sandbox quota. For a premium account it would be more.
Tweepy API Documentation
You may check out this answer here - Limit
I am using code which is working fine. I have added the whole code as taken from geeks for geeks. But I want to modify it to add referenced_tweets.type. I am new to APIs and really want to understand how to fix this.
import pandas as pd
import tweepy
# function to display data of each tweet
def printtweetdata(n, ith_tweet):
print()
print(f"Tweet {n}:")
print(f"Username:{ith_tweet[0]}")
print(f"likes:{ith_tweet[1]}")
print(f"Location:{ith_tweet[2]}")
print(f"Following Count:{ith_tweet[3]}")
print(f"Follower Count:{ith_tweet[4]}")
print(f"Total Tweets:{ith_tweet[5]}")
print(f"Retweet Count:{ith_tweet[6]}")
print(f"Tweet Text:{ith_tweet[7]}")
print(f"Hashtags Used:{ith_tweet[8]}")
# function to perform data extraction
def scrape(words, date_since, numtweet):
# Creating DataFrame using pandas
db = pd.DataFrame(columns=['username', 'likes', 'location', 'following',
'followers', 'totaltweets', 'retweetcount', 'text', 'hashtags'])
# We are using .Cursor() to search through twitter for the required tweets.
# The number of tweets can be restricted using .items(number of tweets)
tweets = tweepy.Cursor(api.search, q=words, lang="en",
since=date_since, tweet_mode='extended').items(numtweet)
# .Cursor() returns an iterable object. Each item in
# the iterator has various attributes that you can access to
# get information about each tweet
list_tweets = [tweet for tweet in tweets]
# Counter to maintain Tweet Count
i = 1
# we will iterate over each tweet in the list for extracting information about each tweet
for tweet in list_tweets:
username = tweet.user.screen_name
likes = tweet.favorite_count
location = tweet.user.location
following = tweet.user.friends_count
followers = tweet.user.followers_count
totaltweets = tweet.user.statuses_count
retweetcount = tweet.retweet_count
hashtags = tweet.entities['hashtags']
# Retweets can be distinguished by a retweeted_status attribute,
# in case it is an invalid reference, except block will be executed
try:
text = tweet.retweeted_status.full_text
except AttributeError:
text = tweet.full_text
hashtext = list()
for j in range(0, len(hashtags)):
hashtext.append(hashtags[j]['text'])
# Here we are appending all the extracted information in the DataFrame
ith_tweet = [username, likes, location, following,
followers, totaltweets, retweetcount, text, hashtext]
db.loc[len(db)] = ith_tweet
# Function call to print tweet data on screen
printtweetdata(i, ith_tweet)
i = i+1
filename = 'etihad.csv'
# we will save our database as a CSV file.
db.to_csv(filename)
if __name__ == '__main__':
# Enter your own credentials obtained
# from your developer account
consumer_key =
consumer_secret =
access_key =
access_secret =
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
# Enter Hashtag and initial date
print("Enter Twitter HashTag to search for")
words = input()
print("Enter Date since The Tweets are required in yyyy-mm--dd")
date_since = input()
# number of tweets you want to extract in one run
numtweet = 100
scrape(words, date_since, numtweet)
print('Scraping has completed!')
I now want to add referenced_tweets.type in order to get if the Tweet is a Retweet or not but I'm not sure how to do it. Can someone help?
API.search uses the standard search API, part of Twitter API v1.1.
referenced_tweets is a value that can be set for tweet.fields, a Twitter API v2 fields parameter.
Currently, if you want to use Twitter API v2 through Tweepy, you'll have to use the development version of Tweepy on the master branch and its Client class. Otherwise, you'll need to wait until Tweepy v4.0 is released.
Alternatively, if your only goal is to determine whether a Status/Tweet object is a Retweet or not, you can simply check for the retweeted_status attribute.
1.The api: stream.filter(). I read the documentation which said that all parameters can be optional. However, when I left it empty, it won't work.
Still the question with api. It is said that if I write code like below:
twitter_stream.filter(locations = [-180,-90, 180, 90])
It can filter all tweets with geological information. However, when I check the json data, I still find many tweets, the value of their attribute geo are still null.
3.I tried to use stream to get as many tweets as possible. However, it is said that it can get tweets in real time. will there be any parameters to set the time
like to collect tweets from 2013 to 2015
4.I tried to collect data through users and their followers and continue the same step until I get as many tweets as I want. So my code is like below:
import tweepy
import chardet
import json
import sys
#set one global list to store all user_names
users_unused = ["Raithan8"]
users_used = []
def process_or_store(tweet):
print(json.dumps(tweet))
consumer_key =
consumer_secret =
access_token =
access_token_secret =
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
def getAllTweets():
#initialize one empty list tw store all tweets
screen_name = users_unused[0]
users_unused.remove(screen_name)
users_used.append(screen_name)
print("this is the current user: " + screen_name)
for friend in tweepy.Cursor(api.friends, screen_name = screen_name).items():
if friend not in users_unused and friend not in users_used:
users_unused.append(friend.screen_name)
for follower in tweepy.Cursor(api.followers, screen_name = screen_name).items():
if follower not in users_unused and follower not in users_used:
users_unused.append(follower.screen_name)
print(users_unused)
print(users_used)
alltweets = []
#tweepy limits at most 200 tweets each time
new_tweets = api.user_timeline(screen_name = screen_name, count = 200)
alltweets.extend(new_tweets)
if not alltweets:
return alltweets
oldest = alltweets[-1].id - 1
while(len(new_tweets) <= 0):
new_tweets = api.user_timeline(screen_name = screen_name, count = 200, max_id = oldest)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
return alltweets
def storeTweets(alltweets, file_name = "tweets.json"):
for tweet in alltweets:
json_data = tweet._json
data = json.dumps(tweet._json)
with open(file_name, "a") as f:
if json_data['geo'] is not None:
f.write(data)
f.write("\n")
if __name__ == "__main__":
while(1):
if not users_unused:
break
storeTweets(getAllTweets())
I don't why it runs so slow. Maybe it is mainly because I initialize tweepy API as below
api = tweepy.API(auth, wait_on_rate_limit=True)
But if I don't initialize it in this way, it will raise error below:
raise RateLimitError(error_msg, resp)
tweepy.error.RateLimitError: [{'message': 'Rate limit exceeded', 'code': 88}]
2) There's a difference between a tweet with coordinates and filtering by location.
Filtering by location means that the sender is located in the range of your filter. If you set it globally twitter_stream.filter(locations = [-180,-90, 180, 90]) it will return tweets for people who set their country name in their preferences.
If you need to filter by coordinates (a tweet that has a coordinates) you can take a look at my blog post. But basically you need to set a listener and then check if the tweet have some coordinates.
3 and 4) Twitter's Search API and Twitter's Streaming API are different in many ways and restrictions about rate limits (Tweepy) and Twitter rate limit.
You have a limitation about how many tweets you want to get (in the past).
Check again Tweepy API because wait_on_rate_limit set as true just wait that your current limit window is available again. That's why it's "slow" as you said.
However using streaming API doesn't have such restrictions.
I'm trying to take every open tweets in a hashtag but my code does not go further than 299 tweets.
I also trying to take tweets from a specific time line like tweets only in May 2015 and July 2016. Are there any way to do it in the main process or should I write a little code for it?
Here is my code:
# if this is the first time, creates a new array which
# will store max id of the tweets for each keyword
if not os.path.isfile("max_ids.npy"):
max_ids = np.empty(len(keywords))
# every value is initialized as -1 in order to start from the beginning the first time program run
max_ids.fill(-1)
else:
max_ids = np.load("max_ids.npy") # loads the previous max ids
# if there is any new keywords added, extends the max_ids array in order to correspond every keyword
if len(keywords) > len(max_ids):
new_indexes = np.empty(len(keywords) - len(max_ids))
new_indexes.fill(-1)
max_ids = np.append(arr=max_ids, values=new_indexes)
count = 0
for i in range(len(keywords)):
since_date="2015-01-01"
sinceId = None
tweetCount = 0
maxTweets = 5000000000000000000000 # maximum tweets to find per keyword
tweetsPerQry = 100
searchQuery = "#{0}".format(keywords[i])
while tweetCount < maxTweets:
if max_ids[i] < 0:
if (not sinceId):
new_tweets = api.search(q=searchQuery, count=tweetsPerQry)
else:
new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
since_id=sinceId)
else:
if (not sinceId):
new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
max_id=str(max_ids - 1))
else:
new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
max_id=str(max_ids - 1),
since_id=sinceId)
if not new_tweets:
print("Keyword: {0} No more tweets found".format(searchQuery))
break
for tweet in new_tweets:
count += 1
print(count)
file_write.write(
.
.
.
)
item = {
.
.
.
.
.
}
# instead of using mongo's id for _id, using tweet's id
raw_data = tweet._json
raw_data["_id"] = tweet.id
raw_data.pop("id", None)
try:
db["Tweets"].insert_one(item)
except pymongo.errors.DuplicateKeyError as e:
print("Already exists in 'Tweets' collection.")
try:
db["RawTweets"].insert_one(raw_data)
except pymongo.errors.DuplicateKeyError as e:
print("Already exists in 'RawTweets' collection.")
tweetCount += len(new_tweets)
print("Downloaded {0} tweets".format(tweetCount))
max_ids[i] = new_tweets[-1].id
np.save(arr=max_ids, file="max_ids.npy") # saving in order to continue mining from where left next time program run
Have a look at this: https://tweepy.readthedocs.io/en/v3.5.0/cursor_tutorial.html
And try this:
import tweepy
auth = tweepy.OAuthHandler(CONSUMER_TOKEN, CONSUMER_SECRET)
api = tweepy.API(auth)
for tweet in tweepy.Cursor(api.search, q='#python', rpp=100).items():
# Do something
pass
In your case you have a max number of tweets to get, so as per the linked tutorial you could do:
import tweepy
MAX_TWEETS = 5000000000000000000000
auth = tweepy.OAuthHandler(CONSUMER_TOKEN, CONSUMER_SECRET)
api = tweepy.API(auth)
for tweet in tweepy.Cursor(api.search, q='#python', rpp=100).items(MAX_TWEETS):
# Do something
pass
If you want tweets after a given ID, you can also pass that argument.
Sorry, I can't answer in comment, too long. :)
Sure :) Check this example:
Advanced searched for #data keyword 2015 may - 2016 july
Got this url: https://twitter.com/search?l=&q=%23data%20since%3A2015-05-01%20until%3A2016-07-31&src=typd
session = requests.session()
keyword = 'data'
date1 = '2015-05-01'
date2 = '2016-07-31'
session.get('https://twitter.com/search?l=&q=%23+keyword+%20since%3A+date1+%20until%3A+date2&src=typd', streaming = True)
Now we have all the requested tweets,
Probably you could have problems with 'pagination'
Pagination url ->
https://twitter.com/i/search/timeline?vertical=news&q=%23data%20since%3A2015-05-01%20until%3A2016-07-31&src=typd&include_available_features=1&include_entities=1&max_position=TWEET-759522481271078912-759538448860581892-BD1UO2FFu9QAAAAAAAAETAAAAAcAAAASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&reset_error_state=false
Probably you could put a random tweet id, or you can parse first, or requests some data from twitter. It can be done.
Use Chrome's networking tab to find all the requested information :)
This code worked for me.
import tweepy
import pandas as pd
import os
#Twitter Access
auth = tweepy.OAuthHandler( 'xxx','xxx')
auth.set_access_token('xxx-xxx','xxx')
api = tweepy.API(auth,wait_on_rate_limit = True)
df = pd.DataFrame(columns=['text', 'source', 'url'])
msgs = []
msg =[]
for tweet in tweepy.Cursor(api.search, q='#bmw', rpp=100).items(10):
msg = [tweet.text, tweet.source, tweet.source_url]
msg = tuple(msg)
msgs.append(msg)
df = pd.DataFrame(msgs)
Check twitter api documentation, probably it allows just 300 tweets to parse.
I would recommend to forget api, make it with requests with streaming. The api is an implementation of requests with limitations.