TooManyRequests: 429 Too Many Requests while running tweepy - python

Through the basic Academic Research Developer Account, I'm using the Tweepy API to collect tweets containing specified keywords or hashtags. This enables me to collect 10,000,000 tweets per month. Using the entire archive search, I'm trying to collect tweets from one whole calendar date at a time. I've gotten a rate limit error (despite the wait_on_rate_limit flag being set to true) Now there's an error with the request limit.
here is the code
import pandas as pd
import tweepy
# function to display data of each tweet
def printtweetdata(n, ith_tweet):
print()
print(f"Tweet {n}:")
print(f"Username:{ith_tweet[0]}")
print(f"tweet_ID:{ith_tweet[1]}")
print(f"userID:{ith_tweet[2]}")
print(f"creation:{ith_tweet[3]}")
print(f"location:{ith_tweet[4]}")
print(f"Total Tweets:{ith_tweet[5]}")
print(f"likes:{ith_tweet[6]}")
print(f"retweets:{ith_tweet[7]}")
print(f"hashtag:{ith_tweet[8]}")
# function to perform data extraction
def scrape(words, numtweet, since_date, until_date):
# Creating DataFrame using pandas
db = pd.DataFrame(columns=['username', 'tweet_ID', 'userID',
'creation', 'location', 'text','likes','retweets', 'hashtags'])
# We are using .Cursor() to search through twitter for the required tweets.
# The number of tweets can be restricted using .items(number of tweets)
tweets = tweepy.Cursor(api.search_full_archive,'research',query=words,
fromDate=since_date, toDate=until_date).items(numtweet)
# .Cursor() returns an iterable object. Each item in
# the iterator has various attributes that you can access to
# get information about each tweet
list_tweets = [tweet for tweet in tweets]
# Counter to maintain Tweet Count
i = 1
# we will iterate over each tweet in the list for extracting information about each tweet
for tweet in list_tweets:
username = tweet.user.screen_name
tweet_ID = tweet.id
userID= tweet.author.id
creation = tweet.created_at
location = tweet.user.location
likes = tweet.favorite_count
retweets = tweet.retweet_count
hashtags = tweet.entities['hashtags']
# Retweets can be distinguished by a retweeted_status attribute,
# in case it is an invalid reference, except block will be executed
try:
text = tweet.retweeted_status.full_text
except AttributeError:
text = tweet.text
hashtext = list()
for j in range(0, len(hashtags)):
hashtext.append(hashtags[j]['text'])
# Here we are appending all the extracted information in the DataFrame
ith_tweet = [username, tweet_ID, userID,
creation, location, text, likes,retweets,hashtext]
db.loc[len(db)] = ith_tweet
# Function call to print tweet data on screen
printtweetdata(i, ith_tweet)
i = i+1
filename = 'C:/Users/USER/Desktop/الجامعة الالمانية/output/twitter.csv'
# we will save our database as a CSV file.
db.to_csv(filename)
if __name__ == '__main__':
consumer_key = "####"
consumer_secret = "###"
access_token = "###"
access_token_secret = "###"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
since_date = '200701010000'
until_date = '202101012359'
words = "#USA"
# number of tweets you want to extract in one run
numtweet = 1000
scrape(words, numtweet, since_date, until_date)
print('Scraping has completed!')
I got this error:
TooManyRequests: 429 Too Many Requests
Request exceeds account’s current package request limits. Please upgrade your package and retry or contact Twitter about enterprise access.

Unfortunately, I believe this is due to the Sandbox quota. For a premium account it would be more.
Tweepy API Documentation
You may check out this answer here - Limit

Related

How can I fix forbidden 403 error I get using tweepy?

I am trying to extract tweets that include a specific hashtag(#nike) and I have essential level access(Twitter developer account). I only want to collect 100 tweets. Help, please! Thank you!
I keep getting this error:
Forbidden Traceback (most recent call last)
Forbidden: 403 Forbidden
453 - You currently have Essential access which includes access to Twitter API v2 endpoints only. If you need access to this endpoint, you’ll need to apply for Elevated access via the Developer Portal. You can learn more here: https://developer.twitter.com/en/docs/twitter-api/getting-started/about-twitter-api#v2-access-leve
import tweepy
# function to display data of each tweet
def printtweetdata(n, ith_tweet):
print()
print(f"Tweet {n}:")
print(f"Username:{ith_tweet[0]}")
print(f"Description:{ith_tweet[1]}")
print(f"Location:{ith_tweet[2]}")
print(f"Following Count:{ith_tweet[3]}")
print(f"Follower Count:{ith_tweet[4]}")
print(f"Total Tweets:{ith_tweet[5]}")
print(f"Retweet Count:{ith_tweet[6]}")
print(f"Tweet Text:{ith_tweet[7]}")
print(f"Hashtags Used:{ith_tweet[8]}")
# function to perform data extraction
def scrape(words, date_since, numtweet):
# Creating DataFrame using pandas
db = pd.DataFrame(columns=['username',
'description',
'location',
'following',
'followers',
'totaltweets',
'retweetcount',
'text',
'hashtags'])
# We are using .Cursor() to search
# through twitter for the required tweets.
# The number of tweets can be
# restricted using .items(number of tweets)
tweets = tweepy.Cursor(api.search_tweets,
words, lang="en",
since_id=date_since,
tweet_mode='extended').items(numtweet)
# .Cursor() returns an iterable object. Each item in
# the iterator has various attributes
# that you can access to
# get information about each tweet
list_tweets = [tweet for tweet in tweets]
# Counter to maintain Tweet Count
i = 1
# we will iterate over each tweet in the
# list for extracting information about each tweet
for tweet in list_tweets:
username = tweet.user.screen_name
description = tweet.user.description
location = tweet.user.location
following = tweet.user.friends_count
followers = tweet.user.followers_count
totaltweets = tweet.user.statuses_count
retweetcount = tweet.retweet_count
hashtags = tweet.entities['hashtags']
# Retweets can be distinguished by
# a retweeted_status attribute,
# in case it is an invalid reference,
# except block will be executed
try:
text = tweet.retweeted_status.full_text
except AttributeError:
text = tweet.full_text
hashtext = list()
for j in range(0, len(hashtags)):
hashtext.append(hashtags[j]['text'])
# Here we are appending all the
# extracted information in the DataFrame
ith_tweet = [username, description,
location, following,
followers, totaltweets,
retweetcount, text, hashtext]
db.loc[len(db)] = ith_tweet
# Function call to print tweet data on screen
printtweetdata(i, ith_tweet)
i = i+1
filename = 'scraped_tweets.csv'
# we will save our database as a CSV file.
db.to_csv(filename)
if __name__ == '__main__':
# Enter your own credentials obtained
# from your developer account
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
# Enter Hashtag and initial date
print("#nike")
words = input("#nike")
print("2022-01-01")
date_since = input("2022-01-01")
# number of tweets you want to extract in one run
numtweet = 10
scrape(words, date_since, numtweet)
print('Scraping has completed!')```

How do I pull tweets from a user timeline for specific covid related keyword on python?

I want to retrieve At least 1000 tweets from a {user} timeline replies included
● At least 100 tweets of the 1000 tweets are related to Covid-19 keyword like ["covid19", "wuhan", "mask", "lockdown", "quarantine", "sars-cov-2"] etc.
I wrote the function to retrieve the tweets:
def get_tweets_by_user(self, screen_name):
'''
Use user_timeline api to fetch POI related tweets, some postprocessing may be required.
:return: List
'''
result = []
tweets = api.user_timeline(screen_name=screen_name,
# 200 is the maximum allowed count
count=200,
include_rts = True,
# Necessary to keep full_text
# otherwise only the first 140 words are extracted
tweet_mode = 'extended'
)
for tw in tweets:
result.append(tw)
return result
Now how do I retrieve 100 tweets related to covid-19 keywords from user timeline?
Register for Twitter developer API. You'll need a couple consumer keys. Tell them you're a student.
import requests as re
import json
import twitter # install this library to work with twitter dev.
consumer_key = "your key"
consumer_secret = "your key"
access_token = "your key"
access_token_secret = "your key"
api = twitter.Api(consumer_key=yourkey,
consumer_secret=yoursecret,
access_token_key=youraccesstoken,
access_token_secret=yourtokensecret)
FILTER = ["covid-19 string here"] # PUT YOUR COVID 19 STRING HERE
LANGUAGES = ['en']
store_file = "outputfileforcovidtweets.txt"
_location = ["put coordinates here"]
def main():
with open(store_file, 'a') as z:
for line in api.GetStreamFilter(track=FILTER, languages=LANGUAGES, locations=_location):
z.write(json.dumps(line))
z.write('\n')
main()
This will collect real-time tweets to your output file. :)

Python: Hashtag search with Tweepy

I'd like to get Tweets with #MeTooMen using Tweepy.
There are many Tweets using this hashtag as far as I searched Twitter, but I get 0 result when I try to get these Tweets with Tweepy. Do you have any idea what I can do to improve this code?
import os
import tweepy as tw
import pandas as pd
api_key = '*'
api_secret_key = '*'
access_token = '*'
access_token_secret = '*'
auth = tw.OAuthHandler(api_key, api_secret_key)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)
# Define the search term and the date_since date as variables
search_words = "#metoomen"
date_since = "2017-10-17"
date_until = "2018-01-31"
tweets = tw.Cursor(api.search,
q = search_words,
lang = "en",
since = date_since,
until = date_until).items(5)
users_locs = [[tweet.user.screen_name, tweet.user.location, tweet.text] for tweet in tweets]
users_locs
>>> []
API.search uses Twitter's standard search API and doesn't accept date_since or date_until parameters:
Keep in mind that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week.
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/guides/standard-operators also says:
[It] is not a complete index of all Tweets, but instead an index of recent Tweets. The index includes between 6-9 days of Tweets.
You'll need to use the Full-archive premium search API endpoint, with API.search_full_archive, instead.

How do I use the "referenced_tweets.type" response field in the Twitter API in Python code where I'm trying to extract Twitter hashtag data?

I am using code which is working fine. I have added the whole code as taken from geeks for geeks. But I want to modify it to add referenced_tweets.type. I am new to APIs and really want to understand how to fix this.
import pandas as pd
import tweepy
# function to display data of each tweet
def printtweetdata(n, ith_tweet):
print()
print(f"Tweet {n}:")
print(f"Username:{ith_tweet[0]}")
print(f"likes:{ith_tweet[1]}")
print(f"Location:{ith_tweet[2]}")
print(f"Following Count:{ith_tweet[3]}")
print(f"Follower Count:{ith_tweet[4]}")
print(f"Total Tweets:{ith_tweet[5]}")
print(f"Retweet Count:{ith_tweet[6]}")
print(f"Tweet Text:{ith_tweet[7]}")
print(f"Hashtags Used:{ith_tweet[8]}")
# function to perform data extraction
def scrape(words, date_since, numtweet):
# Creating DataFrame using pandas
db = pd.DataFrame(columns=['username', 'likes', 'location', 'following',
'followers', 'totaltweets', 'retweetcount', 'text', 'hashtags'])
# We are using .Cursor() to search through twitter for the required tweets.
# The number of tweets can be restricted using .items(number of tweets)
tweets = tweepy.Cursor(api.search, q=words, lang="en",
since=date_since, tweet_mode='extended').items(numtweet)
# .Cursor() returns an iterable object. Each item in
# the iterator has various attributes that you can access to
# get information about each tweet
list_tweets = [tweet for tweet in tweets]
# Counter to maintain Tweet Count
i = 1
# we will iterate over each tweet in the list for extracting information about each tweet
for tweet in list_tweets:
username = tweet.user.screen_name
likes = tweet.favorite_count
location = tweet.user.location
following = tweet.user.friends_count
followers = tweet.user.followers_count
totaltweets = tweet.user.statuses_count
retweetcount = tweet.retweet_count
hashtags = tweet.entities['hashtags']
# Retweets can be distinguished by a retweeted_status attribute,
# in case it is an invalid reference, except block will be executed
try:
text = tweet.retweeted_status.full_text
except AttributeError:
text = tweet.full_text
hashtext = list()
for j in range(0, len(hashtags)):
hashtext.append(hashtags[j]['text'])
# Here we are appending all the extracted information in the DataFrame
ith_tweet = [username, likes, location, following,
followers, totaltweets, retweetcount, text, hashtext]
db.loc[len(db)] = ith_tweet
# Function call to print tweet data on screen
printtweetdata(i, ith_tweet)
i = i+1
filename = 'etihad.csv'
# we will save our database as a CSV file.
db.to_csv(filename)
if __name__ == '__main__':
# Enter your own credentials obtained
# from your developer account
consumer_key =
consumer_secret =
access_key =
access_secret =
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
# Enter Hashtag and initial date
print("Enter Twitter HashTag to search for")
words = input()
print("Enter Date since The Tweets are required in yyyy-mm--dd")
date_since = input()
# number of tweets you want to extract in one run
numtweet = 100
scrape(words, date_since, numtweet)
print('Scraping has completed!')
I now want to add referenced_tweets.type in order to get if the Tweet is a Retweet or not but I'm not sure how to do it. Can someone help?
API.search uses the standard search API, part of Twitter API v1.1.
referenced_tweets is a value that can be set for tweet.fields, a Twitter API v2 fields parameter.
Currently, if you want to use Twitter API v2 through Tweepy, you'll have to use the development version of Tweepy on the master branch and its Client class. Otherwise, you'll need to wait until Tweepy v4.0 is released.
Alternatively, if your only goal is to determine whether a Status/Tweet object is a Retweet or not, you can simply check for the retweeted_status attribute.

questions about api of Tweepy

1.The api: stream.filter(). I read the documentation which said that all parameters can be optional. However, when I left it empty, it won't work.
Still the question with api. It is said that if I write code like below:
twitter_stream.filter(locations = [-180,-90, 180, 90])
It can filter all tweets with geological information. However, when I check the json data, I still find many tweets, the value of their attribute geo are still null.
3.I tried to use stream to get as many tweets as possible. However, it is said that it can get tweets in real time. will there be any parameters to set the time
like to collect tweets from 2013 to 2015
4.I tried to collect data through users and their followers and continue the same step until I get as many tweets as I want. So my code is like below:
import tweepy
import chardet
import json
import sys
#set one global list to store all user_names
users_unused = ["Raithan8"]
users_used = []
def process_or_store(tweet):
print(json.dumps(tweet))
consumer_key =
consumer_secret =
access_token =
access_token_secret =
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
def getAllTweets():
#initialize one empty list tw store all tweets
screen_name = users_unused[0]
users_unused.remove(screen_name)
users_used.append(screen_name)
print("this is the current user: " + screen_name)
for friend in tweepy.Cursor(api.friends, screen_name = screen_name).items():
if friend not in users_unused and friend not in users_used:
users_unused.append(friend.screen_name)
for follower in tweepy.Cursor(api.followers, screen_name = screen_name).items():
if follower not in users_unused and follower not in users_used:
users_unused.append(follower.screen_name)
print(users_unused)
print(users_used)
alltweets = []
#tweepy limits at most 200 tweets each time
new_tweets = api.user_timeline(screen_name = screen_name, count = 200)
alltweets.extend(new_tweets)
if not alltweets:
return alltweets
oldest = alltweets[-1].id - 1
while(len(new_tweets) <= 0):
new_tweets = api.user_timeline(screen_name = screen_name, count = 200, max_id = oldest)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
return alltweets
def storeTweets(alltweets, file_name = "tweets.json"):
for tweet in alltweets:
json_data = tweet._json
data = json.dumps(tweet._json)
with open(file_name, "a") as f:
if json_data['geo'] is not None:
f.write(data)
f.write("\n")
if __name__ == "__main__":
while(1):
if not users_unused:
break
storeTweets(getAllTweets())
I don't why it runs so slow. Maybe it is mainly because I initialize tweepy API as below
api = tweepy.API(auth, wait_on_rate_limit=True)
But if I don't initialize it in this way, it will raise error below:
raise RateLimitError(error_msg, resp)
tweepy.error.RateLimitError: [{'message': 'Rate limit exceeded', 'code': 88}]
2) There's a difference between a tweet with coordinates and filtering by location.
Filtering by location means that the sender is located in the range of your filter. If you set it globally twitter_stream.filter(locations = [-180,-90, 180, 90]) it will return tweets for people who set their country name in their preferences.
If you need to filter by coordinates (a tweet that has a coordinates) you can take a look at my blog post. But basically you need to set a listener and then check if the tweet have some coordinates.
3 and 4) Twitter's Search API and Twitter's Streaming API are different in many ways and restrictions about rate limits (Tweepy) and Twitter rate limit.
You have a limitation about how many tweets you want to get (in the past).
Check again Tweepy API because wait_on_rate_limit set as true just wait that your current limit window is available again. That's why it's "slow" as you said.
However using streaming API doesn't have such restrictions.

Categories