I am using the below code to fetch replies for a tweet with id 1504510003201265684 but I keep getting hit limit warning after a certain amount of replies has been fetched and also it seems some replies are ignored. How can I get around with that?
name = 'netflix'
tweet_id = '1504510003201265684'
count = 0
replies=[]
for tweet in tweepy.Cursor(api.search,q='to:'+name, result_type='mixed',tweet_mode='extended', timeout=999999).items():
count += 1
if hasattr(tweet, 'in_reply_to_status_id_str'):
if (tweet.in_reply_to_status_id_str==tweet_id):
replies.append(tweet)
Add a method that handles the tweepy.RateLimitError, in this case it sleeps for 15 minutes
def limit_handled(cursor):
while True:
try:
yield cursor.next()
except tweepy.RateLimitError:
time.sleep(15 * 60)
name = 'netflix'
tweet_id = '1504510003201265684'
count = 0
replies=[]
for tweet in limit_handled(tweepy.Cursor(api.search,q='to:'+name, result_type='mixed',tweet_mode='extended', timeout=999999).items()):
count += 1
if hasattr(tweet, 'in_reply_to_status_id_str'):
if (tweet.in_reply_to_status_id_str==tweet_id):
replies.append(tweet)
Related
Scenario:
I want to retrieve items from DynamoDB table which has 200k records, I am trying to get them in multiple requests
for first request I want 100 records only.
for second request I want next 100 records, here I don't want the records which are already in first request.
My implementation:
scan_kwargs=None
if scan_kwargs is None:
scan_kwargs = {}
complete = False
while not complete:
try:
response = table.scan(Limit=10000, **scan_kwargs,
FilterExpression=Key('timestamp').between(dateFrom, dateTo)
)
except botocore.exceptions.ClientError as error:
raise Exception('Error')
next_key = response.get('LastEvaluatedKey')
scan_kwargs['ExclusiveStartKey'] = next_key
complete = True if next_key is None else False
if response['Items']:
for record in response['Items']:
print(record)
totalRecords = totalRecords + 1
if totalRecords > 100:
break
if totalRecords > 100:
break
From the above code I am only able to get first 100 records for multiple requests.
But my requirement is to get from 101 to 200 records and ignore first 100 records
Can anyone help with working examples according to my requirement?
I am coding a Twitter bot which joins giveaways of users that I follow.
The problem is that when I use a for loop to iterate over a ItemIterator Cursor of 50 items it breaks before finishing. It usually does 20 or 39-40 iterations.
My main function is:
from funciones import *
from config import *
api = login(user)
i=0
while 1>i:
tweets = get_tweets(api, 50, True, None, None)
file = start_stats()
for tweet in tweets:
try:
i = i+1
tweet = is_RT(tweet)
show(tweet)
check(api,tweet,file)
print(f'{i}) 1.5 - 2m tweets cd')
sleep(random.randrange(40, 60,1))
except Exception as e:
print(str(e))
st.append(e)
print('15-20 min cooldown')
sleep(random.randrange(900, 1200,1))
So when the loop usually does 39 iterations, the code jumps into the 15 min. cooldown getting these of Tweets:
len(tweets.current_page) - 1
Out[251]: 19
tweets.page_index
Out[252]: 19
tweets.limit
Out[253]: 50
tweets.num_tweets
Out[254]: 20
I've seen this in the Tweepy cursor.py but I still don't know how to fix it.
def next(self):
if self.limit > 0:
if self.num_tweets == self.limit:
raise StopIteration
if self.current_page is None or self.page_index == len(self.current_page) - 1:
# Reached end of current page, get the next page...
self.current_page = self.page_iterator.next()
self.page_index = -1
self.page_index += 1
self.num_tweets += 1
return self.current_page[self.page_index]
The function I use in my main function to get the cursor is this:
def get_tweets(api,count=1,cursor = False, user = None, id = None):
if id is not None:
tweets = api.get_status(id=id, tweet_mode='extended')
return tweets
if cursor:
if user is not None:
if count>0:
tweets = tp.Cursor(api.user_timeline, screen_name=user, tweet_mode='extended').items(count)
else:
tweets = tp.Cursor(api.user_timeline, screen_name=user, tweet_mode='extended').items()
else:
if count>0:
tweets = tp.Cursor(api.home_timeline, tweet_mode='extended').items(count)
else:
tweets = tp.Cursor(api.home_timeline, tweet_mode='extended').items()
else:
if user is not None:
tweets = api.user_timeline(screen_name=user, count=count,tweet_mode='extended')
else:
tweets = api.home_timeline(count=count, tweet_mode='extended')
return tweets
When I've tried test codes like
j = 0
tweets = get_tweets(api,50,True)
for i in tweets:
j=j+1
print(j)
j and tweets.num_tweets are almost always 50, but I think when this is not 50 is because I don't wait between request, because I've reached j=300 with this, so maybe the problem is in the check function:
(It's a previous check function which also has the same problem, I've noticed it when I've started getting stats, the only difference is that I return values if the Tweets has been liked, rt, etc.)
def check(tweet):
if (bool(is_seen(tweet))
+ bool(age_check(tweet,3))
+ bool(ignore_check(tweet)) == 0):
rt_check(tweet)
like_check(tweet)
follow_check(tweet)
tag_n_drop_check(tweet)
quoted_check(tweet)
This is the first time I asked help so I don't know if I've posted all the info needed. This is driving me mad since last week and I don't know who to ask :(
Thanks in advance!
The IdIterator that Cursor returns when used with API.home_timeline stops when it receives a page with no results. This is most likely what's happening, since the default count for the endpoint is 20 and:
The value of count is best thought of as a limit to the number of tweets to return because suspended or deleted content is removed after the count has been applied.
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/timelines/api-reference/get-statuses-home_timeline
This is a limitation of this Twitter API endpoint, as there's not another good way to determine when to stop paginating.
However, you can pass a higher count (e.g. 100 if that works for you, up to 200) to the endpoint while using Cursor with it and you'll be less likely to receive a premature empty page.
I'm building a program that collects a specified number of tweets(no specific hashtags, just random posts) from a specific country (based on co-ordinates) over the span of 1-2 months.
For example, I'm collecting 200 tweets/status updates from the United States which were posted anywhere between September and October.
The reason I'm doing this is because I want to gather these tweets and perform sentiment analysis on the to see whether or not the average tweet from a specified country is negative/positive.
The problem I'm having is that I don't know how to "filter" for random tweets/status updates because these kind of tweets don't have hashtags. Furthermore, I'm not sure if Twitter allows me to collect tweets which are 2 months old. Any suggestions?
code
import tweepy
from tweepy import OAuthHandler
import json
import datetime as dt
import time
import os
import sys
'''
I created a twitter account for anyone to use if they want to test the code!
I used Python 3 and tweepy version 3.5.0.
'''
def load_api():
''' Function that loads the twitter API after authorizing the user. '''
consumer_key = 'nn'
consumer_secret = 'nn'
access_token = 'nn'
access_secret = 'nnn'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
# load the twitter API via tweepy
return tweepy.API(auth)
def tweet_search(api, query, max_tweets, max_id, since_id, geocode):
''' Function that takes in a search string 'query', the maximum
number of tweets 'max_tweets', and the minimum (i.e., starting)
tweet id. It returns a list of tweepy.models.Status objects. '''
searched_tweets = []
while len(searched_tweets) < max_tweets:
remaining_tweets = max_tweets - len(searched_tweets)
try:
new_tweets = api.search(q=query, count=remaining_tweets,
since_id=str(since_id),
max_id=str(max_id-1))
# geocode=geocode)
print('found',len(new_tweets),'tweets')
if not new_tweets:
print('no tweets found')
break
searched_tweets.extend(new_tweets)
max_id = new_tweets[-1].id
except tweepy.TweepError:
print('exception raised, waiting 15 minutes')
print('(until:', dt.datetime.now()+dt.timedelta(minutes=15), ')')
time.sleep(15*60)
break # stop the loop
return searched_tweets, max_id
def get_tweet_id(api, date='', days_ago=9, query='a'):
''' Function that gets the ID of a tweet. This ID can then be
used as a 'starting point' from which to search. The query is
required and has been set to a commonly used word by default.
The variable 'days_ago' has been initialized to the maximum
amount we are able to search back in time (9).'''
if date:
# return an ID from the start of the given day
td = date + dt.timedelta(days=1)
tweet_date = '{0}-{1:0>2}-{2:0>2}'.format(td.year, td.month, td.day)
tweet = api.search(q=query, count=1, until=tweet_date)
else:
# return an ID from __ days ago
td = dt.datetime.now() - dt.timedelta(days=days_ago)
tweet_date = '{0}-{1:0>2}-{2:0>2}'.format(td.year, td.month, td.day)
# get list of up to 10 tweets
tweet = api.search(q=query, count=10, until=tweet_date)
print('search limit (start/stop):',tweet[0].created_at)
# return the id of the first tweet in the list
return tweet[0].id
def write_tweets(tweets, filename):
''' Function that appends tweets to a file. '''
with open(filename, 'a') as f:
for tweet in tweets:
json.dump(tweet._json, f)
f.write('\n')
def main():
''' This is a script that continuously searches for tweets
that were created over a given number of days. The search
dates and search phrase can be changed below. '''
''' search variables: '''
search_phrases = ['#PythonPleaseWork']
time_limit = 1.0 # runtime limit in hours
max_tweets = 20 # number of tweets per search but it doesn't seem to be working
min_days_old, max_days_old = 1, 1 # search limits e.g., from 7 to 8
# gives current weekday from last week,
# min_days_old=0 will search from right now
USA = '39.8,-95.583068847656,2500km' # this geocode includes nearly all American
# states (and a large portion of Canada)
# but it still fetches from outside the USA
# loop over search items,
# creating a new file for each
for search_phrase in search_phrases:
print('Search phrase =', search_phrase)
''' other variables '''
name = search_phrase.split()[0]
json_file_root = name + '/' + name
os.makedirs(os.path.dirname(json_file_root), exist_ok=True)
read_IDs = False
# open a file in which to store the tweets
if max_days_old - min_days_old == 1:
d = dt.datetime.now() - dt.timedelta(days=min_days_old)
day = '{0}-{1:0>2}-{2:0>2}'.format(d.year, d.month, d.day)
else:
d1 = dt.datetime.now() - dt.timedelta(days=max_days_old-1)
d2 = dt.datetime.now() - dt.timedelta(days=min_days_old)
day = '{0}-{1:0>2}-{2:0>2}_to_{3}-{4:0>2}-{5:0>2}'.format(
d1.year, d1.month, d1.day, d2.year, d2.month, d2.day)
json_file = json_file_root + '_' + day + '.json'
if os.path.isfile(json_file):
print('Appending tweets to file named: ',json_file)
read_IDs = True
# authorize and load the twitter API
api = load_api()
# set the 'starting point' ID for tweet collection
if read_IDs:
# open the json file and get the latest tweet ID
with open(json_file, 'r') as f:
lines = f.readlines()
max_id = json.loads(lines[-1])['id']
print('Searching from the bottom ID in file')
else:
# get the ID of a tweet that is min_days_old
if min_days_old == 0:
max_id = -1
else:
max_id = get_tweet_id(api, days_ago=(min_days_old-1))
# set the smallest ID to search for
since_id = get_tweet_id(api, days_ago=(max_days_old-1))
print('max id (starting point) =', max_id)
print('since id (ending point) =', since_id)
''' tweet gathering loop '''
start = dt.datetime.now()
end = start + dt.timedelta(hours=time_limit)
count, exitcount = 0, 0
while dt.datetime.now() < end:
count += 1
print('count =',count)
# collect tweets and update max_id
tweets, max_id = tweet_search(api, search_phrase, max_tweets,
max_id=max_id, since_id=since_id,
geocode=USA)
# write tweets to file in JSON format
if tweets:
write_tweets(tweets, json_file)
exitcount = 0
else:
exitcount += 1
if exitcount == 3:
if search_phrase == search_phrases[-1]:
sys.exit('Maximum number of empty tweet strings reached - exiting')
else:
print('Maximum number of empty tweet strings reached - breaking')
break
if __name__ == "__main__":
main()
You can not get 2 months historical data with Search API.
"The Twitter Search API searches against a sampling of recent Tweets published in the past 7 days.
Before getting involved, it’s important to know that the Search API is focused on relevance and not completeness. This means that some Tweets and users may be missing from search results."
https://developer.twitter.com/en/docs/tweets/search/overview/basic-search
You can use Streaming api with country filter and instead of hashtags you can use a few stop words. Example, for US you can use "the,and" , for France "le,la,et" etc.
In addition, it is not a good idea to share your access tokens.
I've been using the example in this post
to create a system that searches and gets a large number of Tweets in a short time period. However, each time I switch to a new API key (make a new cursor) the search starts all over from the beginning and gets me repeated Tweets. How do I get each cursor to start where the other left off? What am I missing? Here's the code I am using:
currentAPI = 0
a = 0
currentCursor = tweepy.Cursor(apis[currentAPI].search, q = '%40deltaKshatriya')
c = currentCursor.items()
mentions = []
onlyMentions = []
while True:
try:
tweet = c.next()
if a > 100000:
break
else:
onlyMentions.append(tweet.text)
for t in tTweets:
if tweet.in_reply_to_status_id == t.id:
print str(a) + tweet.text
mentions.append(tweet.text)
a = a + 1
except tweepy.TweepError:
print "Rate limit hit"
if (currentAPI < 9):
print "Switching to next sat in constellation"
currentAPI = currentAPI + 1
#currentCursor = c.iterator.next_cursor
currentCursor = tweepy.Cursor(apis[currentAPI].search, q = '%40deltaKshatriya', cursor = currentCursor)
c = currentCursor.items()
else:
print "All sats maxed out, waiting and will try again"
currentAPI = 0
currentCursor = tweepy.Cursor(apis[currentAPI].search, q = '%40deltaKshatriya', cursor = currentCursor)
c = currentCursor.items()
time.sleep(60 * 15)
continue
except StopIteration:
break
I found a workaround that I think works, although I still encounter some issues. The idea is to add into
currentCursor = tweepy.Cursor(apis[currentAPI].search, q = '%40deltaKshatriya', cursor = currentCursor, max_id = max_id)
Where max_id is the id of the last tweet fetched before the rate limit was hit. The only issue I've encountered is with StopIteration being raised really early (before I get the full 100,000 Tweets) but that I think is a different SO question.
I know that Twitter search API has it's own limitations and returns much less search results rather than the actual results but I was searching through a popular hashtag and it only returns 60 result which is not acceptable at all!
here is my code in which I've used twython module.
results = {}
last_id = None
count = 0
while(len(results.keys()) != min_count):
if(last_id):
tmp_results = self.api.search(q="#mentionsomeoneimportantforyou", count=100, max_id=last_id)
else:
tmp_results = self.api.search(q=#mentionsomeoneimportantforyou, count=100)
count += len(tmp_results['statuses'])
print("new len: ", count)
last_id = get_max_id(tmp_results)
def get_max_id(results):
next_results_url_params = results['search_metadata']['next_results']
next_max_id = next_results_url_params.split('max_id=')[1].split('&')[0]
return next_max_id
Is there anything run with this code? It not, isn't 60 of many a joke?
The twython docs suggest not doing it that way, using the cursor approach instead:
twitter = Twython(APP_KEY, APP_SECRET,
OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
results = twitter.cursor(twitter.search, q='python')
count = 0
for result in results:
print result['id_str']
count += 1
print count
prints:
... many here ...
561918886380322816
561918859050229761
561919180480737282
561919151162130434
561919142450581504
561919113812246529
561919107134922753
561919103867559938
561919077481218049
561918994454556672
561918971755372546
561918962381127680
561918948288258048
561918911751655425
561918904126042112
561918886380322816
561918859050229761
645
I think I found the reason. According to this link Twitter doesn't return tweets older than a week through search api.