Return the number of remaining hits tweepy - python

EDIT: I am trying the following code in order to read a list of ids and get their corresponant names. I am trying to use reamin_search_limits in order to avoid rate_limit errors.
limits = api.rate_limit_status()
remain_search_limits = limits['resources']['search']['/search/tweets']['remaining']
stream = open('myfile','w')
ss = open('userNames', 'w')
for ids in content:
try:
limits = api.rate_limit_status()
remain_search_limits = limits['resources']['search']['/search/tweets']['remaining']
print 'you have', remain_search_limits, 'API calls remaining until next hour'
if remain_search_limits < 2:
dtcode = datetime.utcnow()
unixtime = calendar.timegm(dtcode.utctimetuple())
sleeptime = rate_limit_json.get('reset_time_in_seconds') - unixtime + 10
print 'waiting ', sleeptime, 'seconds'
sleep(sleeptime)
else:
user = api.get_user(ids)
stream.write(str(user.id)+"\n")
ss.write(str(user.name)+"\n")
except (tweepy.TweepError) as e:
print e
stream.close()
ss.close()
Everytime remain_search_limits is printed it return 180 until to get tweepError exception.

This example shows you how to access how many tweets are remaining.
print rate_limit_json["resources"]["search"]['/search/tweets']['remaining']
180
"resources" is the key you should be using to access the information inside.
If you want to update the value, put it in a loop reassigning the value after your time.sleep().
Put all the code inside a while loop:
Something like this:
limits = api.rate_limit_status()
remain_search_limits = limits['resources']['search']['/search/tweets']['remaining']
while remain_search_limits >2:
limits = api.rate_limit_status()
remain_search_limits = limits['resources']['search']['/search/tweets']['remaining']
else:
dtcode = datetime.utcnow()
unixtime = calendar.timegm(dtcode.utctimetuple())
sleeptime = rate_limit_json.get('reset_time_in_seconds') - unixtime + 10
print 'waiting ', sleeptime, 'seconds'
sleep(sleeptime)
I have not tested the code but it should be close to what you need.
You may want to sleep between calls, I am unfamiliar with the api so not sure exactly what you are doing.

Related

Tweepy Cursor not reaching its limit

I am coding a Twitter bot which joins giveaways of users that I follow.
The problem is that when I use a for loop to iterate over a ItemIterator Cursor of 50 items it breaks before finishing. It usually does 20 or 39-40 iterations.
My main function is:
from funciones import *
from config import *
api = login(user)
i=0
while 1>i:
tweets = get_tweets(api, 50, True, None, None)
file = start_stats()
for tweet in tweets:
try:
i = i+1
tweet = is_RT(tweet)
show(tweet)
check(api,tweet,file)
print(f'{i}) 1.5 - 2m tweets cd')
sleep(random.randrange(40, 60,1))
except Exception as e:
print(str(e))
st.append(e)
print('15-20 min cooldown')
sleep(random.randrange(900, 1200,1))
So when the loop usually does 39 iterations, the code jumps into the 15 min. cooldown getting these of Tweets:
len(tweets.current_page) - 1
Out[251]: 19
tweets.page_index
Out[252]: 19
tweets.limit
Out[253]: 50
tweets.num_tweets
Out[254]: 20
I've seen this in the Tweepy cursor.py but I still don't know how to fix it.
def next(self):
if self.limit > 0:
if self.num_tweets == self.limit:
raise StopIteration
if self.current_page is None or self.page_index == len(self.current_page) - 1:
# Reached end of current page, get the next page...
self.current_page = self.page_iterator.next()
self.page_index = -1
self.page_index += 1
self.num_tweets += 1
return self.current_page[self.page_index]
The function I use in my main function to get the cursor is this:
def get_tweets(api,count=1,cursor = False, user = None, id = None):
if id is not None:
tweets = api.get_status(id=id, tweet_mode='extended')
return tweets
if cursor:
if user is not None:
if count>0:
tweets = tp.Cursor(api.user_timeline, screen_name=user, tweet_mode='extended').items(count)
else:
tweets = tp.Cursor(api.user_timeline, screen_name=user, tweet_mode='extended').items()
else:
if count>0:
tweets = tp.Cursor(api.home_timeline, tweet_mode='extended').items(count)
else:
tweets = tp.Cursor(api.home_timeline, tweet_mode='extended').items()
else:
if user is not None:
tweets = api.user_timeline(screen_name=user, count=count,tweet_mode='extended')
else:
tweets = api.home_timeline(count=count, tweet_mode='extended')
return tweets
When I've tried test codes like
j = 0
tweets = get_tweets(api,50,True)
for i in tweets:
j=j+1
print(j)
j and tweets.num_tweets are almost always 50, but I think when this is not 50 is because I don't wait between request, because I've reached j=300 with this, so maybe the problem is in the check function:
(It's a previous check function which also has the same problem, I've noticed it when I've started getting stats, the only difference is that I return values if the Tweets has been liked, rt, etc.)
def check(tweet):
if (bool(is_seen(tweet))
+ bool(age_check(tweet,3))
+ bool(ignore_check(tweet)) == 0):
rt_check(tweet)
like_check(tweet)
follow_check(tweet)
tag_n_drop_check(tweet)
quoted_check(tweet)
This is the first time I asked help so I don't know if I've posted all the info needed. This is driving me mad since last week and I don't know who to ask :(
Thanks in advance!
The IdIterator that Cursor returns when used with API.home_timeline stops when it receives a page with no results. This is most likely what's happening, since the default count for the endpoint is 20 and:
The value of count is best thought of as a limit to the number of tweets to return because suspended or deleted content is removed after the count has been applied.
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/timelines/api-reference/get-statuses-home_timeline
This is a limitation of this Twitter API endpoint, as there's not another good way to determine when to stop paginating.
However, you can pass a higher count (e.g. 100 if that works for you, up to 200) to the endpoint while using Cursor with it and you'll be less likely to receive a premature empty page.

Python: Splitting a "for obj in reponse.json() loop" to request from the xth object forth

Here is a bit of context, I have a program to gets data from an API. It does this in two requests one for the total amount of points and the second a request for each point in the data. These get appended into an array.
def fetch_details(url: str):
response = requests.get(url)
# Makes request call to get the data of detail
# save_file(folder_path,GipodId,text2)
# any other processe
return response.json()
def fetch_data_points(url: str):
limit_request = 1000
# Placeholder for limit: please do not remove = 1000000000 -JJ
folder_path_reset("api_request_jsons","csv","Geographic_information")
total_start_time = start_time_measure()
start_time = start_time_measure(
'Starting Phase 1: First request from API: Data Points')
response = requests.get(url,params={"limit": limit_request})
end_time = end_time_measure(start_time, "Request completed: ")
print(len(response.json()))
time_estimated = end_time/len(response.json())
print(time_estimated)
end_time_measure(total_start_time, "End of Phase 1, completed in: ")
return response.json()
def fetch_details_of_data_points(url: str):
input_json = fetch_data_points(url)
fetch_points_save(input_json)
all_location_points_details = []
amount_of_objects = len(input_json)
total_start_time = start_time_measure()
start_time = start_time_measure(f'Starting Phase 2: Second request from API: {str(amount_of_objects)} requested')
#for i in tqdm(range(amount_of_objects),miniters=0.000000001):
# for obj in input_json:
# all_location_points_details.append(fetch_details(obj.get("detail")))
with tqdm(total=amount_of_objects) as pbar:
for obj in input_json:
all_location_points_details.append(fetch_details(obj.get("detail")))
pbar.update(1)
However I have noticed a certain flaw in my program I may have a solution for but I do not know how to implement. You see when the amount of data requested is massive (over more than 10.000 points) there can always happen a disconnect causing my program to fail. So as a solution I have would like this loop:
with tqdm(total=amount_of_objects) as pbar:
for obj in input_json:
all_location_points_details.append(fetch_details(obj.get("detail")))
pbar.update(1)
To be split a factor of a value i (or x) that is calculated by the following:
value y = 1000
value x = round(Amount of objects/y) --> Round because this needs to be rounded up no matter.
So lets say I have 145862 objects to request details from by my formula that is suppose to be 14.5 rounded up 15 sessions.
So 1 session request the first 1000 obj, starting from obj 1 and ending at 1000. The next session starts requesting from obj 2001. Next sessions starts from obj
So this is technically this:
i = 0
while i < x
for obj (starting from i + 1 object ending at 1*y ) in input_json:
all_location_points_details.append(fetch_details(obj.get("detail")))
i += 1
Thing is the part of I do not know how to program this. Can anyone help me with this?

Nested loop in Python when using multiple parameters in API calls

I have a program to call an API, return the JSON data and write it to a CSV.
The program loops through a list of entities as the first parameter in the API call, but also now needs to loop through a second parameter set (start and end times in epoch) because the API has a max of pulling a week of data at a time.
Example:
API call: ex.com/api/entity/timecards?start_time=1531306800&end_time=1531846800&pretty=1
So I need to loop through all of the entities, and then loop through an entire year's worth of data, a week at a time.
code example so far for the API call function:
def callAPI(entities):
for j in range(len(entities)):
locnum = entities[j][:5]
locnumv = entities[j]
startTime =
endTime =
url = "http://ex.com/api/entity/" + entity[j] + "/timecards?start_time=" + startTime + "&end_time=" + endTime
querystring = {"pretty":"1"}
headers = {
'Api-Key': ""
}
r = requests.request("GET", url, headers=headers, params=querystring)
d = r.json()
The program then goes on to write the data to rows in a CSV, which is all successful when tested with looping through the entities with static time parameters.
So I just need to figure out how would I create another nested for loop to loop through the start time/end time + 518400 seconds (6 days instead of 7 to be safe) and factor in a timeout since this is effectively going to be 20,000+ API calls by the time it's all said and done?
First of all, because you are just using j for getting the current entity, you could replace for j in range(len(entities)) by for entity in entities, it reads better. As for the question, you could just use an inner for loop to iterate over each week. The whole code will be:
def callAPI(entities):
for entity in entities:
locnum = entity[:5]
locnumv = entity # This is redundant
START = 1531306800 # starting time, 1 year ago
END = START + 31536000 # final time, e.g. the current time
TIME_STEP = 518400 # 1 day, 1 week, 1 month
for start_time in range(START, END, TIME_STEP):
end_time = start_time + TIME_STEP - 1 # substract 1 for preventing overlap of times
url = "http://ex.com/api/entity/%s/timecards?start_time=%d&end_time=%d" % (entity, start_time, end_time)
querystring = {"pretty":"1"}
headers = {'Api-Key': ""}
try:
r = requests.request("GET", url, headers=headers, params=querystring)
except:
break
d = r.json()
# Do something with the data
I hope this can help you!!
First off, you can just do:
for entity in entities:
instead of:
for j in range(len(entities)):
and then use entity instead of entities[j]
When it comes to looping through your epoch times. You will have to set your start time and then set your end time to start_time + 540000 inside of another loop:
start_time = 1531306800
i = 0
while True:
if i != 0:
start_time = end_time
end_time = start_time + 540000
url = "http://ex.com/api/entity/" + entity + "/timecards?start_time=" + start_time + "&end_time=" + end_time
querystring = {"pretty":"1"}
headers = {'Api-Key': ""}
try:
r = requests.request("GET", url, headers=headers, params=querystring)
except:
break
d = r.json()
Basically, you are going to loop through all of the epoch times until the request fails. Once it does, you will exit the loop and go to your next entity. The new entity's url will start at the same epoch time as the entity before it and so on.
I hope that helps!

Tweepy use multiple API keys with cursor to search Twitter

I've been using the example in this post
to create a system that searches and gets a large number of Tweets in a short time period. However, each time I switch to a new API key (make a new cursor) the search starts all over from the beginning and gets me repeated Tweets. How do I get each cursor to start where the other left off? What am I missing? Here's the code I am using:
currentAPI = 0
a = 0
currentCursor = tweepy.Cursor(apis[currentAPI].search, q = '%40deltaKshatriya')
c = currentCursor.items()
mentions = []
onlyMentions = []
while True:
try:
tweet = c.next()
if a > 100000:
break
else:
onlyMentions.append(tweet.text)
for t in tTweets:
if tweet.in_reply_to_status_id == t.id:
print str(a) + tweet.text
mentions.append(tweet.text)
a = a + 1
except tweepy.TweepError:
print "Rate limit hit"
if (currentAPI < 9):
print "Switching to next sat in constellation"
currentAPI = currentAPI + 1
#currentCursor = c.iterator.next_cursor
currentCursor = tweepy.Cursor(apis[currentAPI].search, q = '%40deltaKshatriya', cursor = currentCursor)
c = currentCursor.items()
else:
print "All sats maxed out, waiting and will try again"
currentAPI = 0
currentCursor = tweepy.Cursor(apis[currentAPI].search, q = '%40deltaKshatriya', cursor = currentCursor)
c = currentCursor.items()
time.sleep(60 * 15)
continue
except StopIteration:
break
I found a workaround that I think works, although I still encounter some issues. The idea is to add into
currentCursor = tweepy.Cursor(apis[currentAPI].search, q = '%40deltaKshatriya', cursor = currentCursor, max_id = max_id)
Where max_id is the id of the last tweet fetched before the rate limit was hit. The only issue I've encountered is with StopIteration being raised really early (before I get the full 100,000 Tweets) but that I think is a different SO question.

How do I join results of looping script into a single variable?

I have looping script returning different filtered results, I can make this data return as an array for each of the different filter classes. However I am unsure of the best method to join all of these arrays together.
import mechanize
import urllib
import json
import re
import random
import datetime
from sched import scheduler
from time import time, sleep
from sets import Set
##### Code to loop the script and set up scheduling time
s = scheduler(time, sleep)
random.seed()
##### Code to stop duplicates part 1
userset = set ()
def run_periodically(start, end, interval, func):
event_time = start
while event_time < end:
s.enterabs(event_time, 0, func, ())
event_time += interval + random.randrange(-5, 10)
s.run()
##### Code to get the data required from the URL desired
def getData():
post_url = "URL OF INTEREST"
browser = mechanize.Browser()
browser.set_handle_robots(False)
browser.addheaders = [('User-agent', 'Firefox')]
##### These are the parameters you've got from checking with the aforementioned tools
parameters = {'page' : '1',
'rp' : '250',
'sortname' : 'race_time',
'sortorder' : 'asc'
}
##### Encode the parameters
data = urllib.urlencode(parameters)
trans_array = browser.open(post_url,data).read().decode('UTF-8')
xmlload1 = json.loads(trans_array)
pattern2 = re.compile('/control/profile/view/(.*)\' title=')
pattern4 = re.compile('title=\'posted: (.*) strikes:')
pattern5 = re.compile('strikes: (.*)\'><img src=')
for row in xmlload1['rows']:
cell = row["cell"]
##### defining the Keys (key is the area from which data is pulled in the XML) for use in the pattern finding/regex
user_delimiter = cell['username']
selection_delimiter = cell['race_horse']
user_numberofselections = float(re.findall(pattern4, user_delimiter)[0])
user_numberofstrikes = float(re.findall(pattern5, user_delimiter)[0])
strikeratecalc1 = user_numberofstrikes/user_numberofselections
strikeratecalc2 = strikeratecalc1*100
userid_delimiter_results = (re.findall(pattern2, user_delimiter)[0])
##### Code to stop duplicates throughout the day part 2 (skips if the id is already in the userset)
if userid_delimiter_results in userset: continue;
userset.add(userid_delimiter_results)
arraym = ""
arrayna = ""
if strikeratecalc2 > 50 and strikeratecalc2 < 100):
arraym0 = "System M"
arraym1 = "user id = ",userid_delimiter_results
arraym2 = "percantage = ",strikeratecalc2,"%"
arraym3 = ""
arraym = [arraym0, arraym1, arraym2, arraym3]
if strikeratecalc2 > 0 and strikeratecalc2 < 50):
arrayna0 = "System NA"
arrayna1 = "user id = ",userid_delimiter_results
arrayna2 = "percantage = ",strikeratecalc2,"%"
arrayna3 = ""
arrayna = [arrayna0, arrayna1, arrayna2, arrayna3]
getData()
run_periodically(time()+5, time()+1000000, 10, getData)
What I want to be able to do, is return both the 'arraym' and the 'arrayna' as one final Array, however due to the looping nature of the script upon each loop of the script the old 'arraym'/'arrayna' are overwritten, currently my attempts to yield one array containing all of the data has resulted in the last userid for 'systemm' and the last userid for 'sustemna'. This is obviously because, upon each run of the loop it overwrites the old 'arraym' and the 'arrayna' however I do not know of a way to get around this, so that all of my data can be accumulated in one array. Please note, I have been coding for cumulatively two weeks now, so there may well be some simple function to overcome this problem.
Kind regards AEA
Without looking at that huge code segment, typically you can do something like:
my_array = [] # Create an empty list
for <some loop>:
my_array.append(some_value)
# At this point, my_array is a list containing some_value for each loop iteration
print(my_array)
Look into python's list.append()
So your code might look something like:
#...
arraym = []
arrayna = []
for row in xmlload1['rows']:
#...
if strikeratecalc2 > 50 and strikeratecalc2 < 100):
arraym.append("System M")
arraym.append("user id = %s" % userid_delimiter_results)
arraym.append("percantage = %s%%" % strikeratecalc2)
arraym.append("")
if strikeratecalc2 > 0 and strikeratecalc2 < 50):
arrayna.append("System NA")
arrayna.append("user id = %s" % userid_delimiter_results)
arrayna.append("percantage = %s%%" % strikeratecalc2)
arrayna.append("")
#...

Categories