Nested loop in Python when using multiple parameters in API calls - python

I have a program to call an API, return the JSON data and write it to a CSV.
The program loops through a list of entities as the first parameter in the API call, but also now needs to loop through a second parameter set (start and end times in epoch) because the API has a max of pulling a week of data at a time.
Example:
API call: ex.com/api/entity/timecards?start_time=1531306800&end_time=1531846800&pretty=1
So I need to loop through all of the entities, and then loop through an entire year's worth of data, a week at a time.
code example so far for the API call function:
def callAPI(entities):
for j in range(len(entities)):
locnum = entities[j][:5]
locnumv = entities[j]
startTime =
endTime =
url = "http://ex.com/api/entity/" + entity[j] + "/timecards?start_time=" + startTime + "&end_time=" + endTime
querystring = {"pretty":"1"}
headers = {
'Api-Key': ""
}
r = requests.request("GET", url, headers=headers, params=querystring)
d = r.json()
The program then goes on to write the data to rows in a CSV, which is all successful when tested with looping through the entities with static time parameters.
So I just need to figure out how would I create another nested for loop to loop through the start time/end time + 518400 seconds (6 days instead of 7 to be safe) and factor in a timeout since this is effectively going to be 20,000+ API calls by the time it's all said and done?

First of all, because you are just using j for getting the current entity, you could replace for j in range(len(entities)) by for entity in entities, it reads better. As for the question, you could just use an inner for loop to iterate over each week. The whole code will be:
def callAPI(entities):
for entity in entities:
locnum = entity[:5]
locnumv = entity # This is redundant
START = 1531306800 # starting time, 1 year ago
END = START + 31536000 # final time, e.g. the current time
TIME_STEP = 518400 # 1 day, 1 week, 1 month
for start_time in range(START, END, TIME_STEP):
end_time = start_time + TIME_STEP - 1 # substract 1 for preventing overlap of times
url = "http://ex.com/api/entity/%s/timecards?start_time=%d&end_time=%d" % (entity, start_time, end_time)
querystring = {"pretty":"1"}
headers = {'Api-Key': ""}
try:
r = requests.request("GET", url, headers=headers, params=querystring)
except:
break
d = r.json()
# Do something with the data
I hope this can help you!!

First off, you can just do:
for entity in entities:
instead of:
for j in range(len(entities)):
and then use entity instead of entities[j]
When it comes to looping through your epoch times. You will have to set your start time and then set your end time to start_time + 540000 inside of another loop:
start_time = 1531306800
i = 0
while True:
if i != 0:
start_time = end_time
end_time = start_time + 540000
url = "http://ex.com/api/entity/" + entity + "/timecards?start_time=" + start_time + "&end_time=" + end_time
querystring = {"pretty":"1"}
headers = {'Api-Key': ""}
try:
r = requests.request("GET", url, headers=headers, params=querystring)
except:
break
d = r.json()
Basically, you are going to loop through all of the epoch times until the request fails. Once it does, you will exit the loop and go to your next entity. The new entity's url will start at the same epoch time as the entity before it and so on.
I hope that helps!

Related

How to check if 'next' field in loop is the same?

I am paginating through an API, which uses two different methods - next token & next timestamp.
For unknown to me reasons, sometimes at the end of a call the next token will be the same, which leaves me stuck in an endless loop. The same happens if I use the next timestamp.
However, I have noticed that this could be avoided if I use a combination of the two.
This is the loop I am currently running:
while int(JSONContent['next'][0:10])>unixtime_yesterday:
try:
url='www.website.com?next'+JSONContent['next'][0:10]+'api_key'
JSONContent = requests.request("GET", url, headers=headers).json()
temp_df=json_normalize(JSONContent['data'])
df=df.append(temp_df,ignore_index=True,sort=False)
except ValueError:
print('There was a JSONDecodeError')
This is a normal result of the JSONContent['next'] field. The first 10 characters are the timestamp and the last 10 are the other token:
'1650377727-3feWs8592va'
How can I check if the next timestamp is the same as the current one, so that I can then use the token instead of the timestamp?
In layman terms I want to do the following:
if JSONContent['next'][0:10][current_cycle]=JSONContent['next'][0:10][next_cycle]:
token=JSONContent['next'][11:22][next_cycle]
else:
token=JSONContent['next'][0:10][next_cycle]
If you want just the "next next" before you're passing on to the next iteration, send another request and check for equality between the next and the next next.
import time
time_stamp= time.time()
using_token = False
while using_token or int(time_stamp) > unixtime_yesterday:
try:
url = 'www.website.com?next'+JSONContent['next'][0:10]+'api_key'
JSONContent = requests.request("GET", url, headers=headers).json()
next_url = 'www.website.com?next'+JSONContent['next'][0:10]+'api_key'
next_JSONContent = requests.request("GET", url, headers=headers).json()
if JSONContent['next'][0:10] == next_JSONContent['next'][0:10]:
using_token = True
else:
time_stamp = JSONContent['next'][0:10]
using_token = False
temp_df = json_normalize(JSONContent['data'])
df = df.append(temp_df, ignore_index=True, sort=False)
except ValueError:
print('There was a JSONDecodeError')
You can also initialize the using_token to true, but it's break clean code rule of naming.

How to iterate over dataframe rows for individual API calls

I'm trying to set up a loop to pull in weather data for about 500 weather stations for an entire year which I have in my dataframe. The base URL stays the same, and the only part that changes is the weather station ID.
I'd like to create a dataframe with the results. I believe i'd use requests.get to pull in data for all the weather stations in my list, which the IDs to use in the URL are in a column called "API ID" in my dataframe. I am a python beginner - so any help would be appreciated! My code is below but doesn't work and returns an error:
"InvalidSchema: No connection adapters were found for '0 " http://www.ncei.noaa.gov/access/services/data/...\nName: API ID, Length: 497, dtype: object'
.
def callAPI(API_id):
for IDs in range(len(API_id)):
url = ('http://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP,SNOW,TMAX,TMIN&stations=' + distances['API ID'] + '&startDate=2020-01-01&endDate=2020-12-31&includeAttributes=0&includeStationName=true&units=standard&format=json')
r = requests.request('GET', url)
d = r.json()
ll = []
for index1,rows1 in distances.iterrows():
station = rows1['Closest Station']
API_id = rows1['API ID']
data = callAPI(API_id)
ll.append([(data)])
I am not sure about your whole code base, but this is the function that will return the data from the API, If you have multiple station id on a single df column then you can use a for loop otherwise no need to do that.
Also, you are not returning the result from the function. Check the return keyword at the end of the function.
Working code:
import requests
def callAPI(API_id):
url = ('http://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP,SNOW,TMAX,TMIN&stations=' + API_id + '&startDate=2020-01-01&endDate=2020-12-31&includeAttributes=0&includeStationName=true&units=standard&format=json')
r = requests.request('GET', url)
d = r.json()
return d
print(callAPI('USC00457180'))
So your full code will be something like this,
def callAPI(API_id):
url = ('http://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP,SNOW,TMAX,TMIN&stations=' + API_id + '&startDate=2020-01-01&endDate=2020-12-31&includeAttributes=0&includeStationName=true&units=standard&format=json')
r = requests.request('GET', url)
d = r.json()
return d
ll = []
for index1,rows1 in distances.iterrows():
station = rows1['Closest Station']
API_id = rows1['API ID']
data = callAPI(API_id)
ll.append([(data)])
Note: Even better use asynchronous calls to the API to make the process faster. Something like this: https://stackoverflow.com/a/56926297/1138192

Python: Splitting a "for obj in reponse.json() loop" to request from the xth object forth

Here is a bit of context, I have a program to gets data from an API. It does this in two requests one for the total amount of points and the second a request for each point in the data. These get appended into an array.
def fetch_details(url: str):
response = requests.get(url)
# Makes request call to get the data of detail
# save_file(folder_path,GipodId,text2)
# any other processe
return response.json()
def fetch_data_points(url: str):
limit_request = 1000
# Placeholder for limit: please do not remove = 1000000000 -JJ
folder_path_reset("api_request_jsons","csv","Geographic_information")
total_start_time = start_time_measure()
start_time = start_time_measure(
'Starting Phase 1: First request from API: Data Points')
response = requests.get(url,params={"limit": limit_request})
end_time = end_time_measure(start_time, "Request completed: ")
print(len(response.json()))
time_estimated = end_time/len(response.json())
print(time_estimated)
end_time_measure(total_start_time, "End of Phase 1, completed in: ")
return response.json()
def fetch_details_of_data_points(url: str):
input_json = fetch_data_points(url)
fetch_points_save(input_json)
all_location_points_details = []
amount_of_objects = len(input_json)
total_start_time = start_time_measure()
start_time = start_time_measure(f'Starting Phase 2: Second request from API: {str(amount_of_objects)} requested')
#for i in tqdm(range(amount_of_objects),miniters=0.000000001):
# for obj in input_json:
# all_location_points_details.append(fetch_details(obj.get("detail")))
with tqdm(total=amount_of_objects) as pbar:
for obj in input_json:
all_location_points_details.append(fetch_details(obj.get("detail")))
pbar.update(1)
However I have noticed a certain flaw in my program I may have a solution for but I do not know how to implement. You see when the amount of data requested is massive (over more than 10.000 points) there can always happen a disconnect causing my program to fail. So as a solution I have would like this loop:
with tqdm(total=amount_of_objects) as pbar:
for obj in input_json:
all_location_points_details.append(fetch_details(obj.get("detail")))
pbar.update(1)
To be split a factor of a value i (or x) that is calculated by the following:
value y = 1000
value x = round(Amount of objects/y) --> Round because this needs to be rounded up no matter.
So lets say I have 145862 objects to request details from by my formula that is suppose to be 14.5 rounded up 15 sessions.
So 1 session request the first 1000 obj, starting from obj 1 and ending at 1000. The next session starts requesting from obj 2001. Next sessions starts from obj
So this is technically this:
i = 0
while i < x
for obj (starting from i + 1 object ending at 1*y ) in input_json:
all_location_points_details.append(fetch_details(obj.get("detail")))
i += 1
Thing is the part of I do not know how to program this. Can anyone help me with this?

Python - Using a While Loop, how can I update a variable string to be used in a function

Background: I'm attempting to create a dataframe using data called from Twitch's API. They only allow 100 records per call so with each pull a new Pagination Cursor is offered in order to move on to the next page. I'm using the following code to try and efficiently pull this data rather than manually adjusting the after=(pagination value) in the get response. Right now the variable I'm trying to make dynamic is the 'Pagination' variable but it only gets updated once the loop finishes - not helpful! Take a look below and see if you notice anything I can change to achieve this goal. Any help is appreciated!
TwitchTopGamesDataFrame = [] #This is our Data List
BaseURL = 'https://api.twitch.tv/helix/games/top?first=100'
Headers = {'client-id':'lqctse0orgdbs5gdf5faz665api03r','Authorization': 'Bearer a1yl09mwmnwetp6ovocilheias8pzt'}
Indent = 2
Pagination = ''
FullURL = BaseURL + Pagination
Response = requests.get(FullURL,headers=Headers)
iterations = 1 # Data records returned are equivalent to iterations x100
#Loop: Response, Convert JSON data, Append to Data List, Get Pagination & Replace String in Variable - Iterate until 300 records
while count <= 3:
#Grab JSON Data, Convert, & Append
ResponseJSONData = Response.json()
#print(pgn) - Debug
pd.set_option('display.max_rows', None)
TopGamesDF = pd.DataFrame(ResponseJSONData['data'])
TopGamesDF = TopGamesDF[['id','name']]
TopGamesDF = TopGamesDF.rename(columns={'id':'GameID','name':'GameName'})
TopGamesDF['Rank'] = TopGamesDF.index + 1
TwitchTopGamesDataFrame.append(TopGamesDF)
#print(FullURL) - Debug
#Grab & Replace Pagination Value
ResponseJSONData['pagination']
RPagination = pd.DataFrame(ResponseJSONData['pagination'],index=[0])
pgn = str('&after='+RPagination.to_string(index=False,header=False).strip())
Pagination = pgn
#print(FullURL) - Debug
iterations += 1
TwitchTopGamesDataFrame```
Figured it out:
def top_games(page_count):
from time import gmtime, strftime
strftime("%Y-%m-%d %H:%M:%S", gmtime())
print("Time of Execution:", strftime("%Y-%m-%d %H:%M:%S", gmtime()))
#In order to condense the code above and be more efficient, a while/for loop would work great.
#Goal: Run a While Loop to create a larger DataFrame through Pagination as the Twitch API only allows for 100 records per call.
baseURL = 'https://api.twitch.tv/helix/games/top?first=100' #Base URL
Headers = {'client-id':'lqctse0orgdbs5gdf5faz665api03r','Authorization': 'Bearer a1yl09mwmnwetp6ovocilheias8pzt'}
Indent = 2
Pagination = ''
FullURL = BaseURL + Pagination
Response = requests.get(FullURL,headers=Headers)
start_count = 0
count = 0 # Data records returned are equivalent to iterations x100
max_count = page_count
#Loop: Response, Convert JSON data, Append to Data List, Get Pagination & Replace String in Variable - Iterate until 300 records
while count <= max_count:
#Grab JSON Data, Extend List
Pagination
FullURL = baseURL + Pagination
Response = requests.get(FullURL,headers=Headers)
ResponseJSONData = Response.json()
pd.set_option('display.max_rows', None)
if count == start_count:
TopGamesDFL = ResponseJSONData['data']
if count > start_count:
i = ResponseJSONData['data']
TopGamesDFL.extend(i)
#Grab & Replace Pagination Value
ResponseJSONData['pagination']
RPagination = pd.DataFrame(ResponseJSONData['pagination'],index=[0])
pgn = str('&after='+RPagination.to_string(index=False,header=False).strip())
Pagination = pgn
count += 1
if count == max_count:
FinalDataFrame = pd.DataFrame(TopGamesDFL)
FinalDataFrame = FinalDataFrame[['id','name']]
FinalDataFrame = FinalDataFrame.rename(columns={'id':'GameID','name':'GameName'})
FinalDataFrame['Rank'] = FinalDataFrame.index + 1
return FinalDataFrame

Determine the rate limit for requests

I have a question about rate limits.
I take a data from the CSV and enter it into the query and the output is stored in a list.
I get an error because I make too many requests at once.
(I can only make 20 requests per second). How can I determine the rate limit?
import requests
import pandas as pd
df = pd.read_csv("Data_1000.csv")
list = []
def requestSummonerData(summonerName, APIKey):
URL = "https://euw1.api.riotgames.com/lol/summoner/v3/summoners/by-name/" + summonerName + "?api_key=" + APIKey
response = requests.get(URL)
return response.json()
def main():
APIKey = (str)(input('Copy and paste your API Key here: '))
for index, row in df.iterrows():
summonerName = row['Player_Name']
responseJSON = requestSummonerData(summonerName, APIKey)
ID = responseJSON ['accountId']
ID = int(ID)
list.insert(index,ID)
df["accountId"]= list
If you already know you can only make 20 requests per second, you just need to work out how long to wait between each request:
Divide 1 second by 20, which should give you 0.05. So you just need to sleep for 0.05 of a second between each request and you shouldn't hit the limit (maybe increase it a bit if you want to be safe).
import time at the top of your file and then time.sleep(0.05) inside of your for loop (you could also just do time.sleep(1/20))

Categories