I am trying to make a call to an API and then grab event_ids from the data. I then want to use those event ids as variables in another request, then parse that data. Then loop back and make another request using the next event id in the event_id variable for all the IDs.
so far i have the following
def nba_odds():
url = "https://xxxxx.com.au/sports/summary/basketball?api_key=xxxxx"
response = requests.get(url)
data = response.json()
event_ids = []
for event in data['Events']:
if event['Country'] == 'USA' and event['League'] == 'NBA':
event_ids.append(event['EventID'])
# print(event_ids)
game_url = f'https://xxxxx.com.au/sports/detail/{event_ids}?api_key=xxxxx'
game_response = requests.get(game_url)
game_data = game_response.json()
print(game_url)
that gives me the result below in the terminal.
https://xxxxx.com.au/sports/detail/['dbx-1425135', 'dbx-1425133', 'dbx-1425134', 'dbx-1425136', 'dbx-1425137', 'dbx-1425138', 'dbx-1425139', 'dbx-1425140', 'anyvsany-nba01-1670043600000000000', 'dbx-1425141', 'dbx-1425142', 'dbx-1425143', 'dbx-1425144', 'dbx-1425145', 'dbx-1425148', 'dbx-1425149', 'dbx-1425147', 'dbx-1425146', 'dbx-1425150', 'e95270f6-661b-46dc-80b9-cd1af75d38fb', '0c989be7-0802-4683-8bb2-d26569e6dcf9']?api_key=779ac51a-2fff-4ad6-8a3e-6a245a0a4cbb
the URL above format should look like
https://xxxx.com.au/sports/detail/dbx-1425135
If anyone can point me in the right direction it would be appreciated.
thanks.
you need to loop over the event ID's again to call the API with one event_id if it is not supporting multiple event_ids like:
all_events_response = []
for event_id in event_ids
game_url = f'https://xxxxx.com.au/sports/detail/{event_id}?api_key=xxxxx'
game_response = requests.get(game_url)
game_data = game_response.json()
all_events_response.append(game_data)
print(game_url)
You can find list of json responses under all_events_response
event_ids is an entire list of event ids. You make a single URL with the full list converted to its string view (['dbx-1425135', 'dbx-1425133', ...]). But it looks like you want to get information on each event in turn. To do that, put the second request in the loop so that it runs for every event you find interesting.
def nba_odds():
url = "https://xxxxx.com.au/sports/summary/basketball?api_key=xxxxx"
response = requests.get(url)
data = response.json()
event_ids = []
for event in data['Events']:
if event['Country'] == 'USA' and event['League'] == 'NBA':
event_id = event['EventID']
# print(event_id)
game_url = f'https://xxxxx.com.au/sports/detail/{event_id}?api_key=xxxxx'
game_response = requests.get(game_url)
game_data = game_response.json()
# do something with game_data - it will be overwritten
# on next round in the loop
print(game_url)
Related
I am paginating through an API, which uses two different methods - next token & next timestamp.
For unknown to me reasons, sometimes at the end of a call the next token will be the same, which leaves me stuck in an endless loop. The same happens if I use the next timestamp.
However, I have noticed that this could be avoided if I use a combination of the two.
This is the loop I am currently running:
while int(JSONContent['next'][0:10])>unixtime_yesterday:
try:
url='www.website.com?next'+JSONContent['next'][0:10]+'api_key'
JSONContent = requests.request("GET", url, headers=headers).json()
temp_df=json_normalize(JSONContent['data'])
df=df.append(temp_df,ignore_index=True,sort=False)
except ValueError:
print('There was a JSONDecodeError')
This is a normal result of the JSONContent['next'] field. The first 10 characters are the timestamp and the last 10 are the other token:
'1650377727-3feWs8592va'
How can I check if the next timestamp is the same as the current one, so that I can then use the token instead of the timestamp?
In layman terms I want to do the following:
if JSONContent['next'][0:10][current_cycle]=JSONContent['next'][0:10][next_cycle]:
token=JSONContent['next'][11:22][next_cycle]
else:
token=JSONContent['next'][0:10][next_cycle]
If you want just the "next next" before you're passing on to the next iteration, send another request and check for equality between the next and the next next.
import time
time_stamp= time.time()
using_token = False
while using_token or int(time_stamp) > unixtime_yesterday:
try:
url = 'www.website.com?next'+JSONContent['next'][0:10]+'api_key'
JSONContent = requests.request("GET", url, headers=headers).json()
next_url = 'www.website.com?next'+JSONContent['next'][0:10]+'api_key'
next_JSONContent = requests.request("GET", url, headers=headers).json()
if JSONContent['next'][0:10] == next_JSONContent['next'][0:10]:
using_token = True
else:
time_stamp = JSONContent['next'][0:10]
using_token = False
temp_df = json_normalize(JSONContent['data'])
df = df.append(temp_df, ignore_index=True, sort=False)
except ValueError:
print('There was a JSONDecodeError')
You can also initialize the using_token to true, but it's break clean code rule of naming.
Hello Community Members,
I am very new to python language and programming, currently I am working on a news API that shows the news from that API. I want this program to check and update whenever there is any update to the API. Please help what can I do to complete this.
CODE:
url = 'https://cryptopanic.com/api/v1/posts/?auth_token=<my token>&filter=hot'
html_link = requests.get(url)
datatype = html_link.json()
news_info = datatype['results']
latest_news = news_info[0]['title']
source = news_info[0]['source']['title']
print(latest_news)
I want this latest_news variable which stores the news to print whenever there is new news in the list, I have tried comparison method but still didn't find anything so far.
Does this fill your criteria? You have to run it every 5 minutes, or any time you want and you will get the latest titles.
import requests, json
old_news_info = {"news": []}
try:
old_news_info = json.load(open("old_news_info.json", "r"))
except:
pass
url = 'https://cryptopanic.com/api/v1/posts/?auth_token=<token>&filter=hot'
print("waiting for response")
html_link = requests.get(url)
datatype = html_link.json()
if datatype != {'status': 'Incomplete', 'info': 'Token not found'}:
news_info = datatype['results']
if not news_info[0] in old_news_info["news"]:
for news in news_info:
if news in old_news_info["news"]:
break
else:
old_news_info["news"].append(news)
print(news["source"]['title'])
json.dump(old_news_info, open("old_news_info.json", "w"), indent = 4)
else:
print("Token not found")
I'm trying to set up a loop to pull in weather data for about 500 weather stations for an entire year which I have in my dataframe. The base URL stays the same, and the only part that changes is the weather station ID.
I'd like to create a dataframe with the results. I believe i'd use requests.get to pull in data for all the weather stations in my list, which the IDs to use in the URL are in a column called "API ID" in my dataframe. I am a python beginner - so any help would be appreciated! My code is below but doesn't work and returns an error:
"InvalidSchema: No connection adapters were found for '0 " http://www.ncei.noaa.gov/access/services/data/...\nName: API ID, Length: 497, dtype: object'
.
def callAPI(API_id):
for IDs in range(len(API_id)):
url = ('http://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP,SNOW,TMAX,TMIN&stations=' + distances['API ID'] + '&startDate=2020-01-01&endDate=2020-12-31&includeAttributes=0&includeStationName=true&units=standard&format=json')
r = requests.request('GET', url)
d = r.json()
ll = []
for index1,rows1 in distances.iterrows():
station = rows1['Closest Station']
API_id = rows1['API ID']
data = callAPI(API_id)
ll.append([(data)])
I am not sure about your whole code base, but this is the function that will return the data from the API, If you have multiple station id on a single df column then you can use a for loop otherwise no need to do that.
Also, you are not returning the result from the function. Check the return keyword at the end of the function.
Working code:
import requests
def callAPI(API_id):
url = ('http://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP,SNOW,TMAX,TMIN&stations=' + API_id + '&startDate=2020-01-01&endDate=2020-12-31&includeAttributes=0&includeStationName=true&units=standard&format=json')
r = requests.request('GET', url)
d = r.json()
return d
print(callAPI('USC00457180'))
So your full code will be something like this,
def callAPI(API_id):
url = ('http://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP,SNOW,TMAX,TMIN&stations=' + API_id + '&startDate=2020-01-01&endDate=2020-12-31&includeAttributes=0&includeStationName=true&units=standard&format=json')
r = requests.request('GET', url)
d = r.json()
return d
ll = []
for index1,rows1 in distances.iterrows():
station = rows1['Closest Station']
API_id = rows1['API ID']
data = callAPI(API_id)
ll.append([(data)])
Note: Even better use asynchronous calls to the API to make the process faster. Something like this: https://stackoverflow.com/a/56926297/1138192
I have the following program to scrap data from a website. I want to improve the below code by using a generator with a yield instead of calling generate_url and call_me multiple times sequentially. The purpose of this exersise is to properly understand yield and the context in which it can be used.
import requests
import shutil
start_date='03-03-1997'
end_date='10-04-2015'
yf_base_url ='http://real-chart.finance.yahoo.com/table.csv?s=%5E'
index_list = ['BSESN','NSEI']
def generate_url(index, start_date, end_date):
s_day = start_date.split('-')[0]
s_month = start_date.split('-')[1]
s_year = start_date.split('-')[2]
e_day = end_date.split('-')[0]
e_month = end_date.split('-')[1]
e_year = end_date.split('-')[2]
if (index == 'BSESN') or (index == 'NSEI'):
url = yf_base_url + index + '&a={}&b={}&c={}&d={}&e={}&f={}'.format(s_day,s_month,s_year,e_day,e_month,e_year)
return url
def callme(url,index):
print('URL {}'.format(url))
r = requests.get(url, verify=False,stream=True)
if r.status_code!=200:
print "Failure!!"
exit()
else:
r.raw.decode_content = True
with open(index + "file.csv", 'wb') as f:
shutil.copyfileobj(r.raw, f)
print "Success"
if __name__ == '__main__':
url = generate_url(index_list[0],start_date,end_date)
callme(url,index_list[0])
url = generate_url(index_list[1],start_date,end_date)
callme(url,index_list[1])
There are multiple options. You could use yield to iterate over URL's. Or over request objects.
If your index_list were long, I would suggest yielding URLs.
Because then you could use multiprocessing.Pool to map a function that does a request and saves the output over these URLs. That would execute them in parallel, potentially making it a lot faster (assuming that you have enough network bandwidth, and that yahoo finance doesn't throttle connections).
yf ='http://real-chart.finance.yahoo.com/table.csv?s=%5E'
'{}&a={}&b={}&c={}&d={}&e={}&f={}'
index_list = ['BSESN','NSEI']
def genurl(symbols, start_date, end_date):
# assemble the URLs
s_day, s_month, s_year = start_date.split('-')
e_day, e_month, e_year = end_date.split('-')
for s in symbols:
url = yf.format(s, s_day,s_month,s_year,e_day,e_month,e_year)
yield url
def download(url):
# Do the request, save the file
p = multiprocessing.Pool()
rv = p.map(download, genurl(index_list, '03-03-1997', '10-04-2015'))
If I understand you correctly, what you want to know is how to change the code so that you can replace the last part by
if __name__ == '__main__':
for url in generate_url(index_list,start_date,end_date):
callme(url,index)
If this is correct, you need to change generate_url, but not callme. Changing generate_url is rather mechanical. Make the first parameter index_list instead of index, wrap the function body in a for index in index_list loop, and change return url to yield url.
You don't need to change callme because you never want to say something like for call in callme(...). You won't do anything with it but a normal function call.
I've got a script that grabs every event off of a Google Claendar, and the searches through those events, and prints the ones that contain a search term to a file.
The problem I'm having is that I need them to be put in order by date, and this doesn't seem to do that.
while True:
events = calendar_acc.events().list(calendarId='myCal',pageToken=page_token).execute()
for event in events['items']:
if 'Search Term' in event['summary']:
#assign names to hit, and date
find = event['summary']
date = event['start'][u'dateTime']
#only print the first ten digits of the date.
month = date[5:7]
day = date[8:10]
year = date[0:4]
formatted_date = month+"/"+day+"/"+year
#write a line
messy.write(formatted_date+" "+event['summary']+"\n\n")
I think there is a way to do this with the time module maybe, but I'm not sure. Any help is appreciated.
Just in case anyone else needs to do this. With the help of jedwards.
I ended up creating an empty list: hits
And then appending the ['start']['dateTime'] as an datetime.datetime object,
and ['summary'] to the list for each event that contained my "Search Term". Like so:
hits = []
while True:
events = calendar_acc.events().list(calendarId='My_Calendar_ID', pageToken=page_token).execute()
for event in events['items']:
if "Search Term" in event['summary']:
hits.append((dateutil.parser.parse(event['start']['dateTime']), event['summary']))
page_token = events.get('nextPageToken')
if not page_token:
break
The you just sort the list, and in my case, I cut the datetime object down to just be the date. And then wrote the whole line to a file. But this code just prints it to the console.
hits.sort()
for x in hits:
d = x[0]
date = "%d/%d/%d"%(getattr(d,'month'),getattr(d,'day'), getattr(d,'year'))
final = str(date)+"\t\t"+str(x[1])
print final
Thanks again to jedwards in the comments!
You can return a sorted list (by date ascending) from the API by using the "orderBy" parameter and setting it to "updated".
page_token = None
while True:
events = service.events().list(calendarId=myID, orderBy='updated', pageToken=page_token).execute()
for event in events['items']:
print(event)
page_token = events.get('nextPageToken')
if not page_token:
break
For more information see: https://developers.google.com/calendar/v3/reference/events/list
Hope this helps.