Nutritionix API only returns first 10 results? - python

I'm trying to extract nutritional information using the Nutritionix API/database in Python. I was able to get a successful query and placed it into a pandas dataframe. However, I'm a bit confused though because the resulting json claims that there are several thousand 'hits' for my query but at most 10 are ever returned. For instance, when I query for Garbanzo, the json file says that there are 513 total_hits, but only 10 are actually returned. Does anyone know what is causing this? The code I'm using is below.
import requests
import json
import pandas as pd
from nutritionix import Nutritionix
nix_apikey = ''
nix_appid = ''
nix = Nutritionix(app_id = nix_appid, api_key = nix_apikey)
results = nix.search('Garbanzo').json()
df = pd.json_normalize(results, record_path = ['hits'])
I'm not including the my api_key or app_id for obvious reasons. Here's a link to the Nutritionix API: https://github.com/leetrout/python-nutritionix
Thanks for any suggestions!

Related

Requests Library in Python - Params Structure to Merge two json bodies

I am building a project to import crypto prices in json format that I can later parse out.
Now I have been researching and I haven't found a sure way to combine these two statements together so I can do one call instead of two. I want to do BTC and ETH like you see in one "req_params" I just dont know what the right syntax is a lot of the sites have confusing material.
import requests
import json
from datetime import datetime,timedelta
base_url = 'https://api.kucoin.com/api/v1/market/candles'
start_timestmp = datetime.strptime("2021-11-11T11:11:11","%Y-%m-%dT%H:%M:%S")
end_timestmp = start_timestmp + timedelta(minutes=1)
# How do I combine these two stamens into one call ?
req_params = {'symbol':'BTC-USDT','type':'1min',
'startAt':int(datetime.timestamp(start_timestmp)),
'endAt': int(datetime.timestamp(end_timestmp))
}
req_params2 = {'symbol':'ETH-USDT','type':'1min',
'startAt':int(datetime.timestamp(start_timestmp)),
'endAt': int(datetime.timestamp(end_timestmp))
}
url_response = requests.get(url=base_url,params=req_params)
url_response2 = requests.get(url=base_url,params=req_params2)
response_json = url_response.json()
response_json2 = url_response2.json()
# put the response in json format
closing_price = float(response_json['data'][0][2])
print(response_json)
print(response_json2)
print(closing_price)

What is the most effective method for adding nested data to a pandas dataframe?

I have been trying to add the JSON data from this API to a pandas data frame. Here is the code I have tried:
url = 'https://api.covid19api.com/summary'
df = pd.read_json(url)
print(df.head())
When running this code, I receive the following error:
ValueError: Mixing dicts with non-Series may lead to ambiguous
ordering.
Any advice on this would be helpful. Thanks in advance.
Hi Matt and welcome on SO. Whenever you work with json it's better to first get the data and have a look at it. In your particular case the key Global is different from the ones in Countries that's why you get that error
import urllib.request
import json
import pandas as pd
url = 'https://api.covid19api.com/summary'
response = urllib.request.urlopen(url)
# the following is the data you should explore
data = json.loads(response.read())
df = pd.DataFrame(data["Countries"])
The JSON has a couple of elements ('Global', 'Countries' and 'Date'), so it would make sense to split it up into separate dataframes, which is not easy to do using pandas.read_json().
import requests
url = 'https://api.covid19api.com/summary'
r = requests.get(url)
data = r.json()
global_data = pd.DataFrame(data['Global'])
countries = pd.DataFrame(data['Countries'])

Json problems to csv

I'm trying to get some stats from the NBA stats page. I'm following this tutorial-idea
https://towardsdatascience.com/using-python-pandas-and-plotly-to-generate-nba-shot-charts-e28f873a99cb
The basic idea is put the data into a csv file.
So I try this code, to get the data from the nba web, trying to get the json file and the convert it to a csv:
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
shot_data_url_start="https://stats.nba.com/events/?flag=3&CFID=33&CFPARAMS=2017-18&PlayerID="
player_id="202695"
shot_data_url_end="&ContextMeasure=FGA&Season=2017-18&section=player&sct=plot"
def shoy_chart(player_id):
full_url = shot_data_url_start + str(player_id) + shot_data_url_end
json = requests.get(full_url, headers=headers).json()
return(json)
data = json['resultSets'][0]['rowSets']
columns = json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
And this is the error that notebook shows to me:
TypeError Traceback (most recent call last)
<ipython-input-42-a3452c3a4fc8> in <module>
18
19
---> 20 data = json['resultSets'][0]['rowSets']
21 columns = json['resultSets'][0]['headers']
22
TypeError: 'module' object is not subscriptable
Anyone can help me, or know another way to get the data into a .csv or excel file?
When imported with import json, the name json is referring to the JSON module of the Python standard library. You cannot use it as a regular variable name. If you rename your variable to something else such as response_json, this part of your code will work.
Regarding the rest of the code, the page https://stats.nba.com/events/ doesn't return any JSON text, it is a regular web page with images, menus, a video player, etc... If you want to access the API that returns the shots in JSON format, you will have to use the https://stats.nba.com/stats/shotchartdetail (with the right query string). This API endpoint is mentioned in the tutorial, in the "Chrome XHR tab and resulting json linked by url" image.
Ok I've changed the code like this:
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
def shot_chart(player_id):
full_url = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID=202695&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
response_json = requests.get(full_url, headers=headers)
return(response_json)
data = response_json['resultSets'][0]['rowSets']
columns = response_json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
shot_data_url_start="https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2019-20&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id="202330"
shot_data_url_end="&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
def shot_chart(player_id):
full_url = shot_data_url_start + str(player_id) + shot_data_url_end
response_json = requests.get(full_url).json()
return(response_json)
data = response_json['resultSets'][0]['rowSets']
columns = response_json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
shot_chart("202330")
What is going on now? the notebook is tucked right know
Try this out
import pandas as pd
from pandas import DataFrame as df
shot_data_url_start = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id = "204001"
shot_data_url_end = "&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
def get_shot_data(player_id):
full_url = shot_data_url_start + player_id + shot_data_url_end
data = requests.get(
full_url,
headers = {
"User-Agent": "PostmanRuntime/7.4.0"
}
)
return data.json()
shot_results = get_shot_data(player_id)
result_sets = shot_results['resultSets']
first_result_set = result_sets[0]
row_set = first_result_set['rowSet']
set_headers = first_result_set['headers']
df = pd.DataFrame.from_records(row_set, columns=set_headers)
I see how you got confused with that medium post. You were missing the headers and the url for the NBA api wasn't right. That's what #pierre was trying to say in his response. The url you're using isn't right. If you reread that post you were following, you'll see that the author said he had to dig in to dev tools in order to find that actual url to use in order to grab the JSON.
Edit: Forgot to mention that when I didn't pass a User-Agent in the headers, the request would timeout. If you don't pass that in, you won't get a successful response.

Search through JSON query from Valve API in Python

I am looking to find various statistics about players in games such as CS:GO from the Steam Web API, but cannot work out how to search through the JSON returned from the query (e.g. here) in Python.
I just need to be able to get a specific part of the list that is provided, e.g. finding total_kills from the link above. If I had a way that could sort through all of the information provided and filters it down to just that specific thing (in this case total_kills) then that would help a load!
The code I have at the moment to turn it into something Python can read is:
url = "http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?key=FE3C600EB76959F47F80C707467108F2&steamid=76561198185148697&include_appinfo=1"
data = requests.get(url).text
data = json.loads(data)
If you are looking for a way to search through the stats list then try this:
import requests
import json
def findstat(data, stat_name):
for stat in data['playerstats']['stats']:
if stat['name'] == stat_name:
return stat['value']
url = "http://api.steampowered.com/ISteamUserStats/GetUserStatsForGame/v0002/?appid=730&key=FE3C600EB76959F47F80C707467108F2&steamid=76561198185148697"
data = requests.get(url).text
data = json.loads(data)
total_kills = findstat(data, 'total_kills') # change 'total_kills' to your desired stat name
print(total_kills)

Parse/split URLs in a pandas dataframe using urllib

I'm trying to split URLs and put the fragments in a dataframe. I found this thread pythonic way to parse/split URLs in a pandas dataframe and try to apply it, but for some reason it gives me an error.
I am under Python 3.x so I used the following:
import pandas
import urllib
urls = ['https://www.google.com/something','https://mail.google.com/anohtersomething', 'https://www.amazon.com/yetanotherthing']
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*df['url'].map(urllib.parse.urlsplit))
I get an error saying KeyError: 'urls', not sure what it means.
If someone could help would be great. Thanks.
The example you used assumes that the links are in a dataframe. Here's the correct solution:
import urllib
import pandas as pd
df = pd.DataFrame()
urls = ['https://www.google.com/something','https://mail.google.com/anohtersomething', 'https://www.amazon.com/yetanotherthing']
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*[urllib.parse.urlsplit(x) for x in urls])
Result
protocol domain path query fragment
0 https www.google.com /something
1 https mail.google.com /anohtersomething
2 https www.amazon.com /yetanotherthing

Categories