I'm trying to get some stats from the NBA stats page. I'm following this tutorial-idea
https://towardsdatascience.com/using-python-pandas-and-plotly-to-generate-nba-shot-charts-e28f873a99cb
The basic idea is put the data into a csv file.
So I try this code, to get the data from the nba web, trying to get the json file and the convert it to a csv:
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
shot_data_url_start="https://stats.nba.com/events/?flag=3&CFID=33&CFPARAMS=2017-18&PlayerID="
player_id="202695"
shot_data_url_end="&ContextMeasure=FGA&Season=2017-18§ion=player&sct=plot"
def shoy_chart(player_id):
full_url = shot_data_url_start + str(player_id) + shot_data_url_end
json = requests.get(full_url, headers=headers).json()
return(json)
data = json['resultSets'][0]['rowSets']
columns = json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
And this is the error that notebook shows to me:
TypeError Traceback (most recent call last)
<ipython-input-42-a3452c3a4fc8> in <module>
18
19
---> 20 data = json['resultSets'][0]['rowSets']
21 columns = json['resultSets'][0]['headers']
22
TypeError: 'module' object is not subscriptable
Anyone can help me, or know another way to get the data into a .csv or excel file?
When imported with import json, the name json is referring to the JSON module of the Python standard library. You cannot use it as a regular variable name. If you rename your variable to something else such as response_json, this part of your code will work.
Regarding the rest of the code, the page https://stats.nba.com/events/ doesn't return any JSON text, it is a regular web page with images, menus, a video player, etc... If you want to access the API that returns the shots in JSON format, you will have to use the https://stats.nba.com/stats/shotchartdetail (with the right query string). This API endpoint is mentioned in the tutorial, in the "Chrome XHR tab and resulting json linked by url" image.
Ok I've changed the code like this:
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
def shot_chart(player_id):
full_url = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID=202695&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
response_json = requests.get(full_url, headers=headers)
return(response_json)
data = response_json['resultSets'][0]['rowSets']
columns = response_json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
shot_data_url_start="https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2019-20&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id="202330"
shot_data_url_end="&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
def shot_chart(player_id):
full_url = shot_data_url_start + str(player_id) + shot_data_url_end
response_json = requests.get(full_url).json()
return(response_json)
data = response_json['resultSets'][0]['rowSets']
columns = response_json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
shot_chart("202330")
What is going on now? the notebook is tucked right know
Try this out
import pandas as pd
from pandas import DataFrame as df
shot_data_url_start = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id = "204001"
shot_data_url_end = "&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
def get_shot_data(player_id):
full_url = shot_data_url_start + player_id + shot_data_url_end
data = requests.get(
full_url,
headers = {
"User-Agent": "PostmanRuntime/7.4.0"
}
)
return data.json()
shot_results = get_shot_data(player_id)
result_sets = shot_results['resultSets']
first_result_set = result_sets[0]
row_set = first_result_set['rowSet']
set_headers = first_result_set['headers']
df = pd.DataFrame.from_records(row_set, columns=set_headers)
I see how you got confused with that medium post. You were missing the headers and the url for the NBA api wasn't right. That's what #pierre was trying to say in his response. The url you're using isn't right. If you reread that post you were following, you'll see that the author said he had to dig in to dev tools in order to find that actual url to use in order to grab the JSON.
Edit: Forgot to mention that when I didn't pass a User-Agent in the headers, the request would timeout. If you don't pass that in, you won't get a successful response.
Related
I'm trying to extract nutritional information using the Nutritionix API/database in Python. I was able to get a successful query and placed it into a pandas dataframe. However, I'm a bit confused though because the resulting json claims that there are several thousand 'hits' for my query but at most 10 are ever returned. For instance, when I query for Garbanzo, the json file says that there are 513 total_hits, but only 10 are actually returned. Does anyone know what is causing this? The code I'm using is below.
import requests
import json
import pandas as pd
from nutritionix import Nutritionix
nix_apikey = ''
nix_appid = ''
nix = Nutritionix(app_id = nix_appid, api_key = nix_apikey)
results = nix.search('Garbanzo').json()
df = pd.json_normalize(results, record_path = ['hits'])
I'm not including the my api_key or app_id for obvious reasons. Here's a link to the Nutritionix API: https://github.com/leetrout/python-nutritionix
Thanks for any suggestions!
I am building a project to import crypto prices in json format that I can later parse out.
Now I have been researching and I haven't found a sure way to combine these two statements together so I can do one call instead of two. I want to do BTC and ETH like you see in one "req_params" I just dont know what the right syntax is a lot of the sites have confusing material.
import requests
import json
from datetime import datetime,timedelta
base_url = 'https://api.kucoin.com/api/v1/market/candles'
start_timestmp = datetime.strptime("2021-11-11T11:11:11","%Y-%m-%dT%H:%M:%S")
end_timestmp = start_timestmp + timedelta(minutes=1)
# How do I combine these two stamens into one call ?
req_params = {'symbol':'BTC-USDT','type':'1min',
'startAt':int(datetime.timestamp(start_timestmp)),
'endAt': int(datetime.timestamp(end_timestmp))
}
req_params2 = {'symbol':'ETH-USDT','type':'1min',
'startAt':int(datetime.timestamp(start_timestmp)),
'endAt': int(datetime.timestamp(end_timestmp))
}
url_response = requests.get(url=base_url,params=req_params)
url_response2 = requests.get(url=base_url,params=req_params2)
response_json = url_response.json()
response_json2 = url_response2.json()
# put the response in json format
closing_price = float(response_json['data'][0][2])
print(response_json)
print(response_json2)
print(closing_price)
I have been trying to add the JSON data from this API to a pandas data frame. Here is the code I have tried:
url = 'https://api.covid19api.com/summary'
df = pd.read_json(url)
print(df.head())
When running this code, I receive the following error:
ValueError: Mixing dicts with non-Series may lead to ambiguous
ordering.
Any advice on this would be helpful. Thanks in advance.
Hi Matt and welcome on SO. Whenever you work with json it's better to first get the data and have a look at it. In your particular case the key Global is different from the ones in Countries that's why you get that error
import urllib.request
import json
import pandas as pd
url = 'https://api.covid19api.com/summary'
response = urllib.request.urlopen(url)
# the following is the data you should explore
data = json.loads(response.read())
df = pd.DataFrame(data["Countries"])
The JSON has a couple of elements ('Global', 'Countries' and 'Date'), so it would make sense to split it up into separate dataframes, which is not easy to do using pandas.read_json().
import requests
url = 'https://api.covid19api.com/summary'
r = requests.get(url)
data = r.json()
global_data = pd.DataFrame(data['Global'])
countries = pd.DataFrame(data['Countries'])
I'm trying to split URLs and put the fragments in a dataframe. I found this thread pythonic way to parse/split URLs in a pandas dataframe and try to apply it, but for some reason it gives me an error.
I am under Python 3.x so I used the following:
import pandas
import urllib
urls = ['https://www.google.com/something','https://mail.google.com/anohtersomething', 'https://www.amazon.com/yetanotherthing']
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*df['url'].map(urllib.parse.urlsplit))
I get an error saying KeyError: 'urls', not sure what it means.
If someone could help would be great. Thanks.
The example you used assumes that the links are in a dataframe. Here's the correct solution:
import urllib
import pandas as pd
df = pd.DataFrame()
urls = ['https://www.google.com/something','https://mail.google.com/anohtersomething', 'https://www.amazon.com/yetanotherthing']
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*[urllib.parse.urlsplit(x) for x in urls])
Result
protocol domain path query fragment
0 https www.google.com /something
1 https mail.google.com /anohtersomething
2 https www.amazon.com /yetanotherthing
I have the following csv url which works correctly if simply pasted into a browser:
http://www.google.com/finance/historical?q=JSE%3AMTN&startdate=Nov 1, 2011&enddate=Nov 30, 2011&output=csv
However I can't seem to download the csv using pandas. I get the error:
urllib.error.HTTPERROR: HTTP ERROR 400: Bad Request
Code:
import pandas as pd
def main():
url = 'http://www.google.com/finance/historical?q=JSE%3AMTN&startdate=Nov 1, 2011&enddate=Nov 30, 2011&output=csv'
df = pd.read_csv(url)
print(df)
Please could someone point me in the right direction.
That URL is not properly encoded. Your browser automagically replaces the spaces ' ' by '%20', the underlying urllib request from the python standard library doesn't do that. Replace all spaces by '%20' and you are fine.
Also, if you are using pandas 0.16 you can skip all of this since support for Google Finance data is built in now (see http://pandas.pydata.org/pandas-docs/stable/remote_data.html#remote-data-google):
import pandas.io.data as web
df = web.DataReader("F", 'JSE:MTN', "2011-11-01", "2011-11-30")