I'm attempting take dates from a dataframe and loop through them within URL's. I've managed to print the URL's (1st code), but when I attempt to turn the URL's json into a dataframe (2nd code) I get this response.
AttributeError: 'str' object has no attribute 'json'
#1st code
import requests
import pandas as pd
df = pd.read_csv('NBADates.csv')
df.to_dict('series')
for row in df.loc[ : ,"Date"]:
url = url_template.format(row=row)
print(url)
Any ideas on what I'm doing wrong?
#2nd code
import requests
import csv
import pandas as pd
url_template = "https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom={row}&DateTo={row}&Division=&DraftPick=&DraftYear=&GameScope=&Height=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=Totals&PlayerExperience=&PlayerOrTeam=Player&PlayerPosition=&PtMeasureType=SpeedDistance&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
df = pd.read_csv('NBADates.csv')
df.to_dict('series')
for row in df.loc[ : ,"Date"]:
url = url_template.format(row=row)
stats = url.json()['resultSets'][0]['rowSet']
headers = url.json()['resultSets'][0]['headers']
stats_df = pd.DataFrame(stats, columns=headers)
# Append to the big dataframe
lineup_df = lineup_df.append(stats_df, ignore_index=True)
lineup_df.to_csv("Stats.csv")
I think you forgot to request the URL. you should send a request and if the response is a json, you should parse it
Related
I am unable to flatten the json data from this API into a dataframe. I have tried using json_normalize but I gives a NotImplemented error. Can someone help me with it? I need the columns: stationId, start, timestep, temperature where there are several values for temperature and rest of the columns should have same values.
import requests
import json
import pandas as pd
response_API = requests.get('https://dwd.api.proxy.bund.dev/v30/stationOverviewExtended?stationIds=10865,G005')
print(response_API.status_code)
data = response_API.text
json.loads(data)
df= ?
You can do it many ways, but your current approach should use json() instead of text
import requests
import json
import pandas as pd
response_API = requests.get('https://dwd.api.proxy.bund.dev/v30/stationOverviewExtended?stationIds=10865,G005')
print(response_API.status_code)
data = response_API.json() <--- it should be json()
print(data)
OR directly read json to df from the URL using read_json()
df = pd.read_json("https://dwd.api.proxy.bund.dev/v30/stationOverviewExtended?stationIds=10865,G005")
print(df)
Edit:
import requests
import json
import pandas as pd
response_API = requests.get('https://dwd.api.proxy.bund.dev/v30/stationOverviewExtended?stationIds=10865,G005')
print(response_API.status_code)
data = response_API.json()
result = []
for station, value in data.items():
for forecast, val in value.items():
if forecast in ['forecast1', 'forecast2']:
result.append(val)
df = pd.DataFrame(result)
print(df)
Can't get Pandas Groupby operation to work.
I suspect I need to convert the data to a pandas dataframe first? However, I can't seem to get that to work either.
import requests
import json
import pandas as pd
baseurl = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/datasets/githubposting.json"
response = requests.get(baseurl)
data = response.json()
print(data)
def get_number_of_jobs(technology):
number_of_jobs = 0
number_of_jobs=data.groupby('technology').sum().loc[technology,:][0]
return technology,number_of_jobs
print(get_number_of_jobs('python'))
Thanks
data is a list of dictionaries, not DataFrame, so it doesn't have groupby. You don't really need it anyway, you can create the DataFrame while replacing the A and B columns with the first values in the json response and search for 'Python' there, it's already a single entry
baseurl = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/datasets/githubposting.json"
response = requests.get(baseurl)
data = response.json()
df = pd.DataFrame(columns=list(data[0].values()), data=[d.values() for d in data[1:]])
number_of_jobs = df.loc[df['technology'] == 'Python', 'number of job posting'].iloc[0]
print(number_of_jobs) # 51
Trying to extract the table from this page "https://www.hkex.com.hk/Market-Data/Statistics/Consolidated-Reports/Monthly-Bulletin?sc_lang=en#select1=0&select2=28". By inspect/network function of chorme, the data request link is "https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?_=1574650413485". This links looks like json format when access directly. However, the codes using this link does not work.
My codes:
import pandas as pd
url="https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?_=1574650413485"
df = pd.read_json(url)
print(df.info(verbose=True))
print(df)
also tried:
url="https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?"
You can try downloading the json first and then convert it back to DataFrame
import pandas as pd
url='https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?_=1574650413485'
import urllib.request, json
with urllib.request.urlopen(url) as r:
data = json.loads(r.read().decode())
df = pd.DataFrame(data['tables'][0]['body'])
columns = [item['text'] for item in data['tables'][0]['header']]
row_count = max(df['row'])
new_df = pd.DataFrame(df.text.values.reshape((row_count,-1)),columns = columns)
I am new to python and am stuck. I cant figure out how to only output one of the tables given. In the output, it gives the desired table, but three versions of them. The first two are awfully formatted, and the last table is the table desired.
I have tried running a for loop and counting to only print the third table.
import pandas as pd
from bs4 import BeautifulSoup
import requests
url = 'https://www.espn.com/golf/leaderboard'
dfs = pd.read_html(url, header = 0)
for df in dfs:
print(df[0:])
Just use index to print the table.
import pandas as pd
url = 'https://www.espn.com/golf/leaderboard'
dfs = pd.read_html(url, header = 0)
print(dfs[2])
OR
print(dfs[-1])
OR If you want to use loop then try that.
import pandas as pd
url = 'https://www.espn.com/golf/leaderboard'
dfs = pd.read_html(url, header = 0)
for df in range(len(dfs)):
if df==2:
print(dfs[df])
I am trying to make a DataFrame with all values from this address: https://www.ebi.ac.uk/pdbe/api/pisa/interfacecomponent/3gcb/0/1/energetics. But The DataFrame I get is very messy and it doesnt provide all the information contained in the JSON dictionary. I am using this code but the result is bad:
import numpy as np
import pandas as pd
import requests
import json
url = 'https://www.ebi.ac.uk/pdbe/api/pisa/interfacecomponent/3gcb/0/1/energetics'
JSONContent = requests.get(url).json()
content = json.dumps(JSONContent, indent = 4, sort_keys=True)
data = json.loads(content)
df = pd.io.json.json_normalize(data)
print df
Can someone help please?