Convert json data into dataframe

Convert json data into dataframe - python

I am unable to flatten the json data from this API into a dataframe. I have tried using json_normalize but I gives a NotImplemented error. Can someone help me with it? I need the columns: stationId, start, timestep, temperature where there are several values for temperature and rest of the columns should have same values.
import requests
import json
import pandas as pd
response_API = requests.get('https://dwd.api.proxy.bund.dev/v30/stationOverviewExtended?stationIds=10865,G005')
print(response_API.status_code)
data = response_API.text
json.loads(data)
df= ?

You can do it many ways, but your current approach should use json() instead of text
import requests
import json
import pandas as pd
response_API = requests.get('https://dwd.api.proxy.bund.dev/v30/stationOverviewExtended?stationIds=10865,G005')
print(response_API.status_code)
data = response_API.json() <--- it should be json()
print(data)
OR directly read json to df from the URL using read_json()
df = pd.read_json("https://dwd.api.proxy.bund.dev/v30/stationOverviewExtended?stationIds=10865,G005")
print(df)
Edit:
import requests
import json
import pandas as pd
response_API = requests.get('https://dwd.api.proxy.bund.dev/v30/stationOverviewExtended?stationIds=10865,G005')
print(response_API.status_code)
data = response_API.json()
result = []
for station, value in data.items():
for forecast, val in value.items():
if forecast in ['forecast1', 'forecast2']:
result.append(val)
df = pd.DataFrame(result)
print(df)

Related

How do I get the groupby operation to work?

Can't get Pandas Groupby operation to work.
I suspect I need to convert the data to a pandas dataframe first? However, I can't seem to get that to work either.
import requests
import json
import pandas as pd
baseurl = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/datasets/githubposting.json"
response = requests.get(baseurl)
data = response.json()
print(data)
def get_number_of_jobs(technology):
number_of_jobs = 0
number_of_jobs=data.groupby('technology').sum().loc[technology,:][0]
return technology,number_of_jobs
print(get_number_of_jobs('python'))
Thanks

data is a list of dictionaries, not DataFrame, so it doesn't have groupby. You don't really need it anyway, you can create the DataFrame while replacing the A and B columns with the first values in the json response and search for 'Python' there, it's already a single entry
baseurl = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/datasets/githubposting.json"
response = requests.get(baseurl)
data = response.json()
df = pd.DataFrame(columns=list(data[0].values()), data=[d.values() for d in data[1:]])
number_of_jobs = df.loc[df['technology'] == 'Python', 'number of job posting'].iloc[0]
print(number_of_jobs) # 51

Removing Values from Pandas Read Excel

I am trying to read values from an excel and change them to json to use in my API.
I am getting:
{"Names":{"0":"Tom","1":"Bill","2":"Sally","3":"Cody","4":"Betty"}}
I only want to see the values. What I would like to get is this:
{"Names":{"Tom", "Bill", "Sally", "Cody", "Betty"}}
I haven't figured out how to remove the numbers before the values.
The code I am using is as follows:
import pandas as pd
df = pd.read_excel(r'C:\Users\User\Desktop\Names.xlsx')
json_str = df.to_json()
print(json_str)

As mentioned in the comments your desired result is not valid json.
maybe you can do this:
import json
import pandas as pd
df = pd.read_excel(r'C:\Users\User\Desktop\Names.xlsx')
json_str = df.to_json()
temp = json.loads(json_str)
temp['Names'] = list(temp['Names'].values())
print(json.dumps(temp))

Extract json data in web page using pd.read_json()?

Trying to extract the table from this page "https://www.hkex.com.hk/Market-Data/Statistics/Consolidated-Reports/Monthly-Bulletin?sc_lang=en#select1=0&select2=28". By inspect/network function of chorme, the data request link is "https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?_=1574650413485". This links looks like json format when access directly. However, the codes using this link does not work.
My codes:
import pandas as pd
url="https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?_=1574650413485"
df = pd.read_json(url)
print(df.info(verbose=True))
print(df)
also tried:
url="https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?"

You can try downloading the json first and then convert it back to DataFrame
import pandas as pd
url='https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?_=1574650413485'
import urllib.request, json
with urllib.request.urlopen(url) as r:
data = json.loads(r.read().decode())
df = pd.DataFrame(data['tables'][0]['body'])
columns = [item['text'] for item in data['tables'][0]['header']]
row_count = max(df['row'])
new_df = pd.DataFrame(df.text.values.reshape((row_count,-1)),columns = columns)

How to make a DataFrame from the nested JSON dictionary

I am trying to make a DataFrame with all values from this address: https://www.ebi.ac.uk/pdbe/api/pisa/interfacecomponent/3gcb/0/1/energetics. But The DataFrame I get is very messy and it doesnt provide all the information contained in the JSON dictionary. I am using this code but the result is bad:
import numpy as np
import pandas as pd
import requests
import json
url = 'https://www.ebi.ac.uk/pdbe/api/pisa/interfacecomponent/3gcb/0/1/energetics'
JSONContent = requests.get(url).json()
content = json.dumps(JSONContent, indent = 4, sort_keys=True)
data = json.loads(content)
df = pd.io.json.json_normalize(data)
print df
Can someone help please?

Python loop through URL's

I'm attempting take dates from a dataframe and loop through them within URL's. I've managed to print the URL's (1st code), but when I attempt to turn the URL's json into a dataframe (2nd code) I get this response.
AttributeError: 'str' object has no attribute 'json'
#1st code
import requests
import pandas as pd
df = pd.read_csv('NBADates.csv')
df.to_dict('series')
for row in df.loc[ : ,"Date"]:
url = url_template.format(row=row)
print(url)
Any ideas on what I'm doing wrong?
#2nd code
import requests
import csv
import pandas as pd
url_template = "https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom={row}&DateTo={row}&Division=&DraftPick=&DraftYear=&GameScope=&Height=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=Totals&PlayerExperience=&PlayerOrTeam=Player&PlayerPosition=&PtMeasureType=SpeedDistance&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
df = pd.read_csv('NBADates.csv')
df.to_dict('series')
for row in df.loc[ : ,"Date"]:
url = url_template.format(row=row)
stats = url.json()['resultSets'][0]['rowSet']
headers = url.json()['resultSets'][0]['headers']
stats_df = pd.DataFrame(stats, columns=headers)
# Append to the big dataframe
lineup_df = lineup_df.append(stats_df, ignore_index=True)
lineup_df.to_csv("Stats.csv")

I think you forgot to request the URL. you should send a request and if the response is a json, you should parse it

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert json data into dataframe - python

Related

How do I get the groupby operation to work?

Removing Values from Pandas Read Excel

Extract json data in web page using pd.read_json()?

How to make a DataFrame from the nested JSON dictionary

Python loop through URL's

Categories

Resources