Creating a Pandas Dataframe from an API Endpoint in a Jupyter Notebook - python

I am trying to convert API into pandas DataFrame.
sample API : https://api.fda.gov/drug/event.json?search=(receivedate:[20040101+TO+20210629])+AND+PREDNISOLONE
Here is my code:
import json
import requests
import pandas as pd
def callAPI(drug_name, recievedate_from, recievedate_to):
url='https://api.fda.gov/drug/event.json?search=(receivedate:
['+str(recievedate_from)+'+TO+'+str(recievedate_to)+'])+AND+'+str(drug_name)
r = requests.get(url).json()
data = json.load(open(r))
df = pd.DataFrame(data["results"])
print(df)
callAPI('PREDNISOLONE', 20040101, 20210629)
I am getting an error:
TypeError: expected str, bytes or os.PathLike object, not dict
How do I get it right?

given response.json() has already called json.loads() you should not be calling it yourself
simplest way to get JSON into a dataframe is json_normalize()
I've also shown how you can expand embedded lists in the returned structure
import requests
res = requests.get("https://api.fda.gov/drug/event.json?search=(receivedate:[20040101+TO+20210629])+AND+PREDNISOLONE")
df = pd.json_normalize(res.json()["results"])
dfpr = df["patient.reaction"].explode().apply(pd.Series)
dfpd = df["patient.drug"].explode().apply(pd.Series)

Related

ValueError: DataFrame constructor not properly called! when coverting dictionaries within list to pandas dataframe

I want to convert a list of dictionaries to a pandas dataframe, however, I got ValueError: DataFrame constructor not properly called!
Below is an example and how I got the data:
import requests
import pandas as pd
# Send an HTTP GET request to the URL
response = requests.get(url)
# Decode the JSON data into a dictionary
scrapped_data = response.text
Content of response.text is:
[{"id":123456,"date":"12-12-2022","value":37},{"id":123456,"date":"13-12-2022","value":38}]
I want to convert it to a dataframe format like the following:
id
date
value
123456
12-12-2022
37
123456
13-12-2022
38
I tried the following methods:
df = pd.DataFrame(scrapped_data)
df = pd.DataFrame_from_dict(scrapped_data)
df = pd.DataFrame(scrapped_data, orient='columns')
all got the same value errors.
I also tried:
df = pd.json_normalize(scrapped_data)
but got NotImplementedError
The type for scrapped_data is string format
Thanks for your help, let me know if you have any questions
One reason for receiving this error from pandas is providing str as data. I think your data come as str, If it is the case then Try this:
import json
import pandas as pd
orignal_data='[{"id":"123456","date":"12-12-2022","value":"37"}, {"id":"123456","date":"13-12-2022","value":"38"}]'
scraped_data = json.loads(orignal_data)
df = pd.DataFrame(data=scraped_data)
df
As you said, scrapped_data is a string then you need to convert it into a dictionary (with the method loads from the json library for example).
If scrapped_data = '[{"id":"123456","date":"12-12-2022","value":"37"}, {"id":"123456","date":"13-12-2022","value":"38"}]',
then you can just do df = pd.DataFrame(scrapped_data).

Convert Pandas Data Table to Background Gradient

I have an issue with trying to convert my data table to a background gradient style. Every time I run the script, I'm not able to convert it somehow. I think it has to do that some data values in python won't convert right since they are in the wrong data form. Does anyone know how to help me with this issue?
try:
# For Python 3.0 and later
from urllib.request import urlopen
except ImportError:
print("Wrong version")
import json
def get_jsonparsed_data(url):
"""
Receive the content of ``url``, parse it as JSON and return the object.
Parameters
----------
url : str
Returns
-------
dict
"""
response = urlopen(url)
data = response.read().decode("utf-8")
return json.loads(data)
url = ("https://financialmodelingprep.com/api/v3/income-statement/AAPL?apikey=*******************")
print(get_jsonparsed_data(url))
data = get_jsonparsed_data(url)
import pandas as pd
import numpy as np
# Sets the pandas dataframe wide for vizualization
desired_width=1000
pd.set_option('display.width', desired_width)
np.set_printoptions(linewidth=desired_width)
pd.set_option('display.max_columns',100)
# Gradient color
df = pd.DataFrame(data)
df.info()
df.style.background_gradient(cmap='Blues',
low=0,
high=0,
axis=0,
subset=None,
text_color_threshold=0.408,
vmin=None,
vmax=None)
print(df)
Screenshots:
Calling .style.* doesn't convert anything.
So the print(df) in the end makes your call useless, it gets evaluated, and then nothing.
If you want to "convert your DataFrame" (you can't actually), create a new variable :
df_styled = df.style.background_gradient(...)
But note that
df is a DataFrame,
df_styled is an html representation of a DataFrame...
It's really different

How to create a dataframe from urlopen (csv)

My code:
# parse json returned from the API to Pandas DF
openUrl = urlopen(url)
r = openUrl.read()
openUrl.close()
#d = json.loads(r.decode())
#df = pd.DataFrame(d, index=[0])
df = pd.DataFrame(r, index=[0])
The error:
ValueError: DataFrame constructor not properly called!
Help would be aprreacited.
The DataFrame constructor requires an nd-array like input (or dict, iterable).
You can use pandas.read_csv if you want to directly input a csv and get a DataFrame.
Try printing r to see what is actually inside the response.
pandas.read_csv has a lot of option parameters to handle different types of csv, which of course depends on what you're getting from the url.
This snippet might help you.
import urllib.request
import pandas as pd
r = urllib.request.urlopen('HERE GOES YOUR LINK')
x = r.read()
print(type(x))
y = str(x)
df = pd.DataFrame([y], columns=['string_values'])
print (df)

Extract json data in web page using pd.read_json()?

Trying to extract the table from this page "https://www.hkex.com.hk/Market-Data/Statistics/Consolidated-Reports/Monthly-Bulletin?sc_lang=en#select1=0&select2=28". By inspect/network function of chorme, the data request link is "https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?_=1574650413485". This links looks like json format when access directly. However, the codes using this link does not work.
My codes:
import pandas as pd
url="https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?_=1574650413485"
df = pd.read_json(url)
print(df.info(verbose=True))
print(df)
also tried:
url="https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?"
You can try downloading the json first and then convert it back to DataFrame
import pandas as pd
url='https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?_=1574650413485'
import urllib.request, json
with urllib.request.urlopen(url) as r:
data = json.loads(r.read().decode())
df = pd.DataFrame(data['tables'][0]['body'])
columns = [item['text'] for item in data['tables'][0]['header']]
row_count = max(df['row'])
new_df = pd.DataFrame(df.text.values.reshape((row_count,-1)),columns = columns)

How to make a DataFrame from the nested JSON dictionary

I am trying to make a DataFrame with all values from this address: https://www.ebi.ac.uk/pdbe/api/pisa/interfacecomponent/3gcb/0/1/energetics. But The DataFrame I get is very messy and it doesnt provide all the information contained in the JSON dictionary. I am using this code but the result is bad:
import numpy as np
import pandas as pd
import requests
import json
url = 'https://www.ebi.ac.uk/pdbe/api/pisa/interfacecomponent/3gcb/0/1/energetics'
JSONContent = requests.get(url).json()
content = json.dumps(JSONContent, indent = 4, sort_keys=True)
data = json.loads(content)
df = pd.io.json.json_normalize(data)
print df
Can someone help please?

Categories