How to create a dataframe from urlopen (csv) - python

My code:
# parse json returned from the API to Pandas DF
openUrl = urlopen(url)
r = openUrl.read()
openUrl.close()
#d = json.loads(r.decode())
#df = pd.DataFrame(d, index=[0])
df = pd.DataFrame(r, index=[0])
The error:
ValueError: DataFrame constructor not properly called!
Help would be aprreacited.

The DataFrame constructor requires an nd-array like input (or dict, iterable).
You can use pandas.read_csv if you want to directly input a csv and get a DataFrame.
Try printing r to see what is actually inside the response.
pandas.read_csv has a lot of option parameters to handle different types of csv, which of course depends on what you're getting from the url.

This snippet might help you.
import urllib.request
import pandas as pd
r = urllib.request.urlopen('HERE GOES YOUR LINK')
x = r.read()
print(type(x))
y = str(x)
df = pd.DataFrame([y], columns=['string_values'])
print (df)

Related

ValueError: DataFrame constructor not properly called! when coverting dictionaries within list to pandas dataframe

I want to convert a list of dictionaries to a pandas dataframe, however, I got ValueError: DataFrame constructor not properly called!
Below is an example and how I got the data:
import requests
import pandas as pd
# Send an HTTP GET request to the URL
response = requests.get(url)
# Decode the JSON data into a dictionary
scrapped_data = response.text
Content of response.text is:
[{"id":123456,"date":"12-12-2022","value":37},{"id":123456,"date":"13-12-2022","value":38}]
I want to convert it to a dataframe format like the following:
id
date
value
123456
12-12-2022
37
123456
13-12-2022
38
I tried the following methods:
df = pd.DataFrame(scrapped_data)
df = pd.DataFrame_from_dict(scrapped_data)
df = pd.DataFrame(scrapped_data, orient='columns')
all got the same value errors.
I also tried:
df = pd.json_normalize(scrapped_data)
but got NotImplementedError
The type for scrapped_data is string format
Thanks for your help, let me know if you have any questions
One reason for receiving this error from pandas is providing str as data. I think your data come as str, If it is the case then Try this:
import json
import pandas as pd
orignal_data='[{"id":"123456","date":"12-12-2022","value":"37"}, {"id":"123456","date":"13-12-2022","value":"38"}]'
scraped_data = json.loads(orignal_data)
df = pd.DataFrame(data=scraped_data)
df
As you said, scrapped_data is a string then you need to convert it into a dictionary (with the method loads from the json library for example).
If scrapped_data = '[{"id":"123456","date":"12-12-2022","value":"37"}, {"id":"123456","date":"13-12-2022","value":"38"}]',
then you can just do df = pd.DataFrame(scrapped_data).

How do I get the groupby operation to work?

Can't get Pandas Groupby operation to work.
I suspect I need to convert the data to a pandas dataframe first? However, I can't seem to get that to work either.
import requests
import json
import pandas as pd
baseurl = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/datasets/githubposting.json"
response = requests.get(baseurl)
data = response.json()
print(data)
def get_number_of_jobs(technology):
number_of_jobs = 0
number_of_jobs=data.groupby('technology').sum().loc[technology,:][0]
return technology,number_of_jobs
print(get_number_of_jobs('python'))
Thanks
data is a list of dictionaries, not DataFrame, so it doesn't have groupby. You don't really need it anyway, you can create the DataFrame while replacing the A and B columns with the first values in the json response and search for 'Python' there, it's already a single entry
baseurl = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/datasets/githubposting.json"
response = requests.get(baseurl)
data = response.json()
df = pd.DataFrame(columns=list(data[0].values()), data=[d.values() for d in data[1:]])
number_of_jobs = df.loc[df['technology'] == 'Python', 'number of job posting'].iloc[0]
print(number_of_jobs) # 51

Creating a Pandas Dataframe from an API Endpoint in a Jupyter Notebook

I am trying to convert API into pandas DataFrame.
sample API : https://api.fda.gov/drug/event.json?search=(receivedate:[20040101+TO+20210629])+AND+PREDNISOLONE
Here is my code:
import json
import requests
import pandas as pd
def callAPI(drug_name, recievedate_from, recievedate_to):
url='https://api.fda.gov/drug/event.json?search=(receivedate:
['+str(recievedate_from)+'+TO+'+str(recievedate_to)+'])+AND+'+str(drug_name)
r = requests.get(url).json()
data = json.load(open(r))
df = pd.DataFrame(data["results"])
print(df)
callAPI('PREDNISOLONE', 20040101, 20210629)
I am getting an error:
TypeError: expected str, bytes or os.PathLike object, not dict
How do I get it right?
given response.json() has already called json.loads() you should not be calling it yourself
simplest way to get JSON into a dataframe is json_normalize()
I've also shown how you can expand embedded lists in the returned structure
import requests
res = requests.get("https://api.fda.gov/drug/event.json?search=(receivedate:[20040101+TO+20210629])+AND+PREDNISOLONE")
df = pd.json_normalize(res.json()["results"])
dfpr = df["patient.reaction"].explode().apply(pd.Series)
dfpd = df["patient.drug"].explode().apply(pd.Series)

convert json to dataframe in for loops in python

I'm trying to call the data using api and making a dataframe using for loops with returned json. I am able to create the first dataframe but my for loop only returns the first json -> dataframe. After a few days struggle, I decided to ask guidance from experts here..
import requests
import json
import pandas as pd
# create an Empty DataFrame object
df = pd.DataFrame()
# api header
headers = {"Accept": "application/json","Authorization": "api_secret"}
#email for loops
email_list = ["abc#gmail.com", "xyz#gmail.com"]
#supposed to read 2 emails in the list and append each df but only reads the first one...#
for i in email_list:
querystring = {"where":i}
response = requests.request("GET", "https://example.com/api/2.0/export", headers=headers, params=querystring)
with open('test.jsonl', 'w') as writefile:
writefile.write(response.text)
data = [json.loads(line) for line in open('test.jsonl', 'r')]
FIELDS = ["event"]
df = pd.json_normalize(data)[FIELDS]
df = df.append(df)
I wonder if I need to change something in df append but I can't pinpoint where needs to be changed. thank you so much in advance!
df = pd.json_normalize(data)[FIELDS]
df = df.append(df)
overwrites the dataframe each time instead, create a new one before appending:
df2 = pd.json_normalize(data)[FIELDS]
df = df.append(df2)

Convert Pandas Data Table to Background Gradient

I have an issue with trying to convert my data table to a background gradient style. Every time I run the script, I'm not able to convert it somehow. I think it has to do that some data values in python won't convert right since they are in the wrong data form. Does anyone know how to help me with this issue?
try:
# For Python 3.0 and later
from urllib.request import urlopen
except ImportError:
print("Wrong version")
import json
def get_jsonparsed_data(url):
"""
Receive the content of ``url``, parse it as JSON and return the object.
Parameters
----------
url : str
Returns
-------
dict
"""
response = urlopen(url)
data = response.read().decode("utf-8")
return json.loads(data)
url = ("https://financialmodelingprep.com/api/v3/income-statement/AAPL?apikey=*******************")
print(get_jsonparsed_data(url))
data = get_jsonparsed_data(url)
import pandas as pd
import numpy as np
# Sets the pandas dataframe wide for vizualization
desired_width=1000
pd.set_option('display.width', desired_width)
np.set_printoptions(linewidth=desired_width)
pd.set_option('display.max_columns',100)
# Gradient color
df = pd.DataFrame(data)
df.info()
df.style.background_gradient(cmap='Blues',
low=0,
high=0,
axis=0,
subset=None,
text_color_threshold=0.408,
vmin=None,
vmax=None)
print(df)
Screenshots:
Calling .style.* doesn't convert anything.
So the print(df) in the end makes your call useless, it gets evaluated, and then nothing.
If you want to "convert your DataFrame" (you can't actually), create a new variable :
df_styled = df.style.background_gradient(...)
But note that
df is a DataFrame,
df_styled is an html representation of a DataFrame...
It's really different

Categories