Convert API Reponse to Pandas DataFrame - python

I making an API call with the following code:
req = urllib.request.Request(url, body, headers)
try:
response = urllib.request.urlopen(req)
string = response.read().decode('utf-8')
json_obj = json.loads(string)
Which returns the following:
{"forecast": [17.588294043898163, 17.412641963452206],
"index": [
{"SaleDate": 1629417600000, "Type": "Type 1"},
{"SaleDate": 1629504000000, "Type": "Type 2"}
]
}
How can I convert this api response to a Panda DataFrame to convert the dict in the following format in pandas dataframe
Forecast SaleDate Type
17.588294043898163 2021-08-16 Type 1
17.412641963452206 2021-08-17 Type 1

You can use the following. It uses pandas.Series to convert the dictionary to columns and pandas.to_datetime to map the correct date from the millisecond timestamp:
d = {"forecast": [17.588294043898163, 17.412641963452206],
"index": [
{"SaleDate": 1629417600000, "Type": "Type 1"},
{"SaleDate": 1629504000000, "Type": "Type 2"}
]
}
df = pd.DataFrame(d)
df = pd.concat([df['forecast'], df['index'].apply(pd.Series)], axis=1)
df['SaleDate'] = pd.to_datetime(df['SaleDate'], unit='ms')
output:
forecast SaleDate Type
0 17.588294 2021-08-20 Type 1
1 17.412642 2021-08-21 Type 2

Here is a solution you can give it a try, using list comprehension to flatten the data.
import pandas as pd
flatten = [
{"forecast": j, **resp['index'][i]} for i, j in enumerate(resp['forecast'])
]
pd.DataFrame(flatten)
forecast SaleDate Type
0 17.588294 1629417600000 Type 1
1 17.412642 1629504000000 Type 2

Related

Normalizing json using pandas with inconsistent nested lists/dictionaries

I've been using pandas' json_normalize for a bit but ran into a problem with specific json file, similar to the one seen here: https://github.com/pandas-dev/pandas/issues/37783#issuecomment-1148052109
I'm trying to find a way to retrieve the data within the Ats -> Ats dict and return any null values (like the one seen in the ID:101 entry) as NaN values in the dataframe. Ignoring errors within the json_normalize call doesn't prevent the TypeError that stems from trying to iterate through a null value.
Any advice or methods to receive a valid dataframe out of data with this structure is greatly appreciated!
import json
import pandas as pd
data = """[
{
"ID": "100",
"Ats": {
"Ats": [
{
"Name": "At1",
"Desc": "Lazy At"
}
]
}
},
{
"ID": "101",
"Ats": null
}
]"""
data = json.loads(data)
df = pd.json_normalize(data, ["Ats", "Ats"], "ID", errors='ignore')
df.head()
TypeError: 'NoneType' object is not iterable
I tried to iterate through the Ats dictionary, which would work normally for the data with ID 100 but not with ID 101. I expected ignoring errors within the function to return a NaN value in a dataframe but instead received a TypeError for trying to iterate through a null value.
The desired output would look like this: Dataframe
This approach can be more efficient when it comes to dealing with large datasets.
data = json.loads(data)
desired_data = list(
map(lambda x: pd.json_normalize(x, ["Ats", "Ats"], "ID").to_dict(orient="records")[0]
if x["Ats"] is not None
else {"ID": x["ID"], "Name": np.nan, "Desc": np.nan}, data))
df = pd.DataFrame(desired_data)
Output:
Name Desc ID
0 At1 Lazy At 100
1 NaN NaN 101
You might want to consider using this simple try and except approach when working with small datasets. In this case, whenever an error is found it should append new row to DataFrame with NAN.
Example:
data = json.loads(data)
df = pd.DataFrame()
for item in data:
try:
df = df.append(pd.json_normalize(item, ["Ats", "Ats"], "ID"))
except TypeError:
df = df.append({"ID" : item["ID"], "Name": np.nan, "Desc": np.nan}, ignore_index=True)
print(df)
Output:
Name Desc ID
0 At1 Lazy At 100
1 NaN NaN 101
Maybe you can create a DataFrame from the data normally (without pd.json_normalize) and then transform it to requested form afterwards:
import json
import pandas as pd
data = """\
[
{
"ID": "100",
"Ats": {
"Ats": [
{
"Name": "At1",
"Desc": "Lazy At"
}
]
}
},
{
"ID": "101",
"Ats": null
}
]"""
data = json.loads(data)
df = pd.DataFrame(data)
df["Ats"] = df["Ats"].str["Ats"]
df = df.explode("Ats")
df = pd.concat([df, df.pop("Ats").apply(pd.Series, dtype=object)], axis=1)
print(df)
Prints:
ID Name Desc
0 100 At1 Lazy At
1 101 NaN NaN

Convert JSON into dataframe [duplicate]

This question already has an answer here:
Python converting URL JSON response to pandas dataframe
(1 answer)
Closed 12 months ago.
I am working with Python and I have the following JSON which I need to convert to a Dataframe:
JSON:
{"Results":
{"forecast": [2.1632421537363355, 16.35421956127545],
"prediction_interval": ["[-114.9747272420262, 119.30121154949884]",
"[-127.10990770140964, 159.8183468239605]"],
"index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0},
{"SaleDate": 1644364800000, "OfferingGroupId": 1}]
}
}
Expected Dataframe output:
Forecast SaleDate OfferingGroupId
2.1632421537363355 2022-02-08 0
16.35421956127545 2022-02-09 1
I have tried a few things but not getting anywhere close, my last attempt was:
string = '{"Results": {"forecast": [2.1632421537363355, 16.35421956127545], "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"], "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]}}'
json_obj = json.loads(string)
df = pd.DataFrame(json_obj)
print(df)
df = pd.concat([df['Results']], axis=0)
df = pd.concat([df['forecast'], df['index'].apply(pd.Series)], axis=1)
which resulted in an error:
AttributeError: 'list' object has no attribute 'apply'
One possible approach is to create a DataFrame from the value under "Results" (this will create a column named "index") and build another DataFrame with the "index" column and join it back to the original DataFrame:
df = pd.DataFrame(data['Results'])
df = df.join(pd.DataFrame(df['index'].tolist())).drop(columns=['prediction_interval', 'index'])
df['SaleDate'] = pd.to_datetime(df['SaleDate'], unit='ms')
Output:
forecast SaleDate OfferingGroupId
0 2.163242 2022-02-08 0
1 16.354220 2022-02-09 1
Not very pretty but I guess you can just throw out all the nesting that makes it complicated by forcing it into an aligned tuple list and then use that:
import json
import pandas as pd
string = '{"Results": {"forecast": [2.1632421537363355, 16.35421956127545], "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"], "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]}}'
results_dict = json.loads(string)["Results"]
results_tuples = zip(results_dict["forecast"],
[d["SaleDate"] for d in results_dict["index"]],
[d["OfferingGroupId"] for d in results_dict["index"]])
df = pd.DataFrame(results_tuples, columns=["Forecast", "SaleDate", "OfferingGroupId"])
df['SaleDate'] = pd.to_datetime(df['SaleDate'], unit='ms')
print(df)
> Forecast SaleDate OfferingGroupId
0 2.163242 2022-02-08 0
1 16.354220 2022-02-09 1
Or the same idea but forcing it into an aligned dict format:
string = '{"Results": {"forecast": [2.1632421537363355, 16.35421956127545], "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"], "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]}}'
results_dict = json.loads(string)["Results"]
results_dict = {"Forecast": results_dict["forecast"],
"SaleDate": [d["SaleDate"] for d in results_dict["index"]],
"OfferingGroupId": [d["OfferingGroupId"] for d in results_dict["index"]]}
df = pd.DataFrame.from_dict(results_dict)
df['SaleDate'] = pd.to_datetime(df['SaleDate'], unit='ms')
print(df)
> Forecast SaleDate OfferingGroupId
0 2.163242 2022-02-08 0
1 16.354220 2022-02-09 1
Generally from my experience letting pandas read a non-intended input format and then using the pandas methods to fix it causes much more of a headache than creating a dict or tuple list format as a middle step and just read that. But that might just be personal preference.
Just load the index as a column, then use tolist() to export it as two columns and create a new DataFrame. Combine the new dataframe with the original via pd.concat().
In this example, I also included columns for prediction_interval because I figured you might want that, too.
d = {"Results":
{"forecast": [2.1632421537363355, 16.35421956127545],
"prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"],
"index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]
}
}
res = pd.DataFrame(d['Results'])
sd = pd.DataFrame(res['index'].tolist())
sd['SaleDate'] = pd.to_datetime(sd['SaleDate'], unit='ms')
pi = pd.DataFrame(res['prediction_interval'].map(json.loads).tolist(), columns=['pi_start', 'pi_end'])
df = pd.concat((res, pi, sd), axis=1).drop(columns=['index', 'prediction_interval'])
You must use the pandas library:
import json
import pandas as pd
with open('data.json') as f:
data = json.load(f)
print(data)
df = pd.read_json('data.json')
df

How to create a single json file from two DataFrames?

I have two DataFrames, and I want to post these DataFrames as json (to the web service) but first I have to concatenate them as json.
#first df
input_df = pd.DataFrame()
input_df['first'] = ['a', 'b']
input_df['second'] = [1, 2]
#second df
customer_df = pd.DataFrame()
customer_df['first'] = ['c']
customer_df['second'] = [3]
For converting to json, I used following code for each DataFrame;
df.to_json(
path_or_buf='out.json',
orient='records', # other options are (split’, ‘records’, ‘index’, ‘columns’, ‘values’, ‘table’)
date_format='iso',
force_ascii=False,
default_handler=None,
lines=False,
indent=2
)
This code gives me the table like this: For ex, input_df export json
[
{
"first":"a",
"second":1
},
{
"first":"b",
"second":2
}
]
my desired output is like that:
{
"input": [
{
"first": "a",
"second": 1
},
{
"first": "b",
"second": 2
}
],
"customer": [
{
"first": "d",
"second": 3
}
]
}
How can I get this output like this? I couldn't find the way :(
You can concatenate the DataFrames with appropriate key names, then groupby the keys and build dictionaries at each group; finally build a json string from the entire thing:
out = (
pd.concat([input_df, customer_df], keys=['input', 'customer'])
.droplevel(1)
.groupby(level=0).apply(lambda x: x.to_dict('records'))
.to_json()
)
Output:
'{"customer":[{"first":"c","second":3}],"input":[{"first":"a","second":1},{"first":"b","second":2}]}'
or a dict by replacing the last to_json() to to_dict().

How to import list stored as txt into pandas?

I have a following problem. I need to load txt file as pandas dataframe. See, how my txt looks like:
[ {
"type": "VALUE",
"value": 93.5,
"from": "2020-12-01T00:00:00.000+01",
"to": "2020-12-01T00:00:01.000+01"
},
{
"type": "VALUE",
"value": 75,
"from": "2020-12-01T00:00:01.000+01",
"to": "2020-12-01T00:05:01.000+01"
},
{
"type": "WARNING",
"from": "2020-12-01T00:00:01.000+01",
"to": "2020-12-01T00:05:01.000+01"
} ]
In other words, I need to read the txt as a list of dictionaries. And then read it as a pandas dataframe. Desired output is:
type value from to
0 VALUE 93.5 2020-12-01T00:00:00.000+01 2020-12-01T00:00:01.000+01
1 VALUE 75 2020-12-01T00:00:01.000+01 2020-12-01T00:05:01.000+01
2 WARNING NaN 2020-12-01T00:00:01.000+01 2020-12-01T00:05:01.000+01
How can I do this, please? I look at this question, but it wasn`t helpful: Pandas how to import a txt file as a list , How to read a file line-by-line into a list?
import json
import pandas as pd
with open('inp_file.txt', 'r') as f:
content = json.load(f)
df = pd.DataFrame(content)
Use read_json, because if extension is txt still it is json file:
df = pd.read_json('file.txt')
print (df)
type value from to
0 VALUE 93.5 2020-12-01T00:00:00.000+01 2020-12-01T00:00:01.000+01
1 VALUE 75.0 2020-12-01T00:00:01.000+01 2020-12-01T00:05:01.000+01
2 WARNING NaN 2020-12-01T00:00:01.000+01 2020-12-01T00:05:01.000+01

Python - JSON array to DataFrame

I have this following JSON array.
[
{
"foo"=1
},
{
"foo"=2
},
...
]
I would like to convert it to DataFrame object using pd.read_json() command like below.
df = pd.read_json(my_json) #my_json is JSON array above
However, I got the error, since my_json is a list/array of json. The error is ValueError: Invalid file path or buffer object type: <class 'list'>.
Besides iterating through the list, is there any efficient way to extract/convert the JSON to DataFrame object?
Use df = pd.DataFrame(YourList)
Ex:
import pandas as pd
d = [
{
"foo":1
},
{
"foo":2
}
]
df = pd.DataFrame(d)
print(df)
Output:
foo
0 1
1 2
There are two problems in your question:
It called to_csv on a list.
The JSON was illegal, as it contained = signs instead of :
This works by me:
import json
import pandas as pd
>>> pd.DataFrame(json.loads("""[
{
"foo": 1
},
{
"foo": 2
}
]"""))
foo
0 1
1 2
You can also call read_json directly.

Categories