I have this json dataset. From this dataset i only want "column_names" keys and its values and "data" keys and its values.Each values of column_names corresponds to values of data. How do i combine only these two keys in python for analysis
"frequency":"daily","type":"Time Series",
for cnames in data['dataset']['column_names']:
for cdata in data['dataset']['data']:
For loop gives me column names and data values i want but i am not sure how to combine it and make it as a python data frame for analysis.
Ref:The above piece of code is from quandal website
data = {
"dataset": {
"column_names": ["Date","Open","High","Low","Close","Volume","Dividend","Split","Adj_Open","Adj_High","Adj_Low","Adj_Close","Adj_Volume"],
"type":"Time Series",
["2017-12-28",85.9,85.93,85.55,85.72,10594344.0,0.0,1.0,83.1976157998082, 83.22667201021558,82.85862667838872,83.0232785373639,10594344.0],
Should the following code do what you want ?
import pandas as pd
df = pd.DataFrame(data, columns = data['dataset']['column_names'])
for i, data_row in enumerate(data['dataset']['data']):
df.loc[i] = data_row
cols = data['dataset']['column_names']
data = data['dataset']['data']
It's quite simple
labeled_data = [dict(zip(cols, d)) for d in data]
The following snippet should work for you
import pandas as pd
df = pd.DataFrame(data['dataset']['data'],columns=data['dataset']['column_names'])
Check the following link to learn more
Hi I have code which looks like this:
with open("file123.json") as json_file:
data = json.load(json_file)
df_1 = pd.DataFrame(dict([(k,pd.Series(v)) for k,v in data["spt"][1].items()]))
df_1_made =pd.json_normalize(json.loads(df_1.to_json(orient="records"))).T.drop(["content.id","shortname","name"])
df_2 = pd.DataFrame(dict([(k,pd.Series(v)) for k,v in data["spt"][2].items()]))
df_2_made = pd.json_normalize(json.loads(df_2.to_json(orient="records"))).T.drop(["content.id","shortname","name"])
df_3 = pd.DataFrame(dict([(k,pd.Series(v)) for k,v in data["spt"][3].items()]))
df_3_made = pd.json_normalize(json.loads(df_3.to_json(orient="records"))).T.drop(["content.id","shortname","name"])
which the dataframe is built from a json file
the problem is that I am dealing with different json files and each one of them can lead to different number of dataframes. so the code above is 3, it may change to 7. Is there any way to make a for loop taking the length of the data:
length = len(data["spt"])
and make the correct number of dataframes from it? so I do not need to do it manually.
The simplest option here will be to put all your dataframes into a dictionary or a list. First define a function that creates the dataframe and then use a list comprehension.
def create_df(data):
df = pd.DataFrame(
[(k,pd.Series(v)) for k,v in data]
df =pd.json_normalize(
return df
my_list_of_dfs = [create_df(data.items()) for x in data["spt"]]
I am using the C3.ai's APIs for analyzing unified COVID-19 data. For generating a time series of confirmed cases and deaths across locations of COVID outbreaks, I successfully called the evalMetrics API, but the response received is JSON.
How can I best convert this to a pandas dataframe in python so that I can easily perform my analyses on this data?
Here is the code I have used to call the evalMetrics API successfully:
import json, requests
locations_to_evaluate = ["China","Italy"]
expressions_to_evaluate = ["JHU_ConfirmedCases","JHU_ConfirmedDeaths"]
url = "https://api.c3.ai/covid/api/1/outbreaklocation/evalmetrics/"
request_data = {
"spec": {
"ids": locations_to_evaluate,
"expressions": expressions_to_evaluate,
"start": "2020-02-01",
"end": "2020-03-01",
"interval": "DAY"
headers = {
"Accept": "application/json",
"Content-Type": "application/json"
response = requests.post(url=url, json=request_data, headers=headers)
eval_metrics_result = json.loads(response.text)
I want to convert eval_metrics_result to a pandas dataframe. Is there a generic function I can use to convert any eval_metrics_result to a pandas dataframe?
One way to do this would be as follows:
import pandas as pd
def convert_evalMetrics_to_Pandas(eval_metrics_result):
evaluate_ids = list(eval_metrics_result["result"].keys())
evaluate_metrics = list(eval_metrics_result["result"][evaluate_ids[0]].keys())
timestamps = eval_metrics_result["result"][evaluate_ids[0]][evaluate_metrics[0]]["dates"]
df = pd.DataFrame(
columns = ["Evaluate_ID"]+evaluate_metrics,
index = ["{}#{}".format(evaluate_id,timestamp) for evaluate_id in evaluate_ids for timestamp in timestamps]
df["Evaluate_ID"] = df.index.str.split("#").str[0]
for evaluate_id in evaluate_ids:
for evaluate_metric in evaluate_metrics:
df[evaluate_metric].loc[df["Evaluate_ID"]==evaluate_id] = eval_metrics_result["result"][evaluate_id][evaluate_metric]["data"]
df.drop("Evaluate_ID", axis=1, inplace=True)
return df
Note that in this case, the index of the dataframe will be in the format: id#timestamp
The timestamps for one id will be ordered before the dataframe index moves on to the next id for the same timestamps.
Here are two other ways to store / grab as a dataframe:
Wide format with evaluate_id, expression, dates array, data array.
import pandas as pd
# get as a wide df with evaluate_id, expression, dates array, data array
evaluate_ids = list(eval_metrics_result['result'].keys())
expressions = list(eval_metrics_result['result'][evaluate_ids[0]].keys())
formatted_list = [{'evaluate_id':evaluate_id,
'dates': pd.to_datetime(eval_metrics_result['result'][evaluate_id][expression]['dates']),
'data': eval_metrics_result['result'][evaluate_id][expression]['data']}
for evaluate_id in evaluate_ids for expression in expressions]
wide_df = pd.DataFrame(formatted_list)
Long format - basically the same as Suraj's but instead, evaluate_id and expression are stored in their own own columns. This helps the df play well with groupby.
# convert from wide to long format
wide_df['dates-data'] = wide_df.apply(lambda row: zip(row['dates'], row['data']), axis = 1)
wide_df.drop(columns = ['dates', 'data'])
def wide2long(df, list_col):
a = pd.DataFrame(df[list_col].tolist()).stack().reset_index(level = 1, drop = True).rename(list_col)
return df.drop(list_col, axis = 1).join(a).reset_index(drop = True)[df.columns]
long_df = wide2long(wide_df, 'dates-data')
long_df['date'] = long_df['dates-data'].apply(lambda date_data: date_data[0])
long_df['data'] = long_df['dates-data'].apply(lambda date_data: date_data[1])
long_df.drop(columns = ['dates-data'], inplace = True)
You can use c3covid19 with docs here. Its a simple c3 covid19 data lake connection wrapper for python.
pip install c3covid19
from c3covid19 import c3api
locations_to_evaluate = ["China","Italy"]
expressions_to_evaluate = ["JHU_ConfirmedCases","JHU_ConfirmedDeaths"]
request_data = {
"spec": {
"ids": locations_to_evaluate,
"expressions": expressions_to_evaluate,
"start": "2020-02-01",
"end": "2020-03-01",
"interval": "DAY"
This is a simple conversion to pandas though. It works great with a list of dictionaries, but struggles with a more nested structure. You should probably be converting to a properly formatted pandas df.
Instead, use the following to get a dictionary:
Then follow the directions listed on on one of the more targeted time series answers by Suraj or Jac.
I have a JSON file which resulted from YouTube's iframe API and I want to put this JSON data into a pandas dataframe, where each JSON key will be a column, and each record should be a new row.
Normally I would use a loop and iterate over the rows of the JSON but this particular JSON looks like this :
In this JSON not every key is written as a new line. How can I extract the keys in this case, and express them as columns?
A Pythonic Solution would be to use the keys and values API of the Python Dictionary.
it should be something like this:
ls = [
ls = [json.loads(j) for j in ls]
keys = [j.keys() for j in ls] # this will get you all the keys
vals = [j.values() for j in ls] # this will get the values and then you can do something with it
easiest way is to leverage json_normalize from pandas.
import json
from pandas.io.json import json_normalize
input_dict = [
input_json = [json.loads(j) for j in input_dict]
df = json_normalize(input_json)
I think you are asking to break down your key and values and want keys as a column,and values as a row:
This is my approach and plz always provide how your expected output should like
ChainMap flats your dict in key and values and pretty much is self explanatory.
data = ["{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}","{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"]
import json
from collections import ChainMap
data = [json.loads(i) for i in data]
data = dict(ChainMap(*data))
keys = []
vals = []
for k,v in data.items():
data = pd.DataFrame(zip(keys,vals)).T
new_header = data.iloc[0]
data = data[1:]
data.columns = new_header
#startSecond playbackRates playbackRate qual totalTimeFormatted timemillis playerStateNumeric playerStateVerbose playerErrorNumeric date time stopSecond bufferLevelPercent playerErrorVerbose qualLevels videoId curTimeFormatted playoutLevelPercent
#0 [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2] 1 large 9:46 1563467467703 1 Playing 18.7.2019 18:31:07,703 90 1.4 [hd720, large, medium, small, tiny, auto] 0HJx2JhQKQk 0:02 0.3
I'm trying to put Pyomo model output into pandas.DataFrame rows. I'm accomplishing it now by saving data as a .csv, then reading the .csv file as a DataFrame. I would like to skip the .csv step and put output directly into a DataFrame.
When I accomplish an optimization solution with Pyomo, the optimal assignments are 1 in the model.x[i] output data (0 otherwise). model.x[i] is indexed by dict keys in v. model.x is specific syntax to Pyomo
Pyomo assigns a timeItem[i], platItem[i], payItem[i], demItem[i], v[i] for each value that presents an optimal solution. The 0807results.csv file produces an accurate file of the optimal assignments showing the value of timeItem[i], platItem[i], payItem[i], demItem[i], v[i] for each valid assignment in the optimal solution.
When model.x[i] is 1, how can I get timeItem[i], platItem[i], payItem[i], demItem[i], v[i] directly into a DataFrame? Your assistance is greatly appreciated. My current code is below.
with open('0807results.csv', 'w') as f:
for i in index:
if value(model.x[i])>0:
f.write("%s,%s,%s,%s,%s\n"%(timeItem[i],platItem[i],payItem[i], demItem[i],v[i]))
from pandas import read_csv
now = datetime.datetime.now()
df = read_csv('0807results.csv')
df.columns = ['Time', 'Platform','Payload','DemandType','Value']
# convert payload types to string so not summed
df['Payload'] = df['Payload'].astype(str)
df = df.sort_values('Time')
# do stats & visualization with pandas df
I have no idea what is in the timeItem etc iterables from the code you've posted. However, I suspect that something similar to:
import pandas as pd
results = pd.DataFrame([timeItem, platItem, payItem, demItem, v], index=["time", "plat", "pay", "dem", "v"]).T
Will work.
If you want to filter on 1s in model.x, you might add it as a column as well, and do a filter with pandas directly:
import pandas as pd
results = pd.DataFrame([timeItem, platItem, payItem, demItem, v, model.x], index=["time", "plat", "pay", "dem", "v", "x"]).T
filtered_results = results[results["x"]>0]
You can also use the DataFrame.from_records() function:
def record_generator():
for i in sorted(v.keys()):
if value(model.x[i] > 1E-6): # integer tolerance
yield (timeItem[i], platItem[i], payItem[i], demItem[i], v[i])
df = pandas.DataFrame.from_records(
record_generator(), columns=['Time', 'Platform', 'Payload', 'DemandType', 'Value'])
I am trying to import a json file using the function:
sku = pandas.read_json('https://cws01.worldstores.co.uk/api/product.php?product_sku=125T:FT0111')
However, i keep getting the following error
ValueError: arrays must all be same length
What should I do to import it correctly into a dataframe?
this is the structure of the json:
"id": "5",
"sku": "JOSH:BECO-BRN",
"last_updated": "2013-06-10 15:46:22",
"propertyType1": [
"category": [
"category_id": "10",
"category_name": "All Products"
"category_id": "238",
"category_name": "All Sofas"
"root_categories": [
"url": "/p/Beco Suede Sofa Bed?product_id=5",
"item": [
"image_names": "[\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/L\\/19\\/Beco_Suede_Sofa_Bed-1.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/P\\/19\\/Beco_Suede_Sofa_Bed-1.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/SP\\/19\\/Beco_Suede_Sofa_Bed-1.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/SS\\/19\\/Beco_Suede_Sofa_Bed-1.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/ST\\/19\\/Beco_Suede_Sofa_Bed-1.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/WP\\/19\\/Beco_Suede_Sofa_Bed-1.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/L\\/19\\/Beco_Suede_Sofa_Bed-2.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/P\\/19\\/Beco_Suede_Sofa_Bed-2.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/SP\\/19\\/Beco_Suede_Sofa_Bed-2.jpg\",\"https:\\/\\/cdn.worldstores.co.uk \\/images\\/products\\/SS\\/19\\/Beco_Suede_Sofa_Bed-2.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/ST\\/19\\/Beco_Suede_Sofa_Bed-2.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/WP\\/19\\/Beco_Suede_Sofa_Bed-2.jpg\"]"
The pandas.read_json function takes multiple formats.
Since you did not specify which format your json file is in (orient= attribute), pandas will default to believing your data is columnar. The different formats pandas expects are discussed below.
The data that you are trying to parse from https://cws01.worldstores.co.uk/api/product.php?product_sku=125T:FT0111
Does not seem to conform to any of the supported formats as it seems to be only a single "record". Pandas expects some kind of collection.
You probably should try to collect multiple entries into a single file, then parse it with the read_json function.
Simple way of getting multiple rows and parsing it with the pandas.read_json function:
import urllib2
import pandas as pd
url_base = "https://cws01.worldstores.co.uk/api/product.php?product_sku={}"
products = ["125T:FT0111", "125T:FT0111", "125T:FT0111"]
raw_data_list = []
for sku in products:
url = url_base.format(sku)
data = "[" + (",".join(raw_data_list)) + "]"
data = pd.read_json(data, orient='records')
My take on the pandas.read_json function formats.
The pandas.read_json function is yet another shining example of pandas trying to jam as much functionality as possible into a single function. This leads of course to a very very complicated function.
If your data is a Series, pandas.read_json(orient=) defaults to 'index'
The values allowed for orient while parsing a Series are: {'split','records','index'}
Note that the Series index must be unique for orient='index'.
If your data is a DataFrame, pandas.read_json(orient=) defaults to 'columns'
The values allowed for orient while parsing a DataFrame are:
Note that the Series index must be unique for orient='index' and orient='columns', and the DataFrame columns must be unique for orient='index', orient='columns', and orient='records'.
No matter if your data is a DataFrame or a Series, the orient= will expect data in the same format:
Expects a string representation of a dict like what the DataFrame constructor takes:
{"index":[1,2,3,4], "columns":["col1","col2"], "data":[[8,7,6,5], [5,6,7,8]]}
Expects a string representation of a list of dicts like:
Note there is no index set here.
Expects a string representation of a nested dict dict like:
Good to note is that it won't accept indicies of other types than strings. May be fixed in later versions.
Expects a string representation of a nested dict like:
Expects a string representation of a list like:
[[8, 5],[7, 6],[6, 7],[5, 8]]
Resulting dataframe
In most cases, the dataframe you get will look like this, with the json strings above:
col1 col2
1 8 5
2 7 6
3 6 7
4 5 8
Maybe this is not the most elegant solution however gives me back what I want, or at least I believe so, feel free to warn if something is wrong
url = "https://cws01.worldstores.co.uk/api/product.php?product_sku=125T:FT0111"
data = urllib2.urlopen(url).read()
data = json.loads(data)
data = pd.DataFrame(data.items())
data = data.transpose()
Another solution is to use a try except.
try: a=pd.read_json(json_path)
except ValueError: a=pd.read_json("["+json_path+"]")
Iterating on #firelynx's answer:
#! /usr/bin/env python3
from urllib.request import urlopen
import pandas as pd
products = ["125T:FT0111", "125T:FT0111", "125T:FT0111"]
raw_lines = ""
for sku in products:
url = f"https://cws01.worldstores.co.uk/api/product.php?product_sku={sku}"
raw_lines += urlopen(url).read() + "\n"
data = pd.read_json(raw_lines, lines=True)
This would support any source returning a single JSON object or a bunch of newline ('\n') separated ones.
Or this one-liner(ish) should work the same:
#! /usr/bin/env python3
import pandas as pd
products = ["125T:FT0111", "125T:FT0111", "125T:FT0111"]
data = pd.concat(
) for sku in products
PS: python3 is only for fstring support here, so you should use str.format for python2 compatibility.