So, I have a huge CSV file that looks like:
PN,PCA Code,MPN Code,DATE_CODE,Supplier Code,CM Code,Fiscal YEAR,Fiscal MONTH,Usage,Defects
13-1668-01,73-2590,MPN148,1639,S125,CM1,2017,5,65388,0
20-0127-02,73-2171,MPN170,1707,S125,CM1,2017,9,11895,0
19-2472-01,73-2302,MPN24,1711,S119,CM1,2017,10,4479,0
20-0127-02,73-2169,MPN170,1706,S125,CM1,2017,9,7322,0
20-0127-02,73-2296,MPN170,1822,S125,CM1,2018,12,180193,0
15-14399-01,73-2590,MPN195,1739,S133,CM6,2018,11,1290,0
What I want to do is group up all the data by PCA Code. So, a PCA Code will have certain number for parts, those parts would be manufactured by certain MPN Code and the final nested JSON structure that I want looks like:
[
{
PCA: {
"code": "73-2590",
"CM": ["CM1", "CM6"],
"parts": [
{
"number": "13-1668-01",
"manufacturer": [
{
"id": "MPN148"
"info": [
{
"date_code": 1639,
"supplier": {
"id": "S125",
"FYFM": "2020-9",
"usage": 65388,
"defects": 0,
}
}
]
},
]
}
]
}
}
]
So, I want this structure for multiple part numbers (PNs) having different MPNs with different Date Codes and so on.
I am currently using Pandas to do this but I'm stuck on how to proceed with the nesting.
My code so far:
import json
import pandas as pd
dataframe = pd.read_csv('files/dppm_wc.csv')
data = {'PCAs': []}
for key, group in dataframe.groupby('PCA Code'):
for index, row in group.itterrows():
temp_dict = {'PCA Code': key, 'CM Code': row['CM Code'], 'parts': []}
with open('output.txt', 'w') as file:
file.write(json.dumps(data, indent=4))
How do I proceed to achieve the nested JSON format that I want? Is there a better way to do this than what I am doing?
I don't really understand what you wish to do with that structure, but I guess it could be achieved with something like this
data = {'PCAs': []}
for key, group in df.groupby('PCA Code'):
temp_dict = {'PCA Code': key, 'CM Code': [], 'parts': []}
for index, row in group.iterrows():
temp_dict['CM Code'].append(row['CM Code'])
temp_dict['parts'].append(
{'number': row['PN'],
'manufacturer': [
{
'id': row['MPN Code'],
'info': [
{
'date_code': row['DATE_CODE'],
'supplier': {'id': row['Supplier Code'],
'FYFM': '%s-%s' % (row['Fiscal YEAR'], row['Fiscal MONTH']),
'usage': row['Usage'],
'defects': row['Defects']}
}
]
}]
}
)
data['PCAs'].append(temp_dict)
Related
I am trying to build a script that pulls offline endpoints from the dictionary below:
[
{
"name": "My AP",
"serial": "Q234-ABCD-5678",
"mac": "00:11:22:33:44:55",
"status": "online",
"lanIp": "1.2.3.4",
"publicIp": "123.123.123.1",
"networkId": "N_24329156"
}
]
and then populate a dictionary and export output to xlsx with pandas
# Build dictionary to organize endpoints
endpoint = {'name' : [], 'serial' : [], 'mac' : [], 'publicIp' : [], 'networkId' : [], 'status' : [],'lastReportedAt' : [], 'usingCellularFailover' : [], 'wan1Ip' : [], 'wan2Ip' : [], 'lanIp' : []}
# Iterate over the endpoints to fill dictionary
for i in range(len(response_data)):
if response_data[i]['status'] == 'offline':
endpoint['Name'].append(['name'])
endpoint['Serial'].append(['serial'])
endpoint['MAC'].append(['mac'])
endpoint['Public IP'].append(['publicIp'])
endpoint['Network ID'].append(['networkId'])
endpoint['Status'].append(['status'])
endpoint['Last Reied'].append(['lastReiedAt'])
endpoint['Cellular'].append(['usingCellularFailover'])
endpoint['WAN 1'].append(['wan1Ip'])
endpoint['WAN 2'].append(['wan2Ip'])
endpoint['LAN'].append(['lanIp'])
df = pd.DataFrame.from_dict(endpoint)
df.to_excel("output.xlsx", index=False)
I am pretty sure there's a more efficient way to fulfill the task like may be importing the output to pandas and sorting the data but I am still a noob
You could convert a list of dictionaries into a Pandas dataframe directly.
If your list of dictionaries is called "response_data" then you can convert that list to a DataFrame directly like so:
df = pd.DataFrame(response_data, index=range(len(response_data)))
df.to_excel("output.xlsx", index=False)
You can use directly DataFrame and later rename columns and filter data.
response_data = [
{
"name": "My AP",
"serial": "Q234-ABCD-5678",
"mac": "00:11:22:33:44:55",
"status": "online",
"lanIp": "1.2.3.4",
"publicIp": "123.123.123.1",
"networkId": "N_24329156"
},
{
"name": "My AP",
"serial": "Q234-ABCD-5678",
"mac": "00:11:22:33:44:55",
"status": "offline",
"lanIp": "1.2.3.4",
"publicIp": "123.123.123.1",
"networkId": "N_24329156"
}
]
import pandas as pd
df = pd.DataFrame(response_data)
df = df.rename(columns={
'name': 'Name',
'serial': 'Serial',
'mac': 'MAC',
'status': 'Status',
'publicIp': 'Public IP',
'networkId': 'Network ID',
'lastReiedAt': 'Last Reied',
'usingCellularFailover': 'Cellular',
'wan1Ip': 'WAN 1',
'wan2Ip': 'WAN 2',
'lanIp': 'LAN',
})
df = df[ df['Status'] != 'offline' ]
print(df)
df.to_excel("output.xlsx", index=False)
{
"currency": {
"Wpn": {
"units": "KB_per_sec",
"type": "scalar",
"value": 528922.0,
"direction": "up"
}
},
"catalyst": {
"Wpn": {
"units": "ns",
"type": "scalar",
"value": 70144.0,
"direction": "down"
}
},
"common": {
"Wpn": {
"units": "ns",
"type": "scalar",
"value": 90624.0,
"direction": "down"
}
}
}
So I have to basically convert nested json into excel, for which my approach was to flatten json file using json_normalise , but as I am new to all these...I always seem to end up in KeyError...
Here's my code so far , assuming that the file is named as json.json
import requests
from pandas import json_normalize
with open('json.json', 'r') as f:
data = json.load(f)
df = pd.DataFrame(sum([i[['Wpn'], ['value']] for i in data], []))
df.to_excel('Ai.xlsx')
I'm trying to get output on an excel sheet consisting of currency and common along with their resp. values as an output
I know , there are alot of similar questions , but trust me I have tried most of them and yet I didn't get any desirable output... Plz just help me in this
Try:
import json
import pandas as pd
with open('json.json', 'r') as f: data = json.load(f)
data = [{'key': k, 'wpn_value': v['Wpn']['value']} for k, v in data.items()]
print(data)
# here, the variable data looks like
# [{'key': 'currency', 'wpn_value': 528922.0}, {'key': 'catalyst', 'wpn_value': 70144.0}, {'key': 'common', 'wpn_value': 90624.0}]
df = pd.DataFrame(data).set_index('key') # set_index() optional
df.to_excel('Ai.xlsx')
The result looks like
key
wpn_value
currency
528922
catalyst
70144
common
90624
I have a dataframe that I want to convert to a hierarchical flare json to be used in a D3 visulalization like this: D3 sunburst
My dataframe contains a hierarchial data such as this:
And the output I want should look like this:
{"name": "flare","children":
[
{"name": "Animal", "children":
[
{"name": "Mammal", "children":
[
{"name": "Fox","value":35000},
{"name": "Lion","value":25000}
]
},
{"name": "Fish", "children":
[
{"name": "Cod","value":35000}
]
}
]
},
{"name": "Plant", "children":
[
{"name": "Tree", "children":
[
{"name": "Oak","value":35000}
]
}
]
}
]
}
I have tried several approaches, but cant get it right. Here is my non-working code, inspired by this post: Pandas to D3. Serializing dataframes to JSON
from collections import defaultdict
import pandas as pd
df = pd.DataFrame({'group1':["Animal", "Animal", "Animal", "Plant"],'group2':["Mammal", "Mammal", "Fish", "Tree"], 'group3':["Fox", "Lion", "Cod", "Oak"],'value':[35000,25000,15000,1500] })
tree = lambda: defaultdict(tree)
d = tree()
for _, (group0,group1, group2, group3, value) in df.iterrows():
d['name'][group0]['children'] = group1
d['name'][group1]['children'] = group2
d['name'][group2]['children'] = group3
d['name'][group3]['children'] = value
json.dumps(d)
I am working on a similar visualization project that requires moving data from a Pandas DataFrame to a JSON file that works with D3.
I came across your post while looking for a solution and ended up writing something based on this GitHub repository and with input from the link you provided in this post.
The code is not pretty and is a bit hacky and slow. But based on my project, it seems to work just fine for any amount of data as long as it has three levels and a value field. You should be able to simply fork the D3 Starburst notebook and replace the flare.json file with this code's output.
The modification that I made here, based on the original GitHub post, is to provide consideration for three levels of data. So, if the name of the level 0 node exists, then append from level 1 and on. Likewise, if the name of the level 1 node exists, then append the level 2 node (the third level). Otherwise, append the full path of data. If you need more, some kind of recursion might do the trick, or just keep hacking it to add more levels
# code snip to format Pandas DataFrame to json for D3 Starburst Chart
# libraries
import json
import pandas as pd
# example data with three levels and a single value field
data = {'group1': ['Animal', 'Animal', 'Animal', 'Plant'],
'group2': ['Mammal', 'Mammal', 'Fish', 'Tree'],
'group3': ['Fox', 'Lion', 'Cod', 'Oak'],
'value': [35000, 25000, 15000, 1500]}
df = pd.DataFrame.from_dict(data)
print(df)
""" The sample dataframe
group1 group2 group3 value
0 Animal Mammal Fox 35000
1 Animal Mammal Lion 25000
2 Animal Fish Cod 15000
3 Plant Tree Oak 1500
"""
# initialize a flare dictionary
flare = {"name": "flare", "children": []}
# iterate through dataframe values
for row in df.values:
level0 = row[0]
level1 = row[1]
level2 = row[2]
value = row[3]
# create a dictionary with all the row data
d = {'name': level0,
'children': [{'name': level1,
'children': [{'name': level2,
'value': value}]}]}
# initialize key lists
key0 = []
key1 = []
# iterate through first level node names
for i in flare['children']:
key0.append(i['name'])
# iterate through next level node names
key1 = []
for _, v in i.items():
if isinstance(v, list):
for x in v:
key1.append(x['name'])
# add the full row of data if the root is not in key0
if level0 not in key0:
d = {'name': level0,
'children': [{'name': level1,
'children': [{'name': level2,
'value': value}]}]}
flare['children'].append(d)
elif level1 not in key1:
# if the root exists, then append only the next level children
d = {'name': level1,
'children': [{'name': level2,
'value': value}]}
flare['children'][key0.index(level0)]['children'].append(d)
else:
# if the root exists, then only append the next level children
d = {'name': level2,
'value': value}
flare['children'][key0.index(level0)]['children'][key1.index(level1)]['children'].append(d)
# uncomment next three lines to save as json file
# save to some file
# with open('filename_here.json', 'w') as outfile:
# json.dump(flare, outfile)
print(json.dumps(flare, indent=2))
""" the expected output of this json data
{
"name": "flare",
"children": [
{
"name": "Animal",
"children": [
{
"name": "Mammal",
"children": [
{
"name": "Fox",
"value": 35000
},
{
"name": "Lion",
"value1": 25000
}
]
},
{
"name": "Fish",
"children": [
{
"name": "Cod",
"value": 15000
}
]
}
]
},
{
"name": "Plant",
"children": [
{
"name": "Tree",
"children": [
{
"name": "Oak",
"value": 1500
}
]
}
]
}
]
}
"""
I have a simple Json file
input.json
[
{
"title": "Person",
"type": "object",
"required": "firstName",
"min_max": "200/600"
},
{
"title": "Person1",
"type": "object2",
"required": "firstName1",
"min_max": "230/630"
},
{
"title": "Person2",
"type": "object2",
"required": "firstName2",
"min_max": "201/601"
},
{
"title": "Person3",
"type": "object3",
"required": "firstName3",
"min_max": "2000/6000"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "null"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "1024 / 256"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "0"
}
]
I am trying to create a new json file with new data. I would like to split "min_max" into two different fields ie., min and max. Below is the code written in python.
import json
input=open('input.json', 'r')
output=open('test.json', 'w')
json_decode=json.load(input)
result = []
for item in json_decode:
my_dict={}
my_dict['title']=item.get('title')
my_dict['min']=item.get('min_max')
my_dict['max']=item.get('min_max')
result.append(my_dict)
data=json.dumps(result, output)
output.write(data)
output.close()
How do I split the string into two different values. Also, is there any possibility of printing the json output in order.
Your JSON file seems to be written wrong (the example one). It is not a list. It is just a single associated array (or dictionary, in Python). Additionally, you don't seem to be using json.dumps properly. It only takes 1 argument. I also figured it would be easier to just create the dictionary inline. And you don't seem to be splitting the min_max properly.
Here's the correct input:
[{
"title": "Person",
"type": "object",
"required": "firstName",
"min_max": "20/60"
}]
Here's your new code:
import json
with open('input.json', 'r') as inp, open('test.json', 'w') as outp:
json_decode=json.load(inp)
result = []
for temp in json_decode:
minMax = temp["min_max"].split("/")
result.append({
"title":temp["title"],
"min":minMax[0],
"max":minMax[1]
})
data=json.dumps(result)
outp.write(data)
Table + Python == Pandas
import pandas as pd
# Read old json to a dataframe
df = pd.read_json("input.json")
# Create two new columns based on min_max
# Removes empty spaces with strip()
# Returns [None,None] if length of split is not equal to 2
df['min'], df['max'] = (zip(*df['min_max'].apply
(lambda x: [i.strip() for i in x.split("/")]
if len(x.split("/"))== 2 else [None,None])))
# 'delete' (drop) min_max column
df.drop('min_max', axis=1, inplace=True)
# output to json again
df.to_json("test.json",orient='records')
Result:
[{'max': '600',
'min': '200',
'required': 'firstName',
'title': 'Person',
'type': 'object'},
{'max': '630',
'min': '230',
'required': 'firstName1',
'title': 'Person1',
'type': 'object2'},
{'max': '601',
'min': '201',
'required': 'firstName2',
'title': 'Person2',
'type': 'object2'},
{'max': '6000',
'min': '2000',
'required': 'firstName3',
'title': 'Person3',
'type': 'object3'},
{'max': None,
'min': None,
...
You can do something like this:
import json
nl=[]
for di in json.loads(js):
min_,sep,max_=map(lambda s: s.strip(), di['min_max'].partition('/'))
if sep=='/':
del di['min_max']
di['min']=min_
di['max']=max_
nl.append(di)
print json.dumps(nl)
This keeps the "min_max" values that cannot be separated into two values unchanged.
I have a JSON output, where I want to create a csv file, that contains two columns. The first column should contain the userId and the second column should contain the value of videoSeries. The output looks like this:
{
"start": 1490383076,
"stop": 1492975076,
"events": [
{
"time": 1491294219,
"customParameters": [
{
"group": "channelId",
"item": "dr3"
},
{
"group": "videoGenre",
"item": "unknown"
},
{
"group": "videoSeries",
"item": "min-mor-er-pink"
},
{
"group": "videoSlug",
"item": "min-mor-er-pink"
}
],
"userId": "cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16"
}
],
}
My csv should look like this:
--------------------------------------------------------------
User ID videoSeries
--------------------------------------------------------------
cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16 min-mor-er-pink
--------------------------------------------------------------
I have tried using ijson and pandas to get the desired output, but I am unable to get values from two different arrays into a single csv
import ijson
import pandas as pd
with open('MY JSON FILE', 'r') as f:
objects = ijson.items(f, 'events.item')
pandaReadable = list(objects)
df = pd.DataFrame(pandaReadable, columns=['userId', 'customParameters'])
df.to_csv('C:/Users/.../Desktop/output.csv', columns=['userId', 'customParameters'], index=False)
Try this approach:
d is a dictionary built from your JSON:
In [150]: d
Out[150]:
{'events': [{'customParameters': [{'group': 'channelId', 'item': 'dr3'},
{'group': 'videoGenre', 'item': 'unknown'},
{'group': 'videoSeries', 'item': 'min-mor-er-pink'},
{'group': 'videoSlug', 'item': 'min-mor-er-pink'}],
'time': 1491294219,
'userId': 'cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16'}],
'start': 1490383076,
'stop': 1492975076}
Solution:
In [153]: pd.io.json.json_normalize(d['events'], 'customParameters', ['userId']) \
...: .query("group in ['videoSeries']")[['userId','item']]
...:
Out[153]:
userId item
2 cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16 min-mor-er-pink
if you need to have videoSeries as a column name:
In [154]: pd.io.json.json_normalize(d['events'], 'customParameters', ['userId']) \
...: .query("group in ['videoSeries']")[['userId','item']] \
...: .rename(columns={'item':'videoSeries'})
...:
Out[154]:
userId videoSeries
2 cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16 min-mor-er-pink