Write JSON values from array and nested array to single CSV - python

I have a JSON output, where I want to create a csv file, that contains two columns. The first column should contain the userId and the second column should contain the value of videoSeries. The output looks like this:
{
"start": 1490383076,
"stop": 1492975076,
"events": [
{
"time": 1491294219,
"customParameters": [
{
"group": "channelId",
"item": "dr3"
},
{
"group": "videoGenre",
"item": "unknown"
},
{
"group": "videoSeries",
"item": "min-mor-er-pink"
},
{
"group": "videoSlug",
"item": "min-mor-er-pink"
}
],
"userId": "cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16"
}
],
}
My csv should look like this:
--------------------------------------------------------------
User ID videoSeries
--------------------------------------------------------------
cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16 min-mor-er-pink
--------------------------------------------------------------
I have tried using ijson and pandas to get the desired output, but I am unable to get values from two different arrays into a single csv
import ijson
import pandas as pd
with open('MY JSON FILE', 'r') as f:
objects = ijson.items(f, 'events.item')
pandaReadable = list(objects)
df = pd.DataFrame(pandaReadable, columns=['userId', 'customParameters'])
df.to_csv('C:/Users/.../Desktop/output.csv', columns=['userId', 'customParameters'], index=False)

Try this approach:
d is a dictionary built from your JSON:
In [150]: d
Out[150]:
{'events': [{'customParameters': [{'group': 'channelId', 'item': 'dr3'},
{'group': 'videoGenre', 'item': 'unknown'},
{'group': 'videoSeries', 'item': 'min-mor-er-pink'},
{'group': 'videoSlug', 'item': 'min-mor-er-pink'}],
'time': 1491294219,
'userId': 'cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16'}],
'start': 1490383076,
'stop': 1492975076}
Solution:
In [153]: pd.io.json.json_normalize(d['events'], 'customParameters', ['userId']) \
...: .query("group in ['videoSeries']")[['userId','item']]
...:
Out[153]:
userId item
2 cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16 min-mor-er-pink
if you need to have videoSeries as a column name:
In [154]: pd.io.json.json_normalize(d['events'], 'customParameters', ['userId']) \
...: .query("group in ['videoSeries']")[['userId','item']] \
...: .rename(columns={'item':'videoSeries'})
...:
Out[154]:
userId videoSeries
2 cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16 min-mor-er-pink

Related

Filter Json with Ids contained in csv sheet using python

I have a csv file with some "id". I imported a json file and I needed to filter from this Json only the ids that are in the worksheet
Does anyone knows how to do that? I have no idea, I am very new in python. I am usin Jupyter notebook
How to filter data fetching from variable var_filter
import json
import pandas as pd
from IPython.display import display
# read csv with ids
var_filter = pd.read_csv('file.csv')
display(act_filter)
# Load json
with open('file.json') as f:
data = json.load(f)
print(data)
The json structure is:
[
{
"id": "179328741654819",
"t_values": [
{
"t_id": "963852456741",
"value": "499.66",
"date_timestamp": "2020-09-22T15:18:17",
"type": "in"
},
{
"t_id": "852951753456",
"value": "1386.78",
"date_timestamp": "2020-10-31T14:46:44",
"type": "in"
}
]
},
{
"id": "823971648264792",
"t_values": [
{
"t_id": "753958561456",
"value": "672.06",
"date_timestamp": "2020-03-16T22:41:16",
"type": "in"
},
{
"t_id": "321147951753",
"value": "773.88",
"date_timestamp": "2020-05-08T18:29:31",
"type": "out"
},
{
"t_id": "258951753852",
"value": "733.13",
"date_timestamp": null,
"type": "in"
}
]
}
]
You can iterate over the elements in the data variable and check if its id value is in the dataframe's id column. Simple method below, see this article for other methods
Note that I convert the value of the JSONs id to an int as that is what pandas is using as value type for the column
code
import json
from pprint import pprint
import pandas as pd
var_filter = pd.read_csv("id.csv")
# Load json
with open("data.json") as f:
data = json.load(f)
result = []
for elem in data:
if int(elem["id"]) in var_filter["id"].values:
result.append(elem)
pprint(result)
id.csv
id
823971648264792
output
[{'id': '823971648264792',
't_values': [{'date_timestamp': '2020-03-16T22:41:16',
't_id': '753958561456',
'type': 'in',
'value': '672.06'},
{'date_timestamp': '2020-05-08T18:29:31',
't_id': '321147951753',
'type': 'out',
'value': '773.88'},
{'date_timestamp': None,
't_id': '258951753852',
'type': 'in',
'value': '733.13'}]}]

How to use for loop along with if inside lambda python?

I have a dataframe df that has a column tags . Each element of the column tags is a list of dictionary and looks like this:
[
{
"id": "leena123",
"name": "LeenaShaw",
"slug": null,
"type": "UserTag",
"endIndex": 0,
"startIndex": 0
},
{
"id": "1234",
"name": "abc ltd.",
"slug": "5678",
"type": "StockTag",
"endIndex": 0,
"startIndex": 0
}
]
The list can have any number of elements.
Sample dataset:
0 some_data [{'id': 'leena123', 'name': 'leenaShaw', 'slug': None, 'type...
1 some data [{'id': '6', 'name': 'new', 'slug': None, 'type...
I want to create a list of all the ids from the tags column where the type is UserTag
sample output:
['leena123', 'saily639,...]
I am trying with this :
list(df['tags'].apply(lambda x: d['name'] if any(d['type'] == 'UserTag' for d in x)))
but it doesn't work. Kindly help pn this.
Use List Comprehension with df.apply:
df['id'] = df.tags.apply(lambda x: [i['id'] for i in x if i.get('type') == 'UserTag'])
Create a list from id column:
import itertools
l = df['id'].values.tolist()
output_id_list = list(itertools.chain(*l))
If you want to drop id column from df, do:
df.drop('id', inplace=True)

Convert Pandas Dataframe into multi level nested JSON

I have a dataframe that I need to convert into a nested json format. I can get one level of grouping done, but I don't know how to do a second grouping as well as a nesting beneath that.
I have looked a lot of different examples, but nothing really gets me the example I posted below.
import pandas as pd
data= {'Name': ['TEST01','TEST02'],
'Type': ['Tent','Tent'],
'Address':['123 Happy','456 Happy'],
'City':['Happytown','Happytown'],
'State': ['WA','NY'],
'PostalCode': ['89985','85542'],
'Spot' : ['A','A'],
'SpotAssigment' : ['123','456'],
'Cost': [900,500]
}
df = pd.DataFrame(data)
j = (df.groupby(['Name','Type'])
.apply(lambda x: x[['Address','City', 'State', 'PostalCode']].to_dict('r'))
.reset_index(name='addresses')
.to_json(orient='records'))
print(json.dumps(json.loads(j), indent=2, sort_keys=True))
I want it to look like the below.
[
{
"Name": "TEST01",
"Type": "Tent",
"addresses": [
{
"Address": "123 Happy",
"City": "Happytown",
"PostalCode": "89985",
"State": "WA"
}
],
"spots":[
{"Spot":'A',
"SpotAssignments":[
"SpotAssignment":"123",
"Cost":900
]
}
]
},
{
"Name": "TEST02",
"Type": "Tent",
"addresses": [
{
"Address": "456 Happy",
"City": "Happytown",
"PostalCode": "85542",
"State": "NY"
}
],
"spots":[
{"Spot":'A',
"SpotAssignments":[
"SpotAssignment":"456",
"Cost":500
]
}
]
}
]
try this:
j = (df.groupby(['Name','Type'])
.apply(lambda x: x[['Address','City', 'State', 'PostalCode']].to_dict('r'))
.reset_index(name='addresses'))
k = (df.groupby(['Name','Type', 'Spot'])
.apply(lambda x: x[['SpotAssigment', 'Cost']].to_dict('r'))
.reset_index(name='SpotAssignments'))
h = (k.groupby(['Name','Type'])
.apply(lambda x: x[['Spot','SpotAssignments']].to_dict('r'))
.reset_index(name='spots'))
m = j.merge(h, how='inner', on=['Name', 'Type'])
result = m.to_dict(orient='records')
from pprint import pprint as pp
pp(result)
this result is a python list of dicts in the same format that you want, you should be able to dump it as JSON directly.

csv to complex nested json

So, I have a huge CSV file that looks like:
PN,PCA Code,MPN Code,DATE_CODE,Supplier Code,CM Code,Fiscal YEAR,Fiscal MONTH,Usage,Defects
13-1668-01,73-2590,MPN148,1639,S125,CM1,2017,5,65388,0
20-0127-02,73-2171,MPN170,1707,S125,CM1,2017,9,11895,0
19-2472-01,73-2302,MPN24,1711,S119,CM1,2017,10,4479,0
20-0127-02,73-2169,MPN170,1706,S125,CM1,2017,9,7322,0
20-0127-02,73-2296,MPN170,1822,S125,CM1,2018,12,180193,0
15-14399-01,73-2590,MPN195,1739,S133,CM6,2018,11,1290,0
What I want to do is group up all the data by PCA Code. So, a PCA Code will have certain number for parts, those parts would be manufactured by certain MPN Code and the final nested JSON structure that I want looks like:
[
{
PCA: {
"code": "73-2590",
"CM": ["CM1", "CM6"],
"parts": [
{
"number": "13-1668-01",
"manufacturer": [
{
"id": "MPN148"
"info": [
{
"date_code": 1639,
"supplier": {
"id": "S125",
"FYFM": "2020-9",
"usage": 65388,
"defects": 0,
}
}
]
},
]
}
]
}
}
]
So, I want this structure for multiple part numbers (PNs) having different MPNs with different Date Codes and so on.
I am currently using Pandas to do this but I'm stuck on how to proceed with the nesting.
My code so far:
import json
import pandas as pd
dataframe = pd.read_csv('files/dppm_wc.csv')
data = {'PCAs': []}
for key, group in dataframe.groupby('PCA Code'):
for index, row in group.itterrows():
temp_dict = {'PCA Code': key, 'CM Code': row['CM Code'], 'parts': []}
with open('output.txt', 'w') as file:
file.write(json.dumps(data, indent=4))
How do I proceed to achieve the nested JSON format that I want? Is there a better way to do this than what I am doing?
I don't really understand what you wish to do with that structure, but I guess it could be achieved with something like this
data = {'PCAs': []}
for key, group in df.groupby('PCA Code'):
temp_dict = {'PCA Code': key, 'CM Code': [], 'parts': []}
for index, row in group.iterrows():
temp_dict['CM Code'].append(row['CM Code'])
temp_dict['parts'].append(
{'number': row['PN'],
'manufacturer': [
{
'id': row['MPN Code'],
'info': [
{
'date_code': row['DATE_CODE'],
'supplier': {'id': row['Supplier Code'],
'FYFM': '%s-%s' % (row['Fiscal YEAR'], row['Fiscal MONTH']),
'usage': row['Usage'],
'defects': row['Defects']}
}
]
}]
}
)
data['PCAs'].append(temp_dict)

Splitting a string in json using python

I have a simple Json file
input.json
[
{
"title": "Person",
"type": "object",
"required": "firstName",
"min_max": "200/600"
},
{
"title": "Person1",
"type": "object2",
"required": "firstName1",
"min_max": "230/630"
},
{
"title": "Person2",
"type": "object2",
"required": "firstName2",
"min_max": "201/601"
},
{
"title": "Person3",
"type": "object3",
"required": "firstName3",
"min_max": "2000/6000"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "null"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "1024 / 256"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "0"
}
]
I am trying to create a new json file with new data. I would like to split "min_max" into two different fields ie., min and max. Below is the code written in python.
import json
input=open('input.json', 'r')
output=open('test.json', 'w')
json_decode=json.load(input)
result = []
for item in json_decode:
my_dict={}
my_dict['title']=item.get('title')
my_dict['min']=item.get('min_max')
my_dict['max']=item.get('min_max')
result.append(my_dict)
data=json.dumps(result, output)
output.write(data)
output.close()
How do I split the string into two different values. Also, is there any possibility of printing the json output in order.
Your JSON file seems to be written wrong (the example one). It is not a list. It is just a single associated array (or dictionary, in Python). Additionally, you don't seem to be using json.dumps properly. It only takes 1 argument. I also figured it would be easier to just create the dictionary inline. And you don't seem to be splitting the min_max properly.
Here's the correct input:
[{
"title": "Person",
"type": "object",
"required": "firstName",
"min_max": "20/60"
}]
Here's your new code:
import json
with open('input.json', 'r') as inp, open('test.json', 'w') as outp:
json_decode=json.load(inp)
result = []
for temp in json_decode:
minMax = temp["min_max"].split("/")
result.append({
"title":temp["title"],
"min":minMax[0],
"max":minMax[1]
})
data=json.dumps(result)
outp.write(data)
Table + Python == Pandas
import pandas as pd
# Read old json to a dataframe
df = pd.read_json("input.json")
# Create two new columns based on min_max
# Removes empty spaces with strip()
# Returns [None,None] if length of split is not equal to 2
df['min'], df['max'] = (zip(*df['min_max'].apply
(lambda x: [i.strip() for i in x.split("/")]
if len(x.split("/"))== 2 else [None,None])))
# 'delete' (drop) min_max column
df.drop('min_max', axis=1, inplace=True)
# output to json again
df.to_json("test.json",orient='records')
Result:
[{'max': '600',
'min': '200',
'required': 'firstName',
'title': 'Person',
'type': 'object'},
{'max': '630',
'min': '230',
'required': 'firstName1',
'title': 'Person1',
'type': 'object2'},
{'max': '601',
'min': '201',
'required': 'firstName2',
'title': 'Person2',
'type': 'object2'},
{'max': '6000',
'min': '2000',
'required': 'firstName3',
'title': 'Person3',
'type': 'object3'},
{'max': None,
'min': None,
...
You can do something like this:
import json
nl=[]
for di in json.loads(js):
min_,sep,max_=map(lambda s: s.strip(), di['min_max'].partition('/'))
if sep=='/':
del di['min_max']
di['min']=min_
di['max']=max_
nl.append(di)
print json.dumps(nl)
This keeps the "min_max" values that cannot be separated into two values unchanged.

Categories