Splitting a string in json using python

Splitting a string in json using python - python

I have a simple Json file
input.json
[
{
"title": "Person",
"type": "object",
"required": "firstName",
"min_max": "200/600"
},
{
"title": "Person1",
"type": "object2",
"required": "firstName1",
"min_max": "230/630"
},
{
"title": "Person2",
"type": "object2",
"required": "firstName2",
"min_max": "201/601"
},
{
"title": "Person3",
"type": "object3",
"required": "firstName3",
"min_max": "2000/6000"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "null"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "1024 / 256"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "0"
}
]
I am trying to create a new json file with new data. I would like to split "min_max" into two different fields ie., min and max. Below is the code written in python.
import json
input=open('input.json', 'r')
output=open('test.json', 'w')
json_decode=json.load(input)
result = []
for item in json_decode:
my_dict={}
my_dict['title']=item.get('title')
my_dict['min']=item.get('min_max')
my_dict['max']=item.get('min_max')
result.append(my_dict)
data=json.dumps(result, output)
output.write(data)
output.close()
How do I split the string into two different values. Also, is there any possibility of printing the json output in order.

Your JSON file seems to be written wrong (the example one). It is not a list. It is just a single associated array (or dictionary, in Python). Additionally, you don't seem to be using json.dumps properly. It only takes 1 argument. I also figured it would be easier to just create the dictionary inline. And you don't seem to be splitting the min_max properly.
Here's the correct input:
[{
"title": "Person",
"type": "object",
"required": "firstName",
"min_max": "20/60"
}]
Here's your new code:
import json
with open('input.json', 'r') as inp, open('test.json', 'w') as outp:
json_decode=json.load(inp)
result = []
for temp in json_decode:
minMax = temp["min_max"].split("/")
result.append({
"title":temp["title"],
"min":minMax[0],
"max":minMax[1]
})
data=json.dumps(result)
outp.write(data)

Table + Python == Pandas
import pandas as pd
# Read old json to a dataframe
df = pd.read_json("input.json")
# Create two new columns based on min_max
# Removes empty spaces with strip()
# Returns [None,None] if length of split is not equal to 2
df['min'], df['max'] = (zip(*df['min_max'].apply
(lambda x: [i.strip() for i in x.split("/")]
if len(x.split("/"))== 2 else [None,None])))
# 'delete' (drop) min_max column
df.drop('min_max', axis=1, inplace=True)
# output to json again
df.to_json("test.json",orient='records')
Result:
[{'max': '600',
'min': '200',
'required': 'firstName',
'title': 'Person',
'type': 'object'},
{'max': '630',
'min': '230',
'required': 'firstName1',
'title': 'Person1',
'type': 'object2'},
{'max': '601',
'min': '201',
'required': 'firstName2',
'title': 'Person2',
'type': 'object2'},
{'max': '6000',
'min': '2000',
'required': 'firstName3',
'title': 'Person3',
'type': 'object3'},
{'max': None,
'min': None,
...

You can do something like this:
import json
nl=[]
for di in json.loads(js):
min_,sep,max_=map(lambda s: s.strip(), di['min_max'].partition('/'))
if sep=='/':
del di['min_max']
di['min']=min_
di['max']=max_
nl.append(di)
print json.dumps(nl)
This keeps the "min_max" values that cannot be separated into two values unchanged.

Related

Propertly form JSON with df.to_json with dataframe containing nested json

I have the following situation:
id items
3b68b7b2-f42c-418b-aa88-02450d66b616 [{quantity=3.0, item_id=210defdb-de69-4d03-bddd-7db626cd501b, description=Abc}, {quantity=1.0, item_id=ff457660-5f30-4432-a5af-564a9dee0029, description=xyz . 23}, {quantity=10.0, item_id=8dbd22f2-cc13-4776-b58c-4d6fe0f3463e, description=abc def}]
where one of my columns has a nested JSON list inside of it.
I wish to output the data of this dataframe as proper JSON, including the nested list.
So, for example, calling df.to_dict(orient='records', indent=4) on the above dataframe yields:
[
{
"id": "3b68b7b2-f42c-418b-aa88-02450d66b616",
"items": "[{quantity=3.0, item_id=210defdb-de69-4d03-bddd-7db626cd501b, description=Abc}, {quantity=1.0, item_id=ff457660-5f30-4432-a5af-564a9dee0029, description=xyz . 23}, {quantity=10.0, item_id=8dbd22f2-cc13-4776-b58c-4d6fe0f3463e, description=abc def}]"
}
]
whereas I want:
[
{
"id": "3b68b7b2-f42c-418b-aa88-02450d66b616",
"items": [
{
"quantity": 3.0,
"item_id": "210defdb-de69-4d03-bddd-7db626cd501b",
"description": "Abc"
},
{
"quantity": 1.0,
"item_id": "ff457660-5f30-4432-a5af-564a9dee0029",
"description": "xyz . 23"
},
{
"quantity": 10.0,
"item_id": "8dbd22f2-cc13-4776-b58c-4d6fe0f3463e",
"description": "abc def"
}
]
}
]
Is this possible using df.to_json()? I have tried to use regex to parse the resulting string, but due to the data contained therein, it is unfortunately extremely difficult so "jsonify" the fields I want.

You don't have a list but a string, and this string is not valid json, so you need a bit of pre-processing.
Assuming a non-nested structure, you can use:
import json
out = (df.assign(items=df['items'].str.replace(r'(\w+)=([^,}]+)', r'"\1": "\2"', regex=True).apply(json.loads))
.to_dict(orient='records')
)
Output:
[{'id': '3b68b7b2-f42c-418b-aa88-02450d66b616',
'items': [{'description': 'Abc',
'item_id': '210defdb-de69-4d03-bddd-7db626cd501b',
'quantity': '3.0'},
{'description': 'xyz . 23',
'item_id': 'ff457660-5f30-4432-a5af-564a9dee0029',
'quantity': '1.0'},
{'description': 'abc def',
'item_id': '8dbd22f2-cc13-4776-b58c-4d6fe0f3463e',
'quantity': '10.0'}]}]

Filter Json with Ids contained in csv sheet using python

I have a csv file with some "id". I imported a json file and I needed to filter from this Json only the ids that are in the worksheet
Does anyone knows how to do that? I have no idea, I am very new in python. I am usin Jupyter notebook
How to filter data fetching from variable var_filter
import json
import pandas as pd
from IPython.display import display
# read csv with ids
var_filter = pd.read_csv('file.csv')
display(act_filter)
# Load json
with open('file.json') as f:
data = json.load(f)
print(data)
The json structure is:
[
{
"id": "179328741654819",
"t_values": [
{
"t_id": "963852456741",
"value": "499.66",
"date_timestamp": "2020-09-22T15:18:17",
"type": "in"
},
{
"t_id": "852951753456",
"value": "1386.78",
"date_timestamp": "2020-10-31T14:46:44",
"type": "in"
}
]
},
{
"id": "823971648264792",
"t_values": [
{
"t_id": "753958561456",
"value": "672.06",
"date_timestamp": "2020-03-16T22:41:16",
"type": "in"
},
{
"t_id": "321147951753",
"value": "773.88",
"date_timestamp": "2020-05-08T18:29:31",
"type": "out"
},
{
"t_id": "258951753852",
"value": "733.13",
"date_timestamp": null,
"type": "in"
}
]
}
]

You can iterate over the elements in the data variable and check if its id value is in the dataframe's id column. Simple method below, see this article for other methods
Note that I convert the value of the JSONs id to an int as that is what pandas is using as value type for the column
code
import json
from pprint import pprint
import pandas as pd
var_filter = pd.read_csv("id.csv")
# Load json
with open("data.json") as f:
data = json.load(f)
result = []
for elem in data:
if int(elem["id"]) in var_filter["id"].values:
result.append(elem)
pprint(result)
id.csv
id
823971648264792
output
[{'id': '823971648264792',
't_values': [{'date_timestamp': '2020-03-16T22:41:16',
't_id': '753958561456',
'type': 'in',
'value': '672.06'},
{'date_timestamp': '2020-05-08T18:29:31',
't_id': '321147951753',
'type': 'out',
'value': '773.88'},
{'date_timestamp': None,
't_id': '258951753852',
'type': 'in',
'value': '733.13'}]}]

csv to complex nested json

So, I have a huge CSV file that looks like:
PN,PCA Code,MPN Code,DATE_CODE,Supplier Code,CM Code,Fiscal YEAR,Fiscal MONTH,Usage,Defects
13-1668-01,73-2590,MPN148,1639,S125,CM1,2017,5,65388,0
20-0127-02,73-2171,MPN170,1707,S125,CM1,2017,9,11895,0
19-2472-01,73-2302,MPN24,1711,S119,CM1,2017,10,4479,0
20-0127-02,73-2169,MPN170,1706,S125,CM1,2017,9,7322,0
20-0127-02,73-2296,MPN170,1822,S125,CM1,2018,12,180193,0
15-14399-01,73-2590,MPN195,1739,S133,CM6,2018,11,1290,0
What I want to do is group up all the data by PCA Code. So, a PCA Code will have certain number for parts, those parts would be manufactured by certain MPN Code and the final nested JSON structure that I want looks like:
[
{
PCA: {
"code": "73-2590",
"CM": ["CM1", "CM6"],
"parts": [
{
"number": "13-1668-01",
"manufacturer": [
{
"id": "MPN148"
"info": [
{
"date_code": 1639,
"supplier": {
"id": "S125",
"FYFM": "2020-9",
"usage": 65388,
"defects": 0,
}
}
]
},
]
}
]
}
}
]
So, I want this structure for multiple part numbers (PNs) having different MPNs with different Date Codes and so on.
I am currently using Pandas to do this but I'm stuck on how to proceed with the nesting.
My code so far:
import json
import pandas as pd
dataframe = pd.read_csv('files/dppm_wc.csv')
data = {'PCAs': []}
for key, group in dataframe.groupby('PCA Code'):
for index, row in group.itterrows():
temp_dict = {'PCA Code': key, 'CM Code': row['CM Code'], 'parts': []}
with open('output.txt', 'w') as file:
file.write(json.dumps(data, indent=4))
How do I proceed to achieve the nested JSON format that I want? Is there a better way to do this than what I am doing?

I don't really understand what you wish to do with that structure, but I guess it could be achieved with something like this
data = {'PCAs': []}
for key, group in df.groupby('PCA Code'):
temp_dict = {'PCA Code': key, 'CM Code': [], 'parts': []}
for index, row in group.iterrows():
temp_dict['CM Code'].append(row['CM Code'])
temp_dict['parts'].append(
{'number': row['PN'],
'manufacturer': [
{
'id': row['MPN Code'],
'info': [
{
'date_code': row['DATE_CODE'],
'supplier': {'id': row['Supplier Code'],
'FYFM': '%s-%s' % (row['Fiscal YEAR'], row['Fiscal MONTH']),
'usage': row['Usage'],
'defects': row['Defects']}
}
]
}]
}
)
data['PCAs'].append(temp_dict)

Why does updating a dictionary remove the rest of the dictionaries from my nested array?

I have a json file with players structured as so
[
{
"Player_Name": "Rory McIlroy",
"Tournament": [
{
"Name": "Arnold Palmer Invitational presented by Mastercard",
"Points": "68.10",
"Salary": "12200.00"
},
{
"Name": "World Golf Championships-Mexico Championship",
"Points": "103.30",
"Salary": "12200.00"
},
{
"Name": "The Genesis Invitational",
"Points": "88.60",
"Salary": "12200.00"
},
{
"Name": "Farmers Insurance Open",
"Points": "107.30",
"Salary": "12200.00"
},
{
"Name": "World Golf Championships-HSBC Champions",
"Points": "138.70",
"Salary": "12400.00"
},
{
"Name": "The ZOZO Championship",
"Points": "103.40",
"Salary": "12300.00"
}
]
}]
When I run this code
import json
import numpy as np
import pandas as pd
from itertools import groupby
# using json open the player objects file and set it equal to data
with open('Active_PGA_Player_Objects.json') as json_file:
data = json.load(json_file)
with open('Players_DK.json') as json_file:
Players_DK = json.load(json_file)
results = []
for k,g in groupby(sorted(data, key=lambda x:x['Player_Name']), lambda x:x['Player_Name']):
results.append({'Player_Name':k, 'Tournament':[i['Tournament'][0] for i in g]})
for obj in results:
for x in Players_DK:
if obj['Player_Name'] == x['Name']:
obj['Average'] = x['AvgPointsPerGame']
i = 0
points_results = []
while i < len(results):
j = 0
while j < len(results[i]['Tournament']):
difference = (int(float(results[i]['Tournament'][j]['Points'])) - (results[i]['Average']))
points_results.append(round(difference,2))
j += 1
i += 1
with open('PGA_Player_Objects_w_Average.json', 'w') as my_file:
json.dump(results, my_file)
my list comes back like this
[{
"Player_Name": "Rory McIlroy",
"Tournament": [
{
"Name": "Arnold Palmer Invitational presented by Mastercard",
"Points": "68.10",
"Salary": "12200.00"
}
],
"Average": 96.19
}]
Can someone explain to me why when I update the specific dictionary it deletes all but the first value from the nested Tournament list? My goal here is to add each players average to their corresponding dictionary so that I can take each average and subtract it from each score. When I try to do this though I'm only able to perform it on the one value left in the list.

Just for what it's worth, I'd go back and really think about what each line is really doing. You're also making things harder on yourself by calling variables obj or x. Calculating the average can be done like:
for player in data: # data is poorly named, try players or players_data
player['Average'] = sum(float(tourny['Points']) for tourny in player['Tournament']) / len(player['Tournament'])
for tourny in player['Tournament']:
tourny['Difference'] = float(tourny['Points']) - float(player['Average'])
leaving you with:
{'Player_Name': 'Rory McIlroy',
'Tournament': [{
'Name': 'Arnold Palmer Invitational presented by Mastercard',
'Points': '68.10',
'Salary': '12200.00',
'Difference': -33.46666666666667},
{
'Name': 'World Golf Championships-Mexico Championship',
'Points': '103.30',
'Salary': '12200.00',
'Difference': 1.7333333333333343}, # .....etc
'Average': 101.566666666666666
}
When you use names in your code that describe what they're representing, a huge number of optimizations become immediately obvious. Give it a go!

Append element to json dict (geojson) using Python

I would like to add style to my geojson through Python. The current features currently do not have any style elements. I want to append style and then fill. However, when I do, nothing is added to the file. It is the same as before
import json
with open('test.json') as f:
data = json.load(f)
for feature in data['features']:
feature.append("style")
feature["style"].append({"fill":color})
Sample GeoJson
{
"type": "FeatureCollection",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "STATEFP": "17", "COUNTYFP": "019", "TRACTCE": "005401", "BLKGRPCE": "2", "GEOID": "170190054012", "NAMELSAD": "Block Group 2", "MTFCC": "G5030", "FUNCSTAT": "S", "ALAND": 574246.000000, "AWATER": 4116.000000, "INTPTLAT": "+40.1238204", "INTPTLON": "-088.2038105", "GISJOIN": "G17001900054012", "STUSPS": "IL", "SHAPE_AREA": 578361.706954, "SHAPE_LEN": 3489.996273, "census_block_income_YEAR": "2009-2013", "census_block_income_STATE": "Illinois", "census_block_income_STATEA": 17, "census_block_income_COUNTY": "Champaign County"}}]}
I'm trying to get the end results to be:
{
"type": "FeatureCollection",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "STATEFP": "17", "COUNTYFP": "019", "TRACTCE": "005401", "BLKGRPCE": "2", "GEOID": "170190054012", "NAMELSAD": "Block Group 2", "MTFCC": "G5030", "FUNCSTAT": "S", "ALAND": 574246.000000, "AWATER": 4116.000000, "INTPTLAT": "+40.1238204", "INTPTLON": "-088.2038105", "GISJOIN": "G17001900054012", "STUSPS": "IL", "SHAPE_AREA": 578361.706954, "SHAPE_LEN": 3489.996273, "census_block_income_YEAR": "2009-2013", "census_block_income_STATE": "Illinois", "census_block_income_STATEA": 17, "census_block_income_COUNTY": "Champaign County"},"style"{fill:"red"}}]}

When you type
for feature in data['features']:
every feature will be an item of the list that is data['features']. Each item there is a dictionary, so you are calling the wrong method (append is a method of lists).
You could write
for feature in data['features']:
feature.update({"style": {"fill": "red"}})
Finally, if you want the file from which you got the initial json structure to be altered, make sure to write the now updated data structure back to a file:
with open('output2.json', 'w') as f:
json.dump(data, f)

You are working with list of dictionaries here, dictionary hasn't method append, you can create new key like here:
for feature in data['features']:
feature["style"] = {"fill":color}
Seems that you need rewrite file with JSON:
with open('test.json', 'w') as f:
json.dump(data, f)

There is no append method in a dictionary. One should use update.
import pprint as pp
for feature in data['features']:
feature.update({'style':{'fill': 'red'}})
pp.pprint(data)
Output:
{'crs': {'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'},
'type': 'name'},
'features': [{'properties': {'ALAND': 574246.0,
'AWATER': 4116.0,
'BLKGRPCE': '2',
'COUNTYFP': '019',
'FUNCSTAT': 'S',
'GEOID': '170190054012',
'GISJOIN': 'G17001900054012',
'INTPTLAT': '+40.1238204',
'INTPTLON': '-088.2038105',
'MTFCC': 'G5030',
'NAMELSAD': 'Block Group 2',
'SHAPE_AREA': 578361.706954,
'SHAPE_LEN': 3489.996273,
'STATEFP': '17',
'STUSPS': 'IL',
'TRACTCE': '005401',
'census_block_income_COUNTY': 'Champaign County',
'census_block_income_STATE': 'Illinois',
'census_block_income_STATEA': 17,
'census_block_income_YEAR': '2009-2013'},
'style': {'fill': 'red'},
'type': 'Feature'}],
'type': 'FeatureCollection'}

You never write your changes back to the file. Add the following to the end of your code:
with open('test.json','w') as f:
json.dump(data, f)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting a string in json using python - python

Related

Propertly form JSON with df.to_json with dataframe containing nested json

Filter Json with Ids contained in csv sheet using python

csv to complex nested json

Why does updating a dictionary remove the rest of the dictionaries from my nested array?

Append element to json dict (geojson) using Python

Categories

Resources