How do i append a dictionary to a JSON file in python? - python

I have a JSON looks like this:
{'data': [], 'directed': False, 'multigraph': False, 'elements': {'nodes': [{'data': {'id': 'B2', 'value': 'B2', 'name': 'B2'}}, {'data': {'id': 'SCHROEDER PLZ', 'value': 'SCHROEDER PLZ', 'name': 'SCHROEDER PLZ'}}, {'data': {'id': 'D4', 'value': 'D4', 'name': 'D4'}}, {'data': {'id': 'BLAB PLZ', 'value': 'BLAB PLZ', 'name': 'BLAB PLZ'}}], 'edges': [{'data': {'source': 'B2', 'target': 'SCHROEDER PLZ'}}, {'data': {'source': 'D4', 'target': 'BLAB PLZ'}}]}}
The JSON is a result of the "loads" in my code:
import pandas as pd
import networkx as nx
import json
df= pd.read_csv('.../graph.csv')
g = nx.from_pandas_edgelist(df, source='DISTRICT', target='STREET')
x = nx.cytoscape_data(g)
dump = json.dumps(x)
loads = json.loads(dump)
And this is my csv file structure: The first record is the field name.
OFFENSE_DESCRIPTION,DISTRICT,DAY_OF_WEEK,STREET,INCIDENT_NUMBER,size
INVESTIGATE PERSON,B2,Thursday,SCHROEDER PLZ,854652314,10
INVESTIGATE PERSON,D4,Friday,BLAB PLZ,457856954,3
I want to append "size" values located in my csv file.
In fact, the result must be like the below JSON. in the 'nodes' tags, in the 'data' i want to add 'size' field value.
{'data': [], 'directed': False, 'multigraph': False, 'elements': {'nodes': [{'data': {'id': 'B2', 'value': 'B2', 'name': 'B2','size':10}}, {'data': {'id': 'SCHROEDER PLZ', 'value': 'SCHROEDER PLZ', 'name': 'SCHROEDER PLZ','size':10}}, {'data': {'id': 'D4', 'value': 'D4', 'name': 'D4','size':3}}, {'data': {'id': 'BLAB PLZ', 'value': 'BLAB PLZ', 'name': 'BLAB PLZ','size':3}}], 'edges': [{'data': {'source': 'B2', 'target': 'SCHROEDER PLZ'}}, {'data': {'source': 'D4', 'target': 'BLAB PLZ'}}]}}

An elegant solution is to update node attributes in networkx rather than the output dict. Use nx.set_node_attributes:
df = pd.read_csv('.../graph.csv')
size = dict(df[['DISTRICT', 'size']].values.tolist()
+ df[['STREET', 'size']].values.tolist())
g = nx.from_pandas_edgelist(df, source='DISTRICT', target='STREET')
nx.set_node_attributes(g, size, 'size')
x = nx.cytoscape_data(g)
>>> print(json.dumps(x['elements']['nodes'], indent=4))
[
{
"data": {
"size": 10,
"id": "B2",
"value": "B2",
"name": "B2"
}
},
{
"data": {
"size": 10,
"id": "SCHROEDER PLZ",
"value": "SCHROEDER PLZ",
"name": "SCHROEDER PLZ"
}
},
{
"data": {
"size": 3,
"id": "D4",
"value": "D4",
"name": "D4"
}
},
{
"data": {
"size": 3,
"id": "BLAB PLZ",
"value": "BLAB PLZ",
"name": "BLAB PLZ"
}
}
]

Related

Python: Change a JSON value

Let's say I have the following JSON file named output.
{'fields': [{'name': 2, 'type': 'Int32'},
{'name': 12, 'type': 'string'},
{'name': 9, 'type': 'datetimeoffset'},
}],
'type': 'struct'}
If type key has a value datetimeoffset, I would like to change it to dateTime and if If type key has a value Int32, I would like to change it to integer and like this, I have multiple values to replace.
The expected output is
{'fields': [{ 'name': 2, 'type': 'integer'},
{ 'name': 12, 'type': 'string'},
{ 'name': 9, 'type': 'dateTime'},
,
}],
'type': 'struct'}
Can anyone help with this in Python?
You can try this out:
substitute = {"Int32": "integer", "datetimeoffset": "dateTime"}
x = {'fields': [
{'name': 2, 'type': 'Int32'},
{'name': 12, 'type': 'string'},
{'name': 9, 'type': 'datetimeoffset'}
],'type': 'struct'}
for i in range(len(x['fields'])):
if x['fields'][i]["type"] in substitute:
x['fields'][i]['type'] = substitute[x['fields'][i]['type']]
print(x)
You can use the following code. Include in equivalences dict the values you want to replace:
json = {
'fields': [
{'name': 2, 'type': 'Int32'},
{'name': 12, 'type': 'string'},
{'name': 9, 'type': 'datetimeoffset'},
],
'type': 'struct'
}
equivalences = {"datetimeoffset": "dateTime", "Int32": "integer"}
#Replace values based on equivalences dict
for i, data in enumerate(json["fields"]):
if data["type"] in equivalences.keys():
json["fields"][i]["type"] = equivalences[data["type"]]
print(json)
The output is:
{
"fields": [
{
"name": 2,
"type": "integer"
},
{
"name": 12,
"type": "string"
},
{
"name": 9,
"type": "dateTime"
}
],
"type": "struct"
}
simple but ugly way:
json_ ={'fields': [{'name': 2, 'type': 'Int32'},
{'name': 12, 'type': 'string'},
{'name': 9, 'type': 'datetimeoffset'}], 'type': 'struct'}
result = json.loads(json.dumps(json_ ).replace("datetimeoffset", "dateTime").replace("Int32", "integer"))

How to extract Json data scraped from website

I used Beautiful soup to extract data from a Website. Content is in JSON and I need to extract all the display_name values. I have no clue how to naviagate and print the values I need to save in my CSV.
I tried using some array examples like this one
for productoslvl in soup2.findAll('script',{'id' :'searchResult'}):
element = jsons[0]['display_name']
print (element)
but I keep getting KeyError
This is the JSON data:
{
'page_size': -1,
'refinements': [{
'display_name': 'Brand',
'values': [{
'display_name': 'Acqua Di Parma',
'status': 4,
'value': 900096
}],
'type': 'checkboxes'
}, {
'display_name': 'Bristle Type',
'values': [{
'display_name': 'Addictive',
'status': 1,
'value': 14578019
}, {
'display_name': 'Casual',
'status': 1,
'value': 14578020
}, {
'display_name': 'Chic',
'status': 1,
'value': 14301148
}, {
'display_name': 'Polished',
'status': 1,
'value': 14578022
}],
'type': 'checkboxes'
}, {
'display_name': 'Coverage',
'values': [{
'display_name': 'Balanced',
'status': 1,
'value': 14301025
}, {
'display_name': 'Light',
'status': 1,
'value': 14577894
}, {
'display_name': 'Rich',
'status': 1,
'value': 14577895
}],
'type': 'checkboxes'
}, {
'display_name': 'Formulation',
'values': [{
'display_name': 'Cream',
'status': 1,
'value': 100069
}, {
'display_name': 'Spray',
'status': 1,
'value': 100072
}],
'type': 'checkboxes'
}

creating df to generate json in the given format

I am trying to generate a df to produce this below json.
Json data:
{
"name": "flare",
"children": [
{
"name": "K1",
"children": [
{"name": "Exact", "size": 4},
{"name": "synonyms", "size": 14}
]
},
{
"name": "K2",
"children": [
{"name": "Exact", "size": 10},
{"name": "synonyms", "size": 20}
]
},
{
"name": "K3",
"children": [
{"name": "Exact", "size": 0},
{"name": "synonyms", "size": 5}
]
},
{
"name": "K4",
"children": [
{"name": "Exact", "size": 13},
{"name": "synonyms", "size": 15}
]
},
{
"name": "K5",
"children": [
{"name": "Exact", "size": 0},
{"name": "synonyms", "size": 0}
]
}
]
}
input data:
name Exact synonyms
K1 4 14
K2 10 20
K3 0 5
K4 13 15
K5 0 0
I tried creating df with values in the json but I was not able to get the desired json on df.to_json, please help.
You need reshape data by set_index + stack and then use groupby with apply for nested list of dict:
import json
df = (df.set_index('name')
.stack()
.reset_index(level=1)
.rename(columns={'level_1':'name', 0:'size'})
.groupby(level=0).apply(lambda x: x.to_dict(orient='records'))
.reset_index(name='children')
)
print (df)
name children
0 K1 [{'name': 'Exact', 'size': 4}, {'name': 'synon...
1 K2 [{'name': 'Exact', 'size': 10}, {'name': 'syno...
2 K3 [{'name': 'Exact', 'size': 0}, {'name': 'synon...
3 K4 [{'name': 'Exact', 'size': 13}, {'name': 'syno...
4 K5 [{'name': 'Exact', 'size': 0}, {'name': 'synon...
#convert output to dict
j = { "name": "flare", "children": df.to_dict(orient='records')}
#for nice output - easier check
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(j)
{ 'children': [ { 'children': [ {'name': 'Exact', 'size': 4},
{'name': 'synonyms', 'size': 14}],
'name': 'K1'},
{ 'children': [ {'name': 'Exact', 'size': 10},
{'name': 'synonyms', 'size': 20}],
'name': 'K2'},
{ 'children': [ {'name': 'Exact', 'size': 0},
{'name': 'synonyms', 'size': 5}],
'name': 'K3'},
{ 'children': [ {'name': 'Exact', 'size': 13},
{'name': 'synonyms', 'size': 15}],
'name': 'K4'},
{ 'children': [ {'name': 'Exact', 'size': 0},
{'name': 'synonyms', 'size': 0}],
'name': 'K5'}],
'name': 'flare'}
#convert data to json and write to file
with open('data.json', 'w') as outfile:
json.dump(j, outfile)

Reading un-even JSON with Python

I have a program that's taking a LARGE JSON file and reading through the structure, grabbing everything where the key matches something, then storing a number of items form that structure into the database. The problem is that sometimes the structure is off when there is only one item... so as follows:
"stats": {
"first": [
{
"name": "Name1",
"context": "open",
"number": "139"
},
{
"name": "Name2",
"context": "opener",
"number": "135"
}
],
"second": {
"name": "Name1",
"context": "opener",
"amount": "1.5",
"number": "-125"
},
"third": [
{
"name": "Name1",
"context": "open",
"amount": "8.5",
"number": "-110"
},
{
"name": "Name2",
"context": "open",
"amount": "9.0",
"number": "-120"
}
]
}
},
So, you'll notice that second only has one entry, so it's structured differently... I've tried more conditionals than I can think of... how do I check if it's a single entry and move forward? This is probably REALLY simple, I'm just at a loss and not the best at Python data structures (admittedly).
What I'm doing after is grabbign like third[0]['name'] and putting it into a database... so I get an index error when I try on that second node. Also - in some nodes, second WILL have more than one... in others it won't... totally depends on the record.
I would first parse it to a JSON, and then update the dictionary you describe that has keys like "first", "second", etc. as follows:
def repair_dict(d):
for k in list(d):
v = d[k]
if not isinstance(v,list):
d[k] = [v]
It thus repairs the data like:
>>> d = json.loads(data)
>>> d
{'stats': {'third': [{'context': 'open', 'name': 'Name1', 'number': '-110', 'amount': '8.5'}, {'context': 'open', 'name': 'Name2', 'number': '-120', 'amount': '9.0'}], 'second': {'context': 'opener', 'name': 'Name1', 'number': '-125', 'amount': '1.5'}, 'first': [{'context': 'open', 'name': 'Name1', 'number': '139'}, {'context': 'opener', 'name': 'Name2', 'number': '135'}]}}
>>> repair_dict(d['stats'])
>>> d
{'stats': {'third': [{'context': 'open', 'name': 'Name1', 'number': '-110', 'amount': '8.5'}, {'context': 'open', 'name': 'Name2', 'number': '-120', 'amount': '9.0'}], 'second': [{'context': 'opener', 'name': 'Name1', 'number': '-125', 'amount': '1.5'}], 'first': [{'context': 'open', 'name': 'Name1', 'number': '139'}, {'context': 'opener', 'name': 'Name2', 'number': '135'}]}}
Or when pretty printing:
>>> pprint.pprint(d)
{'stats': {'first': [{'context': 'open', 'name': 'Name1', 'number': '139'},
{'context': 'opener', 'name': 'Name2', 'number': '135'}],
'second': [{'amount': '1.5',
'context': 'opener',
'name': 'Name1',
'number': '-125'}],
'third': [{'amount': '8.5',
'context': 'open',
'name': 'Name1',
'number': '-110'},
{'amount': '9.0',
'context': 'open',
'name': 'Name2',
'number': '-120'}]}}

How to convert/update the key-values information in defaultdict?

How do I convert the following defaultdict()?
defaultdict(<class 'dict'>, {
'key1_A': {
'id': 'key1',
'length': '663',
'type': 'A'},
'key1_B': {
'id': 'key1',
'length': '389',
'type': 'B'},
'key2_A': {
'id': 'key2',
'length': '865',
'type': 'A'},
'key2_B': {
'id': 'key2',
'length': '553',
'type': 'B' ........}})
the value of the id i.e key1 becomes the key, and the key called length is changed to length_A or B with corresponding values belonging in the earlier type.
defaultdict(<class 'dict'>, {
'key1': {
'length_A': '663',
'length_B': '389'},
'key2': {
'length_A': '865',
'length_B': '553'}})
Thanks,
I think this does what you want:
from collections import defaultdict
import pprint
d = {
'key1_A': {
'id': 'key1',
'length': '663',
'type': 'A',
},
'key1_B': {
'id': 'key1',
'length': '389',
'type': 'B',
},
'key2_A': {
'id': 'key2',
'length': '865',
'type': 'A',
},
'key2_B': {
'id': 'key2',
'length': '553',
'type': 'B',
},
}
transformed = defaultdict(dict)
for v in d.values():
transformed[v["id"]]["length_{}".format(v["type"])] = v["length"]
pprint.pprint(transformed)
# Output:
# defaultdict(<class 'dict'>,
# {'key1': {'length_A': '663', 'length_B': '389'},
# 'key2': {'length_A': '865', 'length_B': '553'}})

Categories