I have a large JSON file that needs cutting, I'm trying to delete the following items: "owner", "ticker", "comment" and "ptr_link" as keys.
JSON file:
{
"transactions": {
"0": [
{
"transaction_date": "11/29/2022",
"owner": "Spouse",
"ticker": "WIW",
"asset_description": "Western Asset Inflation-Linked Opportunities & Inc",
"asset_type": "Stock",
"type": "Sale (Full)",
"amount": "$1,001 - $15,000",
"comment": "--",
"ptr_link": "https://efdsearch.senate.gov/search/view/ptr/5ac4d053-0258-4531-af39-8a8067f0d085/"
},
{
"transaction_date": "11/29/2022",
"owner": "Spouse",
"ticker": "GBIL",
"asset_description": "Goldman Sachs Access Treasury 0-1 Year ETF",
"asset_type": "Other Securities",
"type": "Purchase",
"amount": "$1,001 - $15,000",
"comment": "--",
"ptr_link": "https://efdsearch.senate.gov/search/view/ptr/5ac4d053-0258-4531-af39-8a8067f0d085/"
}
]
}
}
The "0" that holds this list can range upto the 60's so I need to collectively access all of them rather than writing for specifically this list. The same applies for the dictionaries that hold the keys/values, as there could be numerous amounts, so I can't input [0] or [1] etc.
this is my code, I'm trying to filter to the according object and simply delete the keys. Although I need to do this collectively as mentioned.
import json
data = json.load(open("xxxtester.json"))
data1 = data['transactions']
data2 = data1['0'][0]
for i in data2:
del data2['owner']
for i in data2:
del data2['ticker']
for i in data2:
del data2['comment']
for i in data2:
del data2['ptr_link']
open("xxxtester.json", "w").write(json.dumps(data, indent=4))
Try:
import json
with open("your_data.json", "r") as f_in:
data = json.load(f_in)
to_delete = {"owner", "ticker", "comment", "ptr_link"}
for k in data["transactions"]:
data["transactions"][k] = [
{kk: vv for kk, vv in d.items() if kk not in to_delete}
for d in data["transactions"][k]
]
print(data)
Prints:
{
"transactions": {
"0": [
{
"transaction_date": "11/29/2022",
"asset_description": "Western Asset Inflation-Linked Opportunities & Inc",
"asset_type": "Stock",
"type": "Sale (Full)",
"amount": "$1,001 - $15,000",
},
{
"transaction_date": "11/29/2022",
"asset_description": "Goldman Sachs Access Treasury 0-1 Year ETF",
"asset_type": "Other Securities",
"type": "Purchase",
"amount": "$1,001 - $15,000",
},
]
}
}
To save back as Json:
with open("output.json", "w") as f_out:
json.dump(data, f_out, indent=4)
If you just want to remove some keys from each dictionary in list lets try this
data = json.load(open("xxxtester.json"))
for_delete = ["owner", "ticker", "comment", "ptr_link"]
for d in data['transactions']['0']:
for key in for_delete:
if key in d:
d.pop(key)
open("xxxtester.json", "w").write(
json.dumps(data, indent=4))
Related
I want to append a column name with the value of it's data type to a JSON file. I can't seem to figure out how to get the data type of the value based on the name. Not sure how to append this correctly in the for-loop for data['type'].
Excel Spreadsheet
Code
import xlrd
from collections import OrderedDict
import json
wb = xlrd.open_workbook('./file1.xlsx')
sh = wb.sheet_by_index(0)
data_list = []
data = OrderedDict()
for colnum in range(0, sh.ncols):
data['name'] = sh.row_values(0)[colnum]
data['description'] = sh.row_values(1)[colnum]
data_list.append(data.copy())
data_list = {'columns': data_list}
j = json.dumps(data_list)
with open('seq1.json', 'w') as f:
f.write(j)
Output
{
"columns": [
{
"name": "FILEID",
"description": "FILEID"
},
{
"name": "FILETYPE",
"description": "FILETYPE"
},
}
Expected output
{
"columns": [
{
"name": "fileid",
"description": "FILEID",
"type": "keyword"
},
{
"name": "filetype",
"description": "FILETYPE",
"type": "keyword"
},
}
This question already has an answer here:
Python: Dynamically update a dictionary with varying variable "depth"
(1 answer)
Closed 3 years ago.
I am creating a multi level nested dictionary by reading from a large csv file. The content the files are in the following format, which store relevant information pertaining a unique book. We can assume each row has 6 columns(author, title, year, category, url, citations); all column entries have identical formatting. For example:
Author,Title,Year,Category,Url,Citations
"jk rowling, etc....",goblet of fire,1973,magic: fantasy: english literature,http://doi.acm.org/10.1145/800010.808066,6
"Weiner, Leonard H.",cracking the coding interview,1973,LA: assessment: other,http://doi.acm.org/10.1145/800010.808105,2
"Tolkien",hobbit,1953,magic: fantasy: medieval,http://doi.acm.org/10.1145/800010.808066,6
I want the output to match how each row in the csv file is parsed, similar to the following:
*(note: the # of nested dictionaries is dependent on the book categories under the category header of the csv. Keys are based on successive categories (order matters), separated by the ':' delimiter. Think of the ordering of categories per row in the csv file as the path directory; multiple files can have the same path directory up to a certain point or they can have the same path directory and be placed in the same folder.
results = {'1973':{
"magic": {
"fantasy": {
"English literature": {
"name": "goblet of fire",
"citations": 6,
"url": "http://doi.acm.org/10.1145/800010.808066"
}
},
"medieval": {
"name": "The Hobbit",
"citations": 7,
"url": "http://doi.acm.org/10.1145/800fdfdffd010.808066"
}
}
},
'1953':{
"la": {
"assessment": {
"other": {
"name": "cracking the coding interview",
"citations": 6,
"url": "http://doi.acm.org/10.1145/800010.808105"
}
}
}
}
}
Obviously some books will have share common successive categories together like in the example I showed above. Some books might also share the exact same successive categories. I think I should recursively iterate through the string of categories per row in the csv, either creating new sub dicts that deviate from a preexisting category order, then creating a dictionary representation of the book once there are no more successive categories to check. I'm just not sure exactly how to start.
Here's what I have so far, it's just a standard setup of reading csv files:
with open(DATA_FILE, 'r') as data_file:
data = csv.reader(data_file)
Essentially, I want to create a tree representation of this csv using nested dictionaries, the relative category path (i.e. magic:fantasy:etc...), determining which subtree to traverse/create.If two or more books have the same consecutive path, I want to make all those books leafs of their respective key, instead of overriding each book(leaf) whenever a new book has an identical category path. Leafs represent a dictionary representation of the books mentioned per row in the csv.
You can group your data by category (using a simple dictionary, as you mentioned that you cannot use any modules other than csv) and then apply recursion:
import csv
_, *data = csv.reader(open('filename.csv'))
new_data = [[i[3].split(': '), *i[4:], *i[:3]] for i in data]
def group(d):
_d = {}
for a, *b in d:
if a[0] not in _d:
_d[a[0]] = [[a[1:], *b]]
else:
_d[a[0]].append([a[1:], *b])
r = {a:{'books':[{'name':c[-2], 'citations':c[2], 'url':c[1], 'author':c[3]} for c in b if not c[0]], **(lambda x:{} if not x else group(x))([c for c in b if c[0]])} for a, b in _d.items()}
return {a:{c:d for c, d in b.items() if d} for a, b in r.items()}
import json
print(json.dumps(group(new_data), indent=4))
Output:
{
"magic": {
"fantasy": {
"english literature": {
"books": [
{
"name": "goblet of fire",
"citations": "6",
"url": "http://doi.acm.org/10.1145/800010.808066",
"author": "jk rowling, etc...."
}
]
},
"medieval": {
"books": [
{
"name": "hobbit",
"citations": "6",
"url": "http://doi.acm.org/10.1145/800010.808066",
"author": "Tolkien"
}
]
}
}
},
"LA": {
"assessment": {
"other": {
"books": [
{
"name": "cracking the coding interview",
"citations": "2",
"url": "http://doi.acm.org/10.1145/800010.808105",
"author": "Weiner, Leonard H."
}
]
}
}
}
}
Edit: grouping by publication date:
import csv
_, *data = csv.reader(open('filename.csv'))
new_data = [[i[3].split(': '), *i[4:], *i[:3]] for i in data]
_data = {}
for i in new_data:
if i[-1] not in _data:
_data[i[-1]] = [i]
else:
_data[i[-1]].append(i)
final_result = {a:group(b) for a, b in _data.items()}
Output:
{
"1973": {
"magic": {
"fantasy": {
"english literature": {
"books": [
{
"name": "goblet of fire",
"citations": "6",
"url": "http://doi.acm.org/10.1145/800010.808066",
"author": "jk rowling, etc...."
}
]
}
}
},
"LA": {
"assessment": {
"other": {
"books": [
{
"name": "cracking the coding interview",
"citations": "2",
"url": "http://doi.acm.org/10.1145/800010.808105",
"author": "Weiner, Leonard H."
}
]
}
}
}
},
"1953": {
"magic": {
"fantasy": {
"medieval": {
"books": [
{
"name": "hobbit",
"citations": "6",
"url": "http://doi.acm.org/10.1145/800010.808066",
"author": "Tolkien"
}
]
}
}
}
}
}
Separate categories by their nest
Parse CSV to pandas dataframe
Groupby by category in a loop
use to_dict() to convert to dict in a groupby loop
You can do something like the following:
import pandas as pd
df = pd.read_csv('yourcsv.csv', sep=',')
Next, you want to isolate the Category column and split its content with columns:
cols_no_categ = list(df.columns)
cols_no_categ.remove('Category')
category = df['Category']
DICT = {}
for c in category:
dicto = df[df.Category == c, cols_no_categ].to_dict()
s = c.split(': ')
DICT[s[0]][s[1]][s[2]] = dicto
I have a CSV file
group, first, last
fans, John, Smith
fans, Alice, White
students, Ben, Smith
students, Joan, Carpenter
...
The Output JSON file needs this format:
[
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
},
{
"group" : "students",
"user" : [
{
"first" : "Ben",
"last" : "Smith"
},
{
"first" : "Joan",
"last" : "Carpenter"
}
]
}
]
Short answer
Use itertools.groupby, as described in the documentation.
Long answer
This is a multi-step process.
Start by getting your CSV into a list of dict:
from csv import DictReader
with open('data.csv') as csvfile:
r = DictReader(csvfile, skipinitialspace=True)
data = [dict(d) for d in r]
groupby needs sorted data, so define a function to get the key, and pass it in like so:
def keyfunc(x):
return x['group']
data = sorted(data, key=keyfunc)
Last, call groupby, providing your sorted data and your key function:
from itertools import groupby
groups = []
for k, g in groupby(data, keyfunc):
groups.append({
"group": k,
"user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)]
})
This will iterate over your data, and every time the key changes, it drops into the for block and executes that code, providing k (the key for that group) and g (the dict objects that belong to it). Here we just store those in a list for later.
In this example, the user key uses some pretty dense comprehensions to remove the group key from every row of user. If you can live with that little bit of extra data, that whole line can be simplified as:
"user": list(g)
The result looks like this:
[
{
"group": "fans",
"user": [
{
"first": "John",
"last": "Smith"
},
{
"first": "Alice",
"last": "White"
}
]
},
{
"group": "students",
"user": [
{
"first": "Ben",
"last": "Smith"
},
{
"first": "Joan",
"last": "Carpenter"
}
]
}
]
I have two dictionaries in python 3, called a and b which when dumped as JSON come out as below:
a = {"person": 26.94, "car": 99.49, "dog": 50.56}
b = {"filename": "1234.jpg", "model": "model1", "prototxt": "prototxt.txt"}
I need to combine these into the JSON format below but I am a bit lost on the approach in python, so welcome any pointers!
{
"payload": {
"config": [{
"model": "model1",
"filename": "1234.jpg",
"prototxt": "prototxt.txt"
}],
"results": [{
"object": "person",
"value": 26.94
},
{
"object": "car",
"value": 99.49
},
{
"object": "dog",
"value": 50.56
}
]
}
}
You can achieve this by the following code:
a = {"person": 26.94, "car": 99.49, "dog": 50.56}
b = {"filename": "1234.jpg", "model": "model1", "prototxt": "prototxt.txt"}
a_list= []
for record in a:
a_list.append({'object':record, 'value':a[record]})
payload = {'config':[b], 'results':a_list}
data = {"payload":payload}
# You can print this in the terminal/notebook with
print json.dumps(data, indent=4)
# Or save with
json_string = json.dumps(data)
with open('/path/to/file/payload.json', 'w') as outfile:
json.dump(data, outfile)
It's a matter of making entries in dictionaries and packaging them up in strings to add into new higher dictionaries.
I have data in JSON format:
data = {"outfit":{"shirt":"red,"pants":{"jeans":"blue","trousers":"khaki"}}}
I'm attempting to plot this data into a decision tree using InfoVis, because it looks pretty and interactive. The problem is that their graph takes JSON data in this format:
data = {id:"nodeOutfit",
name:"outfit",
data:{},
children:[{
id:"nodeShirt",
name:"shirt",
data:{},
children:[{
id:"nodeRed",
name:"red",
data:{},
children:[]
}],
}, {
id:"nodePants",
name:"pants",
data:{},
children:[{
id:"nodeJeans",
name:"jeans",
data:{},
children:[{
id:"nodeBlue",
name:"blue",
data:{},
children[]
},{
id:"nodeTrousers",
name:"trousers",
data:{},
children:[{
id:"nodeKhaki",
name:"khaki",
data:{},
children:[]
}
}
Note the addition of 'id', 'data' and 'children' to every key and value and calling every key and value 'name'. I feel like I have to write a recursive function to add these extra values. Is there an easy way to do this?
Here's what I want to do but I'm not sure if it's the right way. Loop through all the keys and values and replace them with the appropriate:
for name, list in data.iteritems():
for dict in list:
for key, value in dict.items():
#Need something here which changes the value for each key and values
#Not sure about the syntax to change "outfit" to name:"outfit" as well as
#adding id:"nodeOutfit", data:{}, and 'children' before the value
Let me know if I'm way off.
Here is their example http://philogb.github.com/jit/static/v20/Jit/Examples/Spacetree/example1.html
And here's the data http://philogb.github.com/jit/static/v20/Jit/Examples/Spacetree/example1.code.html
A simple recursive solution:
data = {"outfit":{"shirt":"red","pants":{"jeans":"blue","trousers":"khaki"}}}
import json
from collections import OrderedDict
def node(name, children):
n = OrderedDict()
n['id'] = 'node' + name.capitalize()
n['name'] = name
n['data'] = {}
n['children'] = children
return n
def convert(d):
if type(d) == dict:
return [node(k, convert(v)) for k, v in d.items()]
else:
return [node(d, [])]
print(json.dumps(convert(data), indent=True))
note that convert returns a list, not a dict, as data could also have more then one key then just 'outfit'.
output:
[
{
"id": "nodeOutfit",
"name": "outfit",
"data": {},
"children": [
{
"id": "nodeShirt",
"name": "shirt",
"data": {},
"children": [
{
"id": "nodeRed",
"name": "red",
"data": {},
"children": []
}
]
},
{
"id": "nodePants",
"name": "pants",
"data": {},
"children": [
{
"id": "nodeJeans",
"name": "jeans",
"data": {},
"children": [
{
"id": "nodeBlue",
"name": "blue",
"data": {},
"children": []
}
]
},
{
"id": "nodeTrousers",
"name": "trousers",
"data": {},
"children": [
{
"id": "nodeKhaki",
"name": "khaki",
"data": {},
"children": []
}
]
}
]
}
]
}
]