How to remove elements from json file using python? - python

I have a json file which looks like this
{
"name1": {
"name": "xyz",
"roll": "123",
"id": "A1"
},
"name2":{
"name": "abc",
"roll": "456",
"id": "A2"
},
"name3":{
"name": "def",
"roll": "789",
"id": "B1"
}
}
I want to remove name1, name2, name3 and should be looking like this
{
{
"name": "xyz",
"roll": "123",
"id": "A1"
},
{
"name": "abc",
"roll": "456",
"id": "A2"
},
{
"name": "def",
"roll": "789",
"id": "B1"
}
}
Is there any way to do this in python? If this problem is asked earlier then me please refer to one cause I can't find it. Help will be thankful.

Since you're expecting the outermost (root) level of json document to be an object (curly braces) you should specify at least one key to start with. Refer below example.
Python code Example
import json
in_file = open("/Input/path/to/sample.json", "r")
out_file = open("/Output/path/to/converted_json.json", "w")
input_json = json.load(in_file)
print(input_json)
converted_json = {"names": [input_json[key] for key in input_json]}
print(converted_json)
json.dump(converted_json, out_file)
in_file.close(), out_file.close()
Out put will be a json file with name converetd_json.json with below content.
{
"names": [{
"name": "xyz",
"roll": "123",
"id": "A1"
},
{
"name": "abc",
"roll": "456",
"id": "A2"
},
{
"name": "def",
"roll": "789",
"id": "B1"
}
]
}
If you want to avoid this root key, an alternative is to use an array (Square Brackets) as an outermost level. Refer below example.
Python Code
import json
in_file = open("/Input/path/to/sample.json", "r")
out_file = open("/Output/path/to/converted_json.json", "w")
input_json = json.load(in_file)
print(input_json)
converted_json = [input_json[key] for key in input_json]
print(converted_json)
json.dump(converted_json, out_file)
in_file.close(), out_file.close()
Out put will be a json file with name converetd_json.json with below content.
[
{
"name": "xyz",
"roll": "123",
"id": "A1"
},
{
"name": "abc",
"roll": "456",
"id": "A2"
},
{
"name": "def",
"roll": "789",
"id": "B1"
}
]
Note: The outermost level of a JSON document is either an "object" (curly braces) or an "array" (square brackets). So in this case both are valid json.

You can use simple python script. First, you need to read the json file and then load it as an json object. Then, you can iterate on the whole json object to change it to whatever format you need.
import json
with open("file.json","r+") as file:
s = file.read()
jfile = json.loads(s)
with open("ss.json","w") as file:
json.dump([jfile[i] for i in jfile], file)

Related

Getting data into a json skeleton through user inputs in python

Say I have an input skeleton like this
"thing": {
"name": "",
"date": ""
},
"anotherThing": [
{
"name": "",
"description": "",
"expirationDate": ""
}
],
How would I go about filling in the blanks through user inputs so it could look something like this?
"thing": {
"name": "Water",
"date": "07/27/2022"
},
"anotherThing": [
{
"name": "Fire",
"description": "is hot",
"expirationDate": "05/22/2026"
}
],
or something like that above depending on what the user inputs?
you would need to write the JSON as a file
data = {
'message': converted,
}
with open('data.json', 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=4)

Why am I getting TypeError on code that worked previously?

I have this code to iterate through a json file. The user specifies tiers to be extracted, the names of which are then saved in inputLabels, and this for loop extracts the data from those tiers:
with open(inputfilename, 'r', encoding='utf8', newline='\r\n') as f:
data = json.load(f)
for line in data:
if line['label'] in inputLabels:
elements = [(e['body']['value']).replace(" ", "_") + "\t" for e in line['first']['items']]
outputData.append(elements)
I wrote this code a year ago and have run it multiple times since then with no issues, but running it today I received a TypeError.
if line['label'] in inputLabels:
TypeError: string indices must be integers
I don't understand why my code was able to work before if this is a true TypeError. Why is this only a problem in the code now, and how can I fix it?
EDIT: Pasted part of the json:
{
"contains": [
{
"total": 118,
"generated": "ELAN Multimedia Annotator 6.2",
"id": "xxx",
"label": "BAR001_TEXT",
"type": "AnnotationCollection",
"#context": "http://www.w3.org/ns/ldp.jsonld",
"first": {
"startIndex": "0",
"id": "xxx",
"type": "AnnotationPage",
"items": [
{
"id": "xxx",
"type": "Annotation",
"body": {
"purpose": "transcribing",
"format": "text/plain",
"language": "",
"type": "TextualBody",
"value": ""
},
"#context": "http://www.w3.org/ns/anno.jsonld",
"target": {
"format": "audio/x-wav",
"id": "xxx",
"type": "Audio"
}
},
{
"id": "xxx",
"type": "Annotation",
"body": {
"purpose": "transcribing",
"format": "text/plain",
"language": "",
"type": "TextualBody",
"value": "Dobar vam"
},
"#context": "http://www.w3.org/ns/anno.jsonld",
"target": {
"format": "audio/x-wav",
"id": "xxx",
"type": "Audio"
}
},
{
"id": "xxx",
"type": "Annotation",
"body": {
"purpose": "transcribing",
"format": "text/plain",
"language": "",
"type": "TextualBody",
"value": "Je"
},
"#context": "http://www.w3.org/ns/anno.jsonld",
"target": {
"format": "audio/x-wav",
"id": "xxx",
"type": "Audio"
}
},
Your code would probably work if you replaced for line in data: with for line in data['contains']
Maybe the JSON schema didn't have the "contains" level previously.
A pretty pythonic approach would be using exceptions:
with open(inputfilename, 'r', encoding='utf8', newline='\r\n') as f:
data = json.load(f)
for line in data:
try:
if line['label'] in inputLabels:
elements = [(e['body']['value']).replace(" ", "_") + "\t" for e in line['first']['items']]
outputData.append(elements)
except Exception as e:
print( f"{type(e)} : {e} when trying to use {line}")
Your code will run through and give you a hint about what failed
Turns out it was a pretty simple fix. All of the JSON file was in a container (look at the portion I posted in the question, it's the second line, "contains":). I was able to just remove that container and its open/closing brackets and the code ran successfully after that. Thanks all for your help.

Ignore specific JSON keys when extracting data in Python

I'm extracting certain keys in several JSON files and then converting it to a CSV in Python. I'm able to define a key list when I run my code and get the information I need.
However, there are certain sub-keys that I want to ignore from the JSON file. For example, if we look at the following snippet:
JSON Sample
[
{
"callId": "abc123",
"errorCode": 0,
"apiVersion": 2,
"statusCode": 200,
"statusReason": "OK",
"time": "2020-12-14T12:00:32.744Z",
"registeredTimestamp": 1417731582000,
"UID": "_guid_abc123==",
"created": "2014-12-04T22:19:42.894Z",
"createdTimestamp": 1417731582000,
"data": {},
"preferences": {},
"emails": {
"verified": [],
"unverified": []
},
"identities": [
{
"provider": "facebook",
"providerUID": "123",
"allowsLogin": true,
"isLoginIdentity": true,
"isExpiredSession": true,
"lastUpdated": "2014-12-04T22:26:37.002Z",
"lastUpdatedTimestamp": 1417731997002,
"oldestDataUpdated": "2014-12-04T22:26:37.002Z",
"oldestDataUpdatedTimestamp": 1417731997002,
"firstName": "John",
"lastName": "Doe",
"nickname": "John Doe",
"profileURL": "https://www.facebook.com/John.Doe",
"age": 50,
"birthDay": 31,
"birthMonth": 12,
"birthYear": 1969,
"city": "City, State",
"education": [
{
"school": "High School Name",
"schoolType": "High School",
"degree": null,
"startYear": 0,
"fieldOfStudy": null,
"endYear": 0
}
],
"educationLevel": "High School",
"favorites": {
"music": [
{
"name": "Music 1",
"id": "123",
"category": "Musician/band"
},
{
"name": "Music 2",
"id": "123",
"category": "Musician/band"
}
],
"movies": [
{
"name": "Movie 1",
"id": "123",
"category": "Movie"
},
{
"name": "Movie 2",
"id": "123",
"category": "Movie"
}
],
"television": [
{
"name": "TV 1",
"id": "123",
"category": "Tv show"
}
]
},
"followersCount": 0,
"gender": "m",
"hometown": "City, State",
"languages": "English",
"likes": [
{
"name": "Like 1",
"id": "123",
"time": "2014-10-31T23:52:53.0000000Z",
"category": "TV",
"timestamp": "1414799573"
},
{
"name": "Like 2",
"id": "123",
"time": "2014-09-16T08:11:35.0000000Z",
"category": "Music",
"timestamp": "1410855095"
}
],
"locale": "en_US",
"name": "John Doe",
"photoURL": "https://graph.facebook.com/123/picture?type=large",
"timezone": "-8",
"thumbnailURL": "https://graph.facebook.com/123/picture?type=square",
"username": "john.doe",
"verified": "true",
"work": [
{
"companyID": null,
"isCurrent": null,
"endDate": null,
"company": "Company Name",
"industry": null,
"title": "Company Title",
"companySize": null,
"startDate": "2010-12-31T00:00:00"
}
]
}
],
"isActive": true,
"isLockedOut": false,
"isRegistered": true,
"isVerified": false,
"lastLogin": "2014-12-04T22:26:33.002Z",
"lastLoginTimestamp": 1417731993000,
"lastUpdated": "2014-12-04T22:19:42.769Z",
"lastUpdatedTimestamp": 1417731582769,
"loginProvider": "facebook",
"loginIDs": {
"emails": [],
"unverifiedEmails": []
},
"rbaPolicy": {
"riskPolicyLocked": false
},
"oldestDataUpdated": "2014-12-04T22:19:42.894Z",
"oldestDataUpdatedTimestamp": 1417731582894,
"registered": "2014-12-04T22:19:42.956Z",
"regSource": "",
"socialProviders": "facebook"
}
]
I want to extract data from created and identities but ignore identities.favorites and identities.likes as well as their data underneath it.
This is what I have so far, below. I defined the JSON keys that I want to extract in the key_list variable:
Current Code
import json, pandas
from flatten_json import flatten
# Enter the path to the JSON and the filename without appending '.json'
file_path = r'C:\Path\To\file_name'
# Open and load the JSON file
json_list = json.load(open(file_path + '.json', 'r', encoding='utf-8', errors='ignore'))
# Extract data from the defined key names
key_list = ['created', 'identities']
json_list = [{k:d[k] for k in key_list} for d in json_list]
# Flatten and convert to a data frame
json_list_flattened = (flatten(d, '.') for d in json_list)
df = pandas.DataFrame(json_list_flattened)
# Export to CSV in the same directory with the original file name
export_csv = df.to_csv (file_path + r'.csv', sep=',', encoding='utf-8', index=None, header=True)
Similar to the key_list, I suspect that I would make an ignore list and factor that in the json_list for loop that I have? Something like:
key_ignore = ['identities.favorites', 'identities.likes']`
Then utilize the dict.pop() which looks like it will remove the unwanted sub-keys if it matches? Just not sure how to implement that correctly.
Expected Output
As a result, the code should extract data from the defined keys in key_list and ignore the sub keys defined in key_ignore, which is identities.favorites and identities.likes. Then the rest of the code will continue to convert it into a CSV:
created
identities.0.provider
identities.0.providerUID
identities...
2014-12-04T19:23:05.191Z
site
cb8168b0cf734b70ad541f0132763761
...
If the keys are always there, you can use
del d[0]['identities'][0]['likes']
del d[0]['identities'][0]['favorites']
or if you want to remove the columns from the dataframe after reading all the json data in you can use
df.drop(df.filter(regex='identities.0.favorites|identities.0.likes').columns, axis=1, inplace=True)

Convert json data to dictionary

I am trying to convert my json data into a dictionary with key the id of the json data. for example lets say i have the following json:
{
"id": "1",
"name": "John",
"surname": "Smith"
},
{
"id": "2",
"name": "Steve",
"surname": "Ger"
}
And i want to construct a new dictionary which includes the id as a key and save it into a file so i wrote the following code:
json_dict = []
request = requests.get('http://example.com/...')
with open("data.json", "w") as out:
loaded_data = json.loads(request.text)
for list_item in loaded_data:
json_dict.append({"id": list_item["id"], "data": list_item })
out.write(json.dumps(json_dict))
In the file i get the following output:
[{"data": {"id":"1",
"name":"John",
"Surname":"Smith"
}
},
{"data": {"id":"2",
"name":"Steve",
"Surname":"Ger"
}
},
]
Why the id is not included in my dict before data ?
I'm pretty sure you're looking at a ghost here. You probably tested wrong. It will go away when you try to create a minimal, complete, and verifiable example for us (i. e. with a fixed string as input instead of a request call, with a print instead of an out.write, etc).
This is my test in which I could not reproduce the problem:
entries = [
{
"id": "1",
"name": "John",
"surname": "Smith"
},
{
"id": "2",
"name": "Steve",
"surname": "Ger"
}
]
json_dict = []
for i in entries:
json_dict.append({"id":i["id"], "data": i})
json.dumps(json_dict, indent=2)
This prints as expected:
[
{
"id": "1",
"data": {
"id": "1",
"surname": "Smith",
"name": "John"
}
},
{
"id": "2",
"data": {
"id": "2",
"surname": "Ger",
"name": "Steve"
}
}
]
You could try this instead:
json_dict = []
with open('data.json', 'w) as f:
loaded_data = json.loads(request.text)
for list_item in loaded_data:
json_dict.append({list_item['id'] : list_item})

Parsing JSON in python for second object

I have a sample JSON in this format:
JSON FILE:
{
"Name": "ABC",
"Phone":"123",
"Address":[{"City":"City-1"},{"Country":"Country-1"}]
}
{
"Name": "ABC-1",
"Phone":"123-1",
"Address":[{"City":"City-2"},{"Country":"Country-2"}]
}
Is there any approach to parse the JSON and loop through the file and print each key-value pair.
The approach I used was through using
json_open = open(json_file)
json_data = json.load(json_open)
print(json_data[Name]) ##should give ABC
print(json_data[Name]) ##should give ABC-1 - unsure about the syntax and format
But I'm currently able to print only the first object values - i.e. name=ABC and not name=ABC-1
There is error in your json file. I modified your json and written code for traverse each element in it.
Error:
Error: Parse error on line 9:
... "Country-1" }]}{ "Name": "ABC-1",
-------------------^
Expecting 'EOF', '}', ',', ']', got '{'
sample.json
{
"data": [
{
"Name": "ABC",
"Phone": "123",
"Address": [
{
"City": "City-1"
},
{
"Country": "Country-1"
}
]
},
{
"Name": "ABC-1",
"Phone": "123-1",
"Address": [
{
"City": "City-2"
},
{
"Country": "Country-2"
}
]
}
]
}
sample.py
import json
json_file='sample.json'
with open(json_file, 'r') as json_data:
data = json.load(json_data)
jin=data['data']
for emp in jin:
print ("Name :"+emp["Name"])
print ("Phone :"+emp["Phone"])
print ("City :"+emp["Address"][0]["City"])
print ("Country :"+emp["Address"][1]["Country"])
Each record on json file is separated by ','. So, your file should look like:
[{
"Name": "ABC",
"Phone": "123",
"Address": [{
"City": "City-1"
}, {
"Country": "Country-1"
}]
},
{
"Name": "ABC-1",
"Phone": "123-1",
"Address": [{
"City": "City-2"
}, {
"Country": "Country-2"
}]
}
]
You can read the file and output as follows :
import json
my_file='test.json'
with open(my_file, 'r') as my_data:
data = json.load(my_data)
print data
for elm in data:
print elm['Phone']
print elm['Name']
print elm['Address'][0]['City']
print elm['Address'][1]['Country']
First of all this is not valid json format,
You can check here you json format
You should use this try.json file
{
"data":[{
"Name": "ABC",
"Phone":"123",
"Address":[{"City":"City-1"},{"Country":"Country-1"}]
}, {
"Name": "ABC-1",
"Phone":"123-1",
"Address":[{"City":"City-2"},{"Country":"Country-2"}]
}]
}
here is try.py
import json
json_open = open("try.json")
json_data = json.load(json_open)
for i in range(len(json_data['data'])):
print(json_data['data'][i]["Name"])
Note : python code is written in python3
output is look line this
ABC
ABC-1

Categories