I have a JSON file with lots of data, and I want to keep only specific data.
I thought reading the file, get all the data I want and save as a new JSON.
The JSON is like this:
{
"event": [
{
"date": "2019-01-01",
"location": "world",
"url": "www.com",
"comments": "null",
"country": "china",
"genre": "blues"
},
{
"date": "2000-01-01",
"location": "street x",
"url": "www.cn",
"comments": "null",
"country":"turkey",
"genre": "reds"
},
{...
and I want it to be like this (with just date and url from each event.
{
"event": [
{
"date": "2019-01-01",
"url": "www.com"
},
{
"date": "2000-01-01",
"url": "www.cn"
},
{...
I can open the JSON and read from it using
with open('xx.json') as f:
data = json.load(f)
data2=data["events"]["date"]
But I still need to understand how to save the data I want in a new JSON keeping it's structure
You can use loop comprehension to loop over the events in and return a dictionary containing only the keys that you want.
data = { "event": [
{
"date": "2019-01-01",
"location": "world",
"url": "www.com",
"comments": None,
"country": "china",
"genre": "blues",
},
{
"date": "2000-01-01",
"location": "street x",
"url": "www.cn",
"comments": None,
"country" :"turkey",
"genre":"reds",
}
]}
# List comprehension
data["event"] = [{"date": x["date"], "url": x["url"]} for x in data["event"]]
Alternatively, you can map a function over the events list
keys_to_keep = ["date", "url"]
def subset_dict(d):
return {x: d[x] for x in keys_to_keep}
data["event"] = list(map(subset_dict, data["event"]))
Related
I'm extracting certain keys in several JSON files and then converting it to a CSV in Python. I'm able to define a key list when I run my code and get the information I need.
However, there are certain sub-keys that I want to ignore from the JSON file. For example, if we look at the following snippet:
JSON Sample
[
{
"callId": "abc123",
"errorCode": 0,
"apiVersion": 2,
"statusCode": 200,
"statusReason": "OK",
"time": "2020-12-14T12:00:32.744Z",
"registeredTimestamp": 1417731582000,
"UID": "_guid_abc123==",
"created": "2014-12-04T22:19:42.894Z",
"createdTimestamp": 1417731582000,
"data": {},
"preferences": {},
"emails": {
"verified": [],
"unverified": []
},
"identities": [
{
"provider": "facebook",
"providerUID": "123",
"allowsLogin": true,
"isLoginIdentity": true,
"isExpiredSession": true,
"lastUpdated": "2014-12-04T22:26:37.002Z",
"lastUpdatedTimestamp": 1417731997002,
"oldestDataUpdated": "2014-12-04T22:26:37.002Z",
"oldestDataUpdatedTimestamp": 1417731997002,
"firstName": "John",
"lastName": "Doe",
"nickname": "John Doe",
"profileURL": "https://www.facebook.com/John.Doe",
"age": 50,
"birthDay": 31,
"birthMonth": 12,
"birthYear": 1969,
"city": "City, State",
"education": [
{
"school": "High School Name",
"schoolType": "High School",
"degree": null,
"startYear": 0,
"fieldOfStudy": null,
"endYear": 0
}
],
"educationLevel": "High School",
"favorites": {
"music": [
{
"name": "Music 1",
"id": "123",
"category": "Musician/band"
},
{
"name": "Music 2",
"id": "123",
"category": "Musician/band"
}
],
"movies": [
{
"name": "Movie 1",
"id": "123",
"category": "Movie"
},
{
"name": "Movie 2",
"id": "123",
"category": "Movie"
}
],
"television": [
{
"name": "TV 1",
"id": "123",
"category": "Tv show"
}
]
},
"followersCount": 0,
"gender": "m",
"hometown": "City, State",
"languages": "English",
"likes": [
{
"name": "Like 1",
"id": "123",
"time": "2014-10-31T23:52:53.0000000Z",
"category": "TV",
"timestamp": "1414799573"
},
{
"name": "Like 2",
"id": "123",
"time": "2014-09-16T08:11:35.0000000Z",
"category": "Music",
"timestamp": "1410855095"
}
],
"locale": "en_US",
"name": "John Doe",
"photoURL": "https://graph.facebook.com/123/picture?type=large",
"timezone": "-8",
"thumbnailURL": "https://graph.facebook.com/123/picture?type=square",
"username": "john.doe",
"verified": "true",
"work": [
{
"companyID": null,
"isCurrent": null,
"endDate": null,
"company": "Company Name",
"industry": null,
"title": "Company Title",
"companySize": null,
"startDate": "2010-12-31T00:00:00"
}
]
}
],
"isActive": true,
"isLockedOut": false,
"isRegistered": true,
"isVerified": false,
"lastLogin": "2014-12-04T22:26:33.002Z",
"lastLoginTimestamp": 1417731993000,
"lastUpdated": "2014-12-04T22:19:42.769Z",
"lastUpdatedTimestamp": 1417731582769,
"loginProvider": "facebook",
"loginIDs": {
"emails": [],
"unverifiedEmails": []
},
"rbaPolicy": {
"riskPolicyLocked": false
},
"oldestDataUpdated": "2014-12-04T22:19:42.894Z",
"oldestDataUpdatedTimestamp": 1417731582894,
"registered": "2014-12-04T22:19:42.956Z",
"regSource": "",
"socialProviders": "facebook"
}
]
I want to extract data from created and identities but ignore identities.favorites and identities.likes as well as their data underneath it.
This is what I have so far, below. I defined the JSON keys that I want to extract in the key_list variable:
Current Code
import json, pandas
from flatten_json import flatten
# Enter the path to the JSON and the filename without appending '.json'
file_path = r'C:\Path\To\file_name'
# Open and load the JSON file
json_list = json.load(open(file_path + '.json', 'r', encoding='utf-8', errors='ignore'))
# Extract data from the defined key names
key_list = ['created', 'identities']
json_list = [{k:d[k] for k in key_list} for d in json_list]
# Flatten and convert to a data frame
json_list_flattened = (flatten(d, '.') for d in json_list)
df = pandas.DataFrame(json_list_flattened)
# Export to CSV in the same directory with the original file name
export_csv = df.to_csv (file_path + r'.csv', sep=',', encoding='utf-8', index=None, header=True)
Similar to the key_list, I suspect that I would make an ignore list and factor that in the json_list for loop that I have? Something like:
key_ignore = ['identities.favorites', 'identities.likes']`
Then utilize the dict.pop() which looks like it will remove the unwanted sub-keys if it matches? Just not sure how to implement that correctly.
Expected Output
As a result, the code should extract data from the defined keys in key_list and ignore the sub keys defined in key_ignore, which is identities.favorites and identities.likes. Then the rest of the code will continue to convert it into a CSV:
created
identities.0.provider
identities.0.providerUID
identities...
2014-12-04T19:23:05.191Z
site
cb8168b0cf734b70ad541f0132763761
...
If the keys are always there, you can use
del d[0]['identities'][0]['likes']
del d[0]['identities'][0]['favorites']
or if you want to remove the columns from the dataframe after reading all the json data in you can use
df.drop(df.filter(regex='identities.0.favorites|identities.0.likes').columns, axis=1, inplace=True)
Hi im am trying to parse json data and gets this error every time the element
if ['fields']['assignee'] in each:
TypeError: list indices must be integers or slices, not str
>>>
My json is this
{
"expand": "schema,names",
"startAt": 1,
"maxResults": 50,
"total": 7363,
"issues": [
{
"expand": "operations,versionedRepresentations,editmeta,changelog,renderedFields",
"id": "591838",
"self": "https://jira.mynet.com/rest/api/2/issue/591838",
"key": "TEST-8564",
"fields": {
"summary": "delete tables 31-03-2020 ",
"customfield_10006": 2.0,
"created": "2020-02-27T10:29:12.000+0100",
"description": "A LOT OF TEXT",
"assignee": null,
"labels": [
"DATA",
"Refined"
],
"status": {
"self": "https://jira.mynet.com/rest/api/2/status/10000",
"description": "",
"iconUrl": "https://jira.mynet.com/",
"name": "To Do",
"id": "10000",
"statusCategory": {
"self": "https://jira.mynet.com/rest/api/2/statuscategory/2",
"id": 2,
"key": "new",
"colorName": "blue-gray",
"name": "To Do"
}
}
}
}
]
}
The element in ['fields']['assignee'] is NULL in this example
sometimes it is like this
"assignee": : {
"self": "https://mynet.com/rest/api/2/user?username=xxxxxx",
"name": "sij",
"key": "x",
"emailAddress": xx#mynet.com",
"avatarUrls": {
"48x48": "https://mynet.com/secure/useravatar?ownerId=bdysdh&avatarId=16743",
"24x24": "https://mynet.com/secure/useravatar?size=small&ownerId=bdysdh&avatarId=16743",
"16x16": "https://mynet.com/secure/useravatar?size=xsmall&ownerId=bdysdh&avatarId=16743",
"32x32": "https://mynet.com/secure/useravatar?size=medium&ownerId=bdysdh&avatarId=16743"
},
"displayName": "Bruce Springsteen",
"active": true,
"timeZone": "Arctic/Longyearbyen"
},
I am trying to check of assignee is null and if so print null
my code looks like this
with open('C:\\TEMP\\testdata.json') as json_file:
data = json.load(json_file)
for each in data['issues']:
if ['fields']['assignee'] in each:
print (['fields']['assignee']['name'])
else:
print ('null')
I have tried to put in [0] between ['fields']['assignee']['name'] but nothing seems to help.
Try with
if 'fields' in each and 'assignee' in each['fields']:
Note that you need the name of the key, not surrounded by square brackets.
Perhaps better:
for each in data['issues']:
print(each.get('fields', {}).get('assignee', {}).get('name', 'null'))
and if you can't guarantee that 'issues' exists in data either:
for each in data.get('issues', []):
<as before>
data.get('issues', []) returns an empty list if data['issuess'] doesn't exist.
I have the following json:
{
"request": {
"id": "123",
"url": "/aa/bb/cc",
"method": "GET",
"timestamp": "2018-08-09T08:41:38.432Z"
},
"response": {
"status": {
"code": 200,
"message": "OK"
},
"items": [
{
"id": "aaa",
"name": "w1"
},
{
"id": "bbb",
"name": "w2"
},
{
"id": "ccc",
"name": "w3"
}
]
}
}
I need to loop over items and print each name. I've tried the following code which doesn't work.
response = requests.get(url)
data = json.loads(response.content)
for group in data['response']['items']:
print data['response']['items'][group]['name']
When i replace group with 0 for example, I can access the first name:
data['response']['items'][0]['name']
However, I don't know in advanced how many elements are in the array.
As Joel mentioned, in the for loop,
for group in data['response']['items']:
you are assigning group the value from data['response']['items']. Hence group contains the value :
[
{
"id": "aaa",
"name": "w1"
},
{
"id": "bbb",
"name": "w2"
},
{
"id": "ccc",
"name": "w3"
}
]
So all you need to do is
print group['name']
You can use Pandas module and call read_json function.
import pandas as pd
df = pd.read_json(your_json_file.json)
for i in df.response['items']:
print(i['name'])
# w1
# w2
# w3
You could try this:
for i in range (0,len(d['response']['items'])):
print(d['response']['items'][i]['name'])
Output:
w1
w2
w3
returns
"messages": [
{
"text": "testing",
"ts": "1479967441.000004",
"user": "ray",
"type": "message",
"bot_id": "B379PT5AT"
},
{
"text": "SWAT start",
"type": "message",
"user": "john",
"ts": "1479967379.000003"
},
{
"text": "SWAT close",
"type": "message",
"user": "ray",
"ts": "1479967379.000003"
},
I would like to add another parameter
"messages": [
{
"text": "testing",
"ts": "1479967441.000004",
"user": "ray",
"type": "message",
"bot_id": "B379PT5AT"
"icon": "URL...."
},
{
"text": "SWAT start",
"type": "message",
"user": "john",
"ts": "1479967379.000003"
"icon": "URL...."
},
{
"text": "SWAT close",
"type": "message",
"user": "ray",
"ts": "1479967379.000003"
"icon": "URL...."
},
Code:
import simplejson
with open ('automation.json')as json_data:
data = simplejson.loads(json_data)
for r in data['messages']:
key = messages['icon']
messages["icon"] = ("testing")
outdata = simplejson.dumps(data)
You almost had it. Looping over data['messages'] accesses a list of dictionaries, which is why your assignment didn't quite work. You have to access the dictionary first, and then assign.
import json
with open('automation.json') as data_file:
data = json.load(data_file)['messages'] # you now have a the json as a python object.
# If you want to always add the same value to each dictionary
data = {d['icon'] = 'A Value you want' for d in data} # d represents a dictionary in data['messages']
# other wise, lets say the icon is linked to the user:
icon_urls = {'john': 'http://someurl.com', 'ray': 'http://someotherurl.com'}
data = {d['icon'] = icon_urls[d['user']] for d in data}
I have the following json
{
"response": {
"message": null,
"exception": null,
"context": [
{
"headers": null,
"name": "aname",
"children": [
{
"type": "cluster-connectivity",
"name": "cluster-connectivity"
},
{
"type": "consistency-groups",
"name": "consistency-groups"
},
{
"type": "devices",
"name": "devices"
},
{
"type": "exports",
"name": "exports"
},
{
"type": "storage-elements",
"name": "storage-elements"
},
{
"type": "system-volumes",
"name": "system-volumes"
},
{
"type": "uninterruptible-power-supplies",
"name": "uninterruptible-power-supplies"
},
{
"type": "virtual-volumes",
"name": "virtual-volumes"
}
],
"parent": "/clusters",
"attributes": [
{
"value": "true",
"name": "allow-auto-join"
},
{
"value": "0",
"name": "auto-expel-count"
},
{
"value": "0",
"name": "auto-expel-period"
},
{
"value": "0",
"name": "auto-join-delay"
},
{
"value": "1",
"name": "cluster-id"
},
{
"value": "true",
"name": "connected"
},
{
"value": "synchronous",
"name": "default-cache-mode"
},
{
"value": "true",
"name": "default-caw-template"
},
{
"value": "blah",
"name": "default-director"
},
{
"value": [
"blah",
"blah"
],
"name": "director-names"
},
{
"value": [
],
"name": "health-indications"
},
{
"value": "ok",
"name": "health-state"
},
{
"value": "1",
"name": "island-id"
},
{
"value": "blah",
"name": "name"
},
{
"value": "ok",
"name": "operational-status"
},
{
"value": [
],
"name": "transition-indications"
},
{
"value": [
],
"name": "transition-progress"
}
],
"type": "cluster"
}
],
"custom-data": null
}
}
which im trying to parse using the json module in python. I am only intrested in getting the following information out of it.
Name Value
operational-status Value
health-state Value
Here is what i have tried.
in the below script data is the json returned from a webpage
json = json.loads(data)
healthstate= json['response']['context']['operational-status']
operationalstatus = json['response']['context']['health-status']
Unfortunately i think i must be missing something as the above results in an error that indexes must be integers not string.
if I try
healthstate= json['response'][0]
it errors saying index 0 is out of range.
Any help would be gratefully received.
json['response']['context'] is a list, so that object requires you to use integer indices.
Each item in that list is itself a dictionary again. In this case there is only one such item.
To get all "name": "health-state" dictionaries out of that structure you'd need to do a little more processing:
[attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
would give you a list of of matching values for health-state in the first context.
Demo:
>>> [attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
[u'ok']
You have to follow the data structure. It's best to interactively manipulate the data and check what every item is. If it's a list you'll have to index it positionally or iterate through it and check the values. If it's a dict you'll have to index it by it's keys. For example here is a function that get's the context and then iterates through it's attributes checking for a particular name.
def get_attribute(data, attribute):
for attrib in data['response']['context'][0]['attributes']:
if attrib['name'] == attribute:
return attrib['value']
return 'Not Found'
>>> data = json.loads(s)
>>> get_attribute(data, 'operational-status')
u'ok'
>>> get_attribute(data, 'health-state')
u'ok'
json['reponse']['context'] is a list, not a dict. The structure is not exactly what you think it is.
For example, the only "operational status" I see in there can be read with the following:
json['response']['context'][0]['attributes'][0]['operational-status']