Creating a custom json from existing json - python

I have Json something like below
Input Json
"blocks": [
{
"key": "",
"text": "Contents",
"type": "unstyled",
"depth": 0,
"inlineStyleRanges": [
{
"offset": 0,
"length": 8,
"style": "BOLD"
}
],
"entityRanges": [],
"data": {}
},
{
"key": "",
"text": "1.\u00a0\u00a0\u00a0\u00a0 Completed\n Activities & Accomplishments\u00a0 1",
"type": "unstyled",
"depth": 0,
"inlineStyleRanges": [],
"entityRanges": [
{
"offset": 0,
"length": 49,
"key": 0
}
],
"data": {}
},
{
"key": "",
"text": "2.\u00a0\u00a0\u00a0\u00a0 Planned Activities for\n Next Reporting Period\u00a0 3",
"type": "unstyled",
"depth": 0,
"inlineStyleRanges": [],
"entityRanges": [
{
"offset": 0,
"length": 55,
"key": 1
}
],
"data": {}
},
I am trying to extract "text" key data and want to convert it into new json in the key value format
I am able to extract text properly now I just wanna know how to separate numeric from letters successfully
def jsontojson():
with open('C:\Extraction\Docs\TMS TO 692M15-22-F-00073 Monthly Status _ Financial Report July 2022.json') as json_file:
# json1=json_file.read()
json1=json.load(json_file)
value=""
for dict1 in json1["blocks"]:
# print(dict1)
for key in dict1:
if key=="text":
value+=dict1[key]
dict2={}
d=value.split()
print("Value of d",d)
if str(d).isdigit():
dict2['key1']=d
else:
dict2['desciption']=d
print("Dictionary is",dict2['key'])
For the above code it gives me Key error : key1
Let me know where i am wrong or what I need to do so that i can get the Output
Expected OUTPUT
[
{
"key": "",
"text": "Contents"
},
{
"key": "1.",
"text": "Completed Activities & Accomplishments"
},
{
"key": "2.",
"text": "Planned Activities for Next Reporting Period"
},
]

Try (json1 contains the data from your question):
import re
out = []
for d in json1["blocks"]:
num = re.search(r"^(\d+\.)\s*(.*)", d["text"].replace("\n", ""))
out.append(
{
"key": num.group(1) if num else "",
"text": (num.group(2) if num else d["text"]).rsplit(maxsplit=1)[0],
}
)
print(out)
Prints:
[
{"key": "", "text": "Contents"},
{"key": "1.", "text": "Completed Activities & Accomplishments"},
{"key": "2.", "text": "Planned Activities for Next Reporting Period"},
]

Another and easy way is (if your first list is called "Mylist" :
new_list =[]
for dic in Mylist:
new_list.append({'key':dic['key'],
'text':dic['text']})

Related

How to create an automatic mapping of possible JSON data options to be collected?

I've never heard of or found an option for what I'm looking for, but maybe someone knows a way:
To collect the data from a JSON I need to map manually it like this:
events = response['events']
for event in events:
tournament_name = event['tournament']['name']
tournament_slug = event['tournament']['slug']
tournament_category_name = event['tournament']['category']['name']
tournament_category_slug = event['tournament']['category']['slug']
tournament_category_sport_name = event['tournament']['category']['sport']['name']
tournament_category_sport_slug = event['tournament']['category']['sport']['slug']
tournament_category_sport_id = event['tournament']['category']['sport']['id']
The complete model is this:
{
"events": [
{
"tournament": {
"name": "Serie A",
"slug": "serie-a",
"category": {
"name": "Italy",
"slug": "italy",
"sport": {
"name": "Football",
"slug": "football",
"id": 1
},
"id": 31,
"flag": "italy",
"alpha2": "IT"
},
"uniqueTournament": {
"name": "Serie A",
"slug": "serie-a",
"category": {
"name": "Italy",
"slug": "italy",
"sport": {
"name": "Football",
"slug": "football",
"id": 1
},
"id": 31,
"flag": "italy",
"alpha2": "IT"
},
"userCount": 586563,
"id": 23,
"hasEventPlayerStatistics": true
},
"priority": 254,
"id": 33
},
"roundInfo": {
"round": 24
},
"customId": "Kdbsfeb",
"status": {
"code": 7,
"description": "2nd half",
"type": "inprogress"
},
"winnerCode": 0,
"homeTeam": {
"name": "Bologna",
"slug": "bologna",
"shortName": "Bologna",
"gender": "M",
"userCount": 39429,
"nameCode": "BOL",
"national": false,
"type": 0,
"id": 2685,
"subTeams": [
],
"teamColors": {
"primary": "#003366",
"secondary": "#cc0000",
"text": "#cc0000"
}
},
"awayTeam": {
"name": "Empoli",
"slug": "empoli",
"shortName": "Empoli",
"gender": "M",
"userCount": 31469,
"nameCode": "EMP",
"national": false,
"type": 0,
"id": 2705,
"subTeams": [
],
"teamColors": {
"primary": "#0d5696",
"secondary": "#ffffff",
"text": "#ffffff"
}
},
"homeScore": {
"current": 0,
"display": 0,
"period1": 0
},
"awayScore": {
"current": 0,
"display": 0,
"period1": 0
},
"coverage": 1,
"time": {
"initial": 2700,
"max": 5400,
"extra": 540,
"currentPeriodStartTimestamp": 1644159735
},
"changes": {
"changes": [
"status.code",
"status.description",
"time.currentPeriodStart"
],
"changeTimestamp": 1644159743
},
"hasGlobalHighlights": false,
"hasEventPlayerStatistics": true,
"hasEventPlayerHeatMap": true,
"id": 9645399,
"statusTime": {
"prefix": "",
"initial": 2700,
"max": 5400,
"timestamp": 1644159735,
"extra": 540
},
"startTimestamp": 1644156000,
"slug": "empoli-bologna",
"lastPeriod": "period2",
"finalResultOnly": false
}
]
}
In my example I am collecting 7 values.
But there are 83 possible values to be collected.
In case I want to get all the values options that exist in this JSON, is there any way to make this map sequence automatically to print so I can copy it to the code?
Because manually it takes too long to do and it's very tiring.
And the results of texts like print() in terminal would be something like:
tournament_name = event['tournament']['name']
tournament_slug = event['tournament']['slug']
...
...
...
And so on until delivering the 83 object paths with values to collect...
Then I could copy all the prints and paste into my Python file to retrieve the values or any other way to make the work easier.
If the elements in the events arrays are the same, this code works without errors.
def get_prints(recode: dict):
for key in recode.keys():
if type(recode[key]) == dict:
for sub_print in get_prints(recode[key]):
yield [key] + sub_print
else:
yield [key]
class Automater:
def __init__(self,name: str):
"""
Params:
name: name of json
"""
self.name = name
def get_print(self,*args):
"""
Params:
*args: keys json
"""
return '_'.join(args) + ' = ' + self.name + ''.join([f"['{arg}']" for arg in args])
For example, this code:
dicts = {
'tournament':{
'name':"any name",
'slug':'somthing else',
'sport':{
'name':'sport',
'anotherdict':{
'yes':True
}
}
}
}
list_names = get_prints(dicts)
for name in list_names:
print(auto.get_print(*name))
Gives this output:
tournament_name = event['tournament']['name']
tournament_slug = event['tournament']['slug']
tournament_sport_name = event['tournament']['sport']['name']
tournament_sport_anotherdict_yes = event['tournament']['sport']['anotherdict']['yes']

Find a value in a list of dictionaries

I have the following list:
{
"id":1,
"name":"John",
"status":2,
"custom_attributes":[
{
"attribute_code":"address",
"value":"st"
},
{
"attribute_code":"city",
"value":"st"
},
{
"attribute_code":"job",
"value":"test"
}]
}
I need to get the value from the attribute_code that is equal city
I've tried this code:
if list["custom_attributes"]["attribute_code"] == "city" in list:
var = list["value"]
But this gives me the following error:
TypeError: list indices must be integers or slices, not str
What i'm doing wrong here? I've read this solution and this solution but din't understood how to access each value.
Another solution, using next():
dct = {
"id": 1,
"name": "John",
"status": 2,
"custom_attributes": [
{"attribute_code": "address", "value": "st"},
{"attribute_code": "city", "value": "st"},
{"attribute_code": "job", "value": "test"},
],
}
val = next(d["value"] for d in dct["custom_attributes"] if d["attribute_code"] == "city")
print(val)
Prints:
st
Your data is a dict not a list.
You need to scan the attributes according the criteria you mentioned.
See below:
data = {
"id": 1,
"name": "John",
"status": 2,
"custom_attributes": [
{
"attribute_code": "address",
"value": "st"
},
{
"attribute_code": "city",
"value": "st"
},
{
"attribute_code": "job",
"value": "test"
}]
}
for attr in data['custom_attributes']:
if attr['attribute_code'] == 'city':
print(attr['value'])
break
output
st

Ignore specific JSON keys when extracting data in Python

I'm extracting certain keys in several JSON files and then converting it to a CSV in Python. I'm able to define a key list when I run my code and get the information I need.
However, there are certain sub-keys that I want to ignore from the JSON file. For example, if we look at the following snippet:
JSON Sample
[
{
"callId": "abc123",
"errorCode": 0,
"apiVersion": 2,
"statusCode": 200,
"statusReason": "OK",
"time": "2020-12-14T12:00:32.744Z",
"registeredTimestamp": 1417731582000,
"UID": "_guid_abc123==",
"created": "2014-12-04T22:19:42.894Z",
"createdTimestamp": 1417731582000,
"data": {},
"preferences": {},
"emails": {
"verified": [],
"unverified": []
},
"identities": [
{
"provider": "facebook",
"providerUID": "123",
"allowsLogin": true,
"isLoginIdentity": true,
"isExpiredSession": true,
"lastUpdated": "2014-12-04T22:26:37.002Z",
"lastUpdatedTimestamp": 1417731997002,
"oldestDataUpdated": "2014-12-04T22:26:37.002Z",
"oldestDataUpdatedTimestamp": 1417731997002,
"firstName": "John",
"lastName": "Doe",
"nickname": "John Doe",
"profileURL": "https://www.facebook.com/John.Doe",
"age": 50,
"birthDay": 31,
"birthMonth": 12,
"birthYear": 1969,
"city": "City, State",
"education": [
{
"school": "High School Name",
"schoolType": "High School",
"degree": null,
"startYear": 0,
"fieldOfStudy": null,
"endYear": 0
}
],
"educationLevel": "High School",
"favorites": {
"music": [
{
"name": "Music 1",
"id": "123",
"category": "Musician/band"
},
{
"name": "Music 2",
"id": "123",
"category": "Musician/band"
}
],
"movies": [
{
"name": "Movie 1",
"id": "123",
"category": "Movie"
},
{
"name": "Movie 2",
"id": "123",
"category": "Movie"
}
],
"television": [
{
"name": "TV 1",
"id": "123",
"category": "Tv show"
}
]
},
"followersCount": 0,
"gender": "m",
"hometown": "City, State",
"languages": "English",
"likes": [
{
"name": "Like 1",
"id": "123",
"time": "2014-10-31T23:52:53.0000000Z",
"category": "TV",
"timestamp": "1414799573"
},
{
"name": "Like 2",
"id": "123",
"time": "2014-09-16T08:11:35.0000000Z",
"category": "Music",
"timestamp": "1410855095"
}
],
"locale": "en_US",
"name": "John Doe",
"photoURL": "https://graph.facebook.com/123/picture?type=large",
"timezone": "-8",
"thumbnailURL": "https://graph.facebook.com/123/picture?type=square",
"username": "john.doe",
"verified": "true",
"work": [
{
"companyID": null,
"isCurrent": null,
"endDate": null,
"company": "Company Name",
"industry": null,
"title": "Company Title",
"companySize": null,
"startDate": "2010-12-31T00:00:00"
}
]
}
],
"isActive": true,
"isLockedOut": false,
"isRegistered": true,
"isVerified": false,
"lastLogin": "2014-12-04T22:26:33.002Z",
"lastLoginTimestamp": 1417731993000,
"lastUpdated": "2014-12-04T22:19:42.769Z",
"lastUpdatedTimestamp": 1417731582769,
"loginProvider": "facebook",
"loginIDs": {
"emails": [],
"unverifiedEmails": []
},
"rbaPolicy": {
"riskPolicyLocked": false
},
"oldestDataUpdated": "2014-12-04T22:19:42.894Z",
"oldestDataUpdatedTimestamp": 1417731582894,
"registered": "2014-12-04T22:19:42.956Z",
"regSource": "",
"socialProviders": "facebook"
}
]
I want to extract data from created and identities but ignore identities.favorites and identities.likes as well as their data underneath it.
This is what I have so far, below. I defined the JSON keys that I want to extract in the key_list variable:
Current Code
import json, pandas
from flatten_json import flatten
# Enter the path to the JSON and the filename without appending '.json'
file_path = r'C:\Path\To\file_name'
# Open and load the JSON file
json_list = json.load(open(file_path + '.json', 'r', encoding='utf-8', errors='ignore'))
# Extract data from the defined key names
key_list = ['created', 'identities']
json_list = [{k:d[k] for k in key_list} for d in json_list]
# Flatten and convert to a data frame
json_list_flattened = (flatten(d, '.') for d in json_list)
df = pandas.DataFrame(json_list_flattened)
# Export to CSV in the same directory with the original file name
export_csv = df.to_csv (file_path + r'.csv', sep=',', encoding='utf-8', index=None, header=True)
Similar to the key_list, I suspect that I would make an ignore list and factor that in the json_list for loop that I have? Something like:
key_ignore = ['identities.favorites', 'identities.likes']`
Then utilize the dict.pop() which looks like it will remove the unwanted sub-keys if it matches? Just not sure how to implement that correctly.
Expected Output
As a result, the code should extract data from the defined keys in key_list and ignore the sub keys defined in key_ignore, which is identities.favorites and identities.likes. Then the rest of the code will continue to convert it into a CSV:
created
identities.0.provider
identities.0.providerUID
identities...
2014-12-04T19:23:05.191Z
site
cb8168b0cf734b70ad541f0132763761
...
If the keys are always there, you can use
del d[0]['identities'][0]['likes']
del d[0]['identities'][0]['favorites']
or if you want to remove the columns from the dataframe after reading all the json data in you can use
df.drop(df.filter(regex='identities.0.favorites|identities.0.likes').columns, axis=1, inplace=True)

Extracting elements from json in python

I have the following json:
{
"request": {
"id": "123",
"url": "/aa/bb/cc",
"method": "GET",
"timestamp": "2018-08-09T08:41:38.432Z"
},
"response": {
"status": {
"code": 200,
"message": "OK"
},
"items": [
{
"id": "aaa",
"name": "w1"
},
{
"id": "bbb",
"name": "w2"
},
{
"id": "ccc",
"name": "w3"
}
]
}
}
I need to loop over items and print each name. I've tried the following code which doesn't work.
response = requests.get(url)
data = json.loads(response.content)
for group in data['response']['items']:
print data['response']['items'][group]['name']
When i replace group with 0 for example, I can access the first name:
data['response']['items'][0]['name']
However, I don't know in advanced how many elements are in the array.
As Joel mentioned, in the for loop,
for group in data['response']['items']:
you are assigning group the value from data['response']['items']. Hence group contains the value :
[
{
"id": "aaa",
"name": "w1"
},
{
"id": "bbb",
"name": "w2"
},
{
"id": "ccc",
"name": "w3"
}
]
So all you need to do is
print group['name']
You can use Pandas module and call read_json function.
import pandas as pd
df = pd.read_json(your_json_file.json)
for i in df.response['items']:
print(i['name'])
# w1
# w2
# w3
You could try this:
for i in range (0,len(d['response']['items'])):
print(d['response']['items'][i]['name'])
Output:
w1
w2
w3

Unable to pull data from json using python

I have the following json
{
"response": {
"message": null,
"exception": null,
"context": [
{
"headers": null,
"name": "aname",
"children": [
{
"type": "cluster-connectivity",
"name": "cluster-connectivity"
},
{
"type": "consistency-groups",
"name": "consistency-groups"
},
{
"type": "devices",
"name": "devices"
},
{
"type": "exports",
"name": "exports"
},
{
"type": "storage-elements",
"name": "storage-elements"
},
{
"type": "system-volumes",
"name": "system-volumes"
},
{
"type": "uninterruptible-power-supplies",
"name": "uninterruptible-power-supplies"
},
{
"type": "virtual-volumes",
"name": "virtual-volumes"
}
],
"parent": "/clusters",
"attributes": [
{
"value": "true",
"name": "allow-auto-join"
},
{
"value": "0",
"name": "auto-expel-count"
},
{
"value": "0",
"name": "auto-expel-period"
},
{
"value": "0",
"name": "auto-join-delay"
},
{
"value": "1",
"name": "cluster-id"
},
{
"value": "true",
"name": "connected"
},
{
"value": "synchronous",
"name": "default-cache-mode"
},
{
"value": "true",
"name": "default-caw-template"
},
{
"value": "blah",
"name": "default-director"
},
{
"value": [
"blah",
"blah"
],
"name": "director-names"
},
{
"value": [
],
"name": "health-indications"
},
{
"value": "ok",
"name": "health-state"
},
{
"value": "1",
"name": "island-id"
},
{
"value": "blah",
"name": "name"
},
{
"value": "ok",
"name": "operational-status"
},
{
"value": [
],
"name": "transition-indications"
},
{
"value": [
],
"name": "transition-progress"
}
],
"type": "cluster"
}
],
"custom-data": null
}
}
which im trying to parse using the json module in python. I am only intrested in getting the following information out of it.
Name Value
operational-status Value
health-state Value
Here is what i have tried.
in the below script data is the json returned from a webpage
json = json.loads(data)
healthstate= json['response']['context']['operational-status']
operationalstatus = json['response']['context']['health-status']
Unfortunately i think i must be missing something as the above results in an error that indexes must be integers not string.
if I try
healthstate= json['response'][0]
it errors saying index 0 is out of range.
Any help would be gratefully received.
json['response']['context'] is a list, so that object requires you to use integer indices.
Each item in that list is itself a dictionary again. In this case there is only one such item.
To get all "name": "health-state" dictionaries out of that structure you'd need to do a little more processing:
[attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
would give you a list of of matching values for health-state in the first context.
Demo:
>>> [attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
[u'ok']
You have to follow the data structure. It's best to interactively manipulate the data and check what every item is. If it's a list you'll have to index it positionally or iterate through it and check the values. If it's a dict you'll have to index it by it's keys. For example here is a function that get's the context and then iterates through it's attributes checking for a particular name.
def get_attribute(data, attribute):
for attrib in data['response']['context'][0]['attributes']:
if attrib['name'] == attribute:
return attrib['value']
return 'Not Found'
>>> data = json.loads(s)
>>> get_attribute(data, 'operational-status')
u'ok'
>>> get_attribute(data, 'health-state')
u'ok'
json['reponse']['context'] is a list, not a dict. The structure is not exactly what you think it is.
For example, the only "operational status" I see in there can be read with the following:
json['response']['context'][0]['attributes'][0]['operational-status']

Categories