Python - normalizing nested json file - python

I have nested json file and i am trying to get the data into data frame. I have to extract sensor-time, then elements and finally sensor info.
Here is how the json file looks like:
{
"sensor-time": {
"timezone": "America/Los_Angeles",
"time": "2019-11-21T01:00:04-08:00"
},
"status": {
"code": "OK"
},
"content": {
"element": [{
"element-id": 0,
"element-name": "Line 0",
"sensor-type": "SINGLE_SENSOR",
"data-type": "LINE",
"from": "2019-11-21T00:00:00-08:00",
"to": "2019-11-21T01:00:00-08:00",
"resolution": "ONE_HOUR",
"measurement": [{
"from": "2019-11-21T00:00:00-08:00",
"to": "2019-11-21T01:00:00-08:00",
"value": [{
"value": 0,
"label": "fw"
}, {
"value": 0,
"label": "bw"
}
]
}
]
}
]
},
"sensor-info": {
"serial-number": "D8:80:39:D9:6B:9B",
"ip-address": "192.168.0.3",
"name": "XD01",
"group": "Boost Mobile",
"device-type": "PC2"
}
}
And here is my code so far:
import json
from pandas.io.json import json_normalize
import glob
import urllib
import sqlalchemy as sa
# Create empty dataframe
# Drill through each file with json extension in the folder, open it, load it and parse it into dataframe
file = 'C:/Test/Loading/testfile.json'
with open(file) as json_file:
json_data = json.load(json_file)
df = json_normalize(json_data, meta=['sensor-time'])
df
and here is the output when I run my code:
I tried using flatten_json librarry and the best I can get it is with this code:
with open(file) as json_file:
json_data = json.load(json_file)
flat = flatten_json(json_data)
df = json_normalize(flat)
And i get output with one row with 33 columns. So in my case since i have multiple values under measurments part of json files, i am getting a column for each of the measurements. What i have to get is 3 rows with 24 columns. One row for each measurements.
So how do i modify this now?

The simplest way I think would be to use pandas.DataFrame(json_data); then you can access these information doing:
pandas.DataFrame(json_data)['sensor-time']['time']
pandas.DataFrame(json_data)['content']['element]
pandas.DataFrame(json_data)['content']['element]

Related

Iterate and update through folder of .json files and update value in python

I've struck out trying to find a suitable script to iterate through a folder of .json files and update a single line.
Below is an example json file located in a path among others. I would like to iterate through the json files in a folder containing several files like this with various information and update the "seller_fee_basis_points" from "0" to say "500" and save.
Would really appreciate the assistance.
{
"name": "Solflare X NFT",
"symbol": "",
"description": "Celebratory Solflare NFT for the Solflare X launch",
"seller_fee_basis_points": 0,
"image": "https://www.arweave.net/abcd5678?ext=png",
"animation_url": "https://www.arweave.net/efgh1234?ext=mp4",
"external_url": "https://solflare.com",
"attributes": [
{
"trait_type": "web",
"value": "yes"
},
{
"trait_type": "mobile",
"value": "yes"
},
{
"trait_type": "extension",
"value": "yes"
}
],
"collection": {
"name": "Solflare X NFT",
"family": "Solflare"
},
"properties": {
"files": [
{
"uri": "https://www.arweave.net/abcd5678?ext=png",
"type": "image/png"
},
{
"uri": "https://watch.videodelivery.net/9876jkl",
"type": "unknown",
"cdn": true
},
{
"uri": "https://www.arweave.net/efgh1234?ext=mp4",
"type": "video/mp4"
}
],
"category": "video",
"creators": [
{
"address": "SOLFLR15asd9d21325bsadythp547912501b",
"share": 100
}
]
}
}
Updated with an answer due to #JCaesar's help
import json
import glob
import os
SOURCE_DIRECTORY = r'my_favourite_directory'
KEY = 'seller_fee_basis_points'
NEW_VALUE = 500
for file in glob.glob(os.path.join(SOURCE_DIRECTORY, '*.json')):
json_data = json.loads(open(file, encoding="utf8").read())
# note that using the update method means
# that if KEY does not exist then it will be created
# which may not be what you want
json_data.update({KEY: NEW_VALUE})
json.dump(json_data, open(file, 'w'), indent=4)
I recommend using glob to find the files you're interested in. Then utilise the json module for reading and writing the JSON content.
This is very concise and has no sanity checking / exception handling but you should get the idea:
import json
import glob
import os
SOURCE_DIRECTORY = 'my_favourite_directory'
KEY = 'seller_fee_basis_points'
NEW_VALUE = 500
for file in glob.glob(os.path.join(SOURCE_DIRECTORY, '*.json')):
json_data = json.loads(open(file).read())
# note that using the update method means
# that if KEY does not exist then it will be created
# which may not be what you want
json_data.update({KEY: NEW_VALUE})
json.dump(json_data, open(file, 'w'), indent=4)

How to access uploaded json file google colab

I'm stuck trying to read the files in google colab, It should read the file as a simple JSON but I can't even do a json.dumps(file) without getting 100 of errors
Uploading the file:
import json
import csv
from google.colab import files
uploaded = files.upload()
Printing works, It shows the content of the file:
print(uploaded)
data = json.dumps(uploaded)
But I get Object of type 'bytes' is not JSON serializable when trying to do json.dumps(uploaded)
Shouldn't the file be read as json and not bytes? In some other cases, I tested it also read as dictionary
JSON file:
[
{
"type": "message",
"subtype": "channel_join",
"ts": "123",
"user": "DWADAWD",
"text": "<#DWADAWD> has joined the channel"
},
{
"type": "message",
"subtype": "channel_join",
"ts": "123",
"user": "DWADAWD",
"text": "<#DWADAWD> has joined the channel"
},
{
"text": "Let's chat",
"user_profile": {
"display_name": "XASD",
"team": "TDF31231",
"name": "XASD",
"is_restricted": false,
"is_ultra_restricted": false
},
"blocks": [
{
"type": "rich_text",
"block_id": "2N1",
"elements": [
{
"type": "rich_text_section",
"elements": [
{
"type": "text",
"text": "Let's chat"
}
]
}
]
}
]
}
]
If you upload just 1 file. You can get the content from its values()
data = next(iter(uploaded.values()))
Then, you can convert json string to dict
d = json.loads(data.decode())
Here's an example notebook
JSON handles Unicode strings, not byte sequences. Try:
json.dumps(uploaded.decode("utf-8"))
I prefer to use io and files.
First, I import them (and pandas):
import io
import pandas as pd
from google.colab import files
Then, I use a file widget to upload the file:
uploaded = files.upload()
To load the data into a dataframe:
df = pd.read_json(io.StringIO(uploaded.get('file.json').decode('utf-8')))
The dataframe df has all json data.

CSV file to JSON for nested array generic template using python (for csv to mongodb insert)

I want to create the JSON file from CSV file using the generic python script.
Found hone package from GitHub but some of the functionalities missing in that code.
csv to json
I want to code like generic template CSV to JSON.
[
{
"birth": {
"day": "7",
"month": "May",
"year": "1985"
},
"name": "Bob",
"reference": "TRUE",
"reference name": "Smith"
}
]
Only handled above type of JSON only.
[
{
"Type": "AwsEc2Instance",
"Id": "i-cafebabe",
"Partition": "aws",
"Region": "us-west-2",
"Tags": {
"billingCode": "Lotus-1-2-3",
"needsPatching": "true"
},
"Details": {
"AwsEc2Instance": {
"Type": "i3.xlarge",
"ImageId": "ami-abcd1234",
"IpV4Addresses": [ "54.194.252.215", "192.168.1.88" ],
"IpV6Addresses": [ "2001:db812341a2b::123" ],
"KeyName": "my_keypair",
"VpcId": "vpc-11112222",
"SubnetId": "subnet-56f5f633",
"LaunchedAt": "2018-05-08T16:46:19.000Z"
}
}
}
]
I want to handle nested array[] ,{}
I have done something like this before and below code can be modified as I have not seen your dataset.
dataframe = pd.read_excel('dataframefilepath', encoding='utf-8', header=0)
'''Adding to list to finally save it as JSON'''
df = []
for (columnName, columnData) in dataframe.iteritems():
if dataframe.columns.get_loc(columnName) > 0:
for indata, rwdata in dataframe.iterrows():
for insav, rwsave in df_to_Save.iterrows():
if rwdata.Selected_Prediction == rwsave.Selected_Prediction:
#print()
df_to_Save.loc[insav, 'Value_to_Save'] = rwdata[dataframe.columns.get_loc(columnName)]
#print(rwdata[dataframe.columns.get_loc(columnName)])
df.append(df_to_Save.set_index('Selected_Prediction').T.to_dict('record'))
df = eval(df)
'''Saving in JSON format'''
path_to_save = '\\your path'
with open(path_to_save, 'w') as json_file:
json.dump(df, json_file)

CSV to JSON with Specific format

I Have csv file with this data and using python i would like to convert in json Format.
I would like to convert in this format Json Format.Can you tell me the which library i should use or any suggestion for sudo code.
I am able to convert in standard json which has key value pair but i don't know how to convert below Json Format.
"T-shirt","Long-tshirt",18
"T-shirt","short-tshirt"19
"T-shirt","half-tshirt",20
"top","very-nice",45
"top","not-nice",56
{
"T-shirts":[
{
"name":"Long-tshirt",
"size":"18"
},
{
"name":"short-tshirt",
"size":"19"
},
{
"name":"half-tshirt",
"size":"20"
},
],
"top":[
{
"name":"very-nice"
"size":45
},
{
"name":"not-nice"
"size":45
},
]
}
In this code, I put your CSV into test.csv file: (as a heads up, the provided code was missing a comma before the 19).
"T-shirt","Long-tshirt",18
"T-shirt","short-tshirt",19
"T-shirt","half-tshirt",20
"top","very-nice",45
"top","not-nice",56
Then, using the built-in csv and json modules you can iterate over each row and add them to a dictionary. I used a defaultdict to save time, and write out that data to a json file.
import csv, json
from collections import defaultdict
my_data = defaultdict(list)
with open("test.csv") as csv_file:
reader = csv.reader(csv_file)
for row in reader:
if row: # To ignore blank lines
my_data[row[0]].append({"name": row[1], "size": row[2]})
with open("out.json", "w") as out_file:
json.dump(my_data, out_file, indent=2)
Generated out file:
{
"T-shirt": [
{
"name": "Long-tshirt",
"size": "18"
},
{
"name": "short-tshirt",
"size": "19"
},
{
"name": "half-tshirt",
"size": "20"
}
],
"top": [
{
"name": "very-nice",
"size": "45"
},
{
"name": "not-nice",
"size": "56"
}
]
}
import json
json_string = json.dumps(your_dict)
You now have a string containing json formatted date from your original dictionary - is that what you wanted?

Python: How to search and replace parts of a json file?

I'm new to Python and I would like to search and replace titles of IDs in a JSON-file. Normally I would use R for this Task, but how to do it in Python. Here a sample of my JSON code (with a Service ID and a layer ID). I'm interested in replacing the titles in the layer IDs:
...{"services": [
{
"id": "service",
"url": "http://...",
"title": "GEW",
"layers": [
{
"id": "0",
"title": "wrongTitle",
},
{
"id": "1",
"title": "againTitleWrong",
},
],
"options": {}
},],}
For the replace I would use a table/csv like this:
serviceID layerID oldTitle newTitle
service 0 wrongTitle newTitle1
service 1 againTitleWrong newTitle2
....
Do you have ideas? Thanks
Here's an working example on repl.it.
Code:
import json
import io
import csv
### json input
input = """
{
"layers": [
{
"id": "0",
"title": "wrongTitle"
},
{
"id": "1",
"title": "againTitleWrong"
}
]
}
"""
### parse the json
parsed_json = json.loads(input)
#### csv input
csv_input = """serviceID,layerID,oldTitle,newTitle
service,0,wrongTitle,newTitle1
service,1,againTitleWrong,newTitle2
"""
### parse csv and generate a correction lookup
parsed_csv = csv.DictReader(io.StringIO(csv_input))
lookup = {}
for row in parsed_csv:
lookup[row["layerID"]] = row["newTitle"]
#correct and print json
layers = parsed_json["layers"]
for layer in layers:
layer["title"] = lookup[layer["id"]]
parsed_json["layers"] = layers
print(json.dumps(parsed_json))
You don't say which version of Python you are using but there are built-on JSON parsers for the language.
For 2.x: https://docs.python.org/2.7/library/json.html
For 3.x: https://docs.python.org/3.4/library/json.html
These should be able to help you to parse the JSON and replace what you want.
As other users suggested, check the JSON module will be helpful.
Here gives a basic example on python2.7:
import json
j = '''{
"services":
[{
"id": "service",
"url": "http://...",
"title": "GEW",
"options": {},
"layers": [
{
"id": "0",
"title": "wrongTitle"
},
{
"id": "1",
"title": "againTitleWrong"
}
]
}]
}'''
s = json.loads(j)
s["services"][0]["layers"][0]["title"] = "new title"
# save json object to file
with open('file.json', 'w') as f:
json.dump(s, f)
You can index the element and change its title according to your csv file, which requires the use of CSV module.

Categories