Parse JSON data in Python to CSV file - python

I need to parse the below Json data using python and write to a csv file. Below I have included only 2 server names but my list is big. Please help with sample code to get the desired output.
Below is my json data in a file server_info.json:
{
"dev-server":
{
"hoststatus":
{
"host_name":"dev-server",
"current_state":"2",
"last_time_up":"1482525184"
},
"servicestatus":
{
"/ Filesystem Check":
{
"host_name":"dev-server",
"service_description":"/ Filesystem Check",
"current_state":"1",
"state_type":"1"
},
"/home Filesystem Check":
{
"host_name":"dev-server",
"service_description":"/home Filesystem Check",
"current_state":"2",
"state_type":"2"
}
}
},
"uat-server":
{
"hoststatus":
{
"host_name":"uat-server",
"current_state":"0",
"last_time_up":"1460000000"
},
"servicestatus":
{
"/ Filesystem Check":
{
"host_name":"uat-server",
"service_description":"/ Filesystem Check",
"current_state":"0",
"state_type":"1"
},
"/home Filesystem Check":
{
"host_name":"uat-server",
"service_description":"/home Filesystem Check",
"current_state":"1",
"state_type":"2"
}
}
}
}
Expected Output:
output format:
hoststatus.host_name,hoststatus.current_state,hoststatus.last_time_up
-------------------------------------------------------------
dev-server,2,1482525184
uat-server,0,1460000000
and
output format:
servicestatus.host_name,servicestatus.service_description,servicestatus.current_state,servicestatus.state_type
--------------------------------------------------------------------------------
dev-server,/ Filesystem Check,1,1
dev-server,/home Filesystem Check,2,2
uat-server,/ Filesystem Check,0,1
uat-server,/home Filesystem Check,1,2

Elaborating more on what Jean-Fracois Fabre mentioned, json.load() can be used to read a JSON file and parse into Python object representation of JSON. json.loads() does the same except that the input is a string instead of a file (see json module for more details).
Bearing this in mind, say if you have your server logs in a file then you can start with the following:
import json
file = open('logs.txt')
data = json.load(file) # now the JSON object is represented as Python dict
for key in data.keys(): # dev-server and uat-server are keys
service_status = data[key]['servicestatus'] # this would give out the servicestatus
host_status = data[key]['hoststatus'] # this would give out the hoststatus
With this, you could use csv module to write it as CSV file in the format you desire.

A example by list comprehension.
import json
d = json.loads(data)
print("\n".join([','.join((hstat['host_name'], hstat['current_state'], hstat['last_time_up']))
for g in d.values()
for k, hstat in g.items() if k == 'hoststatus']))
print("\n".join([','.join((v['host_name'], v['service_description'], v['current_state'], v['state_type']))
for g in d.values()
for k, sstat in g.items() if k == 'servicestatus'
for v in sstat.values()]))

Related

Inconsistent error: json.decoder.JSONDecodeError: Extra data: line 30 column 2 (char 590)

I have .json documents generated from the same code. Here multiple nested dicts are being dumped to the json documents. While loadling with json.load(opened_json), I get the json.decoder.JSONDecodeError: Extra data: line 30 column 2 (char 590) like error for some of of the files whereas not for others. It is not understood why. What is the proper way to dump multiple dicts (maybe nested) into json docs and in my current case what is way to read them all? (Extra: Dicts can be over multiple lines, so 'linesplitting' does not work probably.)
Ex: Say I am json.dump(data, file) with data = {'meta_data':{some_data}, 'real_data':{more_data}}.
Let us take these two fake files:
{
"meta_data": {
"id": 0,
"start": 1238397024.0,
"end": 1238397056.0,
"best": []
},
"real_data": {
"YAS": {
"t1": [
1238397047.2182617
],
"v1": [
5.0438767766574255
],
"v2": [
4.371670270544587
]
}
}
}
and
{
"meta_data": {
"id": 0,
"start": 1238397056.0,
"end": 1238397088.0,
"best": []
},
"real_data": {
"XAS": {
"t1": [
1238397047.2182617
],
"v1": [
5.0438767766574255
],
"v2": [
4.371670270544587
]
}
}
}
and try to load them using json.load(open(file_path)) for duplicatling the problem.
You chose not to offer a
reprex.
Here is the code I'm running
which is intended to represent what you're running.
If there is some discrepancy, update the original
question to clarify the details.
import json
from io import StringIO
some_data = dict(a=1)
more_data = dict(b=2)
data = {"meta_data": some_data, "real_data": more_data}
file = StringIO()
json.dump(data, file)
file.seek(0)
d = json.load(file)
print(json.dumps(d, indent=4))
output
{
"meta_data": {
"a": 1
},
"real_data": {
"b": 2
}
}
As is apparent, over the circumstances you have
described the JSON library does exactly what we
would expect of it.
EDIT
Your screenshot makes it pretty clear
that a bunch of ASCII NUL characters are appended
to the 1st file.
We can easily reproduce that JSONDecodeError: Extra data
symptom by adding a single line:
json.dump(data, file)
file.write(chr(0))
(Or perhaps chr(0) * 80 more closely matches the truncated screenshot.)
If your file ends with extraneous characters, such as NUL,
then it will no longer be valid JSON and compliant
parsers will report a diagnostic message when they
attempt to read it.
And there's nothing special about NUL, as a simple
file.write("X") suffices to produce that same
diagnostic.
You will need to trim those NULs from the file's end
before attempting to parse it.
For best results, use UTF8 unicode encoding with no
BOM.
Your editor should have settings for
switching to utf8.
Use $ file foo.json to verify encoding details,
and $ iconv --to-code=UTF-8 < foo.json
to alter an unfortunate encoding.
You need to read the file, you can do both of these.
data = json.loads(open("data.json").read())
or
with open("data.json", "r") as file:
data = json.load(file)

How to add data into a json key from a csv file using python

I am trying to add data into a json key from a csv file and maintain the original structure as is.. the json file looks like this..
{
"inputDocuments": {
"gcsDocuments": {
"documents": [
{
"gcsUri": "gs://test/.PDF",
"mimeType": "application/pdf"
}
]
}
},
"documentOutputConfig": {
"gcsOutputConfig": {
"gcsUri": "gs://test"
}
},
"skipHumanReview": false
The csv file I am trying to load has the following structure..
note that the
mimetype
is not included in the csv file.
I already have code that can do this, however its a bit manual and I am looking for a simpler approach that would just require a csv file with the values and this data will be added into the json structure. The expected outcome should look like this:
{
"inputDocuments": {
"gcsDocuments": {
"documents": [
{
"gcsUri": "gs://sampleinvoices/Handwritten/1.pdf",
"mimeType": "application/pdf"
},
{
"gcsUri": "gs://sampleinvoices/Handwritten/2.pdf",
"mimeType": "application/pdf"
}
]
}
},
"documentOutputConfig": {
"gcsOutputConfig": {
"gcsUri": "gs://test"
}
},
"skipHumanReview": false
The code that I am currently using, which is a bit manual looks like this..
import json
# function to add to JSON
def write_json(new_data, filename='keyvalue.json'):
with open(filename,'r+') as file:
# load existing data into a dict.
file_data = json.load(file)
# Join new_data with file_data inside documents
file_data["inputDocuments"]["gcsDocuments"]["documents"].append(new_data)
# Sets file's current position at offset.
file.seek(0)
# convert back to json.
json.dump(file_data, file, indent = 4)
# python object to be appended
y = {
"gcsUri": "gs://test/.PDF",
"mimeType": "application/pdf"
}
write_json(y)
I would suggest something like this:
import pandas as pd
import json
from pathlib import Path
df_csv = pd.read_csv("your_data.csv")
json_file = Path("your_data.json")
json_data = json.loads(json_file.read_text())
documents = [
{
"gcsUri": cell,
"mimeType": "application/pdf"
}
for cell in df_csv["column_name"]
]
json_data["inputDocuments"]["gcsDocuments"]["documents"] = documents
json_file.write_text(json.dumps(json_data))
Probably you should split this into separate functions, but it should communicate the general idea.

Python json nested

I have a problem with a nested json in python script, i need to reproduce the following jq query:
cat inventory.json | jq '.hostvars[] | [.openstack.hostname, .openstack.accessIPv4]'
the json file has a structure like this:
{
"hostvars": {
"096b430e-20f0-4655-bb97-9bb3ab2db73c": {
"openstack": {
"accessIPv4": "192.168.3.6",
"hostname": "vm-1"
}
}
"8fb7b9b7-5ccc-47c8-addf-64563fdd0d4c": {
"openstack": {
"accessIPv4": "192.168.3.7",
"hostname": "vm-2"
}
}
}
}
and the query with jq gives me the correct output:
# cat test.json | jq '.hostvars[] | [.openstack.hostname, .openstack.accessIPv4]'
[
"vm-1",
"192.168.3.6"
]
[
"vm-2",
"192.168.3.7"
]
Now i want reproduce this in python, to handle the individual values in variable but I can't parse the contents of each id, what with jq i do with .hostvars [].
with open('inventory.json', 'r') as inv:
data=inv.read()
obj=json.loads(data)
objh=obj['hostvars'][096b430e-20f0-4655-bb97-9bb3ab2db73c]['openstack']
print(objh)
Calling the id works, but if I replace it with 0 or [] I have a syntax error.
Serializing JSON Data
I think when you are dealing with json python you should use convert to Serializing:
The json module exposes two methods for serializing Python objects into JSON format.
dump() will write Python data to a file-like object. We use this when we want to serialize our Python data to an external JSON file.
dumps() will write Python data to a string in JSON format. This is useful if we want to use the JSON elsewhere in our program, or if we just want to print it to the console to check that it’s correct.
Both the dump() and dumps() methods allow us to specify an optional indent argument. This will change how many spaces is used for indentation, which can make our JSON easier to read.
json_str = json.dumps(data, indent=4)
for exampel:
import json
data={"user":{
"name":"CodeView",
"age":29
}
}
with open("data_file.json","w")as write_file:
json.dump(data,write_file)
json_str=json.dumps(data)
print(json_str)
json_data = {
"hostvars": {
"096b430e-20f0-4655-bb97-9bb3ab2db73c": {
"openstack": {
"accessIPv4": "192.168.3.6",
"hostname": "vm-1"
}
},
"8fb7b9b7-5ccc-47c8-addf-64563fdd0d4c": {
"openstack": {
"accessIPv4": "192.168.3.7",
"hostname": "vm-2"
}
}
}
}
result = [[value['openstack']['hostname'], value['openstack']['accessIPv4']]
for value in json_data['hostvars'].values()]
print(result)
output
[['vm-1', '192.168.3.6'], ['vm-2', '192.168.3.7']]

Python modify structure (to be accepted by BQ) of JSON with python and save it

I would like to ask if there is an easy way to modify JSON by using Python?
I have found some of the relevant topic- How to update json file with python But could not figure out the solution for my current issue.
Currently, JSON looks like this:
{
"X": [
{
"sample_topic_x":"sample_content_x_1",
...
}
{
"sample_topic_x":"sample_content_x_2",
...
}
......
]
"Y": [
{
"sample_topic_y":"sample_content_y_1",
...
}
{
"sample_topic_y":"sample_content_y_2",
...
}
......
]
}
Required: To be accepted by BQ / Need to remove "Y", keep only "X" in this format.
{"sample_topic_x":"sample_content_x_1",.....}
{"sample_topic_x":"sample_content_x_2",.....}
{"sample_topic_x":"sample_content_x_3",.....}
Any relevant documentation, topics?
P.S> Update 1.0
import json
json_path = 'C:\XXX\exportReport.json'
def updateJsonFile():
jsonFile = open(json_path, "r") # Open the JSON file for reading
data = json.load(jsonFile) # Read the JSON into the buffer
jsonFile.close() # Close the JSON file
updateJsonFile()
Solution:
import json
json_path = 'C:\XXX\exportReport.json'
output_path = 'C:\XXX\your_output_file.txt'
with open(json_path) as f:
data = json.loads(f.read())
# Opening output file in append mode
# Note: Output file is not JSON, as the required format is not valid json
with open(output_file, "a+") as op:
for element in data.get('X'):
op.write(json.dumps(element) + "\n")
Explanation:
Load the input json file using json.loads. The output file will be a plain text file and not a JSON file as the required output format is not a valid JSON. Use a .txt file for storing the output. Store value of json.loads() in data. To get inner element X which is a list of dictionaries, use data.get('X'), which will return list. Iterate over it and write json.dumps() to the output file, each element in a newline.
C:\XXX\exportReport.json
{
"X": [
{
"sample_topic_x":"sample_content_x_1",
...
}
{
"sample_topic_x":"sample_content_x_2",
...
}
......
]
"Y": [
{
"sample_topic_y":"sample_content_y_1",
...
}
{
"sample_topic_y":"sample_content_y_2",
...
}
......
]
}
C:\XXX\your_output_file.txt
{"sample_topic_x":"sample_content_x_1",.....}
{"sample_topic_x":"sample_content_x_2",.....}
{"sample_topic_x":"sample_content_x_3",.....}
You need extract parent data at first. And definite this to a variable. And search "X" data in this variable.

Loading JSON file for reading and selecting data

I have a json file that I load into python. I want to take a keyword from the file (which is very big), like country rank or review from info taken from the internet. I tried
json.load('filename.json')
but I am getting an error:
AttributeError: 'str' object has no attribute 'read.'
What am I doing wrong?
Additionally, how do I select part of a json file if it is very big?
I think you need to open the file then pass that to json load like this
import json
from pprint import pprint
with open('filename.json') as data:
output = json.load(data)
pprint(output)
Try the following:
import json
json_data_file = open("json_file_path", 'r').read() # r for reading the file
json_data = json.loads(json_data_file)
Access the data using the keys as follows :
json_data['key']
json.load() expects the file handle after it has been opened:
with open('filename.json') as datafile:
data = json.load(datafile)
For example if your json data looked like this:
{
"maps": [
{
"id": "blabla",
"iscategorical": "0"
},
{
"id": "blabla",
"iscategorical": "0"
}
],
"masks": {
"id": "valore"
},
"om_points": "value",
"parameters": {
"id": "valore"
}
}
To access parts of the data, use:
data["maps"][0]["id"]
data["masks"]["id"]
data["om_points"]
That code can be found in this SO answer:
Parsing values from a JSON file using Python?

Categories