Python read a xml 1 -M and create json - python

i have develop a python script taht read a tag of xml file and after convert the result into json.
Now the problem is that the xml for each one element have some tag (relation 1 - M)
<idpoint>1021</idpoint>
<tipopoint>1</tipopoint>
<latitude>45.188377380</latitude>
<longitude>8.612257004</longitude>
<previsione time="2015-07-11T12:00:00">
<id_tempo>1</id_tempo>
<desc_tempo>sereno</desc_tempo>
<symbol_day>1</symbol_day>
<temp>33</temp>
</previsione>
<previsione time="2015-07-11T18:00:00">
<id_tempo>1</id_tempo>
<desc_tempo>sereno</desc_tempo>
<symbol_day>1</symbol_day>
<temp>29</temp>
</previsione>
My code python read the first tag and when i arrive to tag previsione that is repeat 2 time for the same point i take the first value of first tag previsioni but doesn't take the second.
I could recreate a same record but this time take the value of second tag previsioni.
this is a snippet of my python code
json_array = [];
for path in files:
with open(path, 'r') as fr:
print "Parsing xmldoc %s" % path
xmldoc = minidom.parse(fr)
if tipo == "allerte":
items = xmldoc.getElementsByTagName("point")
else:
items = xmldoc.getElementsByTagName("localita")
for item in items:
obj = dict()
if tipo == "allerte":
obj['id'] = item.getElementsByTagName("idpoint")[0].firstChild.nodeValue
else:
obj['id'] = item.getElementsByTagName("idpoint")[0].firstChild.nodeValue
obj['latitude'] = float(item.getElementsByTagName("latitude")[0].firstChild.nodeValue)
obj['longitude'] = float(item.getElementsByTagName("longitude")[0].firstChild.nodeValue)
#TODO: IL symbol code va recuperato dalla prima previsione
sobj['symbolcode'] = int(item.getElementsByTagName("id_tempo")[0].firstChild.nodeValue)
json_array.append(obj)
return json.dumps(json_array)
Any help to integrate this code for create into json file 2 element for the 2 tag relation?
Thanks

There is a quick way to get json from xml, using xmltodict. This module creates nice dict from your xml, and you can easily manipulate your data like it is pure json.
Let's assume, that your xml sample is saved as t2.xml file, enveloped by <xml>...</xml> tags.
Then this script
#!/usr/bin/env python
# coding: utf-8
import sys
import xmltodict
import json
with open('t2.xml', 'r') as data:
print "Parsing xmldoc test.xml"
dict = xmltodict.parse(data)
print(json.dumps(dict, indent=4, sort_keys=True))
will produce json as following:
{
"xml": {
"idpoint": "1021",
"latitude": "45.188377380",
"longitude": "8.612257004",
"previsione": [
{
"#time": "2015-07-11T12:00:00",
"desc_tempo": "sereno",
"id_tempo": "1",
"symbol_day": "1",
"temp": "33"
},
{
"#time": "2015-07-11T18:00:00",
"desc_tempo": "sereno",
"id_tempo": "1",
"symbol_day": "1",
"temp": "29"
}
],
"tipopoint": "1"
}
}
In particular, you get both of your previsione elements properly in an array and can use them as you need.

Related

converting txt file into json using python?

I have a log file that has the format as follows:
Nov 28 06:26:45 server-01 dhcpd: DHCPDISCOVER from cc:d3:e2:7a:af:40 via 10.39.192.1
Nov 28 06:26:45 server-01 dhcpd: DHCPOFFER on 10.39.255.253 to cc:d3:e2:7a:af:40 via 10.39.192.1
The next step is to convert the text data into a JSON using Python. So far, I have the python script.
Now, the JSON file is created in the following format:
# Python program to convert text
# file to JSON
import json
# the file to be converted
filename = 'Logs.txt'
# resultant dictionary
dict1 = {}
# fields in the sample file
fields =['timestamp', 'Server', 'Service', 'Message']
with open(filename) as fh:
# count variable for employee id creation
l = 1
for line in fh:
# reading line by line from the text file
description = list( line.strip().split(None, 4))
# for output see below
print(description)
# for automatic creation of id for each employee
sno ='emp'+str(l)
# loop variable
i = 0
# intermediate dictionary
dict2 = {}
while i<len(fields):
# creating dictionary for each employee
dict2[fields[i]]= description[i]
i = i + 1
# appending the record of each employee to
# the main dictionary
dict1[sno]= dict2
l = l + 1
# creating json file
out_file = open("test5.json", "w")
json.dump(dict1, out_file, indent = 4)
out_file.close()
which gives the following output:
{
"emp1": { "timestamp": "Nov", "Server": "28", "Service": "06:26:45", "Message": "server-01" },
"emp2": { "timestamp": "Nov", "Server": "28", "Service": "06:26:45", "Message": "server-01" }
}
But I need an ouput like:
{
"timestamp":"Nov 28 06:26:26",
"Server":"server-01",
"Service":"dhcpd",
"Message":"DHCPOFFER on 10.45.45.31 to cc:d3:e2:7a:b9:6b via 10.45.0.1",
}
I don't know why it's not printing the whole data. Can anyone help me with this?
The problem with your code is that you did .split(None, 4), which allows only 4 splits on the input string. Since the date contains spaces too, the result of this will be (e.g. for the first line of your input):
['Nov', # timestamp
'28', # Server
'06:26:45', # Service
'server-01', # Message
'dhcpd: DHCPDISCOVER from cc:d3:e2:7a:af:40 via 10.39.192.1']
You even printed this, so I'm surprised you didn't notice something is wrong.
Now, the first element of the list is assigned to the key 'timestamp', the second element to the key 'Server', and so on. This is how you get a dict that looks like:
{ "timestamp": "Nov", "Server": "28", "Service": "06:26:45", "Message": "server-01" }
Instead, you want to split a maximum of five times. The first three elements of the resultant split are the timestamp.
# Don't need that extra list(), since .split() already returns a list
description = line.strip().split(None, 5)
# Join the first three elements,
joined_timestamp = " ".join(description[:3])
# and replace them in the list
# Setting a slice of a list: See https://stackoverflow.com/q/10623302/843953
description[:3] = [joined_timestamp]
Then, your description looks like this:
['Nov 28 06:26:45',
'server-01',
'dhcpd:',
'DHCPDISCOVER from cc:d3:e2:7a:af:40 via 10.39.192.1']
and the elements fields now correspond to the values in description.
Note that you could replace that entire while i < len(fields)... loop with simply dict2 = dict(zip(fields, description))
P.S.: You might want to clean up other elements of description, such as description[2] = description[2].rstrip(":") to remove the trailing colon in 'dhcpd:'

Appending "size" element to last json child element for a sunburst diagram

I have a working python script that takes my csv columns data and converts it to a json file to be read by my d3 sunburst visualization. Problem is that there is no "size" element at the final child element which is needed to populate the sunburst diagram correctly.
Below is the script I have that reads the csv to a json the way I need it. I've tried modifying the script with an if else loop to find where there is no child element (the last element) and then appending on that element the "size: 1" but nothing happens.
This is example csv data. The code should work for anything though.
Energy, Grooming, Shedding, Trainability, Group, Breed
Regular Exercise, 2-3 Times a Week Brushing, Seasonal, Easy Training, Toy Group, Affenpinscher
import csv
from collections import defaultdict
def ctree():
return defaultdict(ctree)
def build_leaf(name, leaf):
res = {"name": name}
# add children node if the leaf actually has any children
if len(leaf.keys()) > 0:
res["children"] = [build_leaf(k, v) for k, v in leaf.items()]
return res
def main():
tree = ctree()
# NOTE: you need to have test.csv file as neighbor to this file
with open('test.csv') as csvfile:
reader = csv.reader(csvfile)
for rid, row in enumerate(reader):
if rid == 0:
continue
# usage of python magic to construct dynamic tree structure and
# basically grouping csv values under their parents
leaf = tree[row[0]]
for cid in range(1, len(row)):
leaf = leaf[row[cid]]
# building a custom tree structure
res = []
for name, leaf in tree.items():
res.append(build_leaf(name, leaf))
# this is what I tried to append the size element
def parseTree(leaf):
if len(leaf["children"]) == 0:
return obj["size"] == 1
else:
for child in leaf["children"]:
leaf['children'].append(parseTree(child))
# printing results into the terminal
import json
import uuid
from IPython.display import display_javascript, display_html, display
print(json.dumps(res, indent=2))
main()
The final child element needs to read something like this:
[
{
"name": "Regular Exercise",
"children": [
{
"name": "2-3 Times a Week Brushing",
"children": [
{
"name": "Seasonal",
"children": [
{
"name": "Easy Training",
"children": [
{
"name": "Toy Group",
"children": [
{
"name": "Affenpinscher",
"size": 1
}
]
}]}]}]}]}]}
To add size to the last entry:
import csv
from collections import defaultdict
import json
#import uuid
#from IPython.display import display_javascript, display_html, display
def ctree():
return defaultdict(ctree)
def build_leaf(name, leaf):
res = {"name": name}
# add children node if the leaf actually has any children
if leaf.keys():
res["children"] = [build_leaf(k, v) for k, v in leaf.items()]
else:
res['size'] = 1
return res
def main():
tree = ctree()
# NOTE: you need to have test.csv file as neighbor to this file
with open('test.csv') as csvfile:
reader = csv.reader(csvfile)
header = next(reader) # read the header row
for row in reader:
# usage of python magic to construct dynamic tree structure and
# basically grouping csv values under their parents
leaf = tree[row[0]]
for value in row[1:]:
leaf = leaf[value]
# building a custom tree structure
res = []
for name, leaf in tree.items():
res.append(build_leaf(name, leaf))
# printing results into the terminal
print(json.dumps(res, indent=2))
main()

Syntax to load nested in nested keys of JSON files

I have a big tree in a JSON file and I'm searching the python syntax for loading nested in nested keys from this JSON.
Assume I have this :
{
"FireWall": {
"eth0": {
"INPUT": {
"PING": 1,
}
}
}
}
According to the man page and some questions in Stackoverflow i tried this (and some variations) :
import json
config = open('config.json', 'r')
data = json.load('config')
config.close()
if data['{"FireWall", {"eth0", {"INPUT", {"Ping"}}}}'] == 1:
print('This is working')
With no result. What is the right way to do this (as simple as possible) ? Thank you !
You are trying data = json.load('config') to load string not file object and data['{"FireWall", {"eth0", {"INPUT", {"Ping"}}}}'] it's not right way to access nested dictionary key value.
import json
with open('config.json', 'r') as f:
data = json.load(f)
if data["FireWall"]["eth0"]["INPUT"]["Ping"] == 1:
print('This is working')
data is a nested dictionary, so:
data["FireWall"]["eth0"]["INPUT"]["Ping"]
will be equal to 1; or at least it will when you fix your call to json.load.
Try this:
data["FireWall"]["eth0"]["INPUT"]["PING"]
This will give you the value in PING

Writing JSON data in python. Format

I have this method that writes json data to a file. The title is based on books and data is the book publisher,date,author, etc. The method works fine if I wanted to add one book.
Code
import json
def createJson(title,firstName,lastName,date,pageCount,publisher):
print "\n*** Inside createJson method for " + title + "***\n";
data = {}
data[title] = []
data[title].append({
'firstName:', firstName,
'lastName:', lastName,
'date:', date,
'pageCount:', pageCount,
'publisher:', publisher
})
with open('data.json','a') as outfile:
json.dump(data,outfile , default = set_default)
def set_default(obj):
if isinstance(obj,set):
return list(obj)
if __name__ == '__main__':
createJson("stephen-king-it","stephen","king","1971","233","Viking Press")
JSON File with one book/one method call
{
"stephen-king-it": [
["pageCount:233", "publisher:Viking Press", "firstName:stephen", "date:1971", "lastName:king"]
]
}
However if I call the method multiple times , thus adding more book data to the json file. The format is all wrong. For instance if I simply call the method twice with a main method of
if __name__ == '__main__':
createJson("stephen-king-it","stephen","king","1971","233","Viking Press")
createJson("william-golding-lord of the flies","william","golding","1944","134","Penguin Books")
My JSON file looks like
{
"stephen-king-it": [
["pageCount:233", "publisher:Viking Press", "firstName:stephen", "date:1971", "lastName:king"]
]
} {
"william-golding-lord of the flies": [
["pageCount:134", "publisher:Penguin Books", "firstName:william","lastName:golding", "date:1944"]
]
}
Which is obviously wrong. Is there a simple fix to edit my method to produce a correct JSON format? I look at many simple examples online on putting json data in python. But all of them gave me format errors when I checked on JSONLint.com . I have been racking my brain to fix this problem and editing the file to make it correct. However all my efforts were to no avail. Any help is appreciated. Thank you very much.
Simply appending new objects to your file doesn't create valid JSON. You need to add your new data inside the top-level object, then rewrite the entire file.
This should work:
def createJson(title,firstName,lastName,date,pageCount,publisher):
print "\n*** Inside createJson method for " + title + "***\n";
# Load any existing json data,
# or create an empty object if the file is not found,
# or is empty
try:
with open('data.json') as infile:
data = json.load(infile)
except FileNotFoundError:
data = {}
if not data:
data = {}
data[title] = []
data[title].append({
'firstName:', firstName,
'lastName:', lastName,
'date:', date,
'pageCount:', pageCount,
'publisher:', publisher
})
with open('data.json','w') as outfile:
json.dump(data,outfile , default = set_default)
A JSON can either be an array or a dictionary. In your case the JSON has two objects, one with the key stephen-king-it and another with william-golding-lord of the flies. Either of these on their own would be okay, but the way you combine them is invalid.
Using an array you could do this:
[
{ "stephen-king-it": [] },
{ "william-golding-lord of the flies": [] }
]
Or a dictionary style format (I would recommend this):
{
"stephen-king-it": [],
"william-golding-lord of the flies": []
}
Also the data you are appending looks like it should be formatted as key value pairs in a dictionary (which would be ideal). You need to change it to this:
data[title].append({
'firstName': firstName,
'lastName': lastName,
'date': date,
'pageCount': pageCount,
'publisher': publisher
})

KeyError In Python With json.dumps

I'm trying to work with dictionaries inside a list in a JSON file. The data imports fine and reads fine. For the life of me I can't figure out how to printout the "member_id" keys. I just want to print the list of "member_id" numbers. I was initially using json.loads, then switched to json.dumps. Any help would really be appreciated.
import urllib2
import json
nyt_api_key = '72c9a68bbc504e91a3919efda17ae621%3A7%3A70586819'
url= 'http://api.nytimes.com/svc/politics/v3/us/legislative/congress/113'
json_obj = urllib2.urlopen(url)
data = json.load(json_obj)
data2 = json.dumps(data, sort_keys=True, indent=True, skipkeys = True)
print data2
Output from print data2: (The list goes on and on so it is truncated. There is a closing bracket at the bottom of the list. So it's dictionaries within a list.)
"positions": [
{
"dw_nominate": "0.466",
"member_id": "A000055",
"vote_position": "Yes"
},
{
"dw_nominate": "0.995",
"member_id": "A000367",
"vote_position": "Yes"
},
{
"dw_nominate": "0.666",
"member_id": "A000369",
"vote_position": "Yes"
},
Output from print data2['member_id'], output is the same if using 'positions', 'vote_position', etc.:
Traceback (most recent call last):
File "/Users/Owner/PycharmProjects/untitled2/1", line 9, in <module>
print data2["positions"]
TypeError: string indices must be integers, not str
Output from print data:
u'positions': [{u'dw_nominate': u'0.466', u'vote_position': u'Yes', u'member_id': u'A000055'}, {u'dw_nominate': u'0.995', u'vote_position': u'Yes', u'member_id': u'A000367'}, {u'dw_nominate': u'0.666', u'vote_position': u'Yes', u'member_id': u'A000369'}
Output from print data['positions']:
print data["positions"]
KeyError: 'positions'
Output from print.data(keys):
[u'status', u'results', u'copyright']
Process finished with exit code 0
I just want to print the list of "member_id" numbers.
So you need to loop over positions and access the member_id in each dict:
data ={"positions": [
{
"dw_nominate": "0.466",
"member_id": "A000055",
"vote_position": "Yes"
},
{
"dw_nominate": "0.995",
"member_id": "A000367",
"vote_position": "Yes"
},
{
"dw_nominate": "0.666",
"member_id": "A000369",
"vote_position": "Yes"
}]}
print([d["member_id"] for d in data["results"]["positions"]])
['A000055', 'A000367', 'A000369']
If you look at the API documentation there are examples of each json response.
data2 is a string value, it doesn't have keys. I think what you want to print is data["positions"]
That's a weird output from data, you don't even have the braces. Try printing the type(data), it should be dict
So I should change the heading of this to Scrapping JSON for XML in Python. I'm sure not everyone else would have the same issues I did with JSON but after many frustrating hours I decided to go down path #2... the xml version. The xml version was much easier to work with right out of the gate. In about 1/10 the time I got what I was looking for.
from urllib2 import urlopen
from xml.dom import minidom
feed = urlopen("http://api.nytimes.com/svc/politics/v3/us/legislative.xml?
doc = minidom.parse(feed)
id_element = doc.getElementsByTagName("member_id")
id_number0 = id_element[0].childNodes[0].nodeValue #just a sample
id_number1 = id_element[1].childNodes[0].nodeValue #just a sample
id_number2 = id_element[2].childNodes[0].nodeValue #just a sample
print len(id_element) #to see how many items were in the variable
count = 0
for item in id_element:
print id_element[count].childNodes[0].nodeValue
count = count + 1
if count == 434:
break
This is definitely not the cleanest loop. I'm still working on that. But the code solves the problem that I had originally posted. The API key is not the actual one, formatting in the answer window was throwing it off so I just erased a bunch of it. You can find the API at the NYT developer website.
Thanks to everyone who posted.

Categories