Python to dynamically build JSON with sub arrays - python

I can build JSON from simple dictionary {} and List [], but when I try to build more complex structures. I get '\' embedded in the output JSON.
The structure I want:
{"name": "alpha",
"results": [{"entry1":
[
{"sub1": "one"},
{"sub2": "two"}
]
},
{"entry2":
[
{"sub1": "one"},
{"sub2": "two"}
]
}
]
}
This is what I get:
{'name': 'alpha',
'results': '[{"entry1": "[{\\\\"sub1\\": \\\\"one\\\\"}, {\\\\"sub2\\\\": '
'\\\\"two\\\\"}]"}, {"entry2": "[{\\\\"sub1\\\\": \\\\"one\\\\"},
{\\\\"sub2\\\\": '
'\\\\"two\\\\"}]"}]'}
Note the embedded \\. Every time the code goes through json.dumps another \ is appended.
Here's code that almost works, but doesn't:
import json
import pprint
testJSON = {}
testJSON["name"] = "alpha"
#build sub entry List
entry1List = []
entry2List = []
topList = []
a1 = {}
a2 = {}
a1["sub1"] = "one"
a2["sub2"] = "two"
entry1List.append(a1)
entry1List.append(a2)
entry2List.append(a1)
entry2List.append(a2)
# build sub entry JSON values for Top List
tmpDict1 = {}
tmpDict2 = {}
tmpDict1["entry1"] = json.dumps(entry1List)
tmpDict2["entry2"] = json.dumps(entry2List)
topList.append(tmpDict1)
topList.append(tmpDict2)
# Now lets' add the List with 2 sub List to the JSON
testJSON["results"] = json.dumps(topList)
pprint.pprint (testJSON)

Look at this line:
tmpDict1["entry1"] = json.dumps(entry1List)
This is specifying that key entry1 have the value of the string output of converting entry1List to json. In essence, it's putting JSON in a JSON string, so it's escaped. To nest the datastructure, I'd go with:
tmpDict1["entry1"] = entry1List
Same with the other places. Once there is a tree of lists and dicts - you should only need to call json.dumps() once on the root container (either a dict or a list).

Related

Read a file and match lines above or below from the matching pattern

I'm reading an input json file, and capturing the array values into a dictionary, by matching tar.gz and printing a line above that (essentially the yaml file).
{"Windows": [
"/home/windows/work/input.yaml",
"/home/windows/work/windows.tar.gz"
],
"Mac": [
"/home/macos/required/utilities/input.yaml",
"/home/macos/required/utilities.tar.gz"
],
"Unix": [
"/home/unix/functional/plugins/input.yaml",
"/home/unix/functional/plugins/Plugin.tar.gz"
]
goes on..
}
Output of the dictionary:
{'/home/windows/work/windows.tar.gz': '/home/windows/work/input.yaml',
'/home/macos/required/utilities/utilities.tar.gz' : '/home/macos/required/input.yaml'
......
}
Problem being, if the entries of json changes, i.e. A) tar.gz entries can come as the 1st element in the list of values or B. or, its mix and match,
Irrespective of the entries, how can I get the output dictionary to be of above mentioned format only.
{ "Windows": [
"/home/windows/work/windows.tar.gz",
"/home/windows/work/input.yaml"
],
"Mac": [
"/home/macos/required/utilities/utilities.tar.gz",
"/home/macos/required/input.yaml"
],
"Unix": [
"/home/unix/functional/plugins/Plugin.tar.gz",
"/home/unix/functional/plugins/input.yaml"
]
goes on.. }
mix and match scenario.
{ "Windows": [
"/home/windows/work/windows.tar.gz",
"/home/windows/work/input.yaml"
],
"Mac": [
"/home/macos/required/utilities/input.yaml",
"/home/macos/required/utilities.tar.gz"
],
"Unix": [
"/home/unix/functional/plugins/Plugin.tar.gz",
"/home/unix/functional/plugins/input.yaml"
] }
My code snippet.
def read_input():
files_to_be_processed = {}
with open('input.json', 'r') as f:
lines = f.read().splitlines()
lines = [line.replace('"', '').replace(" ", '').replace(',', '') for line in lines]
for index, value in enumerate(lines):
match = re.match(r".*.tar.gz", line)
if match:
j = i-1 if i > 1 else 0
for k in range(j, i):
read_input[match.string] = lines[k]
print(read_input)
A method here is to have the following:
1- Using the JSON class in python makes your whole process much easier.
2- After taking the data in the JSON class, you can check each object (aka Windows/Max/Unix), for both the tar-gz and the yaml
3- Assign to new dictionary
Here is a quick code:
import json
def read_input():
files_to_be_processed = {}
with open('input.json','r') as f:
jsonObject = json.load(f)
for value in jsonObject.items():
tarGz = ""
Yaml = ""
for line in value[1]: #value[0] contains the key (e.g. Windows)
if line.endswith('.tar.gz'):
tarGz = line
elif line.endswith('.yaml'):
Yaml = line
files_to_be_processed[tarGz] = Yaml
print(files_to_be_processed)
read_input()
This code can be shortened and optimised using things like list comprehension and other methods, but it should be a good place to get started
One way could be for you to transform the list within your input json_dict into a dict that has a key for "yaml" and "gz"
json_dict_1 = dict.fromkeys(json_dict, dict())
for key in json_dict:
list_val = json_dict[key]
for entry in list_val:
entry_key = 'yaml' if 'yaml' in entry[-4:] else 'gz'
json_dict_1[key][entry_key] = entry
print(json_dict_1)
#{'Windows': {'yaml': '/home/unix/functional/plugins/input.yaml',
# 'gz': '/home/unix/functional/plugins/Plugin.tar.gz'},
# 'Mac': {'yaml': '/home/unix/functional/plugins/input.yaml',
# 'gz': '/home/unix/functional/plugins/Plugin.tar.gz'},
# 'Unix': {'yaml': '/home/unix/functional/plugins/input.yaml',
# 'gz': '/home/unix/functional/plugins/Plugin.tar.gz'}}

Remove dictionary from list of dictionaries if value has empty list

I have a list of dictionaries, and within the dictionary is a list.
{
"Credentials": [
{
"realName": "Mark Toga",
"toolsOut": [
"TL-482940",
"TL-482940"
],
"username": "291F"
},
{
"realName": "Burt Mader",
"toolsOut": [],
"username": "R114"
},
{
"realName": "Tim Johnson",
"toolsOut": [
"TL-482940"
],
"username": "E188"
}
]
}
I am attempting to parse this file so that it shows something like this:
Mark Toga: TL-482940, TL482940
Tim Johnson: TL-482940
Ommitting Burt Mader as he has no tools out.
I have it to a point where it displays the above with Burt Mader still (GUI output)
Edit: Here is a printout of newstr6 rather than the GUI image. I do want the GUI for my application, but for ease of reading:
Mark Toga: 'TL-482940', 'TL-482940',
Burt Mader: ,
Tim Johnson: 'TL-482940'
Here is my current code (I'm sure there are many efficiency improvements, but I mostly care about ommitting the dictionary with the empty list.)
## importing libraries
import json
from tkinter import *
from tkinter import ttk
from functools import partial
import pprint
mainWin = Tk()
mainWin.geometry('400x480')
mainWin.title('Select Tooling')
with open('Inventory.json','r+') as json_file:
data=json.load(json_file)
credData = data['Credentials']
noSID = [{k: v for k, v in d.items() if k != 'username'} for d in credData]
print(noSID)
pp = pprint.pformat(noSID)
ps = str(pp)
newstr1 = ps.replace('[','')
newstr2 = newstr1.replace(']','')
newstr3 = newstr2.replace('{','')
newstr4 = newstr3.replace('}','')
newstr5 = newstr4.replace("'realName': '","")
newstr6 = newstr5.replace("', 'toolsOut'","")
text = Label(mainWin,text=newstr6)
text.pack()
quitButton = Button(mainWin,text="Log Out",command=lambda:mainWin.destroy())
quitButton.pack()
mainWin.mainloop()
This smells like an X-Y Problem. You don't want to display the person that has no tools checked out, but you don't really need to remove them from the list to do that. You're relying on pprint to convert a dictionary to a string, and then messing with that string. Instead, just build the string from scratch, and don't include the people with no tools checked out.
data=json.load(json_file)
credData = data['Credentials']
# Since you're the one creating the string, you can choose what you want to put in it
# No need to create a NEW dictionary without the username keys
outstr = ""
for person in credData:
outstr += person['realName'] + ": " + ", ".join(person['toolsOut']) + "\n"
print(outstr)
This prints:
Mark Toga: TL-482940, TL-482940
Burt Mader:
Tim Johnson: TL-482940
Now, since you want to ignore the persons that don't have any tools, add that condition.
outstr = ""
for person in credData:
if person['toolsOut']:
outstr += person['realName'] + ": " + ", ".join(person['toolsOut']) + "\n"
print(outstr)
And you get:
Mark Toga: TL-482940,TL-482940
Tim Johnson: TL-482940
if person['toolsOut'] is identical to if len(person['toolsOut']) == 0 because empty lists are Falsy
If you really want to remove the elements of credData that have empty toolsOut keys, you can use this same condition in a list comprehension.
credData2 = [person for person in credData if person['toolsOut'])
You could filter out the unwanted items when you load the "credential" list:
credData = [d for d in data['Credentials'] if d.get("toolsOut")]
or you could have a separate variable for the filtered credentials
credWithTools = [d for d in credData if d.get("toolsOut")]
Just filter your list of dictionaries by aplying certain condition. In this case, dict key toolsOut associated content should be asserted as True:
def process_data(list_of_dicts, field):
res = []
for item in list_of_dicts:
if item[field]:
res.append(item)
return res
credData = process_data(data["Credentials"], "toolsOut")

Creating a json output in python

I am trying to return a response for a function in json form. The output is a list with each element being a dictionary. I don't see any mistake when I print the output. The problem arises when I iterate through the output. I get all the characters in the output one by one. See the sample code and sample output for proper understanding.
code:
import requests
import json
import sys
from bs4 import BeautifulSoup
from collections import OrderedDict
class Cricbuzz():
url = "http://synd.cricbuzz.com/j2me/1.0/livematches.xml"
def __init__(self):
pass
def getxml(self,url):
try:
r = requests.get(url)
except requests.exceptions.RequestException as e:
print e
sys.exit(1)
soup = BeautifulSoup(r.text,"html.parser")
return soup
def matchinfo(self,match):
d = OrderedDict()
d['id'] = match['id']
d['srs'] = match['srs']
d['mchdesc'] = match['mchdesc']
d['mnum'] = match['mnum']
d['type'] = match['type']
d['mchstate'] = match.state['mchstate']
d['status'] = match.state['status']
return d
def matches(self):
xml = self.getxml(self.url)
matches = xml.find_all('match')
info = []
for match in matches:
info.append(self.matchinfo(match))
data = json.dumps(info)
return data
c = Cricbuzz()
matches = c.matches()
print matches #print matches - output1
for match in matches:
print match #print match - output2
"print matches" i.e output1 in above code gives me following output:
[
{
"status": "Coming up on Dec 24 at 01:10 GMT",
"mchstate": "nextlive",
"mchdesc": "AKL vs WEL",
"srs": "McDonalds Super Smash, 2016-17",
"mnum": "18TH MATCH",
"type": "ODI",
"id": "0"
},
{
"status": "Ind U19 won by 34 runs",
"mchstate": "Result",
"mchdesc": "INDU19 vs SLU19",
"srs": "Under 19 Asia Cup, 2016",
"mnum": "Final",
"type": "ODI",
"id": "17727"
},
{
"status": "PRS won by 48 runs",
"mchstate": "Result",
"mchdesc": "PRS vs ADS",
"srs": "Big Bash League, 2016-17",
"mnum": "5th Match",
"type": "T20",
"id": "16729"
}
]
But "print match" i.e output2 in above code inside the for loop gives this output:
[
{
"
i
d
"
:
"
0
"
,
"
s
r
s
"
:
"
M
c
D
o
n
a
l
d
s
S
u
p
e
r
S
m
a
s
h
,
2
0
1
6
-
1
7
"
,
"
m
c
h
d
e
s
As you can see,a character gets printed on each line from matches. I would like to get the dictionary object when printing the match.
def matches(self):
xml = self.getxml(self.url)
matches = xml.find_all('match')
info = []
for match in matches:
info.append(self.matchinfo(match))
data = json.dumps(info) # This is a string
return data # This is a string
c = Cricbuzz()
matches = c.matches() # This is a string
print matches
for match in matches: # Looping over all characters of a string
print match
I think you just want return info, which is a list. You can json.dumps() outside of that function at a later point when you actually do need JSON.
Or if you do want that function to return a JSON string, then you have to parse it back into a list.
for match in json.loads(matches):
If you call json.dumps like you do on info before returning data, the value is converted to a json string. If you want to iterate over the iterable the json string represents, you have to load the data back out of the json.
Consider:
import json
info = [ { "a": 1}, { "b": 2} ]
data = json.dumps(info,indent=2)
print data
for i in data:
print i
for i in json.loads(data):
print i
$ python t.py
[
{
"a": 1
},
{
"b": 2
}
]
[
{
"
a
"
:
1
}
,
{
"
b
"
:
2
}
]
{u'a': 1}
{u'b': 2}
matches is a JSON string, not a dictionary, so for match in matches: iterates over the characters in the string.
If you want the dictionary, the function should return info rather than json.dumps(info). Or you could do:
for match in json.loads(matches):
to parse the JSON back into a dictionary.
Normally you should move data around in the program as structured types like dictionaries and lists, and only convert them to/from JSON when you're sending over a network or storing into a file.
Json.dumps returns a string.
If you expect to have each dict from list during iteration process you may wrap your response into:
matches = json.loads(matches)
Btw, it's nice to dumps it's previously as a simple JSON validation, because it makes a valid JSON from invalid: first of all replaces single quotes with double quotes, etc. That's why I suggest don't skip json.dumps as you're trying to do.

Datastructure for csv data

As indata I get for example the following (CSV-file):
1;data;data;data
1.1;data;data;data
1.1.1;data;data;data
1.1.2;data;data;data
2;data;data;data
2.1;data;data;data
etc...
I have tried to use a Tree data structure:
def Tree():
return collections.defaultdict(Tree)
But the problem is that I would like to store data as following:
t = Tree()
t[1][1] = [data,data,...,data]
...
t[1][1][1] = [data,data,...,data]
And this won't work with the data-structure defined above.
You cannot both store a key and children in a single dict entry:
tree[1] = [myData]
tree[1][5] = [myOtherData]
In this case, tree[1] would have to be both [myData] and {5: [MyOtherData]}
To work around it, you could store the data as a seperate element in the dict:
tree = {
1: {
'data': [myData]
5: {
'data': [myOtherData]
}
}
}
Now you can use:
tree[1]['data'] = [myData]
tree[1][5]['data'] = [myOtherData]
Or if you want, you can use another magic index like 0 (if 0 never occurs), but 'data' is probably clearer.

python + json: parse to list

I'm somewhat new to parsing JSON data with python (using python 2.7). There is a service that I have to send API calls to, and the JSON response is something like what I have below. the amount of items in 'row' can vary. What I need to do is take only the 'content' from the second line IF there is a second line, and put it into a list. Essentially, it is a list of only the 'campaign confirmation numbers' and nothing else. the number will also always be only 9 numeric numbers if that helps anything. Any advice would be very much appreciated.
{"response":
{"result":
{"Potentials":
{"row":
[
{"no":"1","FL":
{"content":"523836000004148171","val":"POTENTIALID"}
},
{"no":"2","FL":
{"content":"523836000004924051","val":"POTENTIALID"}
},
{"no":"3","FL":
[
{"content":"523836000005318448","val":"POTENTIALID"},
{"content":"694275295","val":"Campaign Confirmation Number"}
]
},
{"no":"4","FL":
[
{"content":"523836000005318662","val":"POTENTIALID"},
{"content":"729545274","val":"Campaign Confirmation Number"}
]
},
{"no":"5","FL":
[
{"content":"523836000005318663","val":"POTENTIALID"},
{"content":"903187021","val":"Campaign Confirmation Number"}
]
},
{"no":"6","FL":
{"content":"523836000005322387","val":"POTENTIALID"}
},
{"no":"7","FL":
[
{"content":"523836000005332558","val":"POTENTIALID"},
{"content":"729416761","val":"Campaign Confirmation Number"}
]
}
]
}
},
"uri":"/crm/private/json/Potentials/getSearchRecords"}
}
EDIT: an example of the output for this example would be:
confs = [694275295, 729545274, 903187021, 729416761]
or
confs = ['694275295', '729545274', '903187021', '729416761']
it really doesn't matter if they're stored as strings or ints
EDIT 2: here's my code snip:
import urllib
import urllib2
import datetime
import json
key = '[removed]'
params = {
'[removed]'
}
final_URL = 'https://[removed]'
data = urllib.urlencode(params)
request = urllib2.Request(final_URL,data)
response = urllib2.urlopen(request)
content = response.read()
j = json.load(content)
confs = []
for no in j["response"]["result"]["Potentials"]["row"]:
data = no["FL"]
if isinstance(data, list) and len(data) > 1:
confs.append(int(data[1]["content"]))
print confs
Assuming j is your JSON object which the above structure has been parsed into:
>>> results = []
>>> for no in j["response"]["result"]["Potentials"]["row"]:
... data = no["FL"]
... if isinstance(data, list) and len(data) > 1:
... results.append(int(data[1]["content"]))
...
>>> results
[694275295, 729545274, 903187021, 729416761]
Assuming that 'response' holds the json string:
import json
data = json.loads(response)
rows = data['response']['result']['Potentials']['rows']
output = []
for row in rows:
contents = row['FL']
if len(contents) > 1:
output.append(contents[1]['content'])
That should do it.
EDIT:
I finally got some time to test this "one liner". It's fun to use Python's functional features:
import json
#initialize response to your string
data = json.loads(response)
rows = data['response']['result']['Potentials']['row']
output = [x['FL'][1]['content'] for x in rows if isinstance(x['FL'], list) and len(x['FL']) > 1]
print output
['694275295', '729545274', '903187021', '729416761']

Categories