Converting a list to json in python - python

Here is the code, I have a list, which I want to convert to JSON with dynamic keys.
>>> print (list) #list
['a', 'b', 'c', 'd']
>>> outfile = open('c:\\users\\fawads\desktop\csd\\Test44.json','w')#writing data to file
>>> for entry in list:
... data={'key'+str(i):entry}
... i+=1
... json.dump(data,outfile)
...
>>> outfile.close()
The result is as following:
{"key0": "a"}{"key1": "b"}{"key2": "c"}{"key3": "d"}
Which is not valid json.

Enumerate your list (which you should not call list, by the way, you will shadow the built in list):
>>> import json
>>> lst = ['a', 'b', 'c', 'd']
>>> jso = {'key{}'.format(k):v for k, v in enumerate(lst)}
>>> json.dumps(jso)
'{"key3": "d", "key2": "c", "key1": "b", "key0": "a"}'

data = []
for entry in lst:
data.append({'key'+str(lst.index(entry)):entry})
json.dump(data, outfile)

As a minimal change which I originally posted in a comment:
outfile = open('c:\\users\\fawads\desktop\csd\\Test44.json','w')#writing data to file
all_data = [] #keep a list of all the entries
i = 0
for entry in list:
data={'key'+str(i):entry}
i+=1
all_data.append(data) #add the data to the list
json.dump(all_data,outfile) #write the list to the file
outfile.close()
calling json.dump on the same file multiple times is very rarely useful as it creates multiple segments of json data that needs to be seperated in order to be parsed, it makes much more sense to only call it once when you are done constructing the data.
I'd also like to suggest you use enumerate to handle the i variable as well as using a with statement to deal wit the file IO:
all_data = [] #keep a list of all the entries
for i,entry in enumerate(list):
data={'key'+str(i):entry}
all_data.append(data)
with open('c:\\users\\fawads\desktop\csd\\Test44.json','w') as outfile:
json.dump(all_data,outfile)
#file is automatically closed at the end of the with block (even if there is an e
The loop could be shorted even further with list comprehension:
all_data = [{'key'+str(i):entry}
for i,entry in enumerate(list)]
Which (if you really want) could be put directly into the json.dump:
with open('c:\\users\\fawads\desktop\csd\\Test44.json','w') as outfile:
json.dump([{'key'+str(i):entry}
for i,entry in enumerate(list)],
outfile)
although then you start to lose readability so I don't recommend going that far.

Here is what you need to do:
mydict = {}
i = 0
for entry in list:
dict_key = "key" + str(i)
mydict[dict_key] = entry
i = i + 1
json.dump(mydict, outfile)
Currently you are creating a new dict entry in every iteration of the loop , hence the result is not a valid json.

Related

Store each file in a sublist based on subfolder

There is a list_1 which has paths of many subfolders.
list_1
which gives:
['C:\\Users\\user\\Downloads\\problem00001\\ground_truth.json',
'C:\\Users\\user\\Downloads\\problem00002\\ground_truth.json',
'C:\\Users\\user\\Downloads\\problem00003\\ground_truth.json']
Purpose
In gt2 list there should be a sublist for the json file from problem1. Then another sublist for the json from problem2 and so on.
The attempted code below stores all the json files in the gt2 list.
gt2=[]
for k in list_1:
with open(k, 'r') as f:
gt = {}
for i in json.load(f)['ground_truth']:
gt[i['unknown-text']] = i['true-author']
gt2.append(gt)
The end result should be: inside the gt2 list to have 3 sublists:
one for the file from problem1,
another from problem2 and
another from problem3
Assuming the list is sorted, use enumerate over list_1 & the make gt2 as dict to store the json data.
gt2 = {}
for k, v in enumerate(list_1):
gt = {}
with open(v, 'r') as f:
for i in json.load(f):
gt[i['unknown-text']] = i['true-author']
gt2[f'problem{k + 1}'] = gt
# access values of dict here
print(gt2['problem1'])
Edit
gt2 = []
for fi in list_1:
with open(fi, 'r') as f:
gt2.append([
{i['unknown-text']: i['true-author']} for i in json.load(f)
])

Print out dictionary from file

E;Z;X;Y
I tried
dl= defaultdict(list)
for line in file:
line = line.strip().split(';')
for x in line:
dl[line[0]].append(line[1:4])
dl=dict(dl)
print (votep)
It print out too many results. I have an init that reads the file.
What ways can I edit to make it work?
The csv module could be really handy here, just use a semicolon as your delimiter and a simple dict comprehension will suffice:
with open('filename.txt') as file:
reader = csv.reader(file, delimiter=';')
votep = {k: vals for k, *vals in reader}
print(votep)
Without using csv you can just use str.split:
with open('filename.txt') as file:
votep = {k: vals for k, *vals in (s.split(';') for s in file)}
print(votep)
Further simplified without the comprehension this would look as follows:
votep = {}
for line in file:
key, *vals = line.split(';')
votep[key] = vals
And FYI, key, *vals = line.strip(';') is just multiple variable assignment coupled with iterable unpacking. The star just means put whatever’s left in the iterable into vals after assigning the first value to key.
if you read file in list object, there is a simple function to iterate over and convert it to dictionary you expect:
a = [
'A;X;Y;Z',
'B;Y;Z;X',
'C;Y;Z;X',
'D;Z;X;Y',
'E;Z;X;Y',
]
def vp(a):
dl = {}
for i in a:
split_keys = i.split(';')
dl[split_keys[0]] = split_keys[1:]
print(dl)

Making python dictionary from a text file with multiple keys

I have a text file named file.txt with some numbers like the following :
1 79 8.106E-08 2.052E-08 3.837E-08
1 80 -4.766E-09 9.003E-08 4.812E-07
1 90 4.914E-08 1.563E-07 5.193E-07
2 2 9.254E-07 5.166E-06 9.723E-06
2 3 1.366E-06 -5.184E-06 7.580E-06
2 4 2.966E-06 5.979E-07 9.702E-08
2 5 5.254E-07 0.166E-02 9.723E-06
3 23 1.366E-06 -5.184E-03 7.580E-06
3 24 3.244E-03 5.239E-04 9.002E-08
I want to build a python dictionary, where the first number in each row is the key, the second number is always ignored, and the last three numbers are put as values. But in a dictionary, a key can not be repeated, so when I write my code (attached at the end of the question), what I get is
'1' : [ '90' '4.914E-08' '1.563E-07' '5.193E-07' ]
'2' : [ '5' '5.254E-07' '0.166E-02' '9.723E-06' ]
'3' : [ '24' '3.244E-03' '5.239E-04' '9.002E-08' ]
All the other numbers are removed, and only the last row is kept as the values. What I need is to have all the numbers against a key, say 1, to be appended in the dictionary. For example, what I need is :
'1' : ['8.106E-08' '2.052E-08' '3.837E-08' '-4.766E-09' '9.003E-08' '4.812E-07' '4.914E-08' '1.563E-07' '5.193E-07']
Is it possible to do it elegantly in python? The code I have right now is the following :
diction = {}
with open("file.txt") as f:
for line in f:
pa = line.split()
diction[pa[0]] = pa[1:]
with open('file.txt') as f:
diction = {pa[0]: pa[1:] for pa in map(str.split, f)}
You can use a defaultdict.
from collections import defaultdict
data = defaultdict(list)
with open("file.txt", "r") as f:
for line in f:
line = line.split()
data[line[0]].extend(line[2:])
Try this:
from collections import defaultdict
diction = defaultdict(list)
with open("file.txt") as f:
for line in f:
key, _, *values = line.strip().split()
diction[key].extend(values)
print(diction)
This is a solution for Python 3, because the statement a, *b = tuple1 is invalid in Python 2. Look at the solution of #cha0site if you are using Python 2.
Make the value of each key in diction be a list and extend that list with each iteration. With your code as it is written now when you say diction[pa[0]] = pa[1:] you're overwriting the value in diction[pa[0]] each time the key appears, which describes the behavior you're seeing.
with open("file.txt") as f:
for line in f:
pa = line.split()
try:
diction[pa[0]].extend(pa[1:])
except KeyError:
diction[pa[0]] = pa[1:]
In this code each value of diction will be a list. In each iteration if the key exists that list will be extended with new values from pa giving you a list of all the values for each key.
To do this in a very simple for loop:
with open('file.txt') as f:
return_dict = {}
for item_list in map(str.split, f):
if item_list[0] not in return_dict:
return_dict[item_list[0]] = []
return_dict[item_list[0]].extend(item_list[1:])
return return_dict
Or, if you wanted to use defaultdict in a one liner-ish:
from collections import defaultdict
with open('file.txt') as f:
return_dict = defaultdict(list)
[return_dict[item_list[0]].extend(item_list[1:]) for item_list in map(str.split, f)]
return return_dict

Python list write to CSV without the square brackets

I have this main function:
def main():
subprocess.call("cls", shell=True)
ipList,hostList,manfList,masterList,temp = [],[],[],[],[]
ipList,hostList,manfList, = getIPs(),getHosts(),getManfs()
entries = len(hostList)
i = 0
for i in xrange(i, entries):
temp = [[hostList[i]],[manfList[i]],[ipList[i]]]
masterList.append(temp)
with open("output.csv", "wb") as f:
writer = csv.writer(f, delimiter=',')
writer.writerows(masterList)
My current output is that it successfully writes to CSV but my objective is to remove the square brackets.
I tried using .join() method however I understand that it only takes single lists and not nested lists.
How can I achieve this given that I'm using a 3 dimensional list? Note, I intend to add more columns of data in the future.
Edit:
My current output for 1 row is similar to:
['Name1,'] ['Brand,'] ['1.1.1.1,']
I would like it to be:
Name1, Brand, 1.1.1.1,
Try to remove bracket for values in temp while creating masterList, because it will be nested list. So, the code should be:
def main():
subprocess.call("cls", shell=True)
ipList,hostList,manfList,masterList,temp = [],[],[],[],[]
ipList,hostList,manfList, = getIPs(),getHosts(),getManfs()
entries = len(hostList)
i = 0
for i in xrange(i, entries):
temp = [hostList[i], manfList[i], ipList[i]]
masterList.append(temp)
with open("output.csv", "wb") as f:
writer = csv.writer(f, delimiter=',')
writer.writerows(masterList)
What you could do is strip a string of the data maybe?
import string
writer.writerows(str(masterList).translate(string.maketrans('', ''), '[]\'')
E.g.
>>> import string
>>> temp = [['1.1.1'], ['Name1'], ['123']]
>>> str(temp).translate(string.maketrans('', ''), '[]\'')
'1.1.1, Name1, 123'
In Python 3.6:
>>> temp = [['1.1.1'], ['Name1'], ['123']]
>>> str(temp).translate({ord('['): '', ord(']'): '', ord('\''): ''})
'1.1.1, Name1, 123'
Try to change this:
temp = [[hostList[i]],[manfList[i]],[ipList[i]]]
to this:
temp = [hostList[i],manfList[i],ipList[i]]
I agree with the answers above, about the brackets removal, however if this is crucial to you for some reason, here is a function that takes a list as an input and returns you a csv row acceptable list.
def output_list(masterList):
output = []
for item in masterList:
if isinstance(item,list): #if item is a list
for i in output_list(item): #call this function on it and append its each value separately. If it has more lists in it this function will call itself again
output.append(i)
else:
output.append(item)
return output
You can use it in the line masterList.append(temp) as masterList.append(output_list(temp)), or even like this:
#in the end
with open("output.csv", "wb") as f:
writer = csv.writer(f, delimiter=',')
for i in masterList:
writer.writerow(output_list(i))

python: extract items of different lists and put them in one set

I have a file like this:
93.93.203.11|["['vmit.it', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'maurominnella.com']"]
168.144.9.16|["['iipmalumni.com','webdesignhostingindia.com', 'iipmstudents.in', 'iipmclubs.in']"]
195.211.72.88|["['tcmpraktijk-jingshen.nl', 'ellen-siemer.nl'']"]
129.35.210.118|["['israelinnovation.co.il', 'watec-peru.com', 'bsacimeeting.org', 'wsava2015.com', 'picsmeeting.com']"]
I want to extract domains in all the lists and add them to one set. ultimately, i would like to have a fine with each unique domain in one line. Here is the code I have written:
set_d = set()
f = open(file,'r')
for line in f:
line = line.strip('\n')
ip,list = line.split('|')
l = json.loads(list)
for e in l:
domain = e.split(',')
set_d.add(domain)
print set_d
but it gives the below error:
set_d.add(domain)
TypeError: unhashable type: 'list'
Can anybody help me out?
Use str.translate to clean the text and add to the set using update:
set_d = set()
with open(file,'r') as f:
for line in f:
lst = (x.strip() for x in line.split("|")[1].translate(None,"\"'[]").split(","
set_d.update(lst)
outputs a unique set of individual domains:
set(['vmit.it', 'tcmpraktijk-jingshen.nl', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'watec-peru.com', 'bsacimeeting.org', 'webdesignhostingindia.com', 'wsava2015.com', 'iipmstudents.in', 'maurominnella.com', 'ellen-siemer.nl', 'picsmeeting.com', 'iipmalumni.com', 'iipmclubs.in', 'israelinnovation.co.il'])
which you can write to a new file:
set_d = set()
with open(file,'r') as f,open("out.txt","w") as out:
for line in f:
lst = (x.strip() for x in line.split("|")[1].translate(None,"\"'[]").split(","))
set_d.update(lst)
for line in set_d:
out.write("{}\n".format(line))
The output:
$ cat out.txt
vmit.it
tcmpraktijk-jingshen.nl
umbertominnella.it
studioguizzardi.it
telestreet.it
watec-peru.com
bsacimeeting.org
webdesignhostingindia.com
wsava2015.com
iipmstudents.in
maurominnella.com
ellen-siemer.nl
picsmeeting.com
iipmalumni.com
iipmclubs.in
israelinnovation.co.il
Your code will not separate into individual domains, your json call does not really do anything to help. Changing your code to update will output something like the following:
{" 'maurominnella.com']", " 'wsava2015.com'", "'webdesignhostingindia.com'", " 'iipmclubs.in']", " 'ellen-siemer.nl'']", " 'umbertominnella.it'", " 'picsmeeting.com']", "['israelinnovation.co.il'", "['vmit.it'", " 'iipmstudents.in'", "['tcmpraktijk-jingshen.nl'", " 'studioguizzardi.it'", "['iipmalumni.com'", " 'watec-peru.com'", " 'bsacimeeting.org'", " 'telestreet.it'"}
Also don't use list as a variable name either it shadows the python list
You should call update instead of add;
set_d.update(domain)
Example;
>>> set_d = {'a', 'b', 'c'}
>>> set_d.update(['c', 'd', 'e'])
>>> print set_d
{'a', 'b', 'c', 'd', 'e'}
As the result of split function is a list (domain = e.split(','))and lists are unhashable you cant add them to set . instead you can add those elements to your set with set.update() , But you dont need Json as it doesn't separate your domain and doesn't give you the desire result instead you can use ast.literal_eval to split your list :
import ast
set_d = set()
f = open(file,'r')
for line in f:
line = line.strip('\n')
ip,li = line.split('|')
l = ast.literal_eval(ast.literal_eval(li)[0])
for e in l:
domain = e.split(',')
set_d.update(domain)
print set_d
Note that dont use of python built-in functions or types as your variable!
And as a more efficient way you just can use regex to grub your domains :
f = open(file,'r').read()
import re
print set(re.findall(r'[a-zA-Z\-]+\.[a-zA-Z]+',f))
result:
set(['vmit.it', 'tcmpraktijk-jingshen.nl', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'israelinnovation.co', 'bsacimeeting.org', 'webdesignhostingindia.com', 'iipmstudents.in', 'maurominnella.com', 'ellen-siemer.nl', 'picsmeeting.com', 'watec-peru.com', 'iipmalumni.com', 'iipmclubs.in'])
[Finished in 0.0s]

Categories