Store each file in a sublist based on subfolder - python

There is a list_1 which has paths of many subfolders.
list_1
which gives:
['C:\\Users\\user\\Downloads\\problem00001\\ground_truth.json',
'C:\\Users\\user\\Downloads\\problem00002\\ground_truth.json',
'C:\\Users\\user\\Downloads\\problem00003\\ground_truth.json']
Purpose
In gt2 list there should be a sublist for the json file from problem1. Then another sublist for the json from problem2 and so on.
The attempted code below stores all the json files in the gt2 list.
gt2=[]
for k in list_1:
with open(k, 'r') as f:
gt = {}
for i in json.load(f)['ground_truth']:
gt[i['unknown-text']] = i['true-author']
gt2.append(gt)
The end result should be: inside the gt2 list to have 3 sublists:
one for the file from problem1,
another from problem2 and
another from problem3

Assuming the list is sorted, use enumerate over list_1 & the make gt2 as dict to store the json data.
gt2 = {}
for k, v in enumerate(list_1):
gt = {}
with open(v, 'r') as f:
for i in json.load(f):
gt[i['unknown-text']] = i['true-author']
gt2[f'problem{k + 1}'] = gt
# access values of dict here
print(gt2['problem1'])
Edit
gt2 = []
for fi in list_1:
with open(fi, 'r') as f:
gt2.append([
{i['unknown-text']: i['true-author']} for i in json.load(f)
])

Related

how to add file names into dictionary based on their prefix?

I have a following problem. I have a list containing file names:
list_files = ["12_abc.txt", "12_ddd_xxx.pdf", "23_sss.xml", "23_adc.txt", "23_axx_yyy.pdf"]
I need to add them into dictionary based on their prefix number, i.e. 12 and 23. Each value of the dictionary should be a list containing all files with the same prefix. Desired output is:
dictionary = {"12": ["12_abc.txt", "12_ddd_xxx.pdf"], "23": ["23_sss.xml", "23_adc.txt", "23_axx_yyy.pdf"]}
What I tried so far:
dictionary = {}
for elem in list_files:
prefix = elem.split("_")[0]
dictionary[prefix] = elem
However this gives me the result {'12': '12_ddd_xxx.pdf', '23': '23_axx_yyy.pdf'}. How can I add my elem into a list within the loop, please?
Try:
list_files = ["12_abc.txt", "12_ddd_xxx.pdf", "23_sss.xml", "23_adc.txt", "23_axx_yyy.pdf"]
dictionary = {}
for f in list_files:
prefix = f.split('_')[0] # or prefix, _ = f.split('_', maxsplit=1)
dictionary.setdefault(prefix, []).append(f)
print(dictionary)
Prints:
{'12': ['12_abc.txt', '12_ddd_xxx.pdf'], '23': ['23_sss.xml', '23_adc.txt', '23_axx_yyy.pdf']}
EDIT: Added maxsplit=1 variant, thanks #Ma0
This is a good place to use defaultdict from collections.
from collections import defaultdict
d = defaultdict(list)
list_files = ["12_abc.txt", "12_ddd_xxx.pdf", "23_sss.xml", "23_adc.txt", "23_axx_yyy.pdf"]
for elem in list_files:
prefix = elem.split("_")[0]
d[prefix].append(elem)
yields:
{'12': ['12_abc.txt', '12_ddd_xxx.pdf'],
'23': ['23_sss.xml', '23_adc.txt', '23_axx_yyy.pdf']}

Reading values from file and storing them in dictionary

I want to write a function read_file() to read the values in the file and store the data in the a dictionary. The dictionary will look something like this:
product = {'d01':['pencil', 5], 'd02':['highlighter', 7], 'd03':['sharpener', 10]....}
What the items in the file looks like:
input:
d={}
file=r"C:\Users\Public\Documents\Folder\products.dat"
with open(file,'r') as f:
for items in f:
print(items)
results:
d01,pencil,5
d02,highlighter,7
d03,sharpener, 10
d04,pen,3
Here are my codes:
def read_file():
d={}
file=r"C:\Users\Public\Documents\Folder\products.dat"
with open(file,'r') as f:
for items in f:
stuff = items.split(",")
quantity = int(stuff[2].rstrip())
a = stuff[0]
b = [stuff[1], quantity]
d = {a:b}
print(d)
read_file()
Currently results I got:
{'d01': ['pencil', 5]}
{'d02': ['highlighter', 7]}
{'d03': ['sharpener', 10]}
{'d04': ['pen', 3]}
How do I achieve the above results?
Don't create a new dictionary for each line, add an element to the same dictonary.
Change
d = {a:b}
to
d[a] = b
And put print(d) after the loop is done, not inside the loop.
To read and parse a csv file into a dictionary of lists, using the first item on each line as a key and the remaining items on each line as a value list:
import csv
def parse(csvfilename):
dic = {}
with open(csvfilename, "r") as csvfile
csvreader = csv.reader(csvfile, skipinitialspace=True)
for row in csvreader:
table[row[0]] = row[1:]
return dic

Print out dictionary from file

E;Z;X;Y
I tried
dl= defaultdict(list)
for line in file:
line = line.strip().split(';')
for x in line:
dl[line[0]].append(line[1:4])
dl=dict(dl)
print (votep)
It print out too many results. I have an init that reads the file.
What ways can I edit to make it work?
The csv module could be really handy here, just use a semicolon as your delimiter and a simple dict comprehension will suffice:
with open('filename.txt') as file:
reader = csv.reader(file, delimiter=';')
votep = {k: vals for k, *vals in reader}
print(votep)
Without using csv you can just use str.split:
with open('filename.txt') as file:
votep = {k: vals for k, *vals in (s.split(';') for s in file)}
print(votep)
Further simplified without the comprehension this would look as follows:
votep = {}
for line in file:
key, *vals = line.split(';')
votep[key] = vals
And FYI, key, *vals = line.strip(';') is just multiple variable assignment coupled with iterable unpacking. The star just means put whatever’s left in the iterable into vals after assigning the first value to key.
if you read file in list object, there is a simple function to iterate over and convert it to dictionary you expect:
a = [
'A;X;Y;Z',
'B;Y;Z;X',
'C;Y;Z;X',
'D;Z;X;Y',
'E;Z;X;Y',
]
def vp(a):
dl = {}
for i in a:
split_keys = i.split(';')
dl[split_keys[0]] = split_keys[1:]
print(dl)

Making python dictionary from a text file with multiple keys

I have a text file named file.txt with some numbers like the following :
1 79 8.106E-08 2.052E-08 3.837E-08
1 80 -4.766E-09 9.003E-08 4.812E-07
1 90 4.914E-08 1.563E-07 5.193E-07
2 2 9.254E-07 5.166E-06 9.723E-06
2 3 1.366E-06 -5.184E-06 7.580E-06
2 4 2.966E-06 5.979E-07 9.702E-08
2 5 5.254E-07 0.166E-02 9.723E-06
3 23 1.366E-06 -5.184E-03 7.580E-06
3 24 3.244E-03 5.239E-04 9.002E-08
I want to build a python dictionary, where the first number in each row is the key, the second number is always ignored, and the last three numbers are put as values. But in a dictionary, a key can not be repeated, so when I write my code (attached at the end of the question), what I get is
'1' : [ '90' '4.914E-08' '1.563E-07' '5.193E-07' ]
'2' : [ '5' '5.254E-07' '0.166E-02' '9.723E-06' ]
'3' : [ '24' '3.244E-03' '5.239E-04' '9.002E-08' ]
All the other numbers are removed, and only the last row is kept as the values. What I need is to have all the numbers against a key, say 1, to be appended in the dictionary. For example, what I need is :
'1' : ['8.106E-08' '2.052E-08' '3.837E-08' '-4.766E-09' '9.003E-08' '4.812E-07' '4.914E-08' '1.563E-07' '5.193E-07']
Is it possible to do it elegantly in python? The code I have right now is the following :
diction = {}
with open("file.txt") as f:
for line in f:
pa = line.split()
diction[pa[0]] = pa[1:]
with open('file.txt') as f:
diction = {pa[0]: pa[1:] for pa in map(str.split, f)}
You can use a defaultdict.
from collections import defaultdict
data = defaultdict(list)
with open("file.txt", "r") as f:
for line in f:
line = line.split()
data[line[0]].extend(line[2:])
Try this:
from collections import defaultdict
diction = defaultdict(list)
with open("file.txt") as f:
for line in f:
key, _, *values = line.strip().split()
diction[key].extend(values)
print(diction)
This is a solution for Python 3, because the statement a, *b = tuple1 is invalid in Python 2. Look at the solution of #cha0site if you are using Python 2.
Make the value of each key in diction be a list and extend that list with each iteration. With your code as it is written now when you say diction[pa[0]] = pa[1:] you're overwriting the value in diction[pa[0]] each time the key appears, which describes the behavior you're seeing.
with open("file.txt") as f:
for line in f:
pa = line.split()
try:
diction[pa[0]].extend(pa[1:])
except KeyError:
diction[pa[0]] = pa[1:]
In this code each value of diction will be a list. In each iteration if the key exists that list will be extended with new values from pa giving you a list of all the values for each key.
To do this in a very simple for loop:
with open('file.txt') as f:
return_dict = {}
for item_list in map(str.split, f):
if item_list[0] not in return_dict:
return_dict[item_list[0]] = []
return_dict[item_list[0]].extend(item_list[1:])
return return_dict
Or, if you wanted to use defaultdict in a one liner-ish:
from collections import defaultdict
with open('file.txt') as f:
return_dict = defaultdict(list)
[return_dict[item_list[0]].extend(item_list[1:]) for item_list in map(str.split, f)]
return return_dict

Converting a list to json in python

Here is the code, I have a list, which I want to convert to JSON with dynamic keys.
>>> print (list) #list
['a', 'b', 'c', 'd']
>>> outfile = open('c:\\users\\fawads\desktop\csd\\Test44.json','w')#writing data to file
>>> for entry in list:
... data={'key'+str(i):entry}
... i+=1
... json.dump(data,outfile)
...
>>> outfile.close()
The result is as following:
{"key0": "a"}{"key1": "b"}{"key2": "c"}{"key3": "d"}
Which is not valid json.
Enumerate your list (which you should not call list, by the way, you will shadow the built in list):
>>> import json
>>> lst = ['a', 'b', 'c', 'd']
>>> jso = {'key{}'.format(k):v for k, v in enumerate(lst)}
>>> json.dumps(jso)
'{"key3": "d", "key2": "c", "key1": "b", "key0": "a"}'
data = []
for entry in lst:
data.append({'key'+str(lst.index(entry)):entry})
json.dump(data, outfile)
As a minimal change which I originally posted in a comment:
outfile = open('c:\\users\\fawads\desktop\csd\\Test44.json','w')#writing data to file
all_data = [] #keep a list of all the entries
i = 0
for entry in list:
data={'key'+str(i):entry}
i+=1
all_data.append(data) #add the data to the list
json.dump(all_data,outfile) #write the list to the file
outfile.close()
calling json.dump on the same file multiple times is very rarely useful as it creates multiple segments of json data that needs to be seperated in order to be parsed, it makes much more sense to only call it once when you are done constructing the data.
I'd also like to suggest you use enumerate to handle the i variable as well as using a with statement to deal wit the file IO:
all_data = [] #keep a list of all the entries
for i,entry in enumerate(list):
data={'key'+str(i):entry}
all_data.append(data)
with open('c:\\users\\fawads\desktop\csd\\Test44.json','w') as outfile:
json.dump(all_data,outfile)
#file is automatically closed at the end of the with block (even if there is an e
The loop could be shorted even further with list comprehension:
all_data = [{'key'+str(i):entry}
for i,entry in enumerate(list)]
Which (if you really want) could be put directly into the json.dump:
with open('c:\\users\\fawads\desktop\csd\\Test44.json','w') as outfile:
json.dump([{'key'+str(i):entry}
for i,entry in enumerate(list)],
outfile)
although then you start to lose readability so I don't recommend going that far.
Here is what you need to do:
mydict = {}
i = 0
for entry in list:
dict_key = "key" + str(i)
mydict[dict_key] = entry
i = i + 1
json.dump(mydict, outfile)
Currently you are creating a new dict entry in every iteration of the loop , hence the result is not a valid json.

Categories