how to add file names into dictionary based on their prefix? - python

I have a following problem. I have a list containing file names:
list_files = ["12_abc.txt", "12_ddd_xxx.pdf", "23_sss.xml", "23_adc.txt", "23_axx_yyy.pdf"]
I need to add them into dictionary based on their prefix number, i.e. 12 and 23. Each value of the dictionary should be a list containing all files with the same prefix. Desired output is:
dictionary = {"12": ["12_abc.txt", "12_ddd_xxx.pdf"], "23": ["23_sss.xml", "23_adc.txt", "23_axx_yyy.pdf"]}
What I tried so far:
dictionary = {}
for elem in list_files:
prefix = elem.split("_")[0]
dictionary[prefix] = elem
However this gives me the result {'12': '12_ddd_xxx.pdf', '23': '23_axx_yyy.pdf'}. How can I add my elem into a list within the loop, please?

Try:
list_files = ["12_abc.txt", "12_ddd_xxx.pdf", "23_sss.xml", "23_adc.txt", "23_axx_yyy.pdf"]
dictionary = {}
for f in list_files:
prefix = f.split('_')[0] # or prefix, _ = f.split('_', maxsplit=1)
dictionary.setdefault(prefix, []).append(f)
print(dictionary)
Prints:
{'12': ['12_abc.txt', '12_ddd_xxx.pdf'], '23': ['23_sss.xml', '23_adc.txt', '23_axx_yyy.pdf']}
EDIT: Added maxsplit=1 variant, thanks #Ma0

This is a good place to use defaultdict from collections.
from collections import defaultdict
d = defaultdict(list)
list_files = ["12_abc.txt", "12_ddd_xxx.pdf", "23_sss.xml", "23_adc.txt", "23_axx_yyy.pdf"]
for elem in list_files:
prefix = elem.split("_")[0]
d[prefix].append(elem)
yields:
{'12': ['12_abc.txt', '12_ddd_xxx.pdf'],
'23': ['23_sss.xml', '23_adc.txt', '23_axx_yyy.pdf']}

Related

python how to iterate a variable which consists of multiple lists

I have a variable that consists of the list after list after list
my code:
>>> text = File(txt) #creates text object from text name
>>> names = text.name_parser() #invokes parser method to extract names from text object
My name_parser() stores names into a list self.names=[]
example:
>>> variable = my_method(txt)
output:
>>> variable
>>> [jacob, david], [jacob, hailey], [judy, david], ...
I want to make them into single list while retaining the duplicate values
desired output:
>>> [jacob, david, jacob, hailey, judy, david, ...]
(edited)
(edited)
Here's a very simple approach to this.
variable = [['a','b','c'], ['d','e','f'], ['g','h','i']]
fileNames = ['one.txt','two.txt','three.txt']
dict = {}
count = 0
for lset in variable:
for letters in lset:dict[letters] = fileNames[count]
count += 1
print(dict)
I hope this helps
#!/usr/bin/python3
#function to iterate through the list of dict
def fun(a):
for i in a:
for ls in i:
f = open(ls)
for x in f:
print(x)
variable ={ "a": "text.txt", "b": "text1.txt" , "c":"text2.txt" , "d": "text3.txt"}
myls = [variable["a"], variable["b"]], [variable["c"], variable["d"]]
fun(myls)
print("Execution Completed")
You can use itertools module that will allow to transform your list of lists into a flat list:
import itertools
foo = [v for v in itertools.chain.from_iterable(variable)]
After that you can iterate over the new variable however you like.
Well, if your variable is list of lists, then you can try something like this:
file_dict = {}
for idx, files in enumerate(variable):
# you can create some dictionary to bind indices to words
# or use any library for this, I believe there are few
file_name = f'{idx+1}.txt'
for file in files:
file_dict[file] = [file_name]

Store each file in a sublist based on subfolder

There is a list_1 which has paths of many subfolders.
list_1
which gives:
['C:\\Users\\user\\Downloads\\problem00001\\ground_truth.json',
'C:\\Users\\user\\Downloads\\problem00002\\ground_truth.json',
'C:\\Users\\user\\Downloads\\problem00003\\ground_truth.json']
Purpose
In gt2 list there should be a sublist for the json file from problem1. Then another sublist for the json from problem2 and so on.
The attempted code below stores all the json files in the gt2 list.
gt2=[]
for k in list_1:
with open(k, 'r') as f:
gt = {}
for i in json.load(f)['ground_truth']:
gt[i['unknown-text']] = i['true-author']
gt2.append(gt)
The end result should be: inside the gt2 list to have 3 sublists:
one for the file from problem1,
another from problem2 and
another from problem3
Assuming the list is sorted, use enumerate over list_1 & the make gt2 as dict to store the json data.
gt2 = {}
for k, v in enumerate(list_1):
gt = {}
with open(v, 'r') as f:
for i in json.load(f):
gt[i['unknown-text']] = i['true-author']
gt2[f'problem{k + 1}'] = gt
# access values of dict here
print(gt2['problem1'])
Edit
gt2 = []
for fi in list_1:
with open(fi, 'r') as f:
gt2.append([
{i['unknown-text']: i['true-author']} for i in json.load(f)
])

Print out dictionary from file

E;Z;X;Y
I tried
dl= defaultdict(list)
for line in file:
line = line.strip().split(';')
for x in line:
dl[line[0]].append(line[1:4])
dl=dict(dl)
print (votep)
It print out too many results. I have an init that reads the file.
What ways can I edit to make it work?
The csv module could be really handy here, just use a semicolon as your delimiter and a simple dict comprehension will suffice:
with open('filename.txt') as file:
reader = csv.reader(file, delimiter=';')
votep = {k: vals for k, *vals in reader}
print(votep)
Without using csv you can just use str.split:
with open('filename.txt') as file:
votep = {k: vals for k, *vals in (s.split(';') for s in file)}
print(votep)
Further simplified without the comprehension this would look as follows:
votep = {}
for line in file:
key, *vals = line.split(';')
votep[key] = vals
And FYI, key, *vals = line.strip(';') is just multiple variable assignment coupled with iterable unpacking. The star just means put whatever’s left in the iterable into vals after assigning the first value to key.
if you read file in list object, there is a simple function to iterate over and convert it to dictionary you expect:
a = [
'A;X;Y;Z',
'B;Y;Z;X',
'C;Y;Z;X',
'D;Z;X;Y',
'E;Z;X;Y',
]
def vp(a):
dl = {}
for i in a:
split_keys = i.split(';')
dl[split_keys[0]] = split_keys[1:]
print(dl)

Append list based on another element in list and remove lists that contained the items

Let's say I have two lists like this:
list_all = [[['some_item'],'Robert'] ,[['another_item'],'Robert'],[['itemx'],'Adam'],[['item2','item3'],'Maurice]]
I want to combine the items together by their holder (i.e 'Robert') only when they are in separate lists. Ie in the end list_all should contain:
list_all = [[['some_name','something_else'],'Robert'],[['itemx'],'Adam'],[['item2','item3'],'Maurice]]
What is a fast and effective way of doing it?
I've tried in different ways but I'm looking for something more elegant, more simplistic.
Thank you
Here is one solution. It is often better to store your data in a more structured form, e.g. a dictionary, rather than manipulate from one list format to another.
from collections import defaultdict
list_all = [[['some_item'],'Robert'],
[['another_item'],'Robert'],
[['itemx'],'Adam'],
[['item2','item3'],'Maurice']]
d = defaultdict(list)
for i in list_all:
d[i[1]].extend(i[0])
# defaultdict(list,
# {'Adam': ['itemx'],
# 'Maurice': ['item2', 'item3'],
# 'Robert': ['some_item', 'another_item']})
d2 = [[v, k] for k, v in d.items()]
# [[['some_item', 'another_item'], 'Robert'],
# [['itemx'], 'Adam'],
# [['item2', 'item3'], 'Maurice']]
You can try this, though it's quite similar to above answer but you can do this without importing anything.
list_all = [[['some_item'], 'Robert'], [['another_item'], 'Robert'], [['itemx'], 'Adam'], [['item2', 'item3'], 'Maurice']]
x = {} # initializing a dictionary to store the data
for i in list_all:
try:
x[i[1]].extend(i[0])
except KeyError:
x[i[1]] = i[0]
list2 = [[j, i ] for i,j in x.items()]
list_all = [[['some_item'],'Robert'] ,[['another_item'],'Robert'],[['itemx'],'Adam'],[['item2','item3'],'Maurice']]
dict_value = {}
for val in list_all:
list_, name = val
if name in dict_value:
dict_value[name][0].extend(list_)
else:
dict_value.setdefault(name,[list_, name])
print(list(dict_value.values()))
>>>[[['some_item', 'another_item'], 'Robert'],
[['itemx'], 'Adam'],
[['item2', 'item3'], 'Maurice']]

Create a list dynamically and store all the values matching with current value in python 3.x

I have a text file which has data created dynamically like
1000L 00V
2000L -10V
3500L -15V
1250L -05V
1000L -05V
2000L -05V
6000L -10V
1010L 00V
and so on...
The numbers before V could vary from -160 to +160
I want to create a list (not using dictionary) dynamically and store the values in a list according to the matching numbers before V
In this case I want to create sets of list as follows
00 = ["1000", "1010"]
-10 = ["2000", "6000"]
-15 = ["3500"]
-05 = ["1250", "1000", "2000"]
Tried code:
if name.split()[1] != "":
gain_value = name.split()[1]
gain_value = int(gain_value.replace("V", ""))
if gain_value not in gain_list:
gain_list.append(gain_value)
gain_length = len(gain_list)
print(gain_length)
g['gain_{0}'.format(gain_length)] = []
'gain_{0}'.format(gain_length).append(L_value)
else:
index_value = gain_list.index(gain_value)
g[index_value].append(L_value)
for x in range(0, len(gain_list)):
print(str(gain_list[x]) + "=" + 'gain_{0}'.format(x))
But the above code doesn't work as I get an error while appending 'gain_{0}'.format(gain_length).append(L_value) and I am unsure how to print the list dynamically after its created as mentioned in my required output.
I can't use dictionary for the above method because I want to give the lists dynamically as input to pygal module as below:
as I need the output for pygal module as input like :
for x in range(0, gain_length):
bar_chart.x_labels = k_list
bar_chart.add(str(gain_length[x]),'gain_{0}'.format(x))
Here I can add the values only from a list not from a dictionary
you can use collections.defaultdict:
import collections
my_dict = collection.defaultdict(list)
with open('your_file') as f:
for x in f:
x = x.strip().split()
my_dict[x[1][:-1]].append(x[0])
output:
defaultdict(<type 'list'>, { '00': ["1000", "1010"],
'-10':["2000", "6000"],
'-15': ["3500"],
'-05': ["1250", "1000", "2000"]})
for your desired output:
for x,y in my_dict.items():
print "{} = {}".format(x,y)

Categories