Sort List into different lists - python

I have list with file_names in it.
(About 800 file_names)
[Example] file_name = 23475048_43241u_43x_pos11_7.npz
I need to sort the file_names and add it to lists. The file_names get sorted with the "pos". In my example is that pos11. (there are different pos -> pos0, pos12...)
I tried firstly to get all different pos_numbers in a Dict:
path =[filename for filename in glob.glob(os.path.join(my_dir, '*.npz'))]
posList = []
for file in path:
file_name = Path(file).parts[-1][:-4].split("_")
posList.append(file_name[3])
mylist = list(dict.fromkeys(posList))
files_dict = {}
for pos in mylist:files_dict[pos] = []
Output:
{'pos0': [], 'pos10': [], 'pos11': [], 'pos12': [], 'pos1': [], 'pos2': [], 'pos3': [], 'pos4': [], 'pos5': [], 'pos6': [], 'pos7': [], 'pos8': [], 'pos9': []}
And now I want to fill the different lists. But now I'm stuck. I want to to iter again over the list with file_names and add them to right list.

Not sure what your code is doing but you can use the below program which takes in list of file names and outputs a dictionary of sorted lists indexed by the pos which is what I think you are trying to do. (If not maybe edit your question to elaborate some more)
files = ['1_2_3_pos1_2.np', '2_3_1_pos2_2.npz']
files_dict = {}
for file in files:
pos = file.split('_')[3]
files_dict[pos] = files_dict.get(pos, []) + [file]
for k in files_dict.keys():
files_dict[k].sort()
print(files_dict)
Edit:
As #Stef suggested you can make it more effecient by using setdefault
files = ['1_2_3_pos1_2.np', '2_3_1_pos2_2.npz']
files_dict = {}
for file in files:
pos = file.split('_')[3]
files_dict.setdefault(pos, []).append(file)
for k in files_dict.keys():
files_dict[k].sort()
print(files_dict)

#ARandomDeveloper's answer clearly explains how to populate the dict by iterating through the list only once. I recommend to study their answer until you've understood it well.
This is a very common way to populate a dict. You will probably encounter this pattern again.
Because this operation of grouping into a dict is so common, module more_itertools offers a function map_reduce for exactly this purpose.
from more_itertools import map_reduce
posList = '''23475048_43241u_43x_pos11_7.npz
23475048_43241u_43x_pos1_7.npz
23475048_43241u_43x_pos10_7.npz
23475048_43241u_43x_pos8_7.npz
23475048_43241u_43x_pos22_7.npz
23475048_43241u_43x_pos2_7.npz'''.split("\n") # example list from uingtea's answer
d = map_reduce(posList, keyfunc=lambda f: f.split('_')[3])
print(d)
# defaultdict(None, {
# 'pos11': ['23475048_43241u_43x_pos11_7.npz'],
# 'pos1': ['23475048_43241u_43x_pos1_7.npz'],
# 'pos10': ['23475048_43241u_43x_pos10_7.npz'],
# 'pos8': ['23475048_43241u_43x_pos8_7.npz'],
# 'pos22': ['23475048_43241u_43x_pos22_7.npz'],
# 'pos2': ['23475048_43241u_43x_pos2_7.npz']
# })
Internally, map_reduce uses almost-exactly the same code as suggested in #ARandomDeveloper's answer, except with a defaultdict.

you need to extract the digits after pos use regex (\d+)_\d\.npz then use .sort() function
import re
posList = '''23475048_43241u_43x_pos11_7.npz
23475048_43241u_43x_pos1_7.npz
23475048_43241u_43x_pos10_7.npz
23475048_43241u_43x_pos8_7.npz
23475048_43241u_43x_pos22_7.npz
23475048_43241u_43x_pos2_7.npz'''.split("\n")
posList = sorted(posList, key=lambda x: int(re.search(r"(\d+)_\d\.npz", x)[1]))
print(posList)
results
['23475048_43241u_43x_pos1_7.npz',
'23475048_43241u_43x_pos2_7.npz',
'23475048_43241u_43x_pos8_7.npz',
'23475048_43241u_43x_pos10_7.npz',
'23475048_43241u_43x_pos11_7.npz',
'23475048_43241u_43x_pos22_7.npz'
]

Related

python how to iterate a variable which consists of multiple lists

I have a variable that consists of the list after list after list
my code:
>>> text = File(txt) #creates text object from text name
>>> names = text.name_parser() #invokes parser method to extract names from text object
My name_parser() stores names into a list self.names=[]
example:
>>> variable = my_method(txt)
output:
>>> variable
>>> [jacob, david], [jacob, hailey], [judy, david], ...
I want to make them into single list while retaining the duplicate values
desired output:
>>> [jacob, david, jacob, hailey, judy, david, ...]
(edited)
(edited)
Here's a very simple approach to this.
variable = [['a','b','c'], ['d','e','f'], ['g','h','i']]
fileNames = ['one.txt','two.txt','three.txt']
dict = {}
count = 0
for lset in variable:
for letters in lset:dict[letters] = fileNames[count]
count += 1
print(dict)
I hope this helps
#!/usr/bin/python3
#function to iterate through the list of dict
def fun(a):
for i in a:
for ls in i:
f = open(ls)
for x in f:
print(x)
variable ={ "a": "text.txt", "b": "text1.txt" , "c":"text2.txt" , "d": "text3.txt"}
myls = [variable["a"], variable["b"]], [variable["c"], variable["d"]]
fun(myls)
print("Execution Completed")
You can use itertools module that will allow to transform your list of lists into a flat list:
import itertools
foo = [v for v in itertools.chain.from_iterable(variable)]
After that you can iterate over the new variable however you like.
Well, if your variable is list of lists, then you can try something like this:
file_dict = {}
for idx, files in enumerate(variable):
# you can create some dictionary to bind indices to words
# or use any library for this, I believe there are few
file_name = f'{idx+1}.txt'
for file in files:
file_dict[file] = [file_name]

Python 3: Creating list of multiple dictionaries have same keys but different values coming from multiple lists

I'm parsing through a response of XML using xpath from lxml library.
I'm getting the results and creating lists out of them like below:
object_name = [o.text for o in response.xpath('//*[name()="objectName"]')]
object_size_KB = [o.text for o in response.xpath('//*[name()="objectSize"]')]
I want to use the lists to create a dictionary per element in list and then add them to a final list like this:
[{'object_name': 'file1234', 'object_size_KB': 9347627},
{'object_name': 'file5671', 'objeobject_size_KBt_size': 9406875}]
I wanted a generator because I might need to search for more metadata from the response in the future so I want my code to be future proof and reduce repetition:
meta_names = {
'object_name': '//*[name()="objectName"]',
'object_size_KB': '//*[name()="objectSize"]'
}
def parse_response(response, meta_names):
"""
input: response: api xml response text from lxml xpath
input: meta_names: key names used to generate dictionary per object
return: list of objects dictionary
"""
mylist = []
# create list of each xpath match assign them to variables
for key, value in meta_names.items():
mylist.append({key: [o.text for o in response.xpath(value)]})
return mylist
However the function gives me this:
[{'object_name': ['file1234', 'file5671']}, {'object_size_KB': ['9347627', '9406875']}]
I've been searching for a similar case in the forums but couldn't find something to match my needs.
Appreciate your help.
UPDATE: Renneys answer was what I wanted I just adjusted the length value of range of my results since I don't always have the same length of xpath per object key and since my lists have identical length everytime I picked first index [0].
now the function looks like this.
def create_entries(root, keys):
tmp = []
for key in keys:
tmp.append([o.text for o in root.xpath('//*[name()="' + key + '"]')])
ret = []
# print(len(tmp[0]))
for i in range(len(tmp[0])):
add = {}
for j in range(len(keys)):
add[keys[j]] = tmp[j][i]
ret.append(add)
return ret
Use a two dimensional array:
def createEntries(root, keys):
tmp = []
for key in keys:
tmp.append([o.text for o in root.xpath('//*[name()="' + key + '"]')])
ret = []
for i in range(len(tmp)):
add = {}
for j in range(len(keys)):
add[keys[j]] = tmp[j][i]
ret.append(add)
return ret
I think this is what you are looking for.
You can use zip to combine your two lists into a list of value pairs.
Then, you can use a list comprehension or a generator expression to pair your value pairs with your desired keys.
import pprint
object_name = ['file1234', 'file5671']
object_size = [9347627, 9406875]
[{'object_name': 'file1234', 'object_size_KB': 9347627},
{'object_name': 'file5671', 'objeobject_size_KBt_size': 9406875}]
[{'object_name': ['file1234', 'file5671']}, {'object_size_KB': ['9347627', '9406875']}]
# List Comprehension
obj_list = [{'object_name': name, 'object_size': size} for name,size in zip(object_name,object_size)]
pprint.pprint(obj_list)
print('\n')
# Generator Expression
generator = ({'object_name': name, 'object_size': size} for name,size in zip(object_name,object_size))
for obj in generator:
print(obj)
Live Code Example -> https://onlinegdb.com/SyNSwd7jU
I think the accepted answer is more efficient, but here's an example of how list comprehensions could be used.
meta_names = {
'object_name': ['file1234', 'file5671'],
'object_size_KB': ['9347627', '9406875'],
'object_text': ['Bob', 'Ross']
}
def parse_response(meta_names):
"""
input: response: api xml response text from lxml xpath
input: meta_names: key names used to generate dictionary per object
return: list of objects dictionary
"""
# List comprehensions
to_dict = lambda l: [{key:val for key,val in pairs} for pairs in l]
objs = list(zip(*list([[key,val] for val in vals] for key,vals in meta_names.items())))
pprint.pprint(to_dict(objs))
parse_response(meta_names)
Live Code -> https://onlinegdb.com/ryLq4PVjL

coupling str elements from a list to a tuple list

I have the following list:
lines
['line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North','line_Mid_North' ]
I would like to couple them in a tuple list as follows, with respect to their names:
tuple_list
[('line_Mid_North', 'line_North_Mid'),
('line_North_South', 'line_South_North'),
('line_Mid_South', 'line_South_Mid')]
I thought maybe I could do a string search in the elements of the lines but it wont be efficient. Is there a better way to order lines elements in a way which would look like tuple_list
Paring Criteria:
If the both elements have the same Area_name: ('North', 'Mid', 'South')
E.g.: 'line_North_Mid' should be coupled with 'line_Mid_North'
Try this:
from itertools import combinations
tuple_list = [i for i in combinations(lines,2) if i[0].split('_')[1] == i[1].split('_')[2] and i[0].split('_')[2] == i[1].split('_')[1]]
or I think this is better:
[i for i in combinations(lines,2) if i[0].split('_')[1:] == i[1].split('_')[1:][::-1]]
An order-agnostic O(n) solution is possible using collections.defaultdict. The idea is to use as our dictionary keys the last 2 components of your strings delimited by '_', appending values from your input list. Then extract values and convert to a list of tuples.
from collections import defaultdict
L = ['line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North', 'line_Mid_North']
dd = defaultdict(list)
for item in L:
dd[frozenset(item.rsplit('_', maxsplit=2)[1:])].append(item)
res = list(map(tuple, dd.values()))
# [('line_North_Mid', 'line_Mid_North'),
# ('line_South_Mid', 'line_Mid_South'),
# ('line_North_South', 'line_South_North')]
You can use the following list comprehension:
lines = ['line_Mid_North', 'line_North_Mid',
'line_North_South', 'line_South_North',
'line_Mid_South', 'line_South_Mid']
[(j,i) for i in lines for j in lines if j not in i
if set(j.split('_')[1:]) < set(i.split('_'))][::2]
[('line_Mid_North', 'line_North_Mid'),
('line_North_South', 'line_South_North'),
('line_Mid_South', 'line_South_Mid')]
I suggest you have a function that returns the same key for string that are supposed to be together (a grouping-key).
def key(s):
# ignore first part and sort other 2 parts, so they will always be in same order
_, part_1, part_2 = s.split('_')
return tuple(sorted([part_1, part_2]))
The you have to use some grouping method; I used defaultdict for example:
import collections
lines = [
'line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North','line_Mid_North',
]
dd = collections.defaultdict(list)
for s in lines:
dd[key(s)].append(s) # those with same key get grouped
print(list(tuple(v) for v in dd.values()))
# [
# ('line_North_Mid', 'line_Mid_North'),
# ('line_South_Mid', 'line_Mid_South'),
# ('line_North_South', 'line_South_North'),
# ]

Append list based on another element in list and remove lists that contained the items

Let's say I have two lists like this:
list_all = [[['some_item'],'Robert'] ,[['another_item'],'Robert'],[['itemx'],'Adam'],[['item2','item3'],'Maurice]]
I want to combine the items together by their holder (i.e 'Robert') only when they are in separate lists. Ie in the end list_all should contain:
list_all = [[['some_name','something_else'],'Robert'],[['itemx'],'Adam'],[['item2','item3'],'Maurice]]
What is a fast and effective way of doing it?
I've tried in different ways but I'm looking for something more elegant, more simplistic.
Thank you
Here is one solution. It is often better to store your data in a more structured form, e.g. a dictionary, rather than manipulate from one list format to another.
from collections import defaultdict
list_all = [[['some_item'],'Robert'],
[['another_item'],'Robert'],
[['itemx'],'Adam'],
[['item2','item3'],'Maurice']]
d = defaultdict(list)
for i in list_all:
d[i[1]].extend(i[0])
# defaultdict(list,
# {'Adam': ['itemx'],
# 'Maurice': ['item2', 'item3'],
# 'Robert': ['some_item', 'another_item']})
d2 = [[v, k] for k, v in d.items()]
# [[['some_item', 'another_item'], 'Robert'],
# [['itemx'], 'Adam'],
# [['item2', 'item3'], 'Maurice']]
You can try this, though it's quite similar to above answer but you can do this without importing anything.
list_all = [[['some_item'], 'Robert'], [['another_item'], 'Robert'], [['itemx'], 'Adam'], [['item2', 'item3'], 'Maurice']]
x = {} # initializing a dictionary to store the data
for i in list_all:
try:
x[i[1]].extend(i[0])
except KeyError:
x[i[1]] = i[0]
list2 = [[j, i ] for i,j in x.items()]
list_all = [[['some_item'],'Robert'] ,[['another_item'],'Robert'],[['itemx'],'Adam'],[['item2','item3'],'Maurice']]
dict_value = {}
for val in list_all:
list_, name = val
if name in dict_value:
dict_value[name][0].extend(list_)
else:
dict_value.setdefault(name,[list_, name])
print(list(dict_value.values()))
>>>[[['some_item', 'another_item'], 'Robert'],
[['itemx'], 'Adam'],
[['item2', 'item3'], 'Maurice']]

Converting a list to json in python

Here is the code, I have a list, which I want to convert to JSON with dynamic keys.
>>> print (list) #list
['a', 'b', 'c', 'd']
>>> outfile = open('c:\\users\\fawads\desktop\csd\\Test44.json','w')#writing data to file
>>> for entry in list:
... data={'key'+str(i):entry}
... i+=1
... json.dump(data,outfile)
...
>>> outfile.close()
The result is as following:
{"key0": "a"}{"key1": "b"}{"key2": "c"}{"key3": "d"}
Which is not valid json.
Enumerate your list (which you should not call list, by the way, you will shadow the built in list):
>>> import json
>>> lst = ['a', 'b', 'c', 'd']
>>> jso = {'key{}'.format(k):v for k, v in enumerate(lst)}
>>> json.dumps(jso)
'{"key3": "d", "key2": "c", "key1": "b", "key0": "a"}'
data = []
for entry in lst:
data.append({'key'+str(lst.index(entry)):entry})
json.dump(data, outfile)
As a minimal change which I originally posted in a comment:
outfile = open('c:\\users\\fawads\desktop\csd\\Test44.json','w')#writing data to file
all_data = [] #keep a list of all the entries
i = 0
for entry in list:
data={'key'+str(i):entry}
i+=1
all_data.append(data) #add the data to the list
json.dump(all_data,outfile) #write the list to the file
outfile.close()
calling json.dump on the same file multiple times is very rarely useful as it creates multiple segments of json data that needs to be seperated in order to be parsed, it makes much more sense to only call it once when you are done constructing the data.
I'd also like to suggest you use enumerate to handle the i variable as well as using a with statement to deal wit the file IO:
all_data = [] #keep a list of all the entries
for i,entry in enumerate(list):
data={'key'+str(i):entry}
all_data.append(data)
with open('c:\\users\\fawads\desktop\csd\\Test44.json','w') as outfile:
json.dump(all_data,outfile)
#file is automatically closed at the end of the with block (even if there is an e
The loop could be shorted even further with list comprehension:
all_data = [{'key'+str(i):entry}
for i,entry in enumerate(list)]
Which (if you really want) could be put directly into the json.dump:
with open('c:\\users\\fawads\desktop\csd\\Test44.json','w') as outfile:
json.dump([{'key'+str(i):entry}
for i,entry in enumerate(list)],
outfile)
although then you start to lose readability so I don't recommend going that far.
Here is what you need to do:
mydict = {}
i = 0
for entry in list:
dict_key = "key" + str(i)
mydict[dict_key] = entry
i = i + 1
json.dump(mydict, outfile)
Currently you are creating a new dict entry in every iteration of the loop , hence the result is not a valid json.

Categories