How to split a list into smaller lists python - python

I have a nested list that looks something like:
lst = [['ID1', 'A'],['ID1','B'],['ID2','AAA'], ['ID2','DDD']...]
Is it possible for me to split the lst into small lists by their ID so that each small list contained elements with the same ID? The results should look something looks like:
lst1 = [['ID1', 'A'], ['ID1', 'B']...]
lst2 = [['ID2', 'AAA'], ['ID2', 'DDD']...]

You can use groupby:
from itertools import groupby
grp_lists = []
for i, grp in groupby(lst, key= lambda x: x[0]):
grp_lists.append(list(grp))
print(grp_lists[0])
[['ID1', 'A'], ['ID1', 'B']]
print(grp_lists[1])
[['ID2', 'AAA'], ['ID2', 'DDD']]

using collections.defaultdict:
lst = [['ID1', 'A'],['ID1','B'],['ID2','AAA'], ['ID2','DDD']]
from collections import defaultdict
result = defaultdict(list)
for item in lst:
result[item[0]].append(item)
print(list(result.values()))
output:
[[['ID1', 'A'], ['ID1', 'B']], [['ID2', 'AAA'], ['ID2', 'DDD']]]

Without external functions: build a set of unique indexes, then loop over the original list building a new list for each of the indexes and filling it with list items that contain that index:
lst = [['ID1', 'A'],['ID1','B'],['ID2','AAA'], ['ID2','DDD']]
unique_set = set(elem[0] for elem in lst)
lst2 = [ [elem for elem in lst if elem[0] in every_unique] for every_unique in unique_set]
print (lst2)
Result:
[[['ID2', 'AAA'], ['ID2', 'DDD']], [['ID1', 'A'], ['ID1', 'B']]]
(It is possible to move unique_set into the final line, making it a one-liner. But that would make it less clear what happens.)

If you want to get separate variables like your example of a result:
lst1 = [sub_lst for sub_lst in lst if sub_lst[0] == 'ID1']
and
lst2 = [sub_lst for sub_lst in lst if sub_lst[0] == 'ID2']
from that, you can make a function:
def create_sub_list(id_str, original_lst):
return [x for x in original_lst if x[0] == id_str]
And call it like that:
lst1 = create_sub_list('ID1', lst)
If you want a dictionary of the sub-lists, for easier access, you can use:
from functools import reduce
def reduce_dict(ret_dict, sub_lst):
if (sub_lst[0] not in ret_dict):
ret_dict[sub_lst[0]] = sub_lst[1:]
else:
ret_dict[sub_lst[0]] += sub_lst[1:]
return ret_dict
grouped_dict = reduce(reduce_dict, lst, dict())
(If you know that in your list there will only be 1 string after each ID slot you can change both the sub_lst[1:]'s to sub_lst[1])
And then to access the elements if the dictionary you use the ID strings:
print(grouped_dict['ID1'])
This will print:
['A', 'B']

Related

How to remove multiple items from nested list in python 3?

How to remove multiple items from nested list in python 3, without using a list comprehension? And at times Indexerror came how to handle that?
split_list =[["a","b","c"],["SUB","d","e",],["f","Billing"]]
rem_word = ['SUB', 'Billing', 'Independent', 'DR']
for sub_list in split_list:
for sub_itm in sub_list:
if sub_itm not in rem_word:
print(sub_itm)
Output Comes like this:
a
b
c
d
e
f
Expected Output:
split_list =[["a","b","c"],["d","e",],["f"]]
You can use always use a list-comprehension. Get all the words to be removed in a separate list and try this :
>>> split_list =[["a","b","c"],["SUB","d","e",],["f","Billing"]]
>>> rem_word = ['SUB', 'Billing', 'Independent', 'DR']
>>> output = [[sub_itm for sub_itm in sub_list if sub_itm not in rem_word] for sub_list in split_list]
[['a', 'b', 'c'], ['d', 'e'], ['f']]
If you want to do it without list comprehension, you need to declare a vacant list to append each new sub-list and also a new vacant sub-list to append all new sub-items. Check this :
output2 = []
for sub_list in split_list:
new_sub_list = []
for sub_itm in sub_list:
if sub_itm not in rem_word:
new_sub_list.append(sub_itm)
output2.append(new_sub_list)
It outputs the same :
[['a', 'b', 'c'], ['d', 'e'], ['f']]
You could simply use map and filter
split_list = [["a", "b", "c"], ["SUB", "d", "e", ], ["f", "Billing"]]
remove_list = ["SUB", "Billing", "INDEPENDENT", "DR"]
split_list = list(map(lambda x: list(filter(lambda i: i not in remove_list, x)), split_list))
print(split_list)
[[x for x in z if x!='SUB'] for z in split_list]
keep in mind that it is a nested list. Treat x as the sub element and z as the element. Also keep in mind that the above code will delete all 'SUB'. just for deleting the first instance use remove.

Flatten lists of variable depths in Python

I have a list of n lists. Each internal list contains a combination of (a) strings, (b) the empty list, or (c) a list containing one string. I would like to transform the inside lists so they only contain the strings.
I have a list like this for example:
[[[],["a"],"a"],[["ab"],[],"abc"]]
and I would like it to be like this:
[["","a","a"],["ab","","abc"]]
I know I could probably go through with a loop but I am looking for a more elegant solution, preferably with a list comprehension.
List comprehension:
>>> original = [[[],["a"],"a"],[["ab"],[],"abc"]]
>>> result = [['' if not item else ''.join(item) for item in sublist] for sublist in original]
>>> result
[['', 'a', 'a'], ['ab', '', 'abc']]
As every element of the list that you'd like to flatten is iterable, instead of checking of being instance of some class (list, string) you can actually make use of duck-typing:
>> my_list = [[[],["a"],"a"],[["ab"],[],"abc"]]
>> [list(map(lambda x: ''.join(x), elem)) for elem in my_list]
Or more readable version:
result = []
for elem in my_list:
flatten = map(lambda x: ''.join(x), elem)
result.append(list(flatten))
Result:
[['', 'a', 'a'], ['ab', '', 'abc']]
It's quite pythonic to not to check what something is but rather leverage transformation mechanics to adaptive abilities of each of the structure.
Via list comprehension:
lst = [[[],["a"],"a"],[["ab"],[],"abc"]]
result = [ ['' if not v else (v[0] if isinstance(v, list) else v) for v in sub_l]
for sub_l in lst ]
print(result)
The output:
[['', 'a', 'a'], ['ab', '', 'abc']]
original_list = [[[],["a"],"a"],[["ab"],[],"abc"]]
flatten = lambda x: "" if x == [] else x[0] if isinstance(x, list) else x
flattened_list = [[flatten(i) for i in j] for j in original_list]

Combine elements of list

I have a problem with Python and hope someone can help me. I have a list, for example this one:
list = [['a','b','c'],['a','c1','d1'],['b','c1','c2']]
I want to combine the list in a way that all arrays with the same index[0] will be together, so then it will be like:
a, b, c, c1, d1
b, c1, c2
I tried something like this, but I did not get it working..
list = [['a','b','c'],['a','c1','d1'],['b','c1','c2']]
empty_list = []
for i in list:
if i not in empty_list:
empty_list.append(i)
print empty_list
Can someone help me?
You can try this :)
old_list = [['a','b','c'],['a','c1','d1'],['b','c1','c2']]
prev = None
empty_list = []
for l in old_list: # iterate through each sub list, sub list as l
if l[0] == prev:
# append your elements to existing sub list
for i in l: # iterate through each element in sub list
if i not in empty_list[-1]:
empty_list[-1].append(i)
else:
empty_list.append(l) # create new sub list
prev = l[0] # update prev
print(empty_list)
# [['a', 'b', 'c', 'c1', 'd1'], ['b', 'c1', 'c2']]
Using itertools.groupby:
from itertools import groupby
from operator import itemgetter
listt = [['a','b','c'],['a','c1','d1'],['b','c1','c2']]
grouped = [list(g) for _,g in groupby(listt,itemgetter(0))]
result = [[item for sslist in slist for item in sslist] for slist in grouped]
An OrderedDict can do most of the work:
from collections import OrderedDict
l = [['a','b','c'], ['a','c1','d1'], ['b','c1','c2']]
d = OrderedDict()
for el in l:
d.setdefault(el[0], el[0:1]).extend(el[1:])
print(d.values())
you can also try using defaultdict(list)
l = [['a','b','c'], ['a','c1','d1'], ['b','c1','c2']]
from collections import defaultdict
d_dict = defaultdict(list)
for i in l:
d_dict[i[0]].extend(i[1:])
[ list(k) + v for k, v in d_dict.items() ]
Output:
[['a', 'b', 'c', 'c1', 'd1'], ['b', 'c1', 'c2']]

Classifying two lists and arrange accordingly in python

I'm working with 2 lists looks like this:
list_a = [x,y,z,.....]
list_b = [xa,xb,xc,xd,xe,ya,yb,yc,yd,za,zb,zc,zd,ze,zf]
What I'm trying to achieve is, to make more lists while arranging the data like following:
list_x = [x,xa,xb,xc,xd,xe]
list_y = [y,ya,yb,yc,yd]
list_z = [z,za,zb,zc,zd,ze,zf]
Now if I use loops like:
final_list=[]
for item in list_a:
for value in list_b:
if value[0] == item:
print item, value
It filters the data but can not reach the desired format.
Could you guys please give some valuable comment on this.
thank you
list_a = ['x','y','z']
list_b = ['xa','xb','xc','xd','xe','ya','yb','yc','yd','za','zb','zc','zd','ze','zf']
print [[x for x in list_b if x.startswith(y)] for y in list_a]
Output :
[['xa', 'xb', 'xc', 'xd', 'xe'], ['ya', 'yb', 'yc', 'yd'], ['za', 'zb', 'zc', 'zd', 'ze', 'zf']]
Or more quite :
print [(y,[x for x in list_b if x.startswith(y)]) for y in list_a]
Output :
[('x', ['xa', 'xb', 'xc', 'xd', 'xe']), ('y', ['ya', 'yb', 'yc', 'yd']), ('z', ['za', 'zb', 'zc', 'zd', 'ze', 'zf'])]
You can use this code:
list_a = ['x','y','z']
list_b = ['xa','xb','xc','xd','xe','ya','yb','yc','yd','za','zb','zc','zd','ze','zf']
print [(key, [_ for _ in list_b if key == _[0]]) for key in list_a]
It gives you a list of tuples with the first entry being the single letter and the second being the list.
Or you do it without tuples like this:
print [[key] + [_ for _ in list_b if key == _[0]] for key in list_a]
Not 100% about this formatting, but you could use a list of lists.
list_a = ["x","y","z"]
list_b = ["xa","xb","xc","xd","xe","ya","yb","yc","yd","za","zb","zc","zd","ze","zf"]
final_list = []
for item in list_a:
item_list = [item]
for value in list_b:
if value[0] == item:
item_list.append(value)
final_list.append(item_list)
print final_list
It returns
[['x', 'xa', 'xb', 'xc', 'xd', 'xe'], ['y', 'ya', 'yb', 'yc', 'yd'], ['z', 'za', 'zb', 'zc', 'zd', 'ze', 'zf']]
You could follow a functional approach by using itertools.groupby() and operator.itemgetter():
In [46]: from itertools import groupby
In [47]: from operator import itemgetter
In [48]: list_b = ['xa','xb','xc','xd','xe','ya','yb','yc','yd','za','zb','zc','zd','ze','zf']
In [49]: [[prefix] + list(group) for prefix, group in groupby(list_b, key=itemgetter(0))]
Out[49]:
[['x', 'xa', 'xb', 'xc', 'xd', 'xe'],
['y', 'ya', 'yb', 'yc', 'yd'],
['z', 'za', 'zb', 'zc', 'zd', 'ze', 'zf']]
It is important to note that if list_b is not ordered, the argument passed to groupby() should be sorted(list_b) instead.
As a side note, you could get rid of the module operator by simply changing the optional argument key=itemgetter(0) to key=lambda s: s[0].

How to aggregate values in a set of sublists only if a common key is shared?

How would you aggregate the third index in the following list if the sublist has the same key as another sublist at index 1?
lst = [['aaa','key1','abc',4],['aaa','key2','abc',4],['ddd','key3','abc',4],['eas','key1','abc',4],['aaa','key1','abc',2],['aaa','key2','abc',10]]
I would like to the aggregate the third index across only the sublists that have the same index. For example, the above list has key1 at index 1 across three sublists. I'd like to add the 4, 4, and 2 together.
Desired_List = [['aaa','key1','abc',10],['aaa','key2','abc',14],['ddd','key3','abc',4]]
The other items in the list are irrelevant.
Well this didn't come out too readable - but here's a pretty compact way with itertools.groupby and reduce:
from itertools import groupby
from operator import itemgetter as ig
[reduce(lambda x,y: x[:-1] + [x[-1] + y[-1]], g) for k,g in groupby(sorted(lst, key=ig(1)), ig(1))]
Out[26]:
[['aaa', 'key1', 'abc', 10],
['aaa', 'key2', 'abc', 14],
['ddd', 'key3', 'abc', 4]]
Things get better if you pull out the lambda into a helper function:
def helper(agg,x):
agg[-1] += x[-1]
return agg
[reduce(helper,g) for k,g in groupby(sorted(lst, key=ig(1)), ig(1))]
Out[30]:
[['aaa', 'key1', 'abc', 10],
['aaa', 'key2', 'abc', 14],
['ddd', 'key3', 'abc', 4]]
Note that you'll need to do from functools import reduce in python 3, since it got banished from the builtins (sad face).
Pretty ugly, but this works:
lst = [['aaa','key1','abc',4],['aaa','key2','abc',4],['ddd','key3','abc',4],['eas','key1','abc',4],['aaa','key1','abc',2],['aaa','key2','abc',10]]
newlst = []
searched = []
for i, sublist1 in enumerate(lst[0:len(lst)-1]):
if sublist1[1] not in searched:
searched.append(sublist1[1])
total = 0
for sublist2 in lst[i+1:]:
if sublist1[1] == sublist2[1]:
total += int(sublist2[3])
newlst.append([sublist1[0], sublist1[1], sublist1[2], total + sublist1[3]])
print newlst
gives:
[['aaa', 'key1', 'abc', 10], ['aaa', 'key2', 'abc', 14], ['ddd', 'key3', 'abc', 4]]
In case which key you want or the index the values are at change I made a variable so its easy to change this code:
lst = [['aaa','key1','abc',4],['aaa','key2','abc',4],['ddd','key3','abc',4],['eas','key1','abc',4],['aaa','key1','abc',2],['aaa','key2','abc',10]]
# Index of the key value to sort on
key_index = 1
# Index of the value to aggregate
value_index = 3
list_dict = {}
# Iterate each list and uniquely identify it by its key value.
for sublist in lst:
if sublist[1] not in list_dict:
list_dict[sublist[key_index]] = sublist
# Add the value of the list to the unique entry in the dict
list_dict[sublist[key_index]][value_index] += sublist[value_index]
# Now turn it into a list. This is not needed but I was trying to match the output
desired_list = [ sublist for _, sublist in list_dict.iteritems()]
Output:
[['ddd', 'key3', 'abc', 8], ['aaa', 'key2', 'abc', 18], ['aaa', 'key1', 'abc', 14]]
lst = [['aaa','key1','abc',4],['aaa','key2','abc',4],['ddd','key3','abc',4],['eas','key1','abc',4],['aaa','key1','abc',2],['aaa','key2','abc',10]]
result = []
for item in lst:
t = tuple(item[:3])
d = {t:item[-1]}
result.append(d)
print result

Categories