Union list of lists without duplicates - python

I have got list of lists. I need to get all combinations of that lists from 2 of N to N of N.
I'm searching for it with itertools.combinations. After this I got list of lists and I need to combine them without duplicates.
For example I have got array:
a = np.array([[1,4,7],[8,2,5],[8,1,4,6],[8,1,3,5],
[2,3,4,7],[2,5,6,7],[2,3,4,6,8],[1,3,5,6,7]])
I'm searching for all 3 elements combinations:
a2 = list(itertools.combinations(a, 3))
a2[:5]
[([1, 4, 7], [8, 2, 5], [8, 1, 4, 6]),
([1, 4, 7], [8, 2, 5], [8, 1, 3, 5]),
([1, 4, 7], [8, 2, 5], [2, 3, 4, 7]),
([1, 4, 7], [8, 2, 5], [2, 5, 6, 7]),
([1, 4, 7], [8, 2, 5], [2, 3, 4, 6, 8])]
The length of this array: 56.
I need to combine every list in this array without duplicates.
For exmple for a2[0] input:
([1, 4, 7], [8, 2, 5], [8, 1, 4, 6])
output:
[1, 2, 4, 5, 6, 7, 8]
And so all 56 elements.
I tried to do it with set:
arr = list(itertools.combinations(a,3))
for i in arr:
arrnew[i].append(list(set().union(arr[i][:3])))
But I had got error:
TypeError Traceback (most recent call last)
<ipython-input-75-4049ddb4c0be> in <module>()
3 arrnew = []
4 for i in arr:
----> 5 for j in arr[i]:
6 arrnew[i].append(list(set().union(arr[:n])))
TypeError: list indices must be integers or slices, not tuple
I need function for N combinations, that returns new combined array.
But I don't know how to do this because of this error.
Is there way to solve this error or another way to solve this task?

A small function which solves it:
def unique_comb(a):
return list(set(itertools.chain(*a)))
For example:
unique_comb(([1, 4, 7], [8, 2, 5], [8, 1, 4, 6]))
If you want to pass a list as an argument to the function, rather than a list inside a tuple, just remove the * (which unpacks the list).
If you want to apply it to the entire array in one statement without defining a function:
a3 = [list(set(itertools.chain(*row))) for row in a2]

Flatting a tuple of lists:
from itertools import chain
new_tuple = [ list(set(chain.from_iterable(each_tuple))) for each_tuple in main_tuple_coll ]
I think this might solve your problem.

Flatten list combinations
comb = []
for line in a2[:3]:
l = list(set([x for y in line for x in y]))
comb.append(l)
comb
[out]
[[1, 2, 4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 7, 8], [1, 2, 3, 4, 5, 7, 8]]

The issue with:
arr = list(itertools.combinations(a,3))
for i in arr:
arrnew[i].append(list(set().union(arr[i][:3])))
Is that i is not the index of the item but the item in the list itself.
What you need is:
import itertools
import numpy as np
a = np.array([[1,4,7],[8,2,5],[8,1,4,6],[8,1,3,5],
[2,3,4,7],[2,5,6,7],[2,3,4,6,8],[1,3,5,6,7]])
arrnew = []
for item in itertools.combinations(a,3):
arrnew.append(list(set().union(*item)))
The result arrnew contains 56 items. Some are equal but none contain duplicates.
I would suggest using sorted rather than list to ensure that the items in each combined list are in ascending order.

Related

Get fixed size combinations of a list of lists in python?

I am looking for modified version of itertools.product(*a). This command returns combinations by selecting elements from each list but I need to restrict size.
Suppose,
mylist = [[6, 7, 8], [3, 5, 9], [2, 1, 4]]
output: (6, 3), (6, 2),....(3, 2)... when size is 2
Number of lists and size are not fixed. I need something that can be dynamic enough.
You can try:
from itertools import product, combinations, chain
mylist=[[6, 7, 8], [3, 5, 9], [2, 1]]
size = 2
results = chain.from_iterable(product(*t) for t in combinations(mylist, size))
print(list(results))
Perhaps you can try this:
from itertools import chain, combinations
l=[[6, 7, 8], [3, 5, 9], [2, 1, 4]]
x=list(combinations(chain.from_iterable(l),2))
print(x)
Solution:
import itertools
size = 2
mylist = [[6, 7, 8], [3, 5, 9], [2, 1, 4]]
res = []
for x in list(itertools.product(*mylist)):
res += itertools.combinations(x, size)
print(set(res))

How to make new list of the numbers not appended into its own list?

If I have a multidimensional list called t and I append some numbers from the list into a new list called TC, how do I take all of the numbers that were not appended into the new list and put them in their own list, called nonTC? For example:
t = [[1, 3, 4, 5, 6, 7],[9, 7, 4, 5, 2], [3, 4, 5]]
And I write some conditions to append only some values from each list to create the new list, TC:
TC = [[3, 4, 6], [9, 7, 2], [5]]
How do I append the values not included in TC into its own list? So I would get:
nonTC = [[1, 5, 7],[4, 5],[3,4]]
You can use list comprehensions and a list of sets to filter your original list:
t = [[1, 3, 4, 5, 6, 7],[9, 7, 4, 5, 2], [3, 4, 5]]
# filter sets - each index corresponds to one inner list of t - the numbers in the
# set should be put into TC - those that are not go into nonTC
getem = [{3,4,6},{9,7,2},{5}]
TC = [ [p for p in part if p in getem[i]] for i,part in enumerate(t)]
print(TC)
nonTC = [ [p for p in part if p not in getem[i]] for i,part in enumerate(t)]
print(nonTC)
Output:
[[3, 4, 6], [9, 7, 2], [5]] # TC
[[1, 5, 7], [4, 5], [3, 4]] # nonTC
Readup:
list comprehensions
sets
enumerate(iterable)
And: Explanation of how nested list comprehension works?
Suggestion for other way to do it, creds to AChampion:
TC_1 = [[p for p in part if p in g] for g, part in zip(getem, t)]
nonTC_1 = [[p for p in part if p not in g] for g, part in zip(getem, t)]
See zip() - it essentially bundles the two lists into an iterable of tuples
( (t[0],getem[0]), (t[1],getem[1]) (t[2],getem[2]))
Add-On for multiple occurences - forfeiting list comp and sets:
t = [[1, 3, 4, 5, 6, 7, 3, 3, 3],[9, 7, 4, 5, 2], [3, 4, 5]]
# filter lists - each index corresponds to one inner list of t - the numbers in the list
# should be put into TC - those that are not go into nonTC - exactly with the amounts given
getem = [[3,3,4,6],[9,7,2],[5]]
from collections import Counter
TC = []
nonTC = []
for f, part in zip(getem,t):
TC.append([])
nonTC.append([])
c = Counter(f)
for num in part:
if c.get(num,0) > 0:
TC[-1].append(num)
c[num]-=1
else:
nonTC[-1].append(num)
print(TC) # [[3, 4, 6, 3], [9, 7, 2], [5]]
print(nonTC) # [[1, 5, 7, 3, 3], [4, 5], [3, 4]]
It needs only 1 pass over your items instead of 2 (seperate list comps) which makes it probably more efficient in the long run...
Just out of curiosity, using NumPy:
import numpy as np
t = [[1, 3, 4, 5, 6, 7],[9, 7, 4, 5, 2], [3, 4, 5]]
TC = [[3, 4, 6], [9, 7, 2], [5]]
print([np.setdiff1d(a, b) for a, b in zip(t, TC)])
#=> [array([1, 5, 7]), array([4, 5]), array([3, 4])]

Subtract previous list from current list in a list of lists loop

I have a list of dataframes with data duplicating in every next dataframe within list which I need to subtract between themselves
the_list[0] = [1, 2, 3]
the_list[1] = [1, 2, 3, 4, 5, 6, 7]
There are also df headers. Dataframes are only different in number of rows.
Wanted solution:
the_list[0] = [1, 2, 3]
the_list[1] = [4, 5, 6, 7]
Due to the fact that my list of lists, the_list, contains several dataframes, I have to work backward and go from the last df to first with first remaining intact.
My current code (estwin is the_list):
estwin = [df1, df2, df3, df4]
output=([])
estwin.reverse()
for i in range(len(estwin) -1):
difference = Diff(estwin[i], estwin[i+1])
output.append(difference)
return(output)
def Diff(li_bigger, li_smaller):
c = [x for x in li_bigger if x not in li_smaller]
return (c)
Currently, the result is an empty list. I need an updated the_list that contains only the differences (no duplicate values between lists).
You should not need to go backward for this problem, it is easier to keep track of what you have already seen going forward.
Keep a set that gets updated with new items as you traverse through each list, and use it to filter out the items that should be present in the output.
list1 = [1,2,3]
list2 = [1,2,3,4,5,6,7]
estwin = [list1, list2]
lookup = set() #to check which items/numbers have already been seen.
output = []
for lst in estwin:
updated_lst = [i for i in lst if i not in lookup] #only new items present
lookup.update(updated_lst)
output.append(updated_lst)
print(output) #[[1, 2, 3], [4, 5, 6, 7]]
Your code is not runnable, but if I guess what you meant to write, it works, except that you have one bug in your algorithm:
the_list = [
[1, 2, 3],
[1, 2, 3, 4, 5, 6, 7],
[1, 2, 3, 4, 5, 6, 7, 8, 9]
]
def process(lists):
output = []
lists.reverse()
for i in range(len(lists)-1):
difference = diff(lists[i], lists[i+1])
output.append(difference)
# BUGFIX: Always add first list (now last becuase of reverse)
output.append(lists[-1])
output.reverse()
return output
def diff(li_bigger, li_smaller):
return [x for x in li_bigger if x not in li_smaller]
print(the_list)
print(process(the_list))
Output:
[[1, 2, 3], [1, 2, 3, 4, 5, 6, 7], [1, 2, 3, 4, 5, 6, 7, 8, 9]]
[[1, 2, 3], [4, 5, 6, 7], [8, 9]]
One-liner:
from itertools import chain
l = [[1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]]
new_l = [sorted(list(set(v).difference(chain.from_iterable(l[:num]))))
for num, v in enumerate(l)]
print(new_l)
# [[1, 2], [3], [4], [5]]

How to iterate through a list that contains lists in python?

I would like to know how i can iterate through a list containing lists in python, however i would like to use the for loop method that uses index rather than iterating the normal way in python. is it possible to do that?
here is the python code:
n = [[1, 2, 3], [4, 5, 6, 7, 8, 9]]
def flatten(my_lists):
results = []
for outer in range(len(my_lists)):
for inner in range(len(outer)):
results.append(lists[outer][inner])
return results
print flatten(n)
this is the error I get in the console:
Traceback (most recent call last):
File "python", line 10, in <module>
File "python", line 6, in flatten
TypeError: object of type 'int' has no len()
what is the error in my code ?
thanks in advance.
outer and inner are both ints. Thus, len(outer) is bound to fail:
results = []
for outer in range(len(my_lists)):
# you need the length of the list in position 'outer', not of 'outer' itself
for inner in range(len(my_lists[outer])):
results.append(my_lists[outer][inner])
return results
It is easier not to use indexes at all:
results = []
for lst in my_lists:
for x in lst:
results.append(x)
# Or without inner loop
# results.extend(lst)
return results
Moreover, for flattening a list of lists, there are many well-documented approaches, a straightforward one being a nested comprehension like:
n = [[1, 2, 3], [4, 5, 6, 7, 8, 9]]
flat = [x for lst in n for x in lst]
# [1, 2, 3, 4, 5, 6, 7, 8, 9]
For more, you can refer to Making a flat list out of list of lists in Python and Flatten (an irregular) list of lists.
Here the loop with indexes using enumerate.
n = [[8, 2, 3], [4, 5, 6, 7, 8, 9]]
def flatten(my_lists):
results = []
for idx, outer in enumerate(my_lists):
for idx2, inner in enumerate(outer):
results.append(my_lists[idx][idx2])
return results
print flatten(n)
idx and idx2 are the current index of the for loops
In addition to the other answers, there is the standard library solution using itertools.
>>> import itertools
>>> n = [[1, 2, 3], [4, 5, 6, 7, 8, 9]]
>>> print(list(itertools.chain(*n)))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
Or (possibly) more explicitly
>>> from itertools import chain
>>> n = [[1, 2, 3], [4, 5, 6, 7, 8, 9]]
>>> print(list(chain.from_iterable(n)))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
If you just need the flattened 'list' to be iterable, you can omit the list call and use the returned itertools.chain object.
>>> chain.from_iterable(n)
<itertools.chain object at 0x7f2ecc05f668>

Match array values with array values in multi-dimensional array

I have 2 arrays:
[1, 2, 3, 4, 5]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Is there a way I can find the number of sub-array in the second array which contains the values of the first array using Python?
For the above, it would be 2
You can try this:
s = [1, 2, 3, 4, 5]
a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
final_val = len([i for i in a if any(b in s for b in i)])
Output:
2
Yes, for each element in the first array, you must check if it is in the sub-arrays of the second.
However, this is quite inefficient, so you can proceed like this:
For each sub array in second, check it it contains any element of First, if it is the case, add one to the count, and check the next sub array.
first = [1, 2, 3, 4, 5]
second = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
matches = 0
for sub in second:
for elt in sub:
if elt in first:
matches += 1
break
print(matches)

Categories