Related
I have a list of lists, where the lists are always ordered in the same way, and within each list several of the elements are duplicates. I would therefore like to remove duplicates from the list, but it's important that I retain the structure of each list
i.e. if elements indices 0, 1 and 2 are all duplicates for a given list, two of these would be removed from the list, but then the same positions elements would also have to be removed from all the other lists too to retain the ordered structure.
Crucially however, it may not be the case that elements with indices 0, 1 and 2 are duplicates in the other lists, and therefore I would only want to do this if I was sure that across the lists, elements indexed by 0, 1 and 2 were always duplicated.
As an example, say I had this list of lists
L = [ [1,1,1,3,3,2,4,6,6],
[5,5,5,4,5,6,5,7,7],
[9,9,9,2,2,7,8,10,10] ]
After applying my method I would like to be left with
L_new = [ [1,3,3,2,4,6],
[5,4,5,6,5,7],
[9,2,2,7,8,10] ]
where you see that elements index 1 and 2 and element 8 have all been constantly removed because they are consistently duplicated across all lists, whereas elements index 3 and 4 have not because they are not always duplicated.
My thinking so far (though I believe this is probably not the best approach and why I asked for help)
def check_duplicates_in_same_position(arr_list):
check_list = []
for arr in arr_list:
duplicate_positions_list = []
positions = {}
for i in range(len(arr)):
item = arr[i]
if item in positions:
positions[item].append(i)
else:
positions[item] = [i]
duplicate_positions = {k: v for k, v in positions.items() if len(v) > 1}
for _, item in duplicate_positions.items():
duplicate_positions_list.append(item)
check_list.append(duplicate_positions_list)
return check_list
This returns a list of lists of lists, where each element is a list that contains a bunch of lists whose elements are the indices of the duplicates for that list as so
[[[0, 1, 2], [3, 4], [7, 8]],
[[0, 1, 2, 4, 6], [7, 8]],
[[0, 1, 2], [3, 4], [7, 8]]]
I then thought to somehow compare these lists and for example remove elements index 1 and 2 and index 8, because these are common matches for each.
Assuming all sub-lists will have the same length, this should work:
l = [ [1,1,1,3,3,2,4,6,6], [5,5,5,4,5,6,5,7,7], [9,9,9,2,2,7,8,10,10] ]
[list(x) for x in zip(*dict.fromkeys(zip(*l)))]
# Output: [[1, 3, 3, 2, 4, 6], [5, 4, 5, 6, 5, 7], [9, 2, 2, 7, 8, 10]]
Explanation:
zip(*l) - This will create a new 1-dimension array. The nth element will be a tuple with all the nth elements in the original sublists:
[(1, 5, 9),
(1, 5, 9),
(1, 5, 9),
(3, 4, 2),
(3, 5, 2),
(2, 6, 7),
(4, 5, 8),
(6, 7, 10),
(6, 7, 10)]
From the previous list, we only want to keep those that are not repeated. There are various ways of achieving this. If you search how to remove duplicates while mantaining order, this answer will pop up. It uses dict.fromkeys(<list>). Since python dict keys must be unique, this removes duplicates and generates the following output:
{(1, 5, 9): None,
(3, 4, 2): None,
(3, 5, 2): None,
(2, 6, 7): None,
(4, 5, 8): None,
(6, 7, 10): None}
We now want to unzip those keys to the original 2-dimensional array. For that, we can use zip again:
zip(*dict.fromkeys(zip(*l)))
Since zip returns tuples, we have to finally convert the tuples to list using a list comprehension:
[list(x) for x in zip(*dict.fromkeys(zip(*l)))]
I would go with something like this. It is not too fast, but dependent on the size of your lists, it could be sufficient.
L = [ [1,1,1,3,3,2,4,6,6], [5,5,5,4,5,6,5,7,7], [9,9,9,2,2,7,8,10,10] ]
azip = zip(*L)
temp_L = []
for zz in azip:
if not zz in temp_L:
temp_L.append(zz)
new_L = [list(zip(*temp_L))[zz] for zz in range(len(L))]
first, we zip the three (or more) lists within L. Then, we iterate over each element, check if it already exists. If not, we add it to our temporary list temp_L. And in the end we restructure temp_L to be of the original format. It returns
new_L
>> [(1, 3, 3, 2, 4, 6), (5, 4, 5, 6, 5, 7), (9, 2, 2, 7, 8, 10)]
I have a list of 2D list which row number is 60 and column number is not fixed. Here, row numbers mean position. How to get every possible combination without changing position?
Let me explain with an example:
Let's make it shorter to understand. Let's assume, row number is 3 instead of 60.
List:
list_val = [[1, 2, 3], [2, 3], [5]]
I want to get the combination:
{1, 2, 5}, {1, 3, 5}, {2, 2, 5}, {2, 3, 5}, {3, 2, 5}, {3, 3, 5}
For three, it should be easy to write three nested loops. But for 60, writing 60 nested loops is not a good idea. Is there any better efficient way to write code in python?
What you're looking for is itertools.product() which implements it in a reasonably efficient way so that you don't need to reinvent it and it's fine for large iterables.
Why is that? It's implemented in C (1, 2) therefore the performance is faster than with your standard pure-Python implementation of the loops unless you use tricks that could achieve comparable speed.
Don't forget to unpack the iterable that has iterables you want to use with star/asterisk (*) for the function or supply it with multiple variables (product(one, two, three)), otherwise it'll behave differently.
>>> from itertools import product
>>> list(product(*[[1, 2, 3], [2,3], [5]]))
[(1, 2, 5), (1, 3, 5), (2, 2, 5), (2, 3, 5), (3, 2, 5), (3, 3, 5)]
>>>
I have a dictionary having following structure
{key1: [1,2,3,4,5], key2: [2,0,4,5,6]}
I need to find maximum and minimum value of each index of the value list, so at index 0, we compare 1 and 2 and choose 2 as the maximum, and 1 as the minimum, etc.
Expected output for my example:
min = [1,0,3,4,5]
max = [2,2,4,5,6]
I cannot use operator as I am not allowed to import it. I tried to used following approach but failed (syntax error). Also I won't iterate through the value set as is not the elegant way (IMO).
maxVal = max(myDictionary.items(), key=(lambda k: myDictionary[k]))
gives me
TypeError: unhashable type: 'list'
Can you correct it or suggest any alternative approach.
You may use zip with min and max:
dct = {'key1': [1,2,3,4,5], 'key2': [2,0,4,5,6]}
[min(i) for i in zip(*dct.values())]
[max(i) for i in zip(*dct.values())]
Output:
[1, 0, 3, 4, 5]
[2, 2, 4, 5, 6]
If you want to get really fancy, you can also use the transpose trick of zip twice to turn this into a one-liner:
min_list, max_list = map(list, zip(*[(min(i), max(i)) for i in zip(*dct.values())]))
min_list
[1, 0, 3, 4, 5]
max_list
[2, 2, 4, 5, 6]
This fancy method behaves badly with empty lists
For example:
dct = {1: [], 2: []}
Will break this method. In fact, pretty much all the ways to break this method involve using an empty list somewhere.
I've mentioned the zip transpose trick twice, so here is why it is necessary here:
If you simply use list(zip(dct.values())), you will get the following output:
[([2, 0, 4, 5, 6],), ([1, 2, 3, 4, 5],)]
This is not the desired result, we want a pairwise comparison of every element at each index of our sublists. However we can leverage the fact that zip is its own tranpose when you use the * operator.
So using list(zip(*dct.values())) provides us with our desired pairwise grouping for comparison:
[(2, 1), (0, 2), (4, 3), (5, 4), (6, 5)]
What are some good ways to define a tuple consisting of integers where the number of occurrences of each item is known ?
For example,
I want to define a tuple with 3 2's, 2 4's and 1, 3, 5 occur once.
For this, I can always go the manual way :
foo = (1, 2, 2, 2, 3, 4, 4, 5)
However, this becomes a bit messy when the number of items in the list is large.
So, I want to know what are some ways to automate the task of generating the desired number of duplicates of each item.
You can do it like this:
>>> (1,) * 1 + (2,) * 3 + (4,) * 2 + (5,) * 1
(1, 2, 2, 2, 4, 4, 5)
One way is to use sequence multiplication. Here's a simple version that makes no attempt to avoid creating unnecessary intermediate objects:
accumulator = ()
for (val, count) in some_data_structure:
accumulator += (val,) * count
That can be improved, the main point is to demonstrate that (1,) * 5 gives you (1, 1, 1, 1, 1). Note that this copies the object reference - that's fine for integers, but can cause confusion if you're trying to multiply a sequence of mutable objects.
If you have a tuple of tuples denoting the value and frequency, you can do the following:
tuples = ((1,1), (2,3), (3,1), (4,2), (5,1))
tuple(i for i, n in tuples for _ in range(n)) # Use xrange in Python 2.X
# (1, 2, 2, 2, 3, 4, 4, 5)
Or, if you know that the values are always going to be 1, 2, 3, ..., n, you can use enumerate with a tuple of the frequencies.
freqs = (1, 3, 1, 2, 1)
tuple(i for i, n in enumerate(freqs, 1) for _ in range(n))
# (1, 2, 2, 2, 3, 4, 4, 5)
If you're curious about the use of the double comprehension in the generator expression, you may want to check out this question.
If your tuple has not many number, you can do it in the simplest way.
(1,)+(2,)*3+(3,)+(4,)*2+(5,)
Otherwise, just turn it into a function.
def myTuple(*val):
return sum(((i,) * n for i, n in val), ())
myTuple((1,1),(2,3),(3,1),(4,2),(5,1))
>>>(1, 2, 2, 2, 3, 4, 4, 5)
you can also call it with:
val = ((1,1),(2,3),(3,1),(4,2),(5,1))
myTuple(*val)
>>>(1, 2, 2, 2, 3, 4, 4, 5)
Something like this could work:
>>> result = tuple()
>>> for item, repeat in ((1, 1), (2, 3), (3, 1), (4, 2), (5, 1)):
... result = result + (item,) * repeat
>>> result
(1, 2, 2, 2, 3, 4, 4, 5)
So you want the inverse function of collections.Counter. Here is how you could do it,
# make a dict of counts (list of tuples is better)
counts = {1: 1, 2: 3, 4: 2, 3:1, 5: 1}
t = tuple(k for k,v in sorted(counts.items()) for _ in range(v))
(1, 2, 2, 2, 3, 4, 4, 5)
# for k,v in list_of_tuples, for a list of tuples
You can define the following function
def a_tuple(*data):
l = []
for i, cnt in data: l.extend([i]*cnt)
return tuple(l)
and use it like this
print(a_tuple((1,1), (2,3), (3,1), (4,2), (5,1)))
to produce the following output
(1, 2, 2, 2, 3, 4, 4, 5)
Have a look to the .extend() method of list if you don't understand how the function works.
I am looking for some help splitting an index from a list containing multiple values and creating a new list with separated values.
L = [(1, 2, 2, 3, 4, 4, 5, 5)]
>>>L[0]
(1, 2, 2, 3, 4, 4, 5, 5)
I need to split that single index into multiple indices in a new list such that:
L_Revised = [1, 2, 2, 3, 4, 4, 5, 5]
>>>L[0]
1
So that I can manipulate the individual indices. What code could I use for this?
Thanks for any help!
For a generalized case, where you can have more than one tuple in the list, you can flatten your list like this:
>>> l = [(1, 2, 3, 4, 5)]
>>> [item for tup in l for item in tup]
[1, 2, 3, 4, 5]
But if it's a single tuple element list, then probably other answer is easier.