How to find common elements inside a list - python

I have a list l1 that looks like [1,2,1,0,1,1,0,3..]. I want to find, for each element the indexes of elements which have same value as the element.
For eg, for the first value in the list, 1, it should list out all indexes where 1 is present in the list and it should repeat same for every element in the list. I can wrote a function to do that iterating through the list but wanted to check if there is any predefined function.
I am getting the list from Pandas dataframe columns, it would be good know if series/dataframe library offer any such functions

You can use numpy.unique, which can return the inverse too. This can be used to reconstruct the indices using numpy.where:
In [49]: a = [1,2,1,0,1,1,0,3,8,10,6,7]
In [50]: uniq, inv = numpy.unique(a, return_inverse=True)
In [51]: r = [(uniq[i], numpy.where(inv == i)[0]) for i in range(uniq.size)]
In [52]: print(r)
[(0, array([3, 6])), (1, array([0, 2, 4, 5])), (2, array([1])), (3, array([7])), (6, array([10])), (7, array([11])), (8, array([8])), (10, array([9]))]

i tried brute force..may be u can optimize
here is python3 code
L = [1,2,1,0,1,1,0,3]
D = dict()
for i in range(len(L)):
n =[]
if L[i] not in D.keys():
for j in range(len(L)):
if L[i] == L[j]:
n.append(j)
D[L[i]] = n
for j in D.keys():
print(j,"->",D.get(j))

You could achieve this using a defaultdict.
from collection import defaultdict
input = [1,2,1,0,1,1,0,3]
#Dictionary to store our indices for each value
index_dict = defaultdict(list)
#Store index for each item
for i, item in enumerate(input):
index_dict[item].append(i)
If you want a list which contains the indices of elements which are the same as the corresponding element in your input list, you can just create a reference to the dictionary:
same_element_indices = [index_dict[x] for x in input]
This has the advantage of only referencing the one object for each identical element.
Output would be:
[[0, 2, 4, 5],
[1],
[0, 2, 4, 5],
[3, 6],
[0, 2, 4, 5],
[0, 2, 4, 5],
[3, 6],
[7]]

You can also try something like this:
import pandas as pd
df = pd.DataFrame({'A': [1,2,1,0,1,1,0,3]})
uni = df['A'].unique()
for i in uni:
lists = df[df['A'] == i].index.tolist()
print(i, '-->', lists)
Output:
1 --> [0, 2, 4, 5]
2 --> [1]
0 --> [3, 6]
3 --> [7]

Related

Find positions of elements in sorted array

Suppose I have some numpy array (all elements are unique) that I want to sort in descending order. I need to find out which positions elements of initial array will take in sorted array.
Example.
In1: [1, 2, 3] # Input
Out1: [2, 1, 0] # Expected output
In2: [1, -2, 2] # Input
Out2: [1, 2, 0] # Expected output
I tried this one:
def find_positions(A):
A = np.array(A)
A_sorted = np.sort(A)[::-1]
return np.argwhere(A[:, None] == A_sorted[None, :])[:, 1]
But it doesn't work when the input array is very large (len > 100000). What I did wrong and how can I resolve it?
Approach #1
We could use double argsort -
np.argsort(a)[::-1].argsort() # a is input array/list
Approach #2
We could use one argsort and then array-assignment -
# https://stackoverflow.com/a/41242285/ #Andras Deak
def argsort_unique(idx):
n = idx.size
sidx = np.empty(n,dtype=int)
sidx[idx] = np.arange(n)
return sidx
out = argsort_unique(np.argsort(a)[::-1])
Take a look at numpy.argsort(...) function:
Returns the indices that would sort an array.
Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as a that index data along the given axis in sorted order.
Here is the reference from the documentation, and the following is a simple example:
import numpy
arr = numpy.random.rand(100000)
indexes = numpy.argsort(arr)
the indexes array will contain all the indexes in the order in which the array arr would be sorted
I face the same problem for plain lists, and would like to avoid using numpy. So I propose a possible solution that should also work for an np.array, and which avoids reversal of the result:
def argsort(A, key=None, reverse=False):
"Indirect sort of list or array A: return indices of elements in order."
keyfunc = (lambda i: A[i]) if key is None else lambda i: key(A[i])
return sorted(range(len(A)), keyfunc, reverse=reverse)
Example of use:
>>> L = [3,1,4,1,5,9,2,6]
>>> argsort( L )
[1, 3, 6, 0, 2, 4, 7, 5]
>>> [L[i]for i in _]
[1, 1, 2, 3, 4, 5, 6, 9]
>>> argsort( L, key=lambda x:(x%2,x) ) # even elements first
[6, 2, 7, 1, 3, 0, 4, 5]
>>> [L[i]for i in _]
[2, 4, 6, 1, 1, 3, 5, 9]
>>> argsort( L, key=lambda x:(x%2,x), reverse = True)
[5, 4, 0, 1, 3, 7, 2, 6]
>>> [L[i]for i in _]
[9, 5, 3, 1, 1, 6, 4, 2]
Feedback would be welcome! (Efficiency compared to previously proposed solutions? Suggestions for improvements?)

Slicing sublists with different lengths

I have a list of lists. Each sublist has a length that varies between 1 and 100. Each sublist contains a particle ID at different times in a set of data. I would like to form lists of all particle IDs at a given time. To do this I could use something like:
list = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8]]
list2 = [item[0] for item in list]
list2 would contain the first elements of each sublist in list. I would like to do this operation not just for the first element, but for every element between 1 and 100. My problem is that element number 100 (or 66 or 77 or whatever) does not exists for every sublist.
Is there some way of creating a lists of lists, where each sublist is the list of all particle IDs at a given time.
I have thought about trying to use numpy arrays to solve this problem, as if the lists were all the same length this would be trivial. I have tried adding -1's to the end of each list to make them all the same length, and then masking the negative numbers, but this hasn't worked for me so far. I will use the list of IDs at a given time to slice another separate array:
pos = pos[satIDs]
lst = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8]]
func = lambda x: [line[x] for line in lst if len(line) > x]
func(3)
[4, 8, 7]
func(4)
[5, 8]
--update--
func = lambda x: [ (line[x],i) for i,line in enumerate(lst) if len(line) > x]
func(4)
[(5, 0), (8, 2)]
You could use itertools.zip_longest. This will zip the lists together and insert None when one of the lists is exhausted.
>>> lst = [[1,2,3,4,5],['A','B','C'],['a','b','c','d','e','f','g']]
>>> list(itertools.zip_longest(*lst))
[(1, 'A', 'a'),
(2, 'B', 'b'),
(3, 'C', 'c'),
(4, None, 'd'),
(5, None, 'e'),
(None, None, 'f'),
(None, None, 'g')]
If you don't want the None elements, you can filter them out:
>>> [[x for x in sublist if x is not None] for sublist in itertools.zip_longest(*lst)]
[[1, 'A', 'a'], [2, 'B', 'b'], [3, 'C', 'c'], [4, 'd'], [5, 'e'], ['f'], ['g']]
Approach #1
One almost* vectorized approach could be suggested that goes along creating ID based on the new order and splitting, like so -
def position_based_slice(L):
# Get lengths of each element in input list
lens = np.array([len(item) for item in L])
# Form ID array that has *ramping* IDs within an element starting from 0
# and restarts with a new element at 0
id_arr = np.ones(lens.sum(),int)
id_arr[lens[:-1].cumsum()] = -lens[:-1]+1
# Get order maintained sorted indices for sorting flattened version of list
ids = np.argsort(id_arr.cumsum(),kind='mergesort')
# Get sorted version and split at boundaries decided by lengths of ids
vals = np.take(np.concatenate(L),ids)
cut_idx = np.where(np.diff(ids)<0)[0]+1
return np.split(vals,cut_idx)
*There is a loop comprehension involved at the start, but being meant to collect just the lengths of the input elements of the list, its effect on the total runtime should be minimal.
Sample run -
In [76]: input_list = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8],[3,2]]
In [77]: position_based_slice(input_list)
Out[77]:
[array([1, 2, 1, 3]), # input_list[ID=0]
array([2, 6, 3, 2]), # input_list[ID=1]
array([3, 7, 6]), # input_list[ID=2]
array([4, 8, 7]), # input_list[ID=3]
array([5, 8])] # input_list[ID=4]
Approach #2
Here's another approach that creates a 2D array, which is easier to index and trace back to original input elements. This uses NumPy broadcasting alongwith boolean indexing. The implementation would look something like this -
def position_based_slice_2Dgrid(L):
# Get lengths of each element in input list
lens = np.array([len(item) for item in L])
# Create a mask of valid places in a 2D grid mapped version of list
mask = lens[:,None] > np.arange(lens.max())
out = np.full(mask.shape,-1,dtype=int)
out[mask] = np.concatenate(L)
return out
Sample run -
In [126]: input_list = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8],[3,2]]
In [127]: position_based_slice_2Dgrid(input_list)
Out[127]:
array([[ 1, 2, 3, 4, 5],
[ 2, 6, 7, 8, -1],
[ 1, 3, 6, 7, 8],
[ 3, 2, -1, -1, -1]])
So, now each column of the output would correspond to your ID based outputting.
If you want it with a one-line forloop and in an array you can do this:
list2 = [[item[i] for item in list if len(item) > i] for i in range(0, 100)]
And if you want to know which id is from which list you can do this:
list2 = [{list.index(item): item[i] for item in list if len(item) > i} for i in range(0, 100)]
list2 would be like this:
[{0: 1, 1: 2, 2: 1}, {0: 2, 1: 6, 2: 3}, {0: 3, 1: 7, 2: 6}, {0: 4, 1: 8, 2: 7},
{0: 5, 2: 8}, {}, {}, ... ]
You could append numpy.nan to your short lists and afterwards create a numpy array
import numpy
import itertools
lst = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8,9]]
arr = numpy.array(list(itertools.izip_longest(*lst, fillvalue=numpy.nan)))
Afterwards you can use numpy slicing as usual.
print arr
print arr[1, :] # [2, 6, 3]
print arr[4, :] # [5, nan, 8]
print arr[5, :] # [nan, nan, 9]

I want to check whether my rows and columns are same

I want to check whether row 1 is equal to column one and row two is equal to column 2 and so on. If a matrix is equal to its transpose.
I tried to solve the problem using the following code, but the function is returning none. Can some one help me with this
x = [[1, 2, 3],
[2, 3, 4],
[3, 4, 1]]
def rows(matrix):
list = [val for val in matrix]
list1 = [i for i in zip(*matrix)]
if list == list1:
return True
else:
return False
rows(x)
zip returns tuples, not lists:
>>> [val for val in x]
[[1, 2, 3], [2, 3, 4], [3, 4, 1]]
>>> [i for i in zip(*x)]
[(1, 2, 3), (2, 3, 4), (3, 4, 1)]
And they don't compare equal to each other:
>>> [1,2,3] == (1,2,3)
False
Instead, you can simply return the results of the comparison after converting to lists:
>>> x == [list(i) for i in zip(*x)]
True
Use map to map the sublists to tuples and compare, mapping to tuple shoud also be more efficient than changing tuples to lists:
def rows(matrix):
return zip(*matrix) == map(tuple, matrix)
The zip function returns a list of tuples:
>>> x = [[1, 2, 3],
[2, 3, 4],
[3, 4, 1]]
>>> zip(*x)
[(1, 2, 3), (2, 3, 4), (3, 4, 1)]
>>> x == zip(*x)
False
A list isn't equal to a tuple, even if it has the same elements. A list of lists isn't equal to a list of tuples, even if the inner lists/tuples contain the same elements. You can do what you want easily, and you were close!
>>> x == [list(i) for i in zip(*x)]
True
You can understand what is wrong with your program by running this:
x = [[1, 2, 3],[2, 3, 4],[3, 4, 1]]
def rows(matrix):
list = [val for val in matrix]
list1 = [i for i in zip(*matrix)]
print list
print list1
if list == list1:
return True
else:
return False
print rows(x)
Here print list prints list of lists, while print list1 prints list of touples.
Since lists and touples are different, the program returns False.

Remove a column from a nested list in Python

I need help figuring how to work around removing a 'column' from a nested list to modify it.
Say I have
L = [[1,2,3,4],
[5,6,7,8],
[9,1,2,3]]
and I want to remove the second column (so values 2,6,1) to get:
L = [[1,3,4],
[5,7,8],
[9,2,3]]
I'm stuck with how to modify the list with just taking out a column. I've done something sort of like this before? Except we were printing it instead, and of course it wouldn't work in this case because I believe the break conflicts with the rest of the values I want in the list.
def L_break(L):
i = 0
while i < len(L):
k = 0
while k < len(L[i]):
print( L[i][k] , end = " ")
if k == 1:
break
k = k + 1
print()
i = i + 1
So, how would you go about modifying this nested list?
Is my mind in the right place comparing it to the code I have posted or does this require something different?
You can simply delete the appropriate element from each row using del:
L = [[1,2,3,4],
[5,6,7,8],
[9,1,2,3]]
for row in L:
del row[1] # 0 for column 1, 1 for column 2, etc.
print L
# outputs [[1, 3, 4], [5, 7, 8], [9, 2, 3]]
If you want to extract that column for later use, while removing it from the original list, use a list comprehension with pop:
>>> L = [[1,2,3,4],
... [5,6,7,8],
... [9,1,2,3]]
>>>
>>> [r.pop(1) for r in L]
[2, 6, 1]
>>> L
[[1, 3, 4], [5, 7, 8], [9, 2, 3]]
Otherwise, just loop over the list and delete the fields you no longer want, as in arshajii's answer
You can use operator.itemgetter, which is created for this very purpose.
from operator import itemgetter
getter = itemgetter(0, 2, 3) # Only indexes which are needed
print(list(map(list, map(getter, L))))
# [[1, 3, 4], [5, 7, 8], [9, 2, 3]]
You can use it in List comprehension like this
print([list(getter(item)) for item in L])
# [[1, 3, 4], [5, 7, 8], [9, 2, 3]]
You can also use nested List Comprehension, in which we skip the elements if the index is 1, like this
print([[item for index, item in enumerate(items) if index != 1] for items in L])
# [[1, 3, 4], [5, 7, 8], [9, 2, 3]]
Note: All these suggested in this answer will not affect the original list. They will generate new lists without the unwanted elements.
Use map-lambda:
print map(lambda x: x[:1]+x[2:], L)
Here is one way, updated to take in kojiro's advice.
>>> L[:] = [i[:1]+i[2:] for i in L]
>>> L
[[1, 3, 4], [5, 7, 8], [9, 2, 3]]
You can generalize this to remove any column:
def remove_column(matrix, column):
return [row[:column] + row[column+1:] for row in matrix]
# Remove 2nd column
copyofL = remove_column(L, 1) # Column is zero-base, so, 1=second column
when you do the del it will delete that index and reset the index, so you have to reduce that index. Here I use the count to reduce and reset the same from the index list we have. Hope this helps. Thanks
nested_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
remove_cols_index = [1,2]
count = 0
for i in remove_cols_index:
i = i-count
count = count+1
del nested_list[i]
print (nested_list)
[j.pop(1) for j in nested_list]
from https://www.geeksforgeeks.org/python-column-deletion-from-list-of-lists/

Merge two lists based on condition

I am trying to merge two lists based on position of index, so sort of a proximity intersection.
A set doesn't work in this case. What i am trying to do is match index in each list then if the element is one less than that of the element in other list, only then i collect it.
An example will explain my scenario better.
Sample Input:
print merge_list([[0, 1, 3], [1, 2], [4, 1, 3, 5]],
[[0, 2, 6], [1, 4], [2, 2], [4, 1, 6]])
Sample Output:
[[0,2],[4,6]]
so on position 0 in list1 we have 1, 3 and in list2 we have 2, 6. Since 1 is one less than 2, so we collect that and move on, now 3 is less than 6 but it's not one less than i.e. not 5 so we ignore that. Next we have [1, 2][1, 4], so both index/position 1, but 2 is not one less than 4 so we ignore that. Next we have [2, 2] in list2 both index 2 doesn't match any index in first list so no comparison. Finally we have [4, 1, 3, 5] [4, 1, 6] comparison. Both index match and only 5 in list one is one less than list two so we collect six hence we collect [4,6] meaning index 4 and match etc.
I have tried to make it work, but i don't seem to make it work.
This is my code so far.
def merge_list(my_list1, my_list2):
merged_list = []
bigger_list = []
smaller_list = []
temp_outer_index = 0
temp_inner_index = 0
if(len(my_list1) > len(my_list2)):
bigger_list = my_list1
smaller_list = my_list2
elif(len(my_list2) > len(my_list1)):
bigger_list = my_list2
smaller_list = my_list1
else:
bigger_list = my_list1
smaller_list = my_list2
for i, sublist in enumerate(bigger_list):
for index1 , val in enumerate(sublist):
for k, sublist2 in enumerate(smaller_list):
for index2, val2 in enumerate(sublist2):
temp_outer_index = index1 + 1
temp_inner_index = index2 + 1
if(temp_inner_index < len(sublist2) and temp_outer_index < len(sublist)):
# print "temp_outer:%s , temp_inner:%s, sublist[temp_outer]:%s, sublist2[temp_inner_index]:%s" % (temp_outer_index, temp_inner_index, sublist[temp_outer_index], sublist2[temp_inner_index])
if(sublist2[temp_inner_index] < sublist[temp_outer_index]):
merged_list.append(sublist[temp_outer_index])
break
return merged_list
No clue what you are doing, but this should work.
First, convert the list of lists to a mapping of indices to set of digits contained in that list:
def convert_list(l):
return dict((sublist[0], set(sublist[1:])) for sublist in l)
This will make the lists a lot easier to work with:
>>> convert_list([[0, 1, 3], [1, 2], [4, 1, 3, 5]])
{0: set([1, 3]), 1: set([2]), 4: set([1, 3, 5])}
>>> convert_list([[0, 2, 6], [1, 4], [2, 2], [4, 1, 6]])
{0: set([2, 6]), 1: set([4]), 2: set([2]), 4: set([1, 6])}
Now the merge_lists function can be written as such:
def merge_lists(l1, l2):
result = []
d1 = convert_list(l1)
d2 = convert_list(l2)
for index, l2_nums in d2.items():
if index not in d1:
#no matching index
continue
l1_nums = d1[index]
sub_nums = [l2_num for l2_num in l2_nums if l2_num - 1 in l1_nums]
if sub_nums:
result.append([index] + sorted(list(sub_nums)))
return result
Works for your test case:
>>> print merge_lists([[0, 1, 3], [1, 2], [4, 1, 3, 5]],
[[0, 2, 6], [1, 4], [2, 2], [4, 1, 6]])
[[0, 2], [4, 6]]
I believe this does what you want it to do:
import itertools
def to_dict(lst):
dct = {sub[0]: sub[1:] for sub in lst}
return dct
def merge_dicts(a, b):
result = []
overlapping_keys = set.intersection(set(a.keys()), set(b.keys()))
for key in overlapping_keys:
temp = [key] # initialize sublist with index
for i, j in itertools.product(a[key], b[key]):
if i == j - 1:
temp.append(j)
if len(temp) > 1: # if the sublist has anything besides the index
result.append(temp)
return result
dict1 = to_dict([[0, 1, 3], [1, 2], [4, 1, 3, 5]])
dict2 = to_dict([[0, 2, 6], [1, 4], [2, 2], [4, 1, 6]])
result = merge_dicts(dict1, dict2)
print(result)
Result:
[[0, 2], [4, 6]]
First, we convert your lists to dicts because they're easier to work with (this separates the key out from the other values). Then, we look for the keys that exist in both dicts (in the example, this is 0, 1, 4) and look at all pairs of values between the two dicts for each key (in the example, 1,2; 1,6; 3,2; 3,6; 2,4; 1,1; 1,6; 3,1; 3,6; 5,1; 5,6). Whenever the first element of a pair is one less than the second element, we add the second element to our temp list. If the temp list ends up containing anything besides the key (i.e. is longer than 1), we add it to the result list, which we eventually return.
(It just occurred to me that this has pretty bad performance characteristics - quadratic in the length of the sublists - so you might want to use Claudiu's answer instead if your sublists are going to be long. If they're going to be short, though, I think the cost of initializing a set is large enough that my solution might be faster.)
def merge_list(a, b):
d = dict((val[0], set(val[1:])) for val in a)
result = []
for val in b:
k = val[0]
if k in d:
match = [x for x in val[1:] if x - 1 in d[k]]
if match:
result.append([k] + match)
return result
Similar to the other answers, this will first convert one of the lists to a dictionary with the first element of each inner list as the key and the remainder of the list as the value. Then we walk through the other list and if the first element exists as a key in the dictionary, we find all values that meet your criteria using the list comprehension and if there were any, add an entry to the result list which is returned at the end.

Categories