Picking the most common element from a bunch of lists - python

I have a list l of lists [l1, ..., ln] of equal length
I want to compare the l1[k], l2[k], ..., ln[k] for all k in len(l1) and make another list l0 by picking the element that appears most frequently.
So, if l1 = [1, 2, 3], l2 = [1, 4, 4] and l3 = [0, 2, 4], then l = [1, 2, 4]. If there is a tie, I will look at the lists that make up the tie and choose the one in the list with higher priority. Priority is given a priori, each list is given a priority.
Ex. if you have value 1 in lists l1 and l3, and value 2 in lists l2 and l4, and 3 in l5, and lists are ordered according to priority, say l5>l2>l3>l1>l4, then I will pick 2, because 2 is in l2 that contains an element with highest occurrence and its priority is higher than l1 and l3.
How do I do this in python without creating a for loop with lots of if/else conditions?

You can use the Counter module from the collections library. Using the map function will reduce your list looping. You will need an if/else statement for the case that there is no most frequent value but only for that:
import collections
list0 = []
list_length = len(your_lists[0])
for k in list_length:
k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
if counts[0][1] > counts[1][1]: #is there a most common value
list0.append(counts[0][0]) #takes the value with highest count
else:
list0.append(k_vals[0]) #takes element from first list
list0 is the answer you are looking for. I just hate using l because it's easy to confuse with the number 1
Edit (based on comments):
Incorporating your comments, instead of the if/else statement, use a while loop:
i = list_length
while counts[0][1] == counts[1][1]:
counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
i -= 1 #go back farther if there's still a tie
list0.append(counts[0][0]) #takes the value with highest count once there's no tie
So the whole thing is now:
import collections
list0 = []
list_length = len(your_lists[0])
for k in list_length:
k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
i = list_length
while counts[0][1] == counts[1][1]: #in case of a tie
counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
i -= 1 #go back farther if there's still a tie
list0.append(counts[0][0]) #takes the value with highest count
You throw in one more tiny loop but on the bright side there's no if/else statements at all!

Just transpose the sublists and get the Counter.most_common element key from each group:
from collections import Counter
lists = [[1, 2, 3],[1, 4, 4],[0, 2, 4]]
print([Counter(sub).most_common(1)[0][0] for sub in zip(*lists)])
If they are individual lists just zip those:
l1, l2, l3 = [1, 2, 3], [1, 4, 4], [0, 2, 4]
print([Counter(sub).most_common(1)[0][0] for sub in zip(l1,l2,l3)])
Not sure how taking the first element from the grouping if there is a tie makes sense as it may not be the one that tied but that is trivial to implement, just get the two most_common and check if their counts are equal:
def most_cm(lists):
for sub in zip(*lists):
# get two most frequent
comm = Counter(sub).most_common(2)
# if their values are equal just return the ele from l1
yield comm[0][0] if len(comm) == 1 or comm[0][1] != comm[1][1] else sub[0]
We also need if len(comm) == 1 in case all the elements are the same or we will get an IndexError.
If you are talking about taking the element that comes from the earlier list in the event of a tie i.e l2 comes before l5 then that is just the same as taking any of the elements that tie.
For a decent number of sublists:
In [61]: lis = [[randint(1,10000) for _ in range(10)] for _ in range(100000)]
In [62]: list(most_cm(lis))
Out[62]: [5856, 9104, 1245, 4304, 829, 8214, 9496, 9182, 8233, 7482]
In [63]: timeit list(most_cm(lis))
1 loops, best of 3: 249 ms per loop

Solution is:
a = [1, 2, 3]
b = [1, 4, 4]
c = [0, 2, 4]
print [max(set(element), key=element.count) for element in zip(a, b, c)]

That's what you're looking for:
from collections import Counter
from operator import itemgetter
l0 = [max(Counter(li).items(), key=itemgetter(1))[0] for li in zip(*l)]

If you are OK taking any one of a set of elements that are tied as most common, and you can guarantee that you won't hit an empty list within your list of lists, then here is a way using Counter (so, from collections import Counter):
l = [ [1, 0, 2, 3, 4, 7, 8],
[2, 0, 2, 1, 0, 7, 1],
[2, 0, 1, 4, 0, 1, 8]]
res = []
for k in range(len(l[0])):
res.append(Counter(lst[k] for lst in l).most_common()[0][0])
Doing this in IPython and printing the result:
In [86]: res
Out[86]: [2, 0, 2, 1, 0, 7, 8]

Try this:
l1 = [1,2,3]
l2 = [1,4,4]
l3 = [0,2,4]
lists = [l1, l2, l3]
print [max(set(x), key=x.count) for x in zip(*lists)]

Related

How to compare lists in python in subgroups

I'm new in python so any help or recomendation is appreciated.
What I'm trying to do is, having two lists (not necessarily inverted).
For instance:
l1 = [1,2,3,4,5]
l2 = [5,4,3,2,1]
Comparing them to return the common values, but not as anyone would normally do, which in this case, the return will be all the elements of the list, because they are the same, just inverted.
What I'm trying to compare is, the same thing but like in stages, or semi portions of the list, and check if there is any coincidence until there, if it is, return that element, if not, keep looking in the next group.
For instance:
the first iteration, would check (having the lists previously defined:
l1 = [1]
l2 = [5]
#is there any coincidence until there? -> false (keep looking)
2nd iteration:
l1 = [1, 2]
l2 = [5, 4]
#is there any coincidence until there? -> false (keep looking)
3rd iteration:
l1 = [1, 2, 3]
l2 = [5, 4, 3]
#is there any coincidence until there? -> true (returns 3,
#which is the element where the coincidence was found, not necessarily
#the same index in both lists)
Having in mind that it will compare the last element from the first list with all from the second till that point, which in this case will be just the first from the second list, if no matches, keep trying with the element immediately preceding the last from the first list with all from the second, and so on, returning the first item that matches.
Another example to clarify:
l1 = [1,2,3,4,5]
l2 = [3,4,5,6,7]
And the output will be 3
A tricky one:
l1 = [1,2,3,4]
l2 = [2,1,4,5]
1st iteration
l1 = [1]
l2 = [2]
# No output
2nd iteration
l1 = [1,2]
l2 = [2,1]
# Output will be 2
Since that element was found in the second list too, and the item that I'm checking first is the last of the first list [1,2], and looking if it is also in the sencond list till that point [2,1].
All of this for needing to implementate the bidirectional search, but I'm finding myself currently stuck in this step as I'm not so used to the for loops and list handling yet.
you can compare the elements of the two lists in the same loop:
l1 = [1,2,3,4,5]
l2 = [5,4,3,2,1]
for i, j in zip(l1, l2):
if i == j:
print('true')
else:
print('false')
It looks like you're really asking: What is (the index of) the first element that l1 and l2 have in common at the same index?
The solution:
next((i, a) for i, (a, b) in enumerate(zip(l1, l2)) if a == b)
How this works:
zip(l1, l2) pairs up elements from l1 and l2, generating tuples
enumerate() gets those tuples, and keeps track of the index, i.e. (0, (1, 5), (1, (2, 4)), etc.
for i, (a, b) in .. generates those pairs of indices and value tuples
The if a == b ensures that only those indices and values where the values match are yielded
next() gets the next element from an iterable, you're interested in the first element that matches the condition, so that's what next() gets you here.
The working example:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 4, 3, 2, 1]
i, v = next((i, a) for i, (a, b) in enumerate(zip(l1, l2)) if a == b)
print(f'index: {i}, value: {v}') # prints "index: 2, value: 3"
If you're not interested in the index, but just in the first value they have in common:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 4, 3, 2, 1]
v = next(a for a, b in zip(l1, l2) if a == b)
print(v) # prints "3"
Edit: you commented and updated the question, and it's clear you don't want the first match at the same index between the lists, but rather the first common element in the heads of the lists.
(or, possibly the first element from the second list that is in the first list, which user #AndrejKesely provided an answer for - which you accepted, although it doesn't appear to answer the problem as described)
Here's a solution that gets the first match from the first part of each list, which seems to match what you describe as the problem:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 2, 6, 7, 8]
v = next(next(iter(x)) for n in range(max(len(l1), len(l2))) if (x := set(l1[:n+1]) & set(l2[:n+1])))
print(v) # prints "2"
Note: the solution fails if there is no match at all, with a StopIteration. Using short-circuiting with any() that can be avoided:
x = None if not any((x := set(l1[:n+1]) & set(l2[:n+1])) for n in range(max(len(l1), len(l2)))) else next(iter(x))
print(x)
This solution has x == None if there is no match, and otherwise x will be the first match in the shortest heads of both lists, so:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 2, 6, 7, 8] # result 2
l1 = [1, 2, 3, 4, 5]
l2 = [5, 6, 7, 8] # result 5
l1 = [1, 2, 3, 4, 5]
l2 = [6, 7, 8] # result None
Note that also:
l1 = [1, 2, 3]
l2 = [4, 3, 2] # result 2, not 3
Both 2 and 3 seem to be valid answers here, it's not clear from your description why 3 should be favoured over 2?
If you do need that element of the two possible answers that comes first in l2, the solution would be a bit more complicated still, since the sets are unordered by definition, so changing the order of l1 and l2 in the answer won't matter.
If you care about that order, this works:
x = None if not any(x := ((set(l1[:n//2+1+n%2]) & set(l2[:n//2+1]))) for n in range(max(len(l1), len(l2)) * 2)) else next(iter(x))
This also works for lists with different lengths, unlike the more readable answer by user #BenGrossmann. Note that they have some efficiency in reusing the constructed sets and adding one element at a time, which also allows them to remember the last element added to the set corresponding with the first list, which is why they also correctly favor 3 over 2 in [[1, 2, 3], [4, 3, 2]].
If the last answer is what you need, you should consider amending their answer (for example using zip_longest) to deal correctly with lists of different lengths, since it will be more efficient for longer lists, and is certainly more readable.
Taking the solution from #BenGrossman, but generalising it for any number of lists, with any number of elements, and favouring the ordering you specified:
from itertools import zip_longest
lists = [[1, 2, 3, 4, 5],
[6, 7, 8, 5, 4]]
sets = [set() for _ in range(len(lists))]
for xs in zip_longest(*lists):
for x, s in zip(xs, sets):
s.add(x)
if i := set.intersection(*sets):
v = sorted([(lists[0].index(x), x) for x in i])[-1][1]
break
else:
v = None
print(v)
This works as described for all the examples, as well as for lists of unequal length, and will favour the elements that are farthest back in the first list (and thus earlier in the others).
The following can be made more efficient, but does work.
lists = [[1,2,3,4,5], # input to the script
[5,4,3,2,1]]
sets = [set(), set()]
for a,b in zip(*lists):
sets[0].add(a)
sets[1].add(b)
if sets[0]&sets[1]:
print("first element in first overlap:")
print(a)
break
else:
print("no overlap")
This results in the output
first element in first overlap:
3
Using lists = [[5,7,6],[7,5,4]] instead results in
first element in first overlap:
7

How to use Python to filter list based on the next requirements?

List contains only integers. I need to select elements with the next requirements (every requirement is a separate problem):
Elements between 1 and 1: [2,1,3,1,3] -> [3], [2,1,3,4,1,3] -> [3,4]
Elements between same numbers: a) only one pair of same number is allowed: [1,2,3,2] -> [3] (2 and 2), [1,4,3,5,4] -> [3,5], (4 and 4)
b) multiple pairs is allowed: [1,2,1,2] -> [[2],[1]], (pairs of: (1,1),(2,2)); [1,4,3,5,4,3,2] -> [[3,5],[5,4], (pairs of: (4,4), (3,3))
Elements that have more than one same neighborhood: [1,1,3] -> [1,1], (2 consecutive 1s), [1,1,3,2,2,2,1] -> [1,1,2,2,2] (2 consecutive 1s and 3 consecutive 2s)
What is general approach to this problem? I have worked with filter but only with one parameter predicate: filter(lambda x: (x%2 == 0), numbers)
Is there another approach rather than nested for loops? Maybe more functional style. Is it possible to use convolutions as a solution?
For Q1, you can create a list with the indices of 1s and iterate over the list again to find the items between ones.
ones = [i for i,x in enumerate(lst) if x==1]
for i,j in zip(ones, ones[1:]):
print(lst[i+1:j])
Output:
[2,1,3,1,3] -> [3]
[2,1,3,4,1,3] -> [3, 4]
For Q2, similar to Q1, iterate over the list to find the indices of items and keep them in a dictionary. Then iterate over it to find the numbers that occur more than once and print out the items between these numbers:
d = {}
for i,x in enumerate(lst):
d.setdefault(x, []).append(i)
for k,indices in d.items():
if len(indices)>1:
for i,j in zip(indices, indices[1:]):
print(lst[i+1:j])
Output:
[1,2,3,2] -> [3]
[1,4,3,5,4] -> [3, 5]
Here is a solution for Q1/Q3 using itertools and of Q2 using a classical loop:
Q1
This one groups by equality to 1, this then drops the first and last group (either 1 or not 1 but in any case we don't want to keep), then drops the groups equal to 1 leaving only the inner groups is any.
l1 = [2,1,3,4,1,3,6,1,0]
from itertools import groupby, chain
list(chain.from_iterable(g for k,g in
[(k,list(g)) for k,g in
groupby(l1, lambda x: x==1)][1:-1]
if not k))
# [3, 4, 3, 6]
Q3
Here we group by identical consecutive values and filter using the group length if greater than 1.
NB. there will be side effects if a value is present more than 2 times, in which case the expected behavior should be explicited
l3 = [1,1,3,2,2,2,1]
from itertools import groupby, chain
list(chain.from_iterable(l for k,g in groupby(l3) if len(l:=list(g))>1))
# [1, 1, 2, 2, 2]
Q2
For this one, we first read the list to identify duplicated values. Then we read the list again and add the value to a dictionary of lists with duplicates as key after the key was encountered once.
l2 = [1,4,3,5,4,3,2]
from collections import Counter
dups = {k: [] for k,v in Counter(l2).items() if v>1}
active = set()
for i in l2:
if i in dups: # if many keys are expected initialize and use a set of the keys to improve efficiency
active.remove(i) if i in active else active.add(i)
for k in active:
if i != k:
dups[k].append(i)
list(dups.values())
# [[3, 5], [5, 4]]
Part 3 can be solved without nested for loops using itertools.groupby
from itertools import groupby
def repeating_sections(lst):
out_lst = []
for key, group in groupby(lst):
_lst = list(group)
if len(_lst) > 1:
out_lst += _lst
return out_lst
print(repeating_sections([1, 1, 3, 2, 2, 2, 1])
# [1, 1, 2, 2, 2]
Maybe you can create a function?
It meets all three criteria
def filterNumbers(numList, targetNum) -> list:
firstPosition = numList.index(targetNum)
lastPosition = numList.index(targetNum, firstPosition + 2)
while True:
try:
if lastPosition != (len(numList)-1):
if numList[lastPosition] == numList[lastPosition+1]:
lastPosition = lastPosition + 1
continue
break
except Exception as e:
print(e)
resultList = numList[firstPosition+1:lastPosition]
return resultList
Condition 1:
test1 = [2, 1, 3, 1, 3]
filterNumbers(test1, 1)
#Output:
[3]
Condition 2:
test2 = [1, 4, 3, 5, 4]
filterNumbers(test2, 4)
#Output:
[3, 5]
Condition 3:
test3 = [1,1,3,2,2,2,1]
filterNumbers(test3, 1)
#Output
[1, 3, 2, 2, 2]
I am assuming there is a typo while you were writing the third condition because you skipped the 3 and included the first 1.

How to divide a list by value and join back in original order?

For example, if I have a list of [1,2,3,1,1,2], I want to
divide it by values, e.g., to [1,1,1], [2,2], [3],
perform an action (e.g., plus index, [1,1,1] becomes [1,2,3], and [2,2] becomes [2,3]),
and 3) rejoin with the original index, yielding [1,2,3,2,3,3]. Is there an efficient way I could do so?
You can use itertools.count with collections.OrderedDict:
import itertools, collections as cl
def group(l):
d, d1 = cl.OrderedDict(), cl.defaultdict(itertools.count)
for i in l:
d.setdefault(i, []).append(i+next(d1[i]))
return list(itertools.chain(*d.values()))
print(group([1, 2, 3, 1, 1, 2]))
print(group([1, 3, 2, 1, 1, 2]))
Output:
[1, 2, 3, 2, 3, 3]
[1, 2, 3, 3, 2, 3]
A solution that does not import any libraries.
l = [1,2,3,1,1,2]
n = []
c = -1
#group
#count how many instances of a number there are in a list
#list.remove(num) removes a single instance of a number, so using the previous, we loop it and delete all instances
#before deleting the instance we add it to an array as many times as there are instances of it
#c variable is a counter, whenever we are done "transporting" instances, we change the array in which we place the new instances
#all of the said arrays are contained within one array, that way we can use .append([]) to add as many new arrays for as many instances as we want
while True:
try:
num = l[0]
c += 1
except IndexError:
break
n.append([])
for _ in range(l.count(num)):
n[c].append(num)
l.remove(num)
#plus index
for i in range(len(n)):
for j in range(len(n[i])):
n[i][j] += j
#join
for i in range(len(n)):
l.extend(n[i])
#print
print(l)

Group repeated elements of a list

I am trying to create a function that receives a list and return another list with the repeated elements.
For example for the input A = [2,2,1,1,3,2] (the list is not sorted) and the function would return result = [[1,1], [2,2,2]]. The result doesn't need to be sorted.
I already did it in Wolfram Mathematica but now I have to translate it to python3, Mathematica has some functions like Select, Map and Split that makes it very simple without using long loops with a lot of instructions.
result = [[x] * A.count(x) for x in set(A) if A.count(x) > 1]
Simple approach:
def grpBySameConsecutiveItem(l):
rv= []
last = None
for elem in l:
if last == None:
last = [elem]
continue
if elem == last[0]:
last.append(elem)
continue
if len(last) > 1:
rv.append(last)
last = [elem]
return rv
print grpBySameConsecutiveItem([1,2,1,1,1,2,2,3,4,4,4,4,5,4])
Output:
[[1, 1, 1], [2, 2], [4, 4, 4, 4]]
You can sort your output afterwards if you want to have it sorted or sort your inputlist , then you wouldnt get consecutive identical numbers any longer though.
See this https://stackoverflow.com/a/4174955/7505395 for how to sort lists of lists depending on an index (just use 0) as all your inner lists are identical.
You could also use itertools - it hast things like TakeWhile - that looks much smarter if used
This will ignore consecutive ones, and just collect them all:
def grpByValue(lis):
d = {}
for key in lis:
if key in d:
d[key] += 1
else:
d[key] = 1
print(d)
rv = []
for k in d:
if (d[k]<2):
continue
rv.append([])
for n in range(0,d[k]):
rv[-1].append(k)
return rv
data = [1,2,1,1,1,2,2,3,4,4,4,4,5,4]
print grpByValue(data)
Output:
[[1, 1, 1, 1], [2, 2, 2], [4, 4, 4, 4, 4]]
You could do this with a list comprehension:
A = [1,1,1,2,2,3,3,3]
B = []
[B.append([n]*A.count(n)) for n in A if B.count([n]*A.count(n)) == 0]
outputs [[1,1,1],[2,2],[3,3,3]]
Or more pythonically:
A = [1,2,2,3,4,1,1,2,2,2,3,3,4,4,4]
B = []
for n in A:
if B.count([n]*A.count(n)) == 0:
B.append([n]*A.count(n))
outputs [[1,1,1],[2,2,2,2,2],[3,3,3],[4,4,4,4]]
Works with sorted or unsorted list, if you need to sort the list before hand you can do for n in sorted(A)
This is a job for Counter(). Iterating over each element, x, and checking A.count(x) has a O(N^2) complexity. Counter() will count how many times each element exists in your iterable in one pass and then you can generate your result by iterating over that dictionary.
>>> from collections import Counter
>>> A = [2,2,1,1,3,2]
>>> counts = Counter(A)
>>> result = [[key] * value for key, value in counts.items() if value > 1]
>>> result
[[2, 2, 2], [[1, 1]]

How to find elements existing in two lists but with different indexes

I have two lists of the same length which contains a variety of different elements. I'm trying to compare them to find the number of elements which exist in both lists, but have different indexes.
Here are some example inputs/outputs to demonstrate what I mean:
>>> compare([1, 2, 3, 4], [4, 3, 2, 1])
4
>>> compare([1, 2, 3], [1, 2, 3])
0
# Each item in the first list has the same index in the other
>>> compare([1, 2, 4, 4], [1, 4, 4, 2])
2
# The 3rd '4' in both lists don't count, since they have the same indexes
>>> compare([1, 2, 3, 3], [5, 3, 5, 5])
1
# Duplicates don't count
The lists are always the same size.
This is the algorithm I have so far:
def compare(list1, list2):
# Eliminate any direct matches
list1 = [a for (a, b) in zip(list1, list2) if a != b]
list2 = [b for (a, b) in zip(list1, list2) if a != b]
out = 0
for possible in list1:
if possible in list2:
index = list2.index(possible)
del list2[index]
out += 1
return out
Is there a more concise and eloquent way to do the same thing?
This python function does hold for the examples you provided:
def compare(list1, list2):
D = {e:i for i, e in enumerate(list1)}
return len(set(e for i, e in enumerate(list2) if D.get(e) not in (None, i)))
since duplicates don't count, you can use sets to find only the elements in each list. A set only holds unique elements. Then select only the elements shared between both using list.index
def compare(l1, l2):
s1, s2 = set(l1), set(l2)
shared = s1 & s2 # intersection, only the elements in both
return len([e for e in shared if l1.index(e) != l2.index(e)])
You can actually bring this down to a one-liner if you want
def compare(l1, l2):
return len([e for e in set(l1) & set(l2) if l1.index(e) != l2.index(e)])
Alternative:
Functionally you can use the reduce builtin (in python3, you have to do from functools import reduce first). This avoids construction of the list which saves excess memory usage. It uses a lambda function to do the work.
def compare(l1, l2):
return reduce(lambda acc, e: acc + int(l1.index(e) != l2.index(e)),
set(l1) & set(l2), 0)
A brief explanation:
reduce is a functional programming contruct that reduces an iterable to a single item traditionally. Here we use reduce to reduce the set intersection to a single value.
lambda functions are anonymous functions. Saying lambda x, y: x + 1 is like saying def func(x, y): return x + y except that the function has no name. reduce takes a function as its first argument. The first argument a the lambda receives when used with reduce is the result of the previous function, the accumulator.
set(l1) & set(l2) is a set consisting of unique elements that are in both l1 and l2. It is iterated over, and each element is taken out one at a time and used as the second argument to the lambda function.
0 is the initial value for the accumulator. We use this since we assume there are 0 shared elements with different indices to start.
I dont claim it is the simplest answer, but it is a one-liner.
import numpy as np
import itertools
l1 = [1, 2, 3, 4]
l2 = [1, 3, 2, 4]
print len(np.unique(list(itertools.chain.from_iterable([[a,b] for a,b in zip(l1,l2) if a!= b]))))
I explain:
[[a,b] for a,b in zip(l1,l2) if a!= b]
is the list of couples from zip(l1,l2) with different items. Number of elements in this list is number of positions where items at same position differ between the two lists.
Then, list(itertools.chain.from_iterable() is for merging component lists of a list. For instance :
>>> list(itertools.chain.from_iterable([[3,2,5],[5,6],[7,5,3,1]]))
[3, 2, 5, 5, 6, 7, 5, 3, 1]
Then, discard duplicates with np.unique(), and take len().

Categories