How to compare lists in python in subgroups - python

I'm new in python so any help or recomendation is appreciated.
What I'm trying to do is, having two lists (not necessarily inverted).
For instance:
l1 = [1,2,3,4,5]
l2 = [5,4,3,2,1]
Comparing them to return the common values, but not as anyone would normally do, which in this case, the return will be all the elements of the list, because they are the same, just inverted.
What I'm trying to compare is, the same thing but like in stages, or semi portions of the list, and check if there is any coincidence until there, if it is, return that element, if not, keep looking in the next group.
For instance:
the first iteration, would check (having the lists previously defined:
l1 = [1]
l2 = [5]
#is there any coincidence until there? -> false (keep looking)
2nd iteration:
l1 = [1, 2]
l2 = [5, 4]
#is there any coincidence until there? -> false (keep looking)
3rd iteration:
l1 = [1, 2, 3]
l2 = [5, 4, 3]
#is there any coincidence until there? -> true (returns 3,
#which is the element where the coincidence was found, not necessarily
#the same index in both lists)
Having in mind that it will compare the last element from the first list with all from the second till that point, which in this case will be just the first from the second list, if no matches, keep trying with the element immediately preceding the last from the first list with all from the second, and so on, returning the first item that matches.
Another example to clarify:
l1 = [1,2,3,4,5]
l2 = [3,4,5,6,7]
And the output will be 3
A tricky one:
l1 = [1,2,3,4]
l2 = [2,1,4,5]
1st iteration
l1 = [1]
l2 = [2]
# No output
2nd iteration
l1 = [1,2]
l2 = [2,1]
# Output will be 2
Since that element was found in the second list too, and the item that I'm checking first is the last of the first list [1,2], and looking if it is also in the sencond list till that point [2,1].
All of this for needing to implementate the bidirectional search, but I'm finding myself currently stuck in this step as I'm not so used to the for loops and list handling yet.

you can compare the elements of the two lists in the same loop:
l1 = [1,2,3,4,5]
l2 = [5,4,3,2,1]
for i, j in zip(l1, l2):
if i == j:
print('true')
else:
print('false')

It looks like you're really asking: What is (the index of) the first element that l1 and l2 have in common at the same index?
The solution:
next((i, a) for i, (a, b) in enumerate(zip(l1, l2)) if a == b)
How this works:
zip(l1, l2) pairs up elements from l1 and l2, generating tuples
enumerate() gets those tuples, and keeps track of the index, i.e. (0, (1, 5), (1, (2, 4)), etc.
for i, (a, b) in .. generates those pairs of indices and value tuples
The if a == b ensures that only those indices and values where the values match are yielded
next() gets the next element from an iterable, you're interested in the first element that matches the condition, so that's what next() gets you here.
The working example:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 4, 3, 2, 1]
i, v = next((i, a) for i, (a, b) in enumerate(zip(l1, l2)) if a == b)
print(f'index: {i}, value: {v}') # prints "index: 2, value: 3"
If you're not interested in the index, but just in the first value they have in common:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 4, 3, 2, 1]
v = next(a for a, b in zip(l1, l2) if a == b)
print(v) # prints "3"
Edit: you commented and updated the question, and it's clear you don't want the first match at the same index between the lists, but rather the first common element in the heads of the lists.
(or, possibly the first element from the second list that is in the first list, which user #AndrejKesely provided an answer for - which you accepted, although it doesn't appear to answer the problem as described)
Here's a solution that gets the first match from the first part of each list, which seems to match what you describe as the problem:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 2, 6, 7, 8]
v = next(next(iter(x)) for n in range(max(len(l1), len(l2))) if (x := set(l1[:n+1]) & set(l2[:n+1])))
print(v) # prints "2"
Note: the solution fails if there is no match at all, with a StopIteration. Using short-circuiting with any() that can be avoided:
x = None if not any((x := set(l1[:n+1]) & set(l2[:n+1])) for n in range(max(len(l1), len(l2)))) else next(iter(x))
print(x)
This solution has x == None if there is no match, and otherwise x will be the first match in the shortest heads of both lists, so:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 2, 6, 7, 8] # result 2
l1 = [1, 2, 3, 4, 5]
l2 = [5, 6, 7, 8] # result 5
l1 = [1, 2, 3, 4, 5]
l2 = [6, 7, 8] # result None
Note that also:
l1 = [1, 2, 3]
l2 = [4, 3, 2] # result 2, not 3
Both 2 and 3 seem to be valid answers here, it's not clear from your description why 3 should be favoured over 2?
If you do need that element of the two possible answers that comes first in l2, the solution would be a bit more complicated still, since the sets are unordered by definition, so changing the order of l1 and l2 in the answer won't matter.
If you care about that order, this works:
x = None if not any(x := ((set(l1[:n//2+1+n%2]) & set(l2[:n//2+1]))) for n in range(max(len(l1), len(l2)) * 2)) else next(iter(x))
This also works for lists with different lengths, unlike the more readable answer by user #BenGrossmann. Note that they have some efficiency in reusing the constructed sets and adding one element at a time, which also allows them to remember the last element added to the set corresponding with the first list, which is why they also correctly favor 3 over 2 in [[1, 2, 3], [4, 3, 2]].
If the last answer is what you need, you should consider amending their answer (for example using zip_longest) to deal correctly with lists of different lengths, since it will be more efficient for longer lists, and is certainly more readable.
Taking the solution from #BenGrossman, but generalising it for any number of lists, with any number of elements, and favouring the ordering you specified:
from itertools import zip_longest
lists = [[1, 2, 3, 4, 5],
[6, 7, 8, 5, 4]]
sets = [set() for _ in range(len(lists))]
for xs in zip_longest(*lists):
for x, s in zip(xs, sets):
s.add(x)
if i := set.intersection(*sets):
v = sorted([(lists[0].index(x), x) for x in i])[-1][1]
break
else:
v = None
print(v)
This works as described for all the examples, as well as for lists of unequal length, and will favour the elements that are farthest back in the first list (and thus earlier in the others).

The following can be made more efficient, but does work.
lists = [[1,2,3,4,5], # input to the script
[5,4,3,2,1]]
sets = [set(), set()]
for a,b in zip(*lists):
sets[0].add(a)
sets[1].add(b)
if sets[0]&sets[1]:
print("first element in first overlap:")
print(a)
break
else:
print("no overlap")
This results in the output
first element in first overlap:
3
Using lists = [[5,7,6],[7,5,4]] instead results in
first element in first overlap:
7

Related

Add two lists which return a list the addition of the adjacent element

I want to write a function add_list, which adds two lists adjacent elements.
E.g. l1 = [1, 2, 3], l2= [1,2,3] should give [2,4,6]. I am lost and not sure how to approach it using loops. Can someone help please?
You can iterate both the lists using zip and then use list comprehension on them
[x+y for x,y in zip(l1, l2)]
Sample run:
>>l1 = [1, 2, 3]
>>l2= [1,2,3]
>>[x+y for x,y in zip(l1, l2)]
[2, 4, 6]
Other possible solution is to iterate through the index (can be used in list comprehension as well)
result = []
for i in range(len(l1)):
result.append(l1[i] + l2[i])
Output:
>>result
[2, 4, 6]
The following code will add numbers in two given list provided that both have same number of elements
def add_list(a, b):
result = [] # empty list
# loop through all the elements of the list
for i in range(len(a)):
# insert addition into results
result.append(a[i] + b[i])
return result
l1 = [1, 2, 3]
l2 = [1, 2, 3]
print(add_list(l1, l2))

Sort a list from an index to another index [duplicate]

This question already has answers here:
Sort a part of a list in place
(3 answers)
Closed 3 years ago.
Suppose I have a list [2, 4, 1, 3, 5].
I want to sort the list just from index 1 to the end, which gives me [2, 1, 3, 4, 5]
How can I do it in Python?
(No extra spaces would be appreciated)
TL;DR:
Use sorted with a slicing assignment to keep the original list object without creating a new one:
l = [2, 4, 1, 3, 5]
l[1:] = sorted(l[1:])
print(l)
Output:
[2, 1, 3, 4, 5]
Longer Answer:
After the list is created, we will make a slicing assignment:
l[1:] =
Now you might be wondering what does [1:], it is slicing the list and starts from the second index, so the first index will be dropped. Python's indexing starts from zero, : means get everything after the index before, but if it was [1:3] it will only get values that are in between the indexes 1 and 3, let's say your list is:
l = [1, 2, 3, 4, 5]
If you use:
print(l[1:])
It will result in:
[2, 3, 4, 5]
And if you use:
print(l[1:3])
It will result in:
[2, 3]
About slicing, read more here if you want to.
And after slicing we have an equal sign =, that just simply changes what's before the = sign to what's after the = sign, so in this case, we use l[1:], and that gives [2, 3, 4, 5], it will change that to whatever is after the = sign.
If you use:
l[1:] = [100, 200, 300, 400]
print(l)
It will result in:
[1, 100, 200, 300, 400]
To learn more about it check out this.
After that, we got sorted, which is default builtin function, it simple sorts the list from small to big, let's say we have the below list:
l = [3, 2, 1, 4]
If you use:
print(sorted(l))
It will result in:
[1, 2, 3, 4]
To learn more about it check this.
After that we come back to our first topic about slicing, with l[1:], but from here you know that it isn't only used for assignments, you can apply functions to it and deal with it, like here we use sorted.
Maybe temporarily put something there that's smaller than the rest? Should be faster than the other solutions. And gets as close to your "No extra spaces" wish as you can get when using sort or sorted.
>>> tmp = l[0]
>>> l[0] = float('-inf')
>>> l.sort()
>>> l[0] = tmp
>>> l
[2, 1, 3, 4, 5]
Benchmarks
For the example list, 1,000,000 iterations (and mine of course preparing that special value only once):
sort_u10 0.8149 seconds
sort_chris 0.8569 seconds
sort_heap 0.7550 seconds
sort_heap2 0.5982 seconds # using -1 instead of -inf
For 50,000 lists like [int(x) for x in os.urandom(100)]:
sort_u10 0.4778 seconds
sort_chris 0.4786 seconds
sort_heap 0.8106 seconds
sort_heap2 0.4437 seconds # using -1 instead of -inf
Benchmark code:
import timeit, os
def sort_u10(l):
l[1:] = sorted(l[1:])
def sort_chris(l):
l = l[:1] + sorted(l[1:])
def sort_heap(l, smallest=float('-inf')):
tmp = l[0]
l[0] = smallest
l.sort()
l[0] = tmp
def sort_heap2(l):
tmp = l[0]
l[0] = -1
l.sort()
l[0] = tmp
for _ in range(3):
for sort in sort_u10, sort_chris, sort_heap, sort_heap2, sort_rev:
number, repeat = 1_000_000, 5
data = iter([[2, 4, 1, 3, 5] for _ in range(number * repeat)])
# number, repeat = 50_000, 5
# data = iter([[int(x) for x in os.urandom(100)] for _ in range(number * repeat)])
t = timeit.repeat(lambda: sort(next(data)), number=number, repeat=repeat)
print('%10s %.4f seconds' % (sort.__name__, min(t)))
print()
Use sorted with slicing:
l[:1] + sorted(l[1:])
Output:
[2, 1, 3, 4, 5]
For the special case that you actually have, according to our comments:
Q: I'm curious: Why do you want this? – Heap Overflow
A: I'm trying to make a next_permutation() in python – nwice13
Q: Do you really need to sort for that, though? Not just reverse? – Heap Overflow
A: Yup, reverse is ok, but I just curious to ask about sorting this way. – nwice13
I'd do that like this:
l[1:] = l[:0:-1]
You can define your own function in python using slicing and sorted and this function (your custom function) should take start and end index of the list.
Since list is mutable in python, I have written the function in such a way it doesn't modify the list passed. Feel free to modify the function. You can modify the list passed to this function to save memory if required.
def sortedList(li, start=0, end=None):
if end is None:
end = len(li)
fi = []
fi[:start] = li[:start]
fi[start:end] = sorted(li[start:end])
return fi
li = [2, 1, 4, 3, 0]
print(li)
print(sortedList(li, 1))
Output:
[2, 1, 4, 3, 0]
[2, 0, 1, 3, 4]

Picking the most common element from a bunch of lists

I have a list l of lists [l1, ..., ln] of equal length
I want to compare the l1[k], l2[k], ..., ln[k] for all k in len(l1) and make another list l0 by picking the element that appears most frequently.
So, if l1 = [1, 2, 3], l2 = [1, 4, 4] and l3 = [0, 2, 4], then l = [1, 2, 4]. If there is a tie, I will look at the lists that make up the tie and choose the one in the list with higher priority. Priority is given a priori, each list is given a priority.
Ex. if you have value 1 in lists l1 and l3, and value 2 in lists l2 and l4, and 3 in l5, and lists are ordered according to priority, say l5>l2>l3>l1>l4, then I will pick 2, because 2 is in l2 that contains an element with highest occurrence and its priority is higher than l1 and l3.
How do I do this in python without creating a for loop with lots of if/else conditions?
You can use the Counter module from the collections library. Using the map function will reduce your list looping. You will need an if/else statement for the case that there is no most frequent value but only for that:
import collections
list0 = []
list_length = len(your_lists[0])
for k in list_length:
k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
if counts[0][1] > counts[1][1]: #is there a most common value
list0.append(counts[0][0]) #takes the value with highest count
else:
list0.append(k_vals[0]) #takes element from first list
list0 is the answer you are looking for. I just hate using l because it's easy to confuse with the number 1
Edit (based on comments):
Incorporating your comments, instead of the if/else statement, use a while loop:
i = list_length
while counts[0][1] == counts[1][1]:
counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
i -= 1 #go back farther if there's still a tie
list0.append(counts[0][0]) #takes the value with highest count once there's no tie
So the whole thing is now:
import collections
list0 = []
list_length = len(your_lists[0])
for k in list_length:
k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
i = list_length
while counts[0][1] == counts[1][1]: #in case of a tie
counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
i -= 1 #go back farther if there's still a tie
list0.append(counts[0][0]) #takes the value with highest count
You throw in one more tiny loop but on the bright side there's no if/else statements at all!
Just transpose the sublists and get the Counter.most_common element key from each group:
from collections import Counter
lists = [[1, 2, 3],[1, 4, 4],[0, 2, 4]]
print([Counter(sub).most_common(1)[0][0] for sub in zip(*lists)])
If they are individual lists just zip those:
l1, l2, l3 = [1, 2, 3], [1, 4, 4], [0, 2, 4]
print([Counter(sub).most_common(1)[0][0] for sub in zip(l1,l2,l3)])
Not sure how taking the first element from the grouping if there is a tie makes sense as it may not be the one that tied but that is trivial to implement, just get the two most_common and check if their counts are equal:
def most_cm(lists):
for sub in zip(*lists):
# get two most frequent
comm = Counter(sub).most_common(2)
# if their values are equal just return the ele from l1
yield comm[0][0] if len(comm) == 1 or comm[0][1] != comm[1][1] else sub[0]
We also need if len(comm) == 1 in case all the elements are the same or we will get an IndexError.
If you are talking about taking the element that comes from the earlier list in the event of a tie i.e l2 comes before l5 then that is just the same as taking any of the elements that tie.
For a decent number of sublists:
In [61]: lis = [[randint(1,10000) for _ in range(10)] for _ in range(100000)]
In [62]: list(most_cm(lis))
Out[62]: [5856, 9104, 1245, 4304, 829, 8214, 9496, 9182, 8233, 7482]
In [63]: timeit list(most_cm(lis))
1 loops, best of 3: 249 ms per loop
Solution is:
a = [1, 2, 3]
b = [1, 4, 4]
c = [0, 2, 4]
print [max(set(element), key=element.count) for element in zip(a, b, c)]
That's what you're looking for:
from collections import Counter
from operator import itemgetter
l0 = [max(Counter(li).items(), key=itemgetter(1))[0] for li in zip(*l)]
If you are OK taking any one of a set of elements that are tied as most common, and you can guarantee that you won't hit an empty list within your list of lists, then here is a way using Counter (so, from collections import Counter):
l = [ [1, 0, 2, 3, 4, 7, 8],
[2, 0, 2, 1, 0, 7, 1],
[2, 0, 1, 4, 0, 1, 8]]
res = []
for k in range(len(l[0])):
res.append(Counter(lst[k] for lst in l).most_common()[0][0])
Doing this in IPython and printing the result:
In [86]: res
Out[86]: [2, 0, 2, 1, 0, 7, 8]
Try this:
l1 = [1,2,3]
l2 = [1,4,4]
l3 = [0,2,4]
lists = [l1, l2, l3]
print [max(set(x), key=x.count) for x in zip(*lists)]

how to make a function which tells if two lists are equivalent in python

This question will be really annoying due to the fact it is for a class and we have a lot of restrictions on our code.
The objective is to make a function to see if two lists (random ordered) have the same elements in them. So, if a=[2,5,4] and b=[4,2,5] a==b would be true. Now the restrictions are that we cannot use any built-in functions except len(). So I can't use anything like set() or the like. I also am not allowed to edit the lists, so I could not check to see if items are in both lists then delete them as I go if they are in both lists until it is empty.
With all these restrictions, I'm running out of ideas. Please help.
Is recursivity allowed ? That way, you don't have to modify existing lists in place. Obviously, not very efficient, but given your requirements this shouldn't really be an issue here...
def are_items_equal(a, b):
# First the guard clause: if the first list is empty,
# return True if the second list is empty too, False otherwise
if not a:
return not b
# There is now at least 1 item in list a
# Perform a linear search in the b list to find a[0]
# (could have used a "for" loop, but I assumed here this was
# forbidden too)
ib = 0;
while ib < len(b):
if a[0] == b[ib]:
# At this point, we know that at index `ib` in list `b`
# there is the same item as in `a[0]`
# Both list match if the "rest" of those two lists match too
# Check that by performing a recursive call to are_items_equal
# (discarding the pair matched at this step)
return are_items_equal(a[1:], b[:ib]+b[ib+1:])
ib += 1
# You only reach this point when `a[0]` was not found
# in the `b` list.
return False
Testing:
test_case = (
([2,5,4], [4,2,5]),
([2, 2, 5, 4], [4, 5, 2, 5]),
([2,2,5,4], [4,2,5]),
([2,2,5,4],[4,2,5,2]),
)
for a,b in test_case:
print(are_items_equal(a, b), a, b)
Producing:
True [2, 5, 4] [4, 2, 5]
False [2, 2, 5, 4] [4, 5, 2, 5]
False [2, 2, 5, 4] [4, 2, 5]
True [2, 2, 5, 4] [4, 2, 5, 2]
Obviously the best solution is to use set(), but you are restricted. Here is one way to do it without any "built-in functions":
def equal_lists(l1, l2):
for i in l1:
if not i in l2:
return False
for i in l2:
if not i in l1:
return False
return True
EDIT
If you want to "account for two lists with all the same elements but different numbers of each":
def occur(it):
d = {}
for i in it:
try:
d[i] += 1
except KeyError:
d[i] = 1
return d
occur(a) == occur(b)

How to find elements existing in two lists but with different indexes

I have two lists of the same length which contains a variety of different elements. I'm trying to compare them to find the number of elements which exist in both lists, but have different indexes.
Here are some example inputs/outputs to demonstrate what I mean:
>>> compare([1, 2, 3, 4], [4, 3, 2, 1])
4
>>> compare([1, 2, 3], [1, 2, 3])
0
# Each item in the first list has the same index in the other
>>> compare([1, 2, 4, 4], [1, 4, 4, 2])
2
# The 3rd '4' in both lists don't count, since they have the same indexes
>>> compare([1, 2, 3, 3], [5, 3, 5, 5])
1
# Duplicates don't count
The lists are always the same size.
This is the algorithm I have so far:
def compare(list1, list2):
# Eliminate any direct matches
list1 = [a for (a, b) in zip(list1, list2) if a != b]
list2 = [b for (a, b) in zip(list1, list2) if a != b]
out = 0
for possible in list1:
if possible in list2:
index = list2.index(possible)
del list2[index]
out += 1
return out
Is there a more concise and eloquent way to do the same thing?
This python function does hold for the examples you provided:
def compare(list1, list2):
D = {e:i for i, e in enumerate(list1)}
return len(set(e for i, e in enumerate(list2) if D.get(e) not in (None, i)))
since duplicates don't count, you can use sets to find only the elements in each list. A set only holds unique elements. Then select only the elements shared between both using list.index
def compare(l1, l2):
s1, s2 = set(l1), set(l2)
shared = s1 & s2 # intersection, only the elements in both
return len([e for e in shared if l1.index(e) != l2.index(e)])
You can actually bring this down to a one-liner if you want
def compare(l1, l2):
return len([e for e in set(l1) & set(l2) if l1.index(e) != l2.index(e)])
Alternative:
Functionally you can use the reduce builtin (in python3, you have to do from functools import reduce first). This avoids construction of the list which saves excess memory usage. It uses a lambda function to do the work.
def compare(l1, l2):
return reduce(lambda acc, e: acc + int(l1.index(e) != l2.index(e)),
set(l1) & set(l2), 0)
A brief explanation:
reduce is a functional programming contruct that reduces an iterable to a single item traditionally. Here we use reduce to reduce the set intersection to a single value.
lambda functions are anonymous functions. Saying lambda x, y: x + 1 is like saying def func(x, y): return x + y except that the function has no name. reduce takes a function as its first argument. The first argument a the lambda receives when used with reduce is the result of the previous function, the accumulator.
set(l1) & set(l2) is a set consisting of unique elements that are in both l1 and l2. It is iterated over, and each element is taken out one at a time and used as the second argument to the lambda function.
0 is the initial value for the accumulator. We use this since we assume there are 0 shared elements with different indices to start.
I dont claim it is the simplest answer, but it is a one-liner.
import numpy as np
import itertools
l1 = [1, 2, 3, 4]
l2 = [1, 3, 2, 4]
print len(np.unique(list(itertools.chain.from_iterable([[a,b] for a,b in zip(l1,l2) if a!= b]))))
I explain:
[[a,b] for a,b in zip(l1,l2) if a!= b]
is the list of couples from zip(l1,l2) with different items. Number of elements in this list is number of positions where items at same position differ between the two lists.
Then, list(itertools.chain.from_iterable() is for merging component lists of a list. For instance :
>>> list(itertools.chain.from_iterable([[3,2,5],[5,6],[7,5,3,1]]))
[3, 2, 5, 5, 6, 7, 5, 3, 1]
Then, discard duplicates with np.unique(), and take len().

Categories