Compare Sequences Python - python

Is there a way in python to compare 2 sequences in lists even if they are not normalized (i think this is the right word). For example:
a = [1,1,2,3,3,1,5]
b = [2,3,3,1,5,1,1]
c = [1,1,1,2,3,3,5]
a == b should return True as they contain the same sequence just from a different starting point.
c == a should return False as although they contain the same elements, they do not contain the same sequence
The only thing I can thing of is rather inelegant. I would compare 2 lists and if they are not equal, shift the last element of the list to the front and compare again. Repeat this until I have shifted the entire list once. However, I will be working with some very large lists so this will be very inefficient

This might be more efficient than shifting elements:
>>> a = [1, 1, 2, 3, 3, 1, 5]
>>> b = [2, 3, 3, 1, 5, 1, 1]
>>> c = [1, 1, 1, 2, 3, 3, 5]
>>> astr, bstr, cstr = ["".join(map(str, x)) for x in (a, b, c)]
>>> astr in bstr*2
True
>>> cstr in astr*2
False
What it does is basically join the lists to strings and check if the first string is contained in the other 'doubled'.
Using strings is probably the fastest and should work for simple cases like in the OP. As a more general approach, you can apply the same idea to list slices, e.g.:
>>> any(idx for idx in range(len(a)) if (b*2)[idx:idx+len(a)] == a)
True

Related

How to compare lists in python in subgroups

I'm new in python so any help or recomendation is appreciated.
What I'm trying to do is, having two lists (not necessarily inverted).
For instance:
l1 = [1,2,3,4,5]
l2 = [5,4,3,2,1]
Comparing them to return the common values, but not as anyone would normally do, which in this case, the return will be all the elements of the list, because they are the same, just inverted.
What I'm trying to compare is, the same thing but like in stages, or semi portions of the list, and check if there is any coincidence until there, if it is, return that element, if not, keep looking in the next group.
For instance:
the first iteration, would check (having the lists previously defined:
l1 = [1]
l2 = [5]
#is there any coincidence until there? -> false (keep looking)
2nd iteration:
l1 = [1, 2]
l2 = [5, 4]
#is there any coincidence until there? -> false (keep looking)
3rd iteration:
l1 = [1, 2, 3]
l2 = [5, 4, 3]
#is there any coincidence until there? -> true (returns 3,
#which is the element where the coincidence was found, not necessarily
#the same index in both lists)
Having in mind that it will compare the last element from the first list with all from the second till that point, which in this case will be just the first from the second list, if no matches, keep trying with the element immediately preceding the last from the first list with all from the second, and so on, returning the first item that matches.
Another example to clarify:
l1 = [1,2,3,4,5]
l2 = [3,4,5,6,7]
And the output will be 3
A tricky one:
l1 = [1,2,3,4]
l2 = [2,1,4,5]
1st iteration
l1 = [1]
l2 = [2]
# No output
2nd iteration
l1 = [1,2]
l2 = [2,1]
# Output will be 2
Since that element was found in the second list too, and the item that I'm checking first is the last of the first list [1,2], and looking if it is also in the sencond list till that point [2,1].
All of this for needing to implementate the bidirectional search, but I'm finding myself currently stuck in this step as I'm not so used to the for loops and list handling yet.
you can compare the elements of the two lists in the same loop:
l1 = [1,2,3,4,5]
l2 = [5,4,3,2,1]
for i, j in zip(l1, l2):
if i == j:
print('true')
else:
print('false')
It looks like you're really asking: What is (the index of) the first element that l1 and l2 have in common at the same index?
The solution:
next((i, a) for i, (a, b) in enumerate(zip(l1, l2)) if a == b)
How this works:
zip(l1, l2) pairs up elements from l1 and l2, generating tuples
enumerate() gets those tuples, and keeps track of the index, i.e. (0, (1, 5), (1, (2, 4)), etc.
for i, (a, b) in .. generates those pairs of indices and value tuples
The if a == b ensures that only those indices and values where the values match are yielded
next() gets the next element from an iterable, you're interested in the first element that matches the condition, so that's what next() gets you here.
The working example:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 4, 3, 2, 1]
i, v = next((i, a) for i, (a, b) in enumerate(zip(l1, l2)) if a == b)
print(f'index: {i}, value: {v}') # prints "index: 2, value: 3"
If you're not interested in the index, but just in the first value they have in common:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 4, 3, 2, 1]
v = next(a for a, b in zip(l1, l2) if a == b)
print(v) # prints "3"
Edit: you commented and updated the question, and it's clear you don't want the first match at the same index between the lists, but rather the first common element in the heads of the lists.
(or, possibly the first element from the second list that is in the first list, which user #AndrejKesely provided an answer for - which you accepted, although it doesn't appear to answer the problem as described)
Here's a solution that gets the first match from the first part of each list, which seems to match what you describe as the problem:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 2, 6, 7, 8]
v = next(next(iter(x)) for n in range(max(len(l1), len(l2))) if (x := set(l1[:n+1]) & set(l2[:n+1])))
print(v) # prints "2"
Note: the solution fails if there is no match at all, with a StopIteration. Using short-circuiting with any() that can be avoided:
x = None if not any((x := set(l1[:n+1]) & set(l2[:n+1])) for n in range(max(len(l1), len(l2)))) else next(iter(x))
print(x)
This solution has x == None if there is no match, and otherwise x will be the first match in the shortest heads of both lists, so:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 2, 6, 7, 8] # result 2
l1 = [1, 2, 3, 4, 5]
l2 = [5, 6, 7, 8] # result 5
l1 = [1, 2, 3, 4, 5]
l2 = [6, 7, 8] # result None
Note that also:
l1 = [1, 2, 3]
l2 = [4, 3, 2] # result 2, not 3
Both 2 and 3 seem to be valid answers here, it's not clear from your description why 3 should be favoured over 2?
If you do need that element of the two possible answers that comes first in l2, the solution would be a bit more complicated still, since the sets are unordered by definition, so changing the order of l1 and l2 in the answer won't matter.
If you care about that order, this works:
x = None if not any(x := ((set(l1[:n//2+1+n%2]) & set(l2[:n//2+1]))) for n in range(max(len(l1), len(l2)) * 2)) else next(iter(x))
This also works for lists with different lengths, unlike the more readable answer by user #BenGrossmann. Note that they have some efficiency in reusing the constructed sets and adding one element at a time, which also allows them to remember the last element added to the set corresponding with the first list, which is why they also correctly favor 3 over 2 in [[1, 2, 3], [4, 3, 2]].
If the last answer is what you need, you should consider amending their answer (for example using zip_longest) to deal correctly with lists of different lengths, since it will be more efficient for longer lists, and is certainly more readable.
Taking the solution from #BenGrossman, but generalising it for any number of lists, with any number of elements, and favouring the ordering you specified:
from itertools import zip_longest
lists = [[1, 2, 3, 4, 5],
[6, 7, 8, 5, 4]]
sets = [set() for _ in range(len(lists))]
for xs in zip_longest(*lists):
for x, s in zip(xs, sets):
s.add(x)
if i := set.intersection(*sets):
v = sorted([(lists[0].index(x), x) for x in i])[-1][1]
break
else:
v = None
print(v)
This works as described for all the examples, as well as for lists of unequal length, and will favour the elements that are farthest back in the first list (and thus earlier in the others).
The following can be made more efficient, but does work.
lists = [[1,2,3,4,5], # input to the script
[5,4,3,2,1]]
sets = [set(), set()]
for a,b in zip(*lists):
sets[0].add(a)
sets[1].add(b)
if sets[0]&sets[1]:
print("first element in first overlap:")
print(a)
break
else:
print("no overlap")
This results in the output
first element in first overlap:
3
Using lists = [[5,7,6],[7,5,4]] instead results in
first element in first overlap:
7

How to determine if sequence is within the list of ints?

I am having trouble when running a python program. The purpose is to determine whether the sequence 1, 2, 3 is within a given list of ints. When it runs two of the tests, the code works. However, the third test fails and I cannot figure out why.
My code is:
def has123(nums):
s = ''.join(str(i) for i in sorted(nums))
if '123' in s:
return True
else:
return False
When passed through the argument [1, 1, 2, 3, 1] and [1, 1, 2, 4, 1], it returns the correct output, but not for [1, 1, 2, 1, 2, 3].
Remove sorted() from s = ''.join(str(i) for i in sorted(nums))
sorted() sorts the list and puts in ascending order
So in your case [1, 1, 2, 1, 2, 3] is converted to '111223' when used sorted(), therefore not able to find the pattern
In case you do need the sequence to be contiguous and in order (which is how I understand the wording), you can:
iterate all the 1 in your list,
for each 1, check if it is the beginning of a 1,2,3
def has123(nums):
all_1_idx = (i for i,n in enumerate(nums) if n==1)
for i in all_1_idx:
if nums[i:i+3] == [1,2,3]:
return True
return False
if order of numbers matters, which does in this case, so don't sort array.
what you are doing, you are sorting array then making string of sorted array. So, position of array element changed, your pattern does not match in this case.
def has123(nums):
s = ''.join(str(i) for i in nums)
if '123' in s:
return True
else:
return False

Array Splitting in Python With Specific Input

If you are given two arrays as an input in the same line such as
[4,2,1,5,7],[4,1,2,3,5,7,1,2,7]
Is it possible to create separate arrays out of the above input?
arr1 = [4,2,1,5,7]
arr2 = [4,1,2,3,5,7,1,2,7]
I tried to use split(',') but since they are used in the actual arrays this does not work.
The length of the arrays can vary and the example above is just a sample.
Any help would be appreciated!
I would suggest "disguising" the input as a well-formed list by adding the outer brackets and then using literal_eval:
import ast
s = "[4,2,1,5,7],[4,1,2,3,5,7,1,2,7]"
parts = ast.literal_eval("[" + s + "]")
#[[4, 2, 1, 5, 7], [4, 1, 2, 3, 5, 7, 1, 2, 7]]
Or do not add anything and treat the input as a tuple of lists:
parts = ast.literal_eval(s)
#([4, 2, 1, 5, 7], [4, 1, 2, 3, 5, 7, 1, 2, 7])
This isn't the easy way, but if the goal is to learn to manipulate strings and lists, you can actually parse this the hard way as a stream of characters.
a = "[4,2,1,5,7],[45,1,2,3,5,7,100,2,7]"
l = []
current_n = ''
current_l = None
for c in a:
if c == '[':
current_l = []
elif c == ",":
if current_l is not None:
current_l.append(int(current_n))
current_n = ''
elif c.isdigit():
current_n += c
elif c == "]":
current_l.append(int(current_n))
l.append(current_l)
current_n = ''
current_l = None
l1, l2 = l
print(l1, l2)
# [4, 2, 1, 5, 7] [45, 1, 2, 3, 5, 7, 100, 2, 7]
Not something you would typically do, but a good exercise and it's simplicity should make is quite fast.
What you have there, once converted from a string using eval, is a 2-element tuple containing two lists. (The outer round parentheses are not mandatory in this situation.)
You could unpack it into two variables as follows:
str = '[4,2,1,5,7],[4,1,2,3,5,7,1,2,7]'
arr1, arr2 = eval(str)
Note: if the input string could derive from third-party input (for example in a server application) then eval should not be used for security reasons because it can allow for execution of arbitrary code, and ast.literal_eval should be used instead. (See separate answer by DYZ.) This will also return a 2-tuple of lists in the case of the input shown above, so the unpacking using var1, var2 = ... is unaffected.

python - Comparing two lists to see if one occurs in another consecutively

I've been trying to make a function that can take two lists of any size (say, list A and list B) and sees if list B occurs in list A, but consecutively and in the same order. If the above is true, it returns True, else it'll return False
e.g.
A:[9,0,**1,2,3,4,5,6,**7,8] and B:[1,2,3,4,5,6] is successful
A:[1,2,0,3,4,0,5,6,0] and B:[1,2,3,4,5,6] is unsuccessful.
A:[1,2,3,4,5,6] and B [6,5,3,2,1,4] fails because despite having the same
numbers, they aren't in the same order
I've tried doing this using nested loops so far and am a bit confused as to where to go
Just try this:
L1 = [9,0,1,2,3,4,5,6,7,8]
L2 = [1,2,3,4,5,6]
c = 0
w = 0
for a in range(len(L2)):
for b in range(w+1, len(L1)):
if L2[a] == L1[b]:
c = c+1
w = b
break
else:
c = 0
if c == len(L2):
print('yes')
break
Here you check if the element of l2 is in l1 and if so breaks the first loops remember where you left and of the next element of l2 is the same as the next element of l1 and so on.
And the last part is to check if this happened as much times as the length of l2. if so then you know that the statement is correct!
if your arrays are not huge and if you can find a way to map each element in your array to a string you can use:
list1 = [9,0,1,2,3,4,5,6,7,8]
list2 = [1,2,3,4,5,6]
if ''.join(str(e) for e in list2) in ''.join(str(e) for e in list1):
print 'true'
it just make two string from the lists and than use 'in' to find any accorence
Use any function
any(A[i:i+len(B)] == B for i in range(len(A) - len(B) + 1))
demo
i converted the entire list into a string and then found a substring of that string
the list when converted to a string it becomes
str(a)='[9,0,1,2,3,4,5,6,7,8]'
which when when we strip the string becomes
str(a).strip('[]')='9,0,1,2,3,4,5,6,7,8'
Now the problem just converted to
checking if there is a substring in the the string
so we can us the in operator to check the substring
The solution
a=[9,0,1,2,3,4,5,6,7,8]
b=[1,2,3,4,5,6]
print(str(b).strip('[]') in str(a).strip(']['))
testcase1
testcase2
Try this:
L1 = [9,2,1,2,0,4,5,6,7,8]
L2 = [1,2,3,4,5,6]
def sameorder(L1,L2):
for i in range(len(L1)-len(L2)+1):
if L1[i:len(L2)+i]==L2:
return True
return False
You can create sublists of a that can be analyzed:
def is_consecutive(a, b):
return any(all(c == d for c, d in zip(b, i)) for i in [a[e:e+len(b)] for e in range(len(a)-len(b))])
cases = [[[9, 0, 1, 2, 3, 4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6]], [[1, 2, 0, 3, 4, 0, 5, 6, 0], [1, 2, 3, 4, 5, 6]], [[1, 2, 3, 4, 5, 6], [6, 5, 3, 2, 1, 4]]]
final_cases = {"case_{}".format(i):is_consecutive(*a) for i, a in enumerate(cases, start=1)}
Output:
{'case_3': False, 'case_2': False, 'case_1': True}

list match in python: get indices of a sub-list in a larger list

For two lists,
a = [1, 2, 9, 3, 8, ...] (no duplicate values in a, but a is very big)
b = [1, 9, 1,...] (set(b) is a subset of set(a), 1<<len(b)<<len(a))
indices = get_indices_of_a(a, b)
how to let get_indices_of_a return indices = [0, 2, 0,...] with array(a)[indices] = b? Is there a faster method than using a.index, which is taking too long?
Making b a set is a fast method of matching lists and returning indices (see compare two lists in python and return indices of matched values ), but it will lose the index of the second 1 as well as the sequence of the indices in this case.
A fast method (when a is a large list) would be using a dict to map values in a to indices:
>>> index_dict = dict((value, idx) for idx,value in enumerate(a))
>>> [index_dict[x] for x in b]
[0, 2, 0]
This will take linear time in the average case, compared to using a.index which would take quadratic time.
Presuming we are working with smaller lists, this is as easy as:
>>> a = [1, 2, 9, 3, 8]
>>> b = [1, 9, 1]
>>> [a.index(item) for item in b]
[0, 2, 0]
On larger lists, this will become quite expensive.
(If there are duplicates, the first occurrence will always be the one referenced in the resulting list, if not set(b) <= set(a), you will get a ValueError).

Categories