python intersection of lists while not having the same index - python

I have a curious case, and after some time I have not come up with an adequate solution.
Say you have two lists and you need to find items that have the same index.
x = [1,4,5,7,8]
y = [1,3,8,7,9]
I am able to get a correct intersection of those which appear in both lists with the same index by using the following:
matches = [i for i, (a,b) in enumerate(zip(x,y)) if a==b)
This would return:
[0,3]
I am able to get a a simple intersection of both lists with the following (and in many other ways, this is just an example)
intersected = set(x) & set(y)
This would return this list:
[1,8,7,9]
Here's the question. I'm wondering for some ideas for a way of getting a list of items (as in the second list) which do not include those matches above but are not in the same position on the list.
In other words, I'm looking items in x that do not share the same index in the y
The desired result would be the index position of "8" in y, or [2]
Thanks in advance

You're so close: iterate through y; look for a value that is in x, but not at the same position:
offset = [i for i, a in enumerate(y) if a in x and a != x[i] ]
Result:
[2]
Including the suggested upgrade from pault, with respect to Martijn's comment ... the pre-processing reduces the complexity, in case of large lists:
>>> both = set(x) & set(y)
>>> offset = [i for i, a in enumerate(y) if a in both and a != x[i] ]
As PaulT pointed out, this is still quite readable at OP's posted level.

I'd create a dictionary of indices for the first list, then use that to test if the second value is a) in that dictionary, and b) the current index is not present:
def non_matching_indices(x, y):
x_indices = {}
for i, v in enumerate(x):
x_indices.setdefault(v, set()).add(i)
return [i for i, v in enumerate(y) if i not in x_indices.get(v, {i})]
The above takes O(len(x) + len(y)) time; a single full scan through the one list, then another full scan through the other, where each test to include i is done in constant time.
You really don't want to use a value in x containment test here, because that requires a scan (a loop) over x to see if that value is really in the list or not. That takes O(len(x)) time, and you do that for each value in y, which means that the fucntion takes O(len(x) * len(y)) time.
You can see the speed differences when you run a time trial with a larger list filled with random data:
>>> import random, timeit
>>> def using_in_x(x, y):
... return [i for i, a in enumerate(y) if a in x and a != x[i]]
...
>>> x = random.sample(range(10**6), 1000)
>>> y = random.sample(range(10**6), 1000)
>>> for f in (using_in_x, non_matching_indices):
... timer = timeit.Timer("f(x, y)", f"from __main__ import f, x, y")
... count, total = timer.autorange()
... print(f"{f.__name__:>20}: {total / count * 1000:6.3f}ms")
...
using_in_x: 10.468ms
non_matching_indices: 0.630ms
So with two lists of 1000 numbers each, if you use value in x testing, you easily take 15 times as much time to complete the task.

x = [1,4,5,7,8]
y = [1,3,8,7,9]
result=[]
for e in x:
if e in y and x.index(e) != y.index(e):
result.append((x.index(e),y.index(e),e))
print result #gives tuple with x_position,y_position,value
This version goes item by item through the first list and checks whether the item is also in the second list. If it is, it compares the indices for the found item in both lists and if they are different then it stores both indices and the item value as a tuple with three values in the result list.

Related

If the input number is in the list add its index to a new one

I want to check if the input number is in the list, and if so - add its index in the original list to the new one. If it's not in the list - I want to add a -1.
I tried using the for loop and adding it like that, but it is kind of bad on the speed of the program.
n = int(input())
k = [int(x) for x in input().split()]
z = []
m = int(input())
for i in range(m):
a = int(input())
if a in k: z.append(k.index(a))
else: z.append(-1)
The input should look like this :
3
2 1 3
1
8
3
And the output should be :
1
-1
2
How can I do what I'm trying to do more efficiently/quickly
There are many approaches to this problem. This is typical when you're first starting in programming as, the simpler the problem, the more options you have. Choosing which option depends what you have and what you want.
In this case we're expecting user input of this form:
3
2 1 3
1
8
3
One approach is to generate a dict to use for lookups instead of using list operations. Lookup in dict will give you better performance overall. You can use enumerate to give me both the index i and the value x from the list from user input. Then use int(x) as the key and associate it to the index.
The key should always be the data you have, and the value should always be the data you want. (We have a value, we want the index)
n = int(input())
k = {}
for i, x in enumerate(input().split()):
k[int(x)] = i
z = []
for i in range(n):
a = int(input())
if a in k:
z.append(k[a])
else:
z.append(-1)
print(z)
k looks like:
{2: 0, 1: 1, 3: 2}
This way you can call k[3] and it will give you 2 in O(1) or constant time.
(See. Python: List vs Dict for look up table)
There is a structure known as defaultdict which allows you to specify behaviour when a key is not present in the dictionary. This is particularly helpful in this case, as we can just request from the defaultdict and it will return the desired value either way.
from collections import defaultdict
n = int(input())
k = defaultdict(lambda: -1)
for i, x in enumerate(input().split()):
k[int(x)] = i
z = []
for i in range(n):
a = int(input())
z.append(k[a])
print(z)
While this does not speed up your program, it does make your second for loop easier to read. It also makes it easier to move into the comprehension in the next section.
(See. How does collections.defaultdict work?
With these things in place, we can use, yes, list comprehension, to very minimally speed up the construction of z and k. (See. Are list-comprehensions and functional functions faster than “for loops”?
from collections import defaultdict
n = int(input())
k = defaultdict(lambda: -1)
for i, x in enumerate(input().split()):
k[int(x)] = i
z = [k[int(input())] for i in range(n)]
print(z)
All code snippets print z as a list:
[1, -1, 2]
See Printing list elements on separated lines in Python if you'd like different print outs.
Note: The index function will find the index of the first occurrence of the value in a list. Because of the way the dict is built, the index of the last occurrence will be stored in k. If you need to mimic index exactly you should ensure that a later index does not overwrite a previous one.
for i, x in enumerate(input().split()):
x = int(x)
if x not in k:
k[x] = i
Adapt this solution for your problem.
def test(list1,value):
try:
return list1.index(value)
except ValueError as e:
print(e)
return -1
list1=[2, 1, 3]
in1 = [1,8,3]
res= [test(list1,i) for i in in1]
print(res)
output
8 is not in list
[1, -1, 2]

Using list comprehension to calculate the sum of all list elements

Don't know if I'm just being stupid right now but I'm trying to convert a list of int to one int. The problem is that I am trying to do it with just a list comprehension but I'm failing every time
class MathStuff():
def add_stuff(self, *stuff):
items = 0
numbers = (i for i in stuff)
items += [i for i in e]
#trying to do "for i in (i for i in stuff)" but assign it to a variable
I've tried multiple ways to do this without a "for loop" but I'm hitting a brick wall with my google searching.
If you have a list of numbers, l, and you don't want to use sum. I suppose you could do the usual:
l = range(1, 100)
s = 0
for i in l:
s += i
Or a more functional approach.
from operator import add
from functools import reduce
l = range(1, 100)
reduce(add, l)
I don't see how comprehensions could help you solve this however.
If you really want to use list comprehension, you can make a new list with an equal number of list entries as your input data. Then, flatten the list, and finally, use its length as the sum. You have to do this for positive and negative values separately, though:
long_pos = [[i for i in range(l)] for l in stuff if l > 0]
long_neg = [[i for i in range(abs(l))] for l in stuff if l < 0]
flat_pos = [i for sub in long_pos for i in sub]
flat_neg = [i for sub in long_neg for i in sub]
items = len(flat_pos) - len(flat_neg)

Most efficient way to remove entries in a list

I have a massive 4D data set, spread throughout 4 variables, x_list, y_list, z_list, and i_list. Each is a list of N scalars, with X, Y, and Z representing the point's position in space, and I representing intensity.
I already have a function that picks through and marks negligible points (those whose intensity is too low) for deletion, by setting their intensity to 0. However, when I run this on my 2-million point set, the deletion process takes hours.
Currently, I am using the .pop(index) command to remove the data points, because it does so very cleanly. Here is the code:
counter = 0
i = 0
for entry in i_list
if (i_list[i] == 0):
x_list.pop(i)
y_list.pop(i)
z_list.pop(i)
i_list.pop(i)
counter += 1
print (counter, "points removed")
else
i += 1
How can I do this more efficiently?
I think it'll be faster to create new empty lists for each existing list, and append items to them if i_list[i] != 0. Look up the time complexity of the operations you're doing, and you'll see that deleting items is O(n), whereas appending is O(1). Currently you're doing a lot of O(n) deletes with a pretty large n, that will be very slow.
So something like:
new_x = []
new_y = []
new_y = []
new_i = []
for index in range(len(i_list)):
if i_list[index] != 0:
new_x.append(x_list[index])
new_y.append(y_list[index])
# Etc.
Going further, you should look into numpy arrays, where subsetting to find the set of items where i_list != 0 would be very fast.
You should use del:
array = [1, 2, 3]
del array[0]
gives: [2, 3]
And most important, using print() while looping through large file is suicide. Most of the time is consumed by printing. Here's example:
>>> from time import time
>>> def test1(n):
... for i in range(n):
... print(i)
...
>>> def test2(n):
... for i in range(n):
... i += 1
...
>>> def wraper():
... t1 = time()
... test1(1000)
... t2 = time()
... test2(1000)
... t3 = time()
... print("Test1: %s\ntest2: %s: " % (t2-t1, t3-t2))
And output is:
(lots of numbers)
Test1: 0.46030712127685547
test2: 0.0:
This is a job for the happy list comprehension!
x_prime_list = [x for (index, x) in enumerate(x_list)
if i_list[index] != 0]
Which pairs up members of x_list with their ordinal address using enumerate(). It puts all the members x in a new list, if and only if i_list[index] is not zero (otherwise it adds nothing to the list.
The advantage that list comprehensions have over the equivalent code you posted is that the looping and appending is handled in C rather than needing Python to do these tasks.

Python: Fastest Way to compare arrays elementwise

I am looking for the fastest way to output the index of the first difference of two arrays in Python. For example, let's take the following two arrays:
test1 = [1, 3, 5, 8]
test2 = [1]
test3 = [1, 3]
Comparing test1 and test2, I would like to output 1, while the comparison of test1 and test3 should output 2.
In other words I look for an equivalent to the statement:
import numpy as np
np.where(np.where(test1 == test2, test1, 0) == '0')[0][0]
with varying array lengths.
Any help is appreciated.
For lists this works:
from itertools import zip_longest
def find_first_diff(list1, list2):
for index, (x, y) in enumerate(zip_longest(list1, list2,
fillvalue=object())):
if x != y:
return index
zip_longest pads the shorter list with None or with a provided fill value. The standard zip does not work if the difference is caused by different list lengths rather than actual different values in the lists.
On Python 2 use izip_longest.
Updated: Created unique fill value to avoid potential problems with None as list value. object() is unique:
>>> o1 = object()
>>> o2 = object()
>>> o1 == o2
False
This pure Python approach might be faster than a NumPy solution. This depends on the actual data and other circumstances.
Converting a list into a NumPy array also takes time. This might actually
take longer than finding the index with the function above. If you are not
going to use the NumPy array for other calculations, the conversion
might cause considerable overhead.
NumPy always searches the full array. If the difference comes early,
you do a lot more work than you need to.
NumPy creates a bunch of intermediate arrays. This costs memory and time.
NumPy needs to construct intermediate arrays with the maximum length.
Comparing many small with very large arrays is unfavorable here.
In general, in many cases NumPy is faster than a pure Python solution.
But each case is a bit different and there are situations where pure
Python is faster.
with numpy arrays (which will be faster for big arrays) then you could check the lengths of the lists then (also) check the overlapping parts something like the following (obviously slicing the longer to the length of the shorter):
import numpy as np
n = min(len(test1), len(test2))
x = np.where(test1[:n] != test2[:n])[0]
if len(x) > 0:
ans = x[0]
elif len(test1) != len(test2):
ans = n
else:
ans = None
EDIT - despite this being voted down I will leave my answer up here in case someone else needs to do something similar.
If the starting arrays are large and numpy then this is the fastest method. Also I had to modify Andy's code to get it to work. In the order: 1. my suggestion, 2. Paidric's (now removed but the most elegant), 3. Andy's accepted answer, 4. zip - non numpy, 5. vanilla python without zip as per #leekaiinthesky
0.1ms, 9.6ms, 0.6ms, 2.8ms, 2.3ms
if the conversion to ndarray is included in timeit then the non-numpy nop-zip method is fastest
7.1ms, 17.1ms, 7.7ms, 2.8ms, 2.3ms
and even more so if the difference between the two lists is at around index 1,000 rather than 10,000
7.1ms, 17.1ms, 7.7ms, 0.3ms, 0.2ms
import timeit
setup = """
import numpy as np
from itertools import zip_longest
list1 = [1 for i in range(10000)] + [4, 5, 7]
list2 = [1 for i in range(10000)] + [4, 4]
test1 = np.array(list1)
test2 = np.array(list2)
def find_first_diff(l1, l2):
for index, (x, y) in enumerate(zip_longest(l1, l2, fillvalue=object())):
if x != y:
return index
def findFirstDifference(list1, list2):
minLength = min(len(list1), len(list2))
for index in range(minLength):
if list1[index] != list2[index]:
return index
return minLength
"""
fn = ["""
n = min(len(test1), len(test2))
x = np.where(test1[:n] != test2[:n])[0]
if len(x) > 0:
ans = x[0]
elif len(test1) != len(test2):
ans = n
else:
ans = None""",
"""
x = np.where(np.in1d(list1, list2) == False)[0]
if len(x) > 0:
ans = x[0]
else:
ans = None""",
"""
x = test1
y = np.resize(test2, x.shape)
x = np.where(np.where(x == y, x, 0) == 0)[0]
if len(x) > 0:
ans = x[0]
else:
ans = None""",
"""
ans = find_first_diff(list1, list2)""",
"""
ans = findFirstDifference(list1, list2)"""]
for f in fn:
print(timeit.timeit(f, setup, number = 1000))
Here one way to do it:
from itertools import izip
def compare_lists(lista, listb):
"""
Compare two lists and return the first index where they differ. if
they are equal, return the list len
"""
for position, (a, b) in enumerate(zip(lista, listb)):
if a != b:
return position
return min([len(lista), len(listb)])
The algorithm is simple: zip (or in this case, a more efficient izip) the two lists, then compare them element by element.
The eumerate function gives the index position which we can return if a discrepancy found
If we exit the for loop without any returns, one of the two possibilities can happen:
The two lists are identical. In this case, we want to return the length of either lists.
Lists are of different length and they are equal up to the length of the shorter list. In this case, we want to return the length of the shorter list
In ether cases, the min(...) expression is what we want.
This function has a bug: if you compare two empty lists, it returns 0, which seems wrong. I'll leave it to you to fix it as an exercise.
The fastest algorithm would compare every element up to the first difference and no more. So iterating through the two lists pairwise like that would give you this:
def findFirstDifference(list1, list2):
minLength = min(len(list1), len(list2))
for index in xrange(minLength):
if list1[index] != list2[index]:
return index
return minLength # the two lists agree where they both have values, so return the next index
Which gives the output you want:
print findFirstDifference(test1, test3)
> 2
Thanks for all of your suggestions, I just found a much simpler way for my problem which is:
x = numpy.array(test1)
y = np.resize(numpy.array(test2), x.shape)
np.where(np.where(x == y, x, 0) == '0')[0][0]
Here's an admittedly not very pythonic, numpy-free stab:
b = zip (test1, test2)
c = 0
while b:
b = b[1:]
if not b or b[0][0] != b[0][1]:
break
else:
c = c + 1
print c
For Python 3.x:
def first_diff_index(ls1, ls2):
l = min(len(ls1), len(ls2))
return next((i for i in range(l) if ls1[i] != ls2[i]), l)
(for Python 2.7 onwards substitute range by xrange)

How to pick the largest number in a matrix of lists in python?

I have a list-of-list-of-lists, where the first two act as a "matrix", where I can access the third list as
list3 = m[x][y]
and the third list contains a mix of strings and numbers, but each list has the same size & structure. Let's call a specific entry in this list The Number of Interest. This number always has the same index in this list!
What's the fastest way to get the 'coordinates' (x,y) for the list that has the largest Number of Interest in Python?
Thank you!
(So really, I'm trying to pick the largest number in m[x][y][k] where k is fixed, for all x & y, and 'know' what its address is)
max((cell[k], x, y)
for (y, row) in enumerate(m)
for (x, cell) in enumerate(row))[1:]
Also, you can assign the result directly to a couple of variables:
(_, x, y) = max((cell[k], x, y)
for (y, row) in enumerate(m)
for (x, cell) in enumerate(row))
This is O(n2), btw.
import itertools
indexes = itertools.product( xrange(len(m)), xrange(len(m[0]))
print max(indexes, key = lambda x: m[x[0]][x[1]][k])
or using numpy
import numpy
data = numpy.array(m)
print numpy.argmax(m[:,:,k])
In you are interested in speeding up operations in python, you really need to look at numpy.
Assuming "The Number of Interest" is in a known spot in the list, and there will be a nonzero maximum,
maxCoords = [-1, -1]
maxNumOfInterest = -1
rowIndex = 0
for row in m:
colIndex = 0
for entry in row:
if entry[indexOfNum] > maxNumOfInterest:
maxNumOfInterest = entry[indexOfNum]
maxCoords = [rowIndex,colIndex]
colIndex += 1
rowIndex += 1
Is a naive method that will be O(n2) on the size of the matrix. Since you have to check every element, this is the fastest solution possible.
#Marcelo's method is more succulent, but perhaps less readable.

Categories