Removing elements from lists to speed up list comparison [duplicate] - python

This question already has answers here:
Python list intersection efficiency: generator or filter()?
(4 answers)
Closed 12 months ago.
Let's assume we have two lists containing unique values and want to find the values that are in both lists using list comprehension.
a = [1,3,5]
b = [3,5,7]
c = [x for x in a if x in b]
print(c)
[3,5]
Simple enough. Now, what if each list had 1 million elements that we wanted to compare? List comprehension would continue to compare every element in list a to every element in list b, even after it has found '5' (from the example above) in both lists. Is that correct?
Would removing an element from the lists when it is found in both lists be more efficient to shorten the comparison time as it loops? Or is there something else I've probably missed?
for x in a:
if x in b:
c.append(x)
a.remove(x)
b.remove(x)
print(a)
[1]
print(b)
[7]
print(c)
[3,5]

Removing x from your list a in
for x in a:
if x in b:
c.append(x)
a.remove(x)
b.remove(x)
would add extra time complexity for the removing of the item. Whenever you call the remove it's of O(n) complexity with n being the number of items in the list. You could write a second for loop which makes it a bit faster, because you could "break" whenever you find the element. However, I think the biggest performance gainer is using a set, because of the O(1) lookup time. You can read about a set here:
https://www.w3schools.com/python/python_sets.asp
https://stackoverflow.com/questions/7351459/time-complexity-of-python-set-operations#:~:text=According%20to%20Python%20wiki%3A%20Time,collisions%20and%20O(n). I wrote a little code snippet for you where you can test and also see the performance difference:
I import the performance counter from the time library to measure the time.
from time import perf_counter
I generate two lists with unique elements:
a = [x for x in range(10000)]
b = [x * 2 for x in range(10000)]
I measure the time from your operation mentioned above:
start_list = perf_counter()
c = [x for x in a if x in b]
stop_list = perf_counter()
print(f"Calculating with list operations took {stop_list - start_list}s")
I measure the time via set operations:
start_set = perf_counter()
d = list(set(a) & set(b))
stop_set = perf_counter()
print(f"Calculating with set operations took {stop_set - start_set}s")
Just to make sure the two methods give the same result:
assert c == d
Output:
Calculating with list operations took 0.796774061s
Calculating with set operations took 0.0013706330000000655s

Related

Using Python, Is there a more elegant way to find the second largest number in a list? [duplicate]

This question already has answers here:
Get the second largest number in a list in linear time
(31 answers)
Closed 2 years ago.
I had this question come up in a job interview, and I was wondering if there were different ways to solve this. Preferably using Python 3.
Given a list of [20,40,20,60,80] find the second highest number.
The idea is to remove the duplicates. In one solution I've iterated over the list, and added any unique values to a list of uniques. Another way I did it was to convert the list to a set and back to a list, and then grab the second number.
So here's the question. Is there a better way to do this using Python 3?
Here's my code for solving in two different ways.
def second_item_method_1():
my_list = [20,40,20,60,80]
my_set = set(my_list)
my_list = list(my_set)
my_list.sort()
print(my_list[1])
def second_item_method_2():
my_list = [20,40,20,60,80]
unique_list = []
for x in my_list:
if x not in unique_list:
unique_list.append(x)
print(my_list[1])
second_item_method_1()
second_item_method_2()
Any other possible solutions?
You can iterate over the list twice in first iteration you can find the maximum element and in second iteration find the maximum element which is smaller than the first element.
def third_item_method():
list1 = [20, 40, 20, 60, 80]
mx=max(list1[0],list1[1])
secondmax=min(list1[0],list1[1])
n =len(list1)
for i in range(2,n):
if list1[i]>mx:
secondmax=mx
mx=list1[i]
elif list1[i]>secondmax and mx != list1[i]:
secondmax=list1[i]
else:
if secondmax == mx:
secondmax = list1[i]
print("Second highest number is : ",str(secondmax))
third_item_method()
source: https://www.geeksforgeeks.org/python-program-to-find-second-largest-number-in-a-list/
Simplest thing I could come up with constraining myself to a single pass through the numbers.
Finds the two highest values. Only returns duplicates if no other alternative value in the list.
>>> def two_highest(li):
... a, b = li[:2]
... for i in li[1:]:
... if i > a:
... a, b = i, a
... elif i > b and i != a or a == b:
... b = i
... return (a, b)

python intersection of lists while not having the same index

I have a curious case, and after some time I have not come up with an adequate solution.
Say you have two lists and you need to find items that have the same index.
x = [1,4,5,7,8]
y = [1,3,8,7,9]
I am able to get a correct intersection of those which appear in both lists with the same index by using the following:
matches = [i for i, (a,b) in enumerate(zip(x,y)) if a==b)
This would return:
[0,3]
I am able to get a a simple intersection of both lists with the following (and in many other ways, this is just an example)
intersected = set(x) & set(y)
This would return this list:
[1,8,7,9]
Here's the question. I'm wondering for some ideas for a way of getting a list of items (as in the second list) which do not include those matches above but are not in the same position on the list.
In other words, I'm looking items in x that do not share the same index in the y
The desired result would be the index position of "8" in y, or [2]
Thanks in advance
You're so close: iterate through y; look for a value that is in x, but not at the same position:
offset = [i for i, a in enumerate(y) if a in x and a != x[i] ]
Result:
[2]
Including the suggested upgrade from pault, with respect to Martijn's comment ... the pre-processing reduces the complexity, in case of large lists:
>>> both = set(x) & set(y)
>>> offset = [i for i, a in enumerate(y) if a in both and a != x[i] ]
As PaulT pointed out, this is still quite readable at OP's posted level.
I'd create a dictionary of indices for the first list, then use that to test if the second value is a) in that dictionary, and b) the current index is not present:
def non_matching_indices(x, y):
x_indices = {}
for i, v in enumerate(x):
x_indices.setdefault(v, set()).add(i)
return [i for i, v in enumerate(y) if i not in x_indices.get(v, {i})]
The above takes O(len(x) + len(y)) time; a single full scan through the one list, then another full scan through the other, where each test to include i is done in constant time.
You really don't want to use a value in x containment test here, because that requires a scan (a loop) over x to see if that value is really in the list or not. That takes O(len(x)) time, and you do that for each value in y, which means that the fucntion takes O(len(x) * len(y)) time.
You can see the speed differences when you run a time trial with a larger list filled with random data:
>>> import random, timeit
>>> def using_in_x(x, y):
... return [i for i, a in enumerate(y) if a in x and a != x[i]]
...
>>> x = random.sample(range(10**6), 1000)
>>> y = random.sample(range(10**6), 1000)
>>> for f in (using_in_x, non_matching_indices):
... timer = timeit.Timer("f(x, y)", f"from __main__ import f, x, y")
... count, total = timer.autorange()
... print(f"{f.__name__:>20}: {total / count * 1000:6.3f}ms")
...
using_in_x: 10.468ms
non_matching_indices: 0.630ms
So with two lists of 1000 numbers each, if you use value in x testing, you easily take 15 times as much time to complete the task.
x = [1,4,5,7,8]
y = [1,3,8,7,9]
result=[]
for e in x:
if e in y and x.index(e) != y.index(e):
result.append((x.index(e),y.index(e),e))
print result #gives tuple with x_position,y_position,value
This version goes item by item through the first list and checks whether the item is also in the second list. If it is, it compares the indices for the found item in both lists and if they are different then it stores both indices and the item value as a tuple with three values in the result list.

Python: Use a Loop to Manipulate Arrays [duplicate]

This question already has answers here:
Extract first item of each sublist
(8 answers)
Closed 4 years ago.
I have to perform the same operation on a number of arrays. Is there a way to use a loop in Python to carry out this repetitive task?
For example, I have 5 arrays: A, B, C, D, and E.
I want to carry out the following:
A = A[0]
B = B[0]
C = C[0]
D = D[0]
E = E[0]
Is there any way to do this using a loop or some other technique to avoid typing almost the same thing multiple times?
My question has been marked as a duplicate of this question. There, the person is asking how to extract the first element from each list (or in my case array). I am not simply trying to extract the first element. I actually want to replace each array with it's first element -- literally A = A[0].
Some are saying this is not a good idea, but this is actually what I want to do. To give some context, I have code that leaves me with a number of 2D arrays, with shapes n x m. When n = 1, the first dimension is irrelevant, and I would like to dispense with that dimension, which is what A = A[0] does. My code needs to handle both cases where n = 1 and cases when n > 1.
In other words, when n = 1, my code results in an array A that is of the following form: A = array([[a]]), and I want to change it so that A = array([a]). And to reiterate, I need the flexibility of allowing n > 1, in which case A = array([[a1],[a2],...]). In that case, my code will not execute A = A[0].
The solution by Mark White below works, if you change the last line to:
A,B,C,D,E = [x[0] for x in [A,B,C,D,E]]
What's interesting is that this solution makes the code more compact, but actually involves as many characters as (an technically more than) the brute force approach.
I do think it is a pretty easy problem ,in my case ,I use three array ,but I think you can do five , my dear friend!
A = [1,3,4]
B = [2,4,5]
C = [54,5,6]
A,B,C = [x[0] for x in [A,B,C]]
Simply create a list of lists (or, specifically, references to A, B, C, etc.)
l = [A, B, C, D, E]
Then, you can iterate over l however you choose.
for i in range(len(l)):
l[i] = l[i][0]
Note that this will update l[i] rather than the lists (A, B, C, ...) themselves.

Adding a list to itself after addition to each value

I have a list of integers that follows a particular pattern, it's complex but for example say:
x = [0,2,4,6,8]
I'd like to extend the list with 9 more copies of itself, but add a constant value that linearly scales each time. E.g. if
constant = 10
loop = 9
Then the 2nd extension would result in:
x_new = [0,2,4,6,8,10,12,14,16,18]
So I think I want a loop that iterates through x and extends the array by x[i]+constant, loop number of times?
for i in range(loop):
for j in range(len(x)):
x_new = x.extend((x[j]+constant)*i)
Or perhaps this can be easily done through list comprehension? My actual list is ~3000 long and I'll be doing it a few times with different values of loop and constant.
Yes, list comprehension should work:
x_new = [ e + constant * i for i in range(loop+1) for e in x ]
I just did some work on above question, this code can be useful for above question.
x=[0,2,4,6,8]
y=x[4]
i=0
j=0
while(i<9):
z=range(y+2,y+12,2)
x.extend(z)
print x
y=y+10
i=i+1

Subtracting one list from another in Python [duplicate]

This question already has answers here:
Remove all the elements that occur in one list from another
(13 answers)
Closed 5 years ago.
What I want to happen: When given two lists (list a and list b), remove the numbers in list a that are in list b.
What currently happens: My first function works only if list a has only one number to be removed.
What I've tried: Turning the lists into sets, then subtracting a - b
def array_diff(a, b):
c = list(set(a) - set(b))
return c
Also tried: Turning the list into sets, looking for n in a and m in b, then if n = m to remove n.
def array_diff(a, b):
list(set(a))
list(set(b))
for n in (a):
for m in (b):
if n == m:
n.remove()
return a
Possibly thought about: Using the "not in" function to determine if something is in b or not.
Sample Input/Output:
INPUT: array_diff([1,2], [1]) OUTPUT: [2]
INPUT: array_diff([1,2,2], [1]) OUTPUT: [2] (This should come out to be [2,2]
Just use it like that :
c = [x for x in a if x not in b]

Categories