Python - efficient way to find first occurences of multiple values - python

I have a following problem: I need to find first occurences in an array for values greater than or equal than multiple other values.
Example:
array_1 = [-3,2,8,-1,0,5]
array_2 = [5,1]
Script has to find where in array_1 is the first value greater than or equal to each value from array_2 so the expected result in that case would be [3,2] for 1-based indices
A simple loop won't be any good for my case as both array have close to million values and it has to execute quickly preferably under a minute.
Simple loop solution that has a run time of about half an hour:
for j in range(0, len(array_2)):
for i in range(0, len(array_1)):
if array_1[i] >= array_2[j]:
solution[j] = i
break
Edit: indices clarification as #Sergio Tulentsev correctly pointed out

First perform some preprocessing on the data: create a new list that only has the values that are greater than all predecessors in the original data, and combine them in a tuple with the 1-based position where they were found.
So for instance, for the example data [-3,2,8,-1,0,5], this would be:
[(-3, 1), (2, 2), (8, 3)]
Note how the answer to any query can only be 1, 2 or 3, as the values at the other positions are all smaller than 8.
Then for each query use a binary search to find the tuple whose left value is at least the queried value, and return the right value of the found tuple (the position). For the binary search you can rely on the bisect library:
import bisect
def solve(data, queries):
# preprocessing
maxima = []
greatest = float("-inf")
for i, val in enumerate(data):
if val > greatest:
greatest = val
maxima.append((val, i+1))
# main
return [maxima[bisect.bisect_left(maxima, (query,))][1]
for query in queries]
Example use:
data = [-3,2,8,-1,0,5]
queries = [5,1]
print(solve(data, queries)) # [3, 2]

I suggest using a loop over the first array and using max(array_2) for the second one.

Related

Get the largest absolute difference in an array Python

I want to do a for loop that can basically do the absolute difference between every 2 elements of the array until it reaches all of them and then print the highest absolute difference.
arr = []
n = int(input("number of elements in array: "))
for i in range(0, n):
arr.append(input('insert element: '))
I've done this already but I would like to know how slow this method is compared to making the absolute difference between the
first and last element after sorting the array.
EXAMPLE
Input: {2, 7, 3, 4, 1, 9}
Output: 8 (|1 – 9|)
This is what I have tried:
arr = []
n = int(input("número de elementos do array : "))
for i in range(0, n):
arr.append(int(input('escreva os elementos: ')))
arr.sort()
print(arr[-1] - arr[0])
If you are fine with numpy, there's a way to do so.
Firstly, you need to find all the possible non-duplicate solutions from the given input using itertools.combinations
from itertools import combinations
alist = [2, 7, 3, 4, 1, 9]
all_comb = list(combinations(alist, 2))
[(2, 7), (2, 3), (2, 4), (2, 1), (2, 9), (7, 3), (7, 4), (7, 1), (7, 9), (3, 4), (3, 1), (3, 9), (4, 1), (4, 9), (1, 9)]
With this, you can use np.diff to find the differences for every tuple.
abs_diff = abs(np.diff(all_comb)).flatten()
array([5, 1, 2, 1, 7, 4, 3, 6, 2, 1, 2, 6, 3, 5, 8])
Finally, you can get the index of the maximum difference using np.argmax.
all_comb[abs_diff.argmax()]
Out[147]: (1, 9)
arr = []
n = int(input("número de elementos do array : "))
my_min, my_max = None, None
for i in range(0, n):
arr.append(int(input('escreva os elementos: ')))
if my_min is None or abs(arr[i]) < my_min:
my_min = arr[i]
if my_max is None or abs(arr[i]) > my_max:
my_max = arr[i]
print(f"{abs(my_max - my_min)} (|{my_max} - {my_min}|)")
You can achieve this just by "emembering" the number with highest and lowest abs value
As I understood about your query.
Sorting the element and then comparing first & last one is much faster than finding highest difference via iterating through list. This is because when sorting happens internally, as it moves forward it needs to compare with less values because if value is higher than last sorted value it directly appends next to it but if value is less then it just moves one value back rather than starting over again.
But comparing through all possible pairs in list takes much more time as it has to start from first value over again since we don't know which comparing will be highest.
So sorting is much faster to find largest difference than iterating for every possible pair with for loop in list.
I hope I got your query right :)
UPDATED
so the main question is about finding a way to find largest diff in a list with for loop and which should be faster so here it is.
In my opinion below code will be even faster than sorting and finding largest diff. Because here in this code we only need to iterate in list once and we will have answer of largest diff. No need to iterate every possible pair of value.
I think this may help :)
list_a = []
n = int(input("number of elements in array: "))
for i in range(0, n):
# store input after converting to integer.
list_a.append(int(input('insert element: ')))
'''to store largest difference of two current numbers in every eteration'''
largest_diff_so_far = 0;
'''list to store that two numbers we are comparing'''
actual_diff_number = None;
'''start from first number in list. we don't need to go through every possible pair so just picking first number without for loop.'''
first = list_a[0]
'''here we iterate through all number only once till last number in
list'''
for second in list_a :
'''first find diff of current two value'''
current_diff = second - first
'''as we can see when current_diff is larger then previous largest diff we will update their value'''
if largest_diff_so_far == 0 or current_diff > largest_diff_so_far:
'''if first value in list is largest than all then the current diff will be negative and in that case we will run below if code and continue the code so that it will not over -ride anything in remaining code'''
if current_diff < 0:
''' since the diff is negative we will store its absolute value in largest diff variable.'''
largest_diff_so_far = abs(current_diff)
''' since first value is largest then all means it is larger than current second also, so in actual_diff_number we will store values in reverse order, so that our largest value which is stored in first variable will be second in list and by this in later iteration we will avoid over-writing of this largest value'''
actual_diff_number = [second, first]
''' we will also update first variable's value to second variable's value since it smaller than previous value of first and by this next iteration will use this value for diff rather than initial value of first variable which was largest.'''
first = second
continue
'''if above condition is not the case than rest of the below code will run'''
'''largest diff will be current_diff'''
largest_diff_so_far = current_diff
'''storing actual number whose diff is largest till now.'''
actual_diff_number = [first, second]
'''below is main part for saving time. if in current process we find diff which is in minus means our second value is even less than first, in that case we no longer need to carry forward that first value so we will update first value to our current second value and will also update largest diff that is stored previously. since our first value is less than previous first value then our diff will also increase from previous diff.'''
elif current_diff < 0:
first = second
'''update largest diff with new first value'''
largest_diff_so_far = actual_diff_number[1] - first
'''update actual diff number's first value in that list'''
actual_diff_number[0] = first
'''finally print answer since after finishing for loop largest_diff_so_far and actual_diff_number contains the answer that we are finding.'''
print(actual_diff_number, largest_diff_so_far)

Not able to understand Python3 enumerate()

Question: Given an array of integers, return indices of the two numbers such that they add up to a specific target.
You may assume that each input would have exactly one solution, and you may not use the same element twice.
Example:
Given nums = [2, 7, 11, 15], target = 9,
Because nums[0] + nums[1] = 2 + 7 = 9,
return [0, 1].
class Solution:
def twoSum(self, nums, target):
lookup={}
for cnt, num in enumerate (nums):
if target-num in lookup:
return lookup[target-num], cnt
lookup[num]=cnt
I am not able to understand the steps after for loop is used.I am new on Python, someone please help me.
Let me help you understand by explaining what the code does and how it solves the problem.
We need to find two numbers that sum to 9, to achieve this, we can iterate over every number in the array and then look if we already encountered a number that equals the target number minus the number we are currently on. If we haven't encountered such a number yet, we store the current number and its corresponding index.
Because we need to return the indices, we want to be able to look for the number-target pairs and immediately get the index. The solution uses a dictionary to store a number (key) and return an index as (value).
We iterate over every number, if we already encountered target-number before, we can return the current index and the index of the target-number, if we haven't encountered that number, we simply store the current number and its index.
The enumerate part, simply provides an index along with the value of the array that is being iterated, in the form of (id, item).
class Solution:
def twoSum(self, nums, target):
# Here a dictionary is created, which will store value, index as key, value pairs.
lookup={}
# For every number in the array, get the index (cnt) and number (num)
for cnt, num in enumerate (nums):
# If we find target-num, we know that num + target-num = target
if target-num in lookup:
# Hence we return the index of the target-num we stored in the dict, and the index of the current value (cnt)
return lookup[target-num], cnt
# Otherwise we store the current number as key with its index as value
lookup[num]=cnt
enumerate() method adds a counter to an iterable and returns it in a form of enumerate object. This enumerate object can then be used directly in for loops or be converted into a list of tuples using list() method.
For e.g.
>>>list(enumerate("abc"))
Gives
[(0, 'a'), (1, 'b'), (2, 'c')]
For easy understanding, I'm commenting your program. Go through it, you'll surely understand.
class Solution:
def twoSum(self, nums, target):
# lookup is a dictionary that stores the number and its index
# e.g. '{7:1}'
# number 7 at index 1
lookup={}
# As explained above cnt and num will receive values one by one along with index.
for cnt, num in enumerate (nums):
# We look if the number required to be added into the 'num' is present in dictionary
if target-num in lookup:
# if value found in lookup then we return the current index along with the index of number found in lookup.
return lookup[target-num], cnt
# After every loop insert the current value and its index into the lookup dictionary.
lookup[num]=cnt
Hope, I answered your query in the way you wanted. Please comment below, if anything is left unanswered, I'll surely try to answer that as well.

Using a selection sort to sort an array in python. How can I optimize?

Working on this challenge on HackerRank and got this code to pass 10 out of 15 test cases. It is failing due to timeout error which is HackerRank's way of telling you that the algorithm is not optimized. How can I optimize this code to run on larger input data?
The goal is to figure out the minimum number of swaps necessary to sort an unsorted array.
Update: Each element in the array is distinct.
def minimum_swaps(arr):
"""Returns the minimum number of swaps to re-oder array in ascending order."""
swaps = 0
for val in range(len(arr) - 1, 0, -1):
# Index of max value
max_pos = 0
for index in range(1, val + 1):
if arr[index] > arr[max_pos]:
max_pos = index
# Skip if value is already in sorted position
if max_pos == val:
continue
arr[val], arr[max_pos] = arr[max_pos], arr[val]
swaps += 1
return swaps
Look at the code. It has 2 nested loops:
The outer loop iterates over the positions val.
The inner loop finds the index of the value that should be at the index val, i.e., max_pos.
It takes a lot of time just to find the index. Instead, I will compute the index of each value and store it in a dict.
index_of = {value: index for index, value in enumerate(arr)}
(note that because all values in arr are distinct, there should be no duplicated keys)
And also prepare a sorted version of the array: that way it's easier to find the maximum value instead of having to loop over the array.
sorted_arr = sorted(arr)
Then do the rest similar to the original code: for each index visited, use sorted_arr to get the max, use index_of to get its current index, if it's out-of-place then swap. Remember to update the index_of dict while swapping too.
The algorithm takes O(n) operations (including dict indexing/modifying), plus sorting cost of n elements (which is about O(n log n)).
Note: If the array arr only contains integers in a small range, it may be faster to make index_of an array instead of a dict.
The short answer is: implement merge sort. The bubble sort algorithm you are using has a O(n^2) running time, while merge sort has a O(log_2(n)) running time.

Removing points from list if distance between 2 points is below a certain threshold

I have a list of points and I want to keep the points of the list only if the distance between them is greater than a certain threshold. So, starting from the first point, if the the distance between the first point and the second is less than the threshold then I would remove the second point then compute the distance between the first one and the third one. If this distance is less than the threshold, compare the first and fourth point. Else move to the distance between the third and fourth and so on.
So for example, if the threshold is 2 and I have
list = [1, 2, 5, 6, 10]
then I would expect
new_list = [1, 5, 10]
Thank you!
Not a fancy one-liner, but you can just iterate the values in the list and append them to some new list if the current value is greater than the last value in the new list, using [-1]:
lst = range(10)
diff = 3
new = []
for n in lst:
if not new or abs(n - new[-1]) >= diff:
new.append(n)
Afterwards, new is [0, 3, 6, 9].
Concerning your comment "What if i had instead a list of coordinates (x,y)?": In this case you do exactly the same thing, except that instead of just comparing the numbers, you have to find the Euclidean distance between two points. So, assuming lst is a list of (x,y) pairs:
if not new or ((n[0]-new[-1][0])**2 + (n[1]-new[-1][1])**2)**.5 >= diff:
Alternatively, you can convert your (x,y) pairs into complex numbers. For those, basic operations such as addition, subtraction and absolute value are already defined, so you can just use the above code again.
lst = [complex(x,y) for x,y in lst]
new = []
for n in lst:
if not new or abs(n - new[-1]) >= diff: # same as in the first version
new.append(n)
print(new)
Now, new is a list of complex numbers representing the points: [0j, (3+3j), (6+6j), (9+9j)]
While the solution by tobias_k works, it is not the most efficient (in my opinion, but I may be overlooking something). It is based on list order and does not consider that the element which is close (within threshold) to the maximum number of other elements should be eliminated the last in the solution. The element that has the least number of such connections (or proximities) should be considered and checked first. The approach I suggest will likely allow retaining the maximum number of points that are outside the specified thresholds from other elements in the given list. This works very well for list of vectors and therefore x,y or x,y,z coordinates. If however you intend to use this solution with a list of scalars, you can simply include this line in the code orig_list=np.array(orig_list)[:,np.newaxis].tolist()
Please see the solution below:
import numpy as np
thresh = 2.0
orig_list=[[1,2], [5,6], ...]
nsamp = len(orig_list)
arr_matrix = np.array(orig_list)
distance_matrix = np.zeros([nsamp, nsamp], dtype=np.float)
for ii in range(nsamp):
distance_matrix[:, ii] = np.apply_along_axis(lambda x: np.linalg.norm(np.array(x)-np.array(arr_matrix[ii, :])),
1,
arr_matrix)
n_proxim = np.apply_along_axis(lambda x: np.count_nonzero(x < thresh),
0,
distance_matrix)
idx = np.argsort(n_proxim).tolist()
idx_out = list()
for ii in idx:
for jj in range(ii+1):
if ii not in idx_out:
if self.distance_matrix[ii, jj] < thresh:
if ii != jj:
idx_out.append(jj)
pop_idx = sorted(np.unique(idx_out).tolist(),
reverse=True)
for pop_id in pop_idx:
orig_list.pop(pop_id)
nsamp = len(orig_list)

Get random unique regions of a list using python

I have a list of numbers, say list=[100,102,108,307,365,421,433,487,511,537,584].
I want to get unique regions from this list for example region 1 from 102-307, region 2 from 421-487 and region 3 from 511-584. These regions should be non overlapping and unique.
I'll credit #TimPietzcker for pointing me in the direction of this answer, although I didn't use the function he offered (random.sample).
In this code, I choose six indices from those in list_ (renamed from list to avoid overwriting the built-in) without replacement, using np.random.choice. I then sort these indices and iterate over each pair of adjacent indices, taking as a region the values from the first index (i) to the second (j) in the pair, inclusive (hence the j + 1).
(If I had used j instead of j + 1, the indices would never be able to include all the values in list, due to the lack of replacement during the selection phase. For example, if one pair were (1, 3), the minimum value for the first index of the next pair would be 4, because 3 could not be chosen twice. Thus, the first pair would take the values at indices 1 and 2, and the value at 3 would be skipped.)
Since it's possible for j to be equal to len(list_) - 1, I've included a try/except section, which catches the IndexError that would be raised in this case and causes the region to include all values through the end of list_ -- equivalent to taking the values from i to j, inclusive, as for all other cases.
import numpy as np
list_ = [100,102,108,307,365,421,433,487,511,537,584]
n_regions = 3
indices = sorted(np.random.choice(range(len(list_)), size=n_regions * 2,
replace=False))
list_of_regions = []
for i, j in zip(indices[::2], indices[1::2]):
try:
list_of_regions.append(list_[i:j + 1])
except IndexError:
# j + 1 == len(list_), so leave it off.
list_of_regions.append(list_[i:])

Categories