Python: Calculate total number of comparisons to find a match - python

I have two arrays. One with 10 Indexes, One with 2 Indexes.
I want to check if the large array has the exact values of the small array.
There is a total of 9 comparisons that need to be made.
How do I calculate this value for arrays of different sizes?
I need this value to manipulate control flow.
largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [9, 10]
On the 9th Comparison it will be true.

The brute-force check will take up to len(largeArr) - len(smallArr) + 1 comparisons, each of size up to len(smallArr). It will take that many if it's not found. If found, it might take half of that on average, but that depends on the statistics of their entries. So this is O(n), where n = len(largeArr).
However, if largeArr is sorted as your example shows, it would be much more efficient to do a binary search for smallArr[0]. That would make checking be O(log(n)).
Another approach which would be much faster if you want to check many different smallArr against a given largeArr: generate a hash of each consecutive slice of length n = len(smallArr) taken from largeArr, and put those hashes in a set or dict. Then you can very quickly check if a particular smallArr is present by computing its hash and checking for membership in the pre-computed set or dict.
Here's an example of this latter approach:
largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [9, 10]
n = len(smallArr)
match = set()
for i in range(0, len(largeArr) - n + 1):
match.add(tuple(largeArr[i:i+n]))
print(tuple(smallArr) in match)
This uses tuples since they are immutable, unlike slices. Checking is now close to O(1), or at least as quick as a set can test membership (which will actually grow slowly with n depending on the implementation).

Here is another solution. The above solution is perfect, my solution just happens to run in constant space and linear time complexity.
That is;
Time: O(N)
Space: O(1)
from typing import List # for types annotation
# You can solve this in a linear fashion like this...
def mapable(universe: List[int], elements: List[int], from_indx: int) -> bool:
# tries to address worst case
last_mapping_indx: int = from_indx + (len(elements) - 1)
if last_mapping_indx >= len(universe) or not(elements[-1] == universe[last_mapping_indx]):
return False
# why use a loop? using a loop is more dynamic, in case elements can change in size
# tries to match a subset in a set
for num in elements:
if from_indx >= len(universe) or not (num == universe[from_indx]):
return False
from_indx += 1
return True
# T = typedVar('T')
# you can find a creative way to use zip(itr[T], itr[T]) here to achieve the same
def a_in_b(larger: List[int], smaller: List[int]) -> bool:
for indx, num in enumerate(larger):
if num == smaller[0]:
if mapable(larger, smaller, indx):
return True
# return indx + (len(smaller)) # this is the solution if you only care about how many comparison were made
return False
# this code will check that a list is found in a larger one-dimentional list in linear faction. If you look at the helper-method(mapable) the worst case scenario would be the following
#larger: [8, 8, 8, 8, 8, 8, 8, 8, 8, 9]
#smaller: [8, 9]
# where it tries to iterate through smaller n-1 times. This would drop our complexity from O(N) to O(n * m) where m = len(smaller). Hence why we have an if statement at the beginning of mapable.
largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [8, 9, 10]
print(a_in_b(largeArr, smallArr)) # True

As you have the numpy tag, use a numpy approach:
from numpy.lib.stride_tricks import sliding_window_view as swv
largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [9, 10]
out = (swv(largeArr, len(smallArr)) == smallArr).any()
# True
Intermediate:
swv(largeArr, len(smallArr))
array([[ 1, 2],
[ 2, 3],
[ 3, 4],
[ 4, 5],
[ 5, 6],
[ 6, 7],
[ 7, 8],
[ 8, 9],
[ 9, 10]])
repeated comparisons
If many comparisons need to be done:
from numpy.lib.stride_tricks import sliding_window_view as swv
existing = set(map(tuple, swv(largeArr, len(smallArr))))
tuple(smallArr) in existing
# True
tuple([12, 4]) in existing
# False

Related

Find 4 values in a list that are close together

I am trying to find the 4 closest value in a given list within a defined value for the difference. The list can be of any length and is sorted in increasing order. Below is what i have tried:
holdlist=[]
m=[]
nlist = []
t = 1
q = [2,3,5,6,7,8]
for i in range(len(q)-1):
for j in range(i+1,len(q)):
if abs(q[i]-q[j])<=1:
holdlist.append(i)
holdlist.append(j)
t=t+1
break
else:
if t != 4:
holdlist=[]
t=1
elif t == 4:
nlist = holdlist
holdlist=[]
t=1
nlist = list(dict.fromkeys(nlist))
for num in nlist:
m.append(q[num])
The defined difference value here is 1. Where "q" is the list and i am trying to get the result in "m" to be [5,6,7,8]. but it turns out to be an empty list.
This works only if the list "q" is [5,6,7,8,10,11]. My guess is after comparing the last value, the for loop ends and the result does not go into "holdlist".
Is there a more elegant way of writing the code?
Thank you.
One solution would be to sort the input list and find the smallest window of four elements. Given the example input, this is
min([sorted(q)[i:i+4] for i in range(len(q) - 3)],
key=lambda w: w[3] - w[0])
But given a different input this will still return a value if the smallest window has a bigger spacing than 1. But I'd still use this solution, with a bit of error handling:
assert len(q) > 4
answer = min([sorted(q)[i:i+4] for i in range(len(q) - 3)], key=lambda w: w[3] - w[0])
assert answer[3] - answer[0] < 4
Written out and annotated:
sorted_q = sorted(q)
if len(q) < 4:
raise RuntimeError("Need at least four members in the list!")
windows = [sorted_q[i:i+4] for i in range(len(q) - 3)] # All the chunks of four elements
def size(window):
"""The size of the window."""
return window[3] - window[0]
answer = min(windows, key=size) # The smallest window, by size
if answer[3] - answer[0] > 3:
return "No group of four elements has a maximum distance of 1"
return answer
This would be one easy approach to find four closest numbers in list
# Lets have a list of numbers. It have to be at least 4 numbers long
numbers = [10, 4, 9, 1,7,12,25,26,28,29,30,77,92]
numbers.sort()
#now we have sorted list
delta = numbers[4]-numbers[0] # Lets see how close first four numbers in sorted list are from each others.
idx = 0 # Let's save our starting index
for i in range(len(numbers)-4):
d = numbers[i+4]-numbers[i]
if d < delta:
# if some sequence are closer together we save that value and index where they were found
delta = d
idx = i
if numbers[idx:idx+4] == 4:
print ("closest numbers are {}".format(numbers[idx:idx+4]))
else:
print ("Sequence with defined difference didn't found")
Here is my jab at the issue for OP's reference, as #kojiro and #ex4 have already supplied answers that deserve credit.
def find_neighbor(nums, dist, k=4):
res = []
nums.sort()
for i in range(len(nums) - k):
if nums[i + k - 1] - nums[i] <= dist * k:
res.append(nums[i: i + k])
return res
Here is the function in action:
>>> nums = [10, 11, 5, 6, 7, 8, 9] # slightly modified input for better demo
>>> find_neighbor(nums, 1)
[[5, 6, 7, 8], [6, 7, 8, 9], [7, 8, 9, 10]]
Assuming sorting is legal in tackling this problem, we first sort the input array. (I decided to sort in-place for marginal performance gain, but we can also use sorted(nums) as well.) Then, we essentially create a window of size k and check if the difference between the first and last element within that window are lesser or equal to dist * k. In the provided example, for instance, we would expect the difference between the two elements to be lesser or equal to 1 * 4 = 4. If there exists such window, we append that subarray to res, which we return in the end.
If the goal is to find a window instead of all windows, we could simply return the subarray without appending it to res.
You can do this in a generic fashion (i.e. for any size of delta or resulting largest group) using the zip function:
def deltaGroups(aList,maxDiff):
sList = sorted(aList)
diffs = [ (b-a)<=maxDiff for a,b in zip(sList,sList[1:]) ]
breaks = [ i for i,(d0,d1) in enumerate(zip(diffs,diffs[1:]),1) if d0!=d1 ]
groups = [ sList[s:e+1] for s,e in zip([0]+breaks,breaks+[len(sList)]) if diffs[s] ]
return groups
Here's how it works:
Sort the list in order to have each number next to the closest other numbers
Identify positions where the next number is within the allowed distance (diffs)
Get the index positions where compliance with the allowed distance changes (breaks) from eligible to non-eligible and from non-eligible to eligible
This corresponds to start and end of segments of the sorted list that have consecutive eligible pairs.
Extract subsets of the the sorted list based on the start/end positions of consecutive eligible differences (groups)
The deltaGroups function returns a list of groups with at least 2 values that are within the distance constraints. You can use it to find the largest group using the max() function.
output:
q = [10,11,5,6,7,8]
m = deltaGroups(q,1)
print(q)
print(m)
print(max(m,key=len))
# [10, 11, 5, 6, 7, 8]
# [[5, 6, 7, 8], [10, 11]]
# [5, 6, 7, 8]
q = [15,1,9,3,6,16,8]
m = deltaGroups(q,2)
print(q)
print(m)
print(max(m,key=len))
# [15, 1, 9, 3, 6, 16, 8]
# [[1, 3], [6, 8, 9], [15, 16]]
# [6, 8, 9]
m = deltaGroups(q,3)
print(m)
print(max(m,key=len))
# [[1, 3, 6, 8, 9], [15, 16]]
# [1, 3, 6, 8, 9]

Sum of certain items in a list

I'm working on a probability-related problem. I need to sum only specific items on a certain list.
I've tried using "for" functions and it hasn't worked. I'm looking for a way to select items based on their positions on the list, and summing them.
You can use operator.itemgetter to select only certian index’s in a list or keys in a dict.
from operator import itemgetter
data = [1, 2, 3, 4, 5, 6, 7, 8]
get_indexes = itemgetter(2, 5, 7)
#this will return indexes 2, 5, 7 from a sequence
sum(get_indexes(data)) #3+6+8
#returns 17
That example is for lists but you can use itemgetter for dict keys too just use itemgetter('key2', 'key5', 'key7')({some_dict})
To get only even or odd indexes use slicing not enumerate and a loop it’s much more efficient and easier to read:
even = sum(data[::2])
odd = sum(data[1::2])
You can also use filter but I wouldn’t suggest this for getting by index:
sum(filter(lambda n: data.index(n) % 2 == 0, data))
You really should have put more into your question, but:
stuff = [1, 2, 3, 4, 5, 6, 7, 8]
# sum the numbers that have even indices:
funny_total = sum([x for i, x in enumerate(stuff) if i % 2 == 0 ])
funny_total
# 16
That should get you started. An approach with a for loop would have worked, as well. You just likely have a bug in your code.
stuff = [1, 2, 3, 4, 5, 6, 7, 8]
indices_to_include = [1, 3, 4, 5, 6]
funny_total = 0
for i, x in enumerate(stuff):
if i in indices_to_include:
funny_total += x
You could also:
def keep_every_third(i):
return i % 3 == 0
# variable definitions as above...
for i, x in enumerate(stuff):
if keep_every_third(i):
# do stuff

Find maximum with limited length in a list

I'm looking for maximum absolute value out of chunked list.
For example, the list is:
[1, 2, 4, 5, 4, 5, 6, 7, 2, 6, -9, 6, 4, 2, 7, 8]
I want to find the maximum with lookahead = 4. For this case, it will return me:
[5, 7, 9, 8]
How can I do simply in Python?
for d in data[::4]:
if count < LIMIT:
count = count + 1
if abs(d) > maximum_item:
maximum_item = abs(d)
else:
max_array.append(maximum_item)
if maximum_item > highest_line:
highest_line = maximum_item
maximum_item = 0
count = 1
I know I can use for loop to check this. But I'm sure there is an easier way in python.
Using standard Python:
[max(abs(x) for x in arr[i:i+4]) for i in range(0, len(arr), 4)]
This works also if the array cannot be evenly divided.
Map the list to abs(), then chunk the list and send it to max():
array = [1,2,4,5,4,5,6,7,2,6,-9,6,4,2,7,8]
array = [abs(item) for item in array]
# use linked question's answer to chunk
# array = [[1,2,4,5], [4,5,6,7], [2,6,9,6], [4,2,7,8]] # chunked abs()'ed list
values = [max(item) for item in array]
Result:
>>> values
[5, 7, 9, 8]
Another way, is to use islice method from itertools module:
>>> from itertools import islice
>>> [max(islice(map(abs,array),i,i+4)) for i in range(0,len(array),4)]
[5, 7, 9, 8]
To break it down:
1 - map(abs, array) returns a list of all absolute values of array elemets
2 - islice(map(abs,array),i,i+4)) slices the array in chunks of four elements
3 - i in range(0,len(array),4) stepping range for islice to avoid overlapping
This can be wrapped in function as fellows:
def max_of_chunks(lst, chunk_size):
lst = map(abs, lst)
result = [max(islice(lst,i,i+chunk_size)) for i in range(0,len(lst),chunk_size)]
return result
Upd: Oh, I've just seen newest comments to task and answers. I wasn't get task properly, my bad :) Let my old answer stay here for history. Max numbers from list chunks you can find in the way like that:
largest = [max(abs(x) for x in l[i:i+n]) for i in xrange(0, len(l), n)]
or
largest = [max(abs(x) for x in l[i:i+n]) for i in range(0, len(l), n)]
if you're use Python3.
Original answer just for history: If you had to choice some numbers (once) from not a big list, you shouldn't install big libraries like numpy for such simple tasks. There are a lot of techniques to do it with built-in Python tools. Here they are (something of them).
So we have some list and count of maximum different elements:
In [1]: l = [1, 2, 4, 5, 4, 5, 6, 7, 2, 6, -9, 6, 4, 2, 7, 8]
In [2]: n = 4
A. First we getting only unique numbers from source list by converting it to set. Then we creating a list consist of these unique numbers, sort it and finally get N last (greatest) elements:
In [3]: sorted(list(set(l)))[-n:]
Out[3]: [5, 6, 7, 8]
B. You can use built-in heapq module:
In [7]: import heapq
In [8]: heapq.nlargest(n, set(l))
Out[8]: [8, 7, 6, 5]
Of course you can 'wrap' A or B technique into some human-friendly function like def get_largest(seq, n): return sorted(list(set(l)))[-n:]. Yes I've ommited some details like handling IndexError. You should remember about it when you'll writing the code.
C. If your list(s) is very long and you had to do many of these operations so fast as Python can, you should use special third-party libraries like numpy or bottleneck.

Mysterious interaction between Python's slice bounds and "stride"

I understand that given an iterable such as
>>> it = [1, 2, 3, 4, 5, 6, 7, 8, 9]
I can turn it into a list and slice off the ends at arbitrary points with, for example
>>> it[1:-2]
[2, 3, 4, 5, 6, 7]
or reverse it with
>>> it[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
or combine the two with
>>> it[1:-2][::-1]
[7, 6, 5, 4, 3, 2]
However, trying to accomplish this in a single operation produces in some results that puzzle me:
>>> it[1:-2:-1]
[]
>>>> it[-1:2:-1]
[9, 8, 7, 6, 5, 4]
>>>> it[-2:1:-1]
[8, 7, 6, 5, 4, 3]
Only after much trial and error, do I get what I'm looking for:
>>> it[-3:0:-1]
[7, 6, 5, 4, 3, 2]
This makes my head hurt (and can't help readers of my code):
>>> it[-3:0:-1] == it[1:-2][::-1]
True
How can I make sense of this? Should I even be pondering such things?
FWYW, my code does a lot of truncating, reversing, and listifying of iterables, and I was looking for something that was faster and clearer (yes, don't laugh) than list(reversed(it[1:-2])).
This is because in a slice like -
list[start:stop:step]
start is inclusive, resultant list starts at index start.
stop is exclusive, that is the resultant list only contains elements till stop - 1 (and not the element at stop).
So for your caseit[1:-2] - the 1 is inclusive , that means the slice result starts at index 1 , whereas the -2 is exclusive , hence the last element of the slice index would be from index -3.
Hence, if you want the reversed of that, you would have to do it[-3:0:-1] - only then -3 would be included in the sliced result, and the sliced result would go upto 1 index.
The important things to understand in your slices are
Start will be included in the slice
Stop will NOT be included in the slice
If you want to slice backwards, the step value should be a negative value.
Basically the range which you specify is a half-open (half-closed) range.
When you say it[-3:0:-1] you are actually starting from the third element from the back, till we reach 0 (not including zero), step one element at a time backwards.
>>> it[-3:0:-1]
[7, 6, 5, 4, 3, 2]
Instead, you can realize the start value like this
>>> it[len(it)-3 : 0 : -1]
[7, 6, 5, 4, 3, 2]
I think the other two answers disambiguate the usage of slicing and give a clearer image of how its parameters work.
But, since your question also involves readability -- which, let's not forget, is a big factor especially in Python -- I'd like to point out how you can improve it slightly by assigning slice() objects to variables thus removing all those hardcoded : separated numbers.
Your truncate and reverse slice object could, alternatively, be coded with a usage implying name :
rev_slice = slice(-3, 0, -1)
In some other config-like file. You could then use it in its named glory within slicing operations to make this a bit more easy on the eyes :
it[rev_slice] # [7, 6, 5, 4, 3, 2]
This might be a trivial thing to mention, but I think it's probably worth it.
Why not create a function for readability:
def listify(it, start=0, stop=None, rev=False):
if stop is None:
the_list = it[start:]
else:
the_list = it[start:stop]
if rev:
return the_list[::-1]
else:
return the_list
listify(it, start=1, stop=-2) # [2, 3, 4, 5, 6, 7]
listify(it, start=1, stop=-2, rev=True) # [7, 6, 5, 4, 3, 2]
A good way to intuitively understand the Python slicing syntax is to see how it maps to the corresponding C for loop.
A slice like
x[a:b:c]
gives you the same elements as
for (int i = a; i < b; i += c) {
...
}
The special cases are just default values:
a defaults to 0
b defaults to len(x)
c defaults to 1
Plus one more special case:
if c is negative, then a and b are swapped and the < is inverted to a >

Replacing values greater than a limit in a numpy array

I have an array n x m, and maximum values for each column. What's the best way to replace values greater than the maximum, besides checking each element?
For example:
def check_limits(bad_array, maxs):
good_array = np.copy(bad_array)
for i_line in xrange(bad_array.shape[0]):
for i_column in xrange(bad_array.shape[1]):
if good_array[i_line][i_column] >= maxs[i_column]:
good_array[i_line][i_column] = maxs[i_column] - 1
return good_array
Anyway to do this faster and in a more concise way?
Use putmask:
import numpy as np
a = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
m = np.array([7,6,5,4])
# This is what you need:
np.putmask(a, a >= m, m - 1)
# a is now:
np.array([[0, 1, 2, 3],
[4, 5, 4, 3],
[6, 5, 4, 3]])
Another way is to use the clip function:
using eumiro's example:
bad_array = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
maxs = np.array([7,6,5,4])
good_array = bad_array.clip(max=maxs-1)
OR
bad_array.clip(max=maxs-1, out=good_array)
you can also specify the lower limit, by adding the argument min=
If we aren't assuming anything about the structure of bad_array, your code is optimal by the adversary argument. If we know that each column is sorted in ascending order, then as soon as we reach a value higher than the max then we know every following element in that column is also higher than the limit, but if we have no such assumption we simply have to check every single one.
If you decide to sort each column first, this would take (n columns * nlogn) time, which is already greater than the n*n time it takes to check each element.
You could also create the good_array by checking and copying in one element at a time, instead of copying all of the elements from bad_array and checking them later. This should roughly cut down the time by a factor of .5
If the number of columns isn't large, one optimization would be:
def check_limits(bad_array, maxs):
good_array = np.copy(bad_array)
for i_column in xrange(bad_array.shape[1]):
to_replace = (good_array[:,i_column] >= maxs[i_column])
good_array[to_replace, i_column] = maxs[i_column] - 1
return good_array

Categories