Find lists items closest to certain values in Python

Find lists items closest to certain values in Python - python

I have a list of sorted floats y, as well as a list of unsorted floats x.
Now, I need to find out for every element in x between which values of y it lies, preferably by index of y. So for example, if
y=[1,2,3,4,5]
x[0]=3.5
I would need the output for index 0 of x to be (2,3), because 3.5 is between y[2] and y[3].
Basically, it is the same as seeing y as bin edges and sorting x to those bins, I guess.
What would be the easiest way yo accomplish that?

I would use zip (itertools.izip in Python 2.x) to accomplish this:
from itertools import islice#, izip as zip # if Python 2.x
def nearest_neighbours(x, lst):
for l1, l2 in zip(lst, islice(lst, 1, None)):
if l1 <= x <= l2:
return l1, l2
else:
# ?
Example usage:
>>> nearest_neighbours(3.5, range(1, 6))
(3, 4)
You will have to decide what you want to happen if x isn't between any pair in lst (i.e. replace # ?!) If you want indices (although your example isn't using them), have a play with enumerate.

Thanks - I'm aware of how to code that step-by-step. However, I was looking for a pretty/easy/elegant solution and now I am using numpy.digitize(), wich looks pretty to me and works nicely.

Q: What would be the easiest way yo accomplish that?
Instead of giving you the code, I think you should see this pseudo-code and try to write your own code! Don't just copy paste code from the internet, if you want to educate yourself!
Pseudocode:
// Assume that when you have a tie,
// you put the number in the smallest range
// Here, b is between 2.1 and 3.5, instead of
// 3.5 and 4.1
float a[5] = {0.1, 1.1, 2.1, 3.5, 4.1}; // your y
float b = 3.5; // your x
// counter for the loop and indexes. Init i to second element
integer i = 1, prev = -1, next;
// while we are not in the end of the array
while(i < 5) {
// if b is in the range of ( a(i-1), a(i) ]
if(b <= a[i] && b > a[i - 1]) {
// mark the indexes
prev = i - 1;
next = i;
}
// go to next element
i++;
}
if(prev = -1)
print "Number is not between some numbers"
else
print "prev, next"
I think that this can make you understand the point and then be able to select the easiest way for you.

Related

Python list with zeros and numbers repeating

My intent is to generate a generic empty list and then to append it with a numeric sequence such that it gives zeros but at the third place it gives 3 and then its multiples, that is a[(0,0,3,0,0,6,0,0,9)] and it needs to have 100 values inside.
I first set the list and then I use the 'for' loop in a range(0,100) and I am sure I need to use % in ways such that whenever my sequence going 1 to 100 is perfectly divisible by 3 it gives back 3 (and not 0) but then it keeps evolving in 6,9,12.. How do I do?
for i in range(0,100):
if i%3==0:
return 0
else
return 3
Of course this is completely wrong but i am new to programming in general. Thanks in advance.

you could try this:
for i in range(0, 100, 3):
list[i]=i
You just change the "step" of the range function, so the index i will represent also the value passed in the list, it should work properly.

#Mark's comment is very relevant and makes good use of the modulo and list comprehension properties in a very simple way. Moreover, its code easily adapts to any value.
However the modulo operator is quite slow so here are other ways to achieve the same result.
Method 1
We can make a range from 3 to 101 with a step of 3, and add [0, 0, i] to each step. Since there will be missing zeros at the end of the list, we must add as many as the rest of the division of 100 by 3.
data = [num for i in range(3, 101, 3) for num in [0, 0, i]] + [0] * 1
Method 2
With the same idea, we can use .extend() to add two 0s before each term.
data = []
for i in range(3, 101, 3):
data.extend([0, 0, i])
data.append(0)
Method 3
The simplest idea, we create a list of 100 zeros, and then we modify the value every 3 terms.
data = [0] * 100
for i in range(2, 101, 3):
data[i] = i + 1
Comparison
Using timeit, here is a comparison of the speed of each algorithm, the comparison is based on 10000 repetitions.
import timeit
print(timeit.timeit("data = [0 if n % 3 else n for n in range(1, 101)]"))
print(timeit.timeit("data = [num for i in range(3, 101, 3) for num in [0, 0, i]] + [0] * 1"))
print(timeit.timeit("""
data = []
for i in range(3, 101, 3):
data.extend([0, 0, i])
data.append(0)
"""))
print(timeit.timeit("""
data = [0] * 100
for i in range(2, 101, 3):
data[i] = i + 1
"""))
Output:
4.137781305000317
3.8176420609997876
2.4403464719998738
1.4861199529996156
The last algorithm is thus the fastest, almost 3 times faster than using modulus.

A function stops running, when it encounters a return. So in your code, you only ever execute the loop once.
But, what if we could change that? Do you know what a generator is? Have a look at this:
def multiples_of_three_or_zero():
for i in range(0,100):
if i%3==0:
yield 0
else
yield 3
That's a generator. yield doesn't end the execution, but rather suspends it. You use it like this:
for i in multiples_of_three_or_zero():
print(i)
Or, if you really want all the elements in a list, just make a list from it:
list(multiples_of_three_or_zero())

Ok I practiced a little bit and I found this solution which best suits my needs (it was for a take-home exercise)
A=[]
for a in range (1,101):
if a%3==0:
A.append(a)
else:
A.append(0)
print(A)
thanks you all!

Is there a way to convert Python to R?

Hey I am trying to convert my python code to R and can't seem to figure out the last part of the recursion. If anyone who has experience in both languages could help that would be great!
def robber(nums):
if len(nums) == 0: return 0
elif len(nums) <= 2: return max(nums)
else:
A = [nums[0], max(nums[0:2])]
for i in range(2, len(nums)):
A.append(max(A[i-1], A[i-2] + nums[i]))
return A[-1]
Above is the Python version and below is my attempt so far on converting to R
robbing <- function(nums) {
if (length(nums) == 0){
result <- 0
}
else if(length(nums) <= 2){
result <- max(nums)
}
else{
a <- list(nums[0], max(nums(0:2)))
for (i in range(2, length(nums))){
result <- max(a[i-1], a[i-2] + nums[i])
}
}
#result <- a[-1]
}

You have a couple of problems.
You are zero-indexing your vectors. R is 1-indexed (first element of y is y[1] not y[0].
Ranges (slices in python) in R are inclusive. Eg: 0:2 = c(0, 1, 2) while python is right-exclusive 0:2 = [0, 1].
R uses minus elements to "remove" elements of vectors, while Python uses these to extract from reverse order. Eg: y[-1] = y[2:length(y)] in R.
R's range function is not the same as Python's range function. The equivalent in R would be seq or a:b (example 3:n). Not again that it is right-inclusive while pythons is right-exclusive!
You are not storing your intermediary results in a as you are doing in python. You need to do this at run-time
And last: R functions will return the last evaluation by default. So there is no need to explicitly use return. This is not a problem per-say, but something that can make code look cleaner (or less clean in some cases). So one option to fix you problem would be:
robber <- function(nums){
n <- length(nums) # <= Only compute length **once** =>
if(n == 0)
0 # <= Returned because no more code is run after this =>
else if(n <= 2)
max(nums) # <= Returned because no more code is run after this =>
else{
a <- numeric(n) # <= pre-allocate our vector =>
a[1:2] <- cummax(nums[1:2]) # <= Cummax instead of c(nums[1], max(nums[1:2])) =>
for(i in 3:n){ # <= Note that we start at 3, because of R's 1-indexing =>
a[i] <- max(a[i - 1], a[i - 2] + nums[i])
}
a[n]
}
}
Note 3 things:
I use that R vectors are 1-indexed, and my range goes from 3 as a consequence of this.
I pre-allocate my a vector (here using numeric(n)). R vector expansion is slow while python lists are constant in time-complexity. So preallocation is the recommended way to go in all cases.
I extract my length once and store it in a variable. n <- length(nums). It is inherently unnecessary to evaluate this multiple times, and it is recommended to store these intermediary results in a variable. This goes for any language such as R, Python and even in compild languages such as C++ (while for the latter, in many cases the compiler is smart enough to not recompute the result).
Last I use cummax where I can. I feel there is an optimized way to get your result almost immediately using vectorization, but I can't quite see it.

I would avoid to use a list. Because appending lists is slow. (Especially in R! - Vector is much better. But we don't need any sequence and indexing, if we use variables like I show you here).
You don't need to build a list.
All you need to keep in memory is the previous
and the preprevious value for res.
def robber(nums, res=0, prev=0, preprev=0): # local vars predefined here
for x in nums:
prev, preprev = res, prev
res = max(prev, preprev + x)
return res
This python function does the same like your given. (Try it out!).
In R this would be:
robber <- function(nums, res=0, prev=0, preprev=0) {
for (x in nums) {
preprev <- prev
prev <- res # correct order important!
res <- max(prev, preprev + x)
}
res
}
Taking the local variable definitions into the argument list saves in R 3 lines of code, therefore I did it.

I suggest you can change result to return() and renaming object a outside the function, also change len to length() by the end of the function.
a <- list(nums[0], max(nums(0:2)))
robbing <- function(nums) {
if (length(nums) == 0){
return(0)
}
else if(length(nums) <= 2){
return(max(nums))
}
else{
for (i in range(2, length(nums))){
return(max(a[i-1], a[i-2] + nums[i]))
}
}
return(a[length(a)])
}

How to find last "K" indexes of vector satisfying condition (Python) ? (Analogue of Matlab's "find" )

Consider some vector:
import numpy as np
v = np.arange(10)
Assume we need to find last 2 indexes satisfying some condition.
For example in Matlab it would be written e.g.
find(v <5 , 2,'last')
answer = [ 3 , 4 ] (Note: Matlab indexing from 1)
Question: What would be the clearest way to do that in Python ?
"Nice" solution should STOP search when it finds 2 desired results, it should NOT search over all elements of vector.
So np.where does not seems to be "nice" in that sense.
We can easyly write that using "for", but is there any alternative way ?
I am afraid using "for" since it might be slow (at least it is very much so in Matlab).

This attempt doesn't use numpy, and it is probably not very idiomatic.
Nevertheless, if I understand it correctly, zip, filter and reversed are all lazy iterators that take only the elements that they really need. Therefore, you could try this:
x = list(range(10))
from itertools import islice
res = reversed(list(map(
lambda xi: xi[1],
islice(
filter(
lambda xi: xi[0] < 5,
zip(reversed(x), reversed(range(len(x))))
),
2
)
)))
print(list(res))
Output:
[3, 4]
What it does (from inside to outside):
create index range
reverse both array and indices
zip the reversed array with indices
filter the two (value, index)-pairs that you need, extract them by islice
Throw away the values, retain only indices with map
reverse again
Even though it looks somewhat monstrous, it should all be lazy, and stop after it finds the first two elements that you are looking for. I haven't compared it with a simple loop, maybe just using a loop would be both simpler and faster.

Any solution you'd find will iterate over the list even if the loop is 'hidden' inside a function.
The solution to your problem depends on the assumptions you can make e.g. is the list sorted?
for the general case I'd iterate over the loop starting at the end:
def find(condition, k, v):
indices = []
for i, var in enumerate(reversed(v)):
if condition(var):
indices.append(len(v) - i - 1)
if len(indices) >= k:
break
return indices
The condition should then be passed as a function, so you can use a lambda:
v = range(10)
find(lambda x: x < 5, 3, v)
will output
[4, 3, 2]

I'm not aware of a "good" numpy solution to short-circuiting.
The most principled way to go would be using something like Cython which to brutally oversimplify it adds fast loops to Python. Once you have set that up it would be easy.
If you do not want to do that you'd have to employ some gymnastics like:
import numpy as np
def find_last_k(vector, condition, k, minchunk=32):
if k > minchunk:
minchunk = k
l, r = vector.size - minchunk, vector.size
found = []
n_found = 0
while r > 0:
if l <= 0:
l = 0
found.append(l + np.where(condition(vector[l:r]))[0])
n_found += len(found[-1])
if n_found >= k:
break
l, r = 3 * l - 2 * r, l
return np.concatenate(found[::-1])[-k:]
This tries balancing loop overhead and numpy "inflexibility" by searching in chunks, which we grow exponentially until enough hits are found.
Not exactly pretty, though.

This is what I've found that seems to do this job for the example described (using argwhere which returns all indices that meet the criteria and then we find the last two of these as a numpy array):
ind = np.argwhere(v<5)
ind[-2:]
This searches through the entire array so is not optimal but is easy to code.

Difficulty finding a Python 3.x implementation of the familiar C for-loop

I'm inexperienced in Python and started with Python 3.4.
I read over the Python 3.x documentation on loop idioms, and haven't found a way of constructing a familiar C-family for-loop, i.e.
for (i = 0; i < n; i++) {
A[i] = value;
}
Writing a for-loop like this in Python seems all but impossible by design. Does anyone know the reason why Python iteration over a sequence follows a pattern like
for x in iterable: # e.g. range, itertools.count, generator functions
pass;
Is this more efficient, convenient, or reduces index-out-of-bounds exception?

for lower <= var < upper:
That was the proposed syntax for a C-style loop. I say "was the proposed syntax", because PEP 284 was rejected, because:
Specifically, Guido did not buy the premise that the range() format needed fixing, "The whole point (15 years ago) of range() was to *avoid* needing syntax to specify a loop over numbers. I think it's worked out well and there's nothing that needs to be fixed (except range() needs to become an iterator, which it will in Python 3.0)."
So no for lower <= var < upper: for us.
Now, how to get a C-style loop? Well, you can use range([start,]end[,step]).
for i in range(0,len(blah),3):
blah[i] += merp #alters every third element of blah
#step defaults to 1 if left off
You can enumerate if you need both index and value:
for i,j in enumerate(blah):
merp[j].append(i)
If you wanted to look at two (or more!) iterators together you can zip them (Also: itertools.izip and itertools.izip_longest)
for i,j in zip(foo,bar):
if i == j: print("Scooby-Doo!")
And finally, there's always the while loop
i = 0
while i < upper:
A[i] = b
i++
Addendum: There's also PEP 276, which suggested making ints iterable, which was also rejected. Still would have been half-open

range(n) produces a suitable iterable :)
for i in range(n):
A[i] = value
for the more general case (not just counting) you should transform to a while loop. eg
i = 0
while i < n:
A[i] = value
i += 1

The foreach-style loop is pretty common in most languages now as it's rare you need access to the index of the collection and more common you only need the object itself. Furthermore, elements which would require an iterator as their is no random access (e.g. set) can be iterated with the exact same syntax as a randomly accessible collection.
In python, the correct way of accessing the index while iterating should be:
for i, x in enumerate(iterable):
At this point, i is your index and x is the item at iterable[i].

You'll want to look at using the range() function.
for i in range(n):
A[i] = value
The function can be used as either range(n), which returns a list of integers 0 - n, or as range(start, end) which will return integers from the start value to the end value. For example:
range(1, 5)
will give you the numbers 1, 2, 3, 4, and 5.

Python is a higher-level language than C and iterating over a high-level abstraction such as 'sequence' is more naturally and safely expressed with another one - 'iterator'. C doesn't really have such abstraction so it's hardly surprising it expresses most traversal with a low-level, 'hand-operated' index or pointer increment. That's an artifact of the low-level nature of C, though - it would be silly for a higher-level abstraction to use it as a primary building block for all looping constructs and most, not just Python, don't.

The ideal way to have C-style for loop in python is to use range. Not many are aware of an overloaded function of range(stop) which accepts start, stop and step arguments where step is optional. With this, you could almost do anything that you could with C-style for loops:
range(start, stop[, step])
for (i = 0; i < 10; i++)
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for (i = 1; i < 11; i++)
>>> range(1, 11)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for (i = 0; i < 30; i=i+5)
>>> range(0, 30, 5)
[0, 5, 10, 15, 20, 25]
for (i = 0; i < 10; i=i+3)
>>> range(0, 10, 3)
[0, 3, 6, 9]
for (i = 0; i > -10; i--)
>>> range(0, -10, -1)
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]
Check https://docs.python.org/2/library/functions.html#range

First, python has several ways to perform C style for loops with the two most common being (the first you allude to in your post, ie. with the generator object returned by range):
for i in range(some_end_value):
print(i)
# or the many times preferred
for i, elem in enumerate(some_list):
print("i is at {0} and the value is {1}".format(i, elem))
As to why python is setup this way, I think this has just become a more convenient and preferred way of setting up foreach-style loops - particuliarly as languages have moved away from the need to define arrays/list with their max index. For instance in java one can also do:
for (int i: someArray) {
system.out.println(i) // which would print the current item of an integer array
}
c# (foreach (int i in someArray)) and c++ (for (auto &i: int)) also have their own foreach loops. While in c, most people tend to write macros to get the functionality of a foreach loop.
It's just a convenient way to access dynamic arrays, lists, dicts, and other constructs. While while loops can be used for activities that must modify the iterator itself - or just a second variable could be created and modified mathematically using the iterator.

The equivalent of the C-loop:
for (i = 0; i < n; i++) A[i] = value;
i.e., to set all items in an array to the same value if A is a numpy array:
A[:] = value
Or if len(A) > n then
A[:n] = value
If you want to create a Python list with n values:
A = [value] * n #NOTE: all items refer to the *same* object
You could also replace values in the existing list:
A[:n] = [value]*n #NOTE: it may grow if necessary
Or without creating a temporary list:
for i in range(n): A[i] = value
The pythonic way to enumerate all values with corresponding indices while using the values:
for index, item in enumerate(A):
A[index] = item * item
The code could also be written using a list comprehension:
A = [item * item for item in A] #NOTE: the original list object may survive
Don't try to write C in Python.

Two value in two lists, simplify the code

I have two lists, and I want to compare the value in each list to see if the difference is in a certain range, and return the number of same value in each list. Here is my code 1st version:
m = [1,3,5,7]
n = [1,4,7,9,5,6,34,52]
k=0
for i in xrange(0, len(m)):
for j in xrange(0, len(n)):
if abs(m[i] - n[j]) <=0.5:
k+=1
else:
continue
the output is 3. I also tried 2nd version:
for i, j in zip(m,n):
if abs(i - j) <=0.5:
t+=1
else:
continue
the output is 1, the answer is wrong. So I am wondering if there is simpler and more efficient code for the 1st version, I have a big mount of data to deal with. Thank you!

The first thing you could do is remove the else: continue, since that doesn't add anything. Also, you can directly use for a in m to avoid iterating over a range and indexing.
If you wanted to write it more succiently, you could use itertools.
import itertools
m = [1,3,5,7]
n = [1,4,7,9,5,6,34,52]
k = sum(abs(a - b) <= 0.5 for a, b in itertools.product(m, n))
The runtime of this (and your solution) is O(m * n), where m and n are the lengths of the lists.
If you need a more efficient algorithm, you can use a sorted data structure like a binary tree or a sorted list to achieve better lookup.
import bisect
m = [1,3,5,7]
n = [1,4,7,9,5,6,34,52]
n.sort()
k = 0
for a in m:
i = bisect.bisect_left(n, a - 0.5)
j = bisect.bisect_right(n, a + 0.5)
k += j - i
The runtime is O((m + n) * log n). That's n * log n for sorting and m * log n for lookups. So you'd want to make n the shorter list.

More pythonic version of your first version:
ms = [1, 3, 5, 7]
ns = [1, 4, 7, 9, 5, 6, 34, 52]
k = 0
for m in ms:
for n in ns:
if abs(m - n) <= 0.5:
k += 1
I don't think it will run faster, but it's simpler (more readable).

It's simpler, and probably slightly faster, to simply iterate over the lists directly rather than to iterate over range objects to get index values. You already do this in your second version, but you're not constructing all possible pairs with that zip() call. Here's a modification of your first version:
m = [1,3,5,7]
n = [1,4,7,9,5,6,34,52]
k=0
for x in m:
for y in n:
if abs(x - y) <=0.5:
k+=1
You don't need the else: continue part, which does nothing at the end of a loop, so I left it out.
If you want to explore generator expressions to do this, you can use:
k = sum(sum( abs(x-y) <= 0.5 for y in n) for x in m)
That should run reasonably fast using just the core language and no imports.

Your two code snippets are doing two different things. The first one is comparing each element of n with each element of m, but the second one is only doing a pairwise comparison of corresponding elements of m and n, stopping when the shorter list runs out of elements. We can see exactly which elements are being compared in the second case by printing the zip:
>>> m = [1,3,5,7]
>>> n = [1,4,7,9,5,6,34,52]
>>> zip(m,n)
[(1, 1), (3, 4), (5, 7), (7, 9)]
pawelswiecki has posted a more Pythonic version of your first snippet. Generally, it's better to directly iterate over containers rather than using an indexed loop unless you actually need the index. And even then, it's more Pythonic to use enumerate() to generate the index than to use xrange(len(m)). Eg
>>> for i, v in enumerate(m):
... print i, v
...
0 1
1 3
2 5
3 7
A rule of thumb is that if you find yourself writing for i in xrange(len(m)), there's probably a better way to do it. :)
William Gaul has made a good suggestion: if your lists are sorted you can break out of the inner loop once the absolute difference gets bigger than your threshold of 0.5. However, Paul Draper's answer using bisect is my favourite. :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find lists items closest to certain values in Python - python

Thanks - I'm aware of how to code that step-by-step. However, I was looking for a pretty/easy/elegant solution and now I am using numpy.digitize(), wich looks pretty to me and works nicely.

Related

Python list with zeros and numbers repeating

Is there a way to convert Python to R?

How to find last "K" indexes of vector satisfying condition (Python) ? (Analogue of Matlab's "find" )

Difficulty finding a Python 3.x implementation of the familiar C for-loop

Two value in two lists, simplify the code

Categories

Resources