Create range without certain numbers - python

I want to create a range x from 0 ... n, without any of the numbers in the list y. How can I do this?
For example:
n = 10
y = [3, 7, 8]
x = # Do Something
Should give the output:
x = [0, 1, 2, 4, 5, 6, 9]
One naive way would be to concatenate several ranges, each spanning a set of numbers which have been intersected by the numbers in y. However, I'm not sure of what the simplest syntax to do this is in Python.

You can use a list comprehension to filter the range from 0 to n: range(n) generates a list (or, in Python 3, a generator object) from 0 to n - 1 (including both ends):
x = [i for i in range(n) if i not in y]
This filters out all numbers in y from the range.
You can also turn it into a generator (which you could only iterate over once but which would be faster for (very) large n) by replacing [ with ( and ] with ). Further, in Python 2, you can use xrange instead of range to avoid loading the entire range into memory at once. Also, especially if y is a large list, you can turn it into a set first to use O(1) membership checks instead of O(n) on list or tuple objects. Such a version might look like
s = set(y)
x = (i for i in range(n) if i not in s)

hlt's answer is ideal, but I'll quickly suggest another way using set operations.
n = 10
y = [3, 7, 8]
x = set(range(n)) - set(y)
x will be a set object. If you definitely need x to be a list, you can just write x = list(x).
Note that the ordering of a set in Python is not guaranteed to be anything in particular. If order is needed, remember to sort.

Adding on to the above answers, here is my answer using lambda function:
x = filter(lambda x: x not in y,range(n))

Related

Split sorted list into two lists

I'm trying to split a sorted integer list into two lists. The first list would have all ints under n and the second all ints over n. Note that n does not have to be in the original list.
I can easily do this with:
under = []
over = []
for x in sorted_list:
if x < n:
under.append(x)
else
over.append(x)
But it just seems like it should be possible to do this in a more elegant way knowing that the list is sorted. takewhile and dropwhile from itertools sound like the solution but then I would be iterating over the list twice.
Functionally, the best I can do is this:
i = 0
while sorted_list[i] < n:
i += 1
under = sorted_list[:i]
over = sorted_list[i:]
But I'm not even sure if it is actually better than just iterating over the list twice and it is definitely not more elegant.
I guess I'm looking for a way to get the list returned by takewhile and the remaining list, perhaps, in a pair.
The correct solution here is the bisect module. Use bisect.bisect to find the index to the right of n (or the index where it would be inserted if it's missing), then slice around that point:
import bisect # At top of file
split_idx = bisect.bisect(sorted_list, n)
under = sorted_list[:split_idx]
over = sorted_list[split_idx:]
While any solution is going to be O(n) (you do have to copy the elements after all), the comparisons are typically more expensive than simple pointer copies (and associated reference count updates), and bisect reduces the comparison work on a sorted list to O(log n), so this will typically (on larger inputs) beat simply iterating and copying element by element until you find the split point.
Use bisect.bisect_left (which finds the leftmost index of n) instead of bisect.bisect (equivalent to bisect.bisect_right) if you want n to end up in over instead of under.
I would use following approach, where I find the index and use slicing to create under and over:
sorted_list = [1,2,4,5,6,7,8]
n=6
idx = sorted_list.index(n)
under = sorted_list[:idx]
over = sorted_list[idx:]
print(under)
print(over)
Output (same as with your code):
[1, 2, 4, 5]
[6, 7, 8]
Edit: As I understood the question wrong here is an adapted solution to find the nearest index:
import numpy as np
sorted_list = [1,2,4,5,6,7,8]
n=3
idx = np.searchsorted(sorted_list, n)
under = sorted_list[:idx]
over = sorted_list[idx:]
print(under)
print(over)
Output:
[1, 2]
[4, 5, 6, 7, 8]

How to filter two numpy arrays?

Edit: I fixed y so that x,y have the same length
I don't understand much about programing but I have a giant mass of data to analyze and it has to be done in Python.
Say I have two arrays:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20,80,45])
and say I want to choose the values in y which are greater than 17, and keep only the values in x which has the same index as the left values in y. for example I want to erase the first value of y (25) and accordingly the matching value in x (1).
I tried this:
filter=np.where(y>17, 0, y)
but I don't know how to filter the x values accordingly (the actual data are much longer arrays so doing it "by hand" is basically imposible)
Solution: using #mozway tip, now that x,y have the same length the needed code is:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20,80,45])
x_filtered=x[y>17]
As your question is not fully clear and you did not provide the expected output, here are two possibilities:
filtering
Nunique arrays can be sliced by an array (iterable) of booleans.
If the two arrays were the same length you could do:
x[y>17]
Here, xis longer than y so we first need to make it the same length:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20])
x[:len(y)][y>17]
Output: array([1, 2, 4, 5, 8])
replacement
To select between x and y based on a condition, use where:
np.where(y>17, x[:len(y)], y)
Output:
array([ 1, 2, 16, 4, 5, 5, 9, 8])
As someone with little experience in Numpy specifically, I wrote this answer before seeing #mozway's excellent answer for filtering. My answer works on more generic containers than Numpy's arrays, though it uses more concepts as a result. I'll attempt to explain each concept in enough detail for the answer to make sense.
TL;DR:
Please, definitely read the rest of the answer, it'll help you understand what's going on.
import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9,10])
y = np.array([25,18,16,19,30,5,9,20])
filtered_x_list = []
filtered_y_list = []
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
filtered_x_list.append(x[i])
filtered_x = np.array(filtered_x_list)
filtered_y = np.array(filtered_y_list)
# These lines are just for us to see what happened
print(filtered_x) # prints [1 2 4 5 8]
print(filtered_y) # prints [25 18 19 30 20]
Pre-requisite Knowledge
Python containers (lists, arrays, and a bunch of other stuff I won't get into)
Lets take a look at the line:
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
What's Python doing?
The first thing it's doing is creating a list:
[1, 2, 3] # and so on
Lists in Python have a few features that are useful for us in this solution:
Accessing elements:
x_list = [ 1, 2, 3 ]
print(x_list[0]) # prints 1
print(x_list[1]) # prints 2, and so on
Adding elements to the end:
x_list = [ 1, 2, 3 ]
x_list.append(4)
print(x_list) # prints [1, 2, 3, 4]
Iteration:
x_list = [ 1, 2, 3 ]
for x in x_list:
print(x)
# prints:
# 1
# 2
# 3
Numpy arrays are slightly different: we can still access and iterate elements in them, but once they're created, we can't modify them - they have no .append, and there are other modifications one can do with lists (like changing one value, or deleting a value) we can't do with numpy arrays.
So the filtered_x_list and the filtered_y_list are empty lists we're creating, but we're going to modify them by adding the values we care about to the end.
The second thing Python is doing is creating a numpy array, using the list to define its contents. The array constructor can take a list expressed as [...], or a list defined by x_list = [...], which we're going to take advantage of later.
A little more on iteration
In your question, for every x element, there is a corresponding y element. We want to test something for each y element, then act on the corresponding x element, too.
Since we can access the same element in both arrays using an index - x[0], for instance - instead of iterating over one list or the other, we can iterate over all indices needed to access the lists.
First, we need to figure out how many indices we're going to need, which is just the length of the lists. len(x) lets us do that - in this case, it returns 10.
What if x and y are different lengths? In this case, I chose the smallest of the two - first, do len(x) and len(y), then pass those to the min() function, which is what min(len(x), len(y)) in the code above means.
Finally, we want to actually iterate through the indices, starting at 0 and ending at len(x) - 1 or len(y) - 1, whichever is smallest. The range sequence lets us do exactly that:
for i in range(10):
print(i)
# prints:
# 0
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# 8
# 9
So range(min(len(x), len(y))), finally, gets us the indices to iterate over, and finally, this line makes sense:
for i in range(min(len(x), len(y))):
Inside this for loop, i now gives us an index we can use for both x and y.
Now, we can do the comparison in our for loop:
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
Then, including xs for the corresponding ys is a simple case of just appending the same x value to the x list:
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
filtered_x_list.append(x[i])
The filtered lists now contain the numbers you're after. The last two lines, outside the for loop, just create numpy arrays from the results:
filtered_x = np.array(filtered_x_list)
filtered_y = np.array(filtered_y_list)
Which you might want to do, if certain numpy functions expect arrays.
While there are, in my opinion, better ways to do this (I would probably write custom iterators that produce the intended results without creating new lists), they require a somewhat more advanced understanding of programming, so I opted for something simpler.

List comprehension where the condition depends on the list being generated

I am new to Python programming. I want to rewrite the following code as a list comprehension:
lx = [1, 2, 3, 4, 5, 1, 2]
ly = [2, 5, 4]
lz = []
for x in lx:
if x in ly and x not in lz:
lz.append(x)
This will create a new list with common elements of lx and ly; but the condition x not in lz depends on the list that is being built. How can this code be rewritten as a list comprehension?
You cannot do it that way in a list comprehension as you cannot compare against the list lz that does not yet exist - assuming you are trying to avoid duplicates in the resulting list as in your example.
Instead, you can use the python set which will enforce only a single instance of each value:
lz = set(x for x in lx if x in ly)
And if what you are really after is a set intersection (elements in common):
lz = set(lx) & set(ly)
UPDATE:
As pointed out by #Błotosmętek in the comments - using the set will not retain the order of the elements as the set is, by definition, unordered. If the order of the elements is significant a different strategy will be necessary.
The correct answer here is to use sets, because (1) sets naturally have distinct elements, and (2) sets are more efficient than lists for membership tests. So the simple solution is list(set(lx) & set(ly)).
However, sets do not preserve the order that elements are inserted in, so in case the order is important, here's a solution which preserves the order from lx. (If you want the order from ly, simply swap the roles of the two lists.)
def ordered_intersection(lx, ly):
ly_set = set(ly)
return [ly_set.remove(x) or x for x in lx if x in ly_set]
Example:
>>> ordered_intersection(lx, ly)
[2, 4, 5]
>>> ordered_intersection(ly, lx)
[2, 5, 4]
It works because ly_set.remove(x) always returns None, which is falsy, so ly_set.remove(x) or x always has the value of x.
The reason you cannot do this with a simpler list comprehension like lz = [... if x in lz] is because the whole list comprehension will be evaluated before the resulting list is assigned to the variable lz; so the x in lz test will give a NameError because there is no such variable yet.
That said, it is possible to rewrite your code to directly use a generator expression (which is somewhat like a list comprehension) instead of a for loop; but it is bad code and you shouldn't do this:
def ordered_intersection_bad_dont_do_this(lx, ly):
lz = []
lz.extend(x for x in lx if x in ly and x not in lz)
return lz
This is not just bad because of repeatedly testing membership of lists; it is worse, because it depends on an unspecified behaviour of the extend method. In particular, it adds each element one by one rather than exhausting the iterator first and then adding them all at once. The docs don't say that this is guaranteed to happen, so this bad solution won't necessarily work in other versions of Python.
If you don't want to use set, this can be another approach using list comprehension.
lx = [1, 2, 3, 4, 5, 1, 2]
ly = [2, 5, 4]
lz=[]
[lz.append(x) for x in lx if (x in ly and x not in lz)]
print(lz)

Is there a way to find a max value between a range?

For example,
l = [1, -9, 2, 5, 9, 16, 11, 0, 21]
and if the range is 10 (10 meaning any numbers higher than 10 wont be considered as the max), I want the code to return 9.
You can first delete all elements too large and then find the max:
filtered = filter(lambda x: x <= limit, list)
val = max(filtered, default = None) # the `default` part means that that's returned if there are no elements
filtered is a filter object which contains all elements less than or equal to the limit. val is the maximum value in that.
Alternatively,
filtered = [x for x in list if x <= limit]
val = max(filtered, default = None)
filtered contains all elements in the list if and only if they are less than the limit. val is the maximum of filtered.
Alternatively,
val = max((x for x in list if x <= limit), default = None)
This combines the two steps from the above method by using an argument comprehension.
Alternatively,
val = max(filter(limit.__ge__, list), default = None)
limit.__ge__ is a function that means x => limit >= x (ge means Greater-Equal). This is the shortest and least readable way of writing it.
Also please rename list
list is a global variable (the list type in Python). Please don't overwrite global variables ;_;
The following is not radically different, conceptually, than #HyperNeutrino's excellent answer, but I think it's somewhat clearer (per the Zen):
from __future__ import print_function
l = [1, -9, 2, 5, 9, 16, 11, 0, 21]
def lim(x, n):
if x <= n:
return x
print(max(lim(a,10) for a in l))
The cleanest and most space efficient method is to utilize a conditioned generator expression:
maxl = max(num for num in l if num <= 10)
This loops over the list l once, ignoring any numbers not satisfying num <= 10 and finds the maximum. No additional list is build.

Inserting and removing into/from sorted list in Python

I have a sorted list of integers, L, and I have a value X that I wish to insert into the list such that L's order is maintained. Similarly, I wish to quickly find and remove the first instance of X.
Questions:
How do I use the bisect module to do the first part, if possible?
Is L.remove(X) going to be the most efficient way to do the second part? Does Python detect that the list has been sorted and automatically use a logarithmic removal process?
Example code attempts:
i = bisect_left(L, y)
L.pop(i) #works
del L[bisect_left(L, i)] #doesn't work if I use this instead of pop
You use the bisect.insort() function:
bisect.insort(L, X)
L.remove(X) will scan the whole list until it finds X. Use del L[bisect.bisect_left(L, X)] instead (provided that X is indeed in L).
Note that removing from the middle of a list is still going to incur a cost as the elements from that position onwards all have to be shifted left one step. A binary tree might be a better solution if that is going to be a performance bottleneck.
You could use Raymond Hettinger's IndexableSkiplist. It performs 3 operations in O(ln n) time:
insert value
remove value
lookup value by rank
import skiplist
import random
random.seed(2013)
N = 10
skip = skiplist.IndexableSkiplist(N)
data = range(N)
random.shuffle(data)
for num in data:
skip.insert(num)
print(list(skip))
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for num in data[:N//2]:
skip.remove(num)
print(list(skip))
# [0, 3, 4, 6, 9]

Categories