Related
Edit: I fixed y so that x,y have the same length
I don't understand much about programing but I have a giant mass of data to analyze and it has to be done in Python.
Say I have two arrays:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20,80,45])
and say I want to choose the values in y which are greater than 17, and keep only the values in x which has the same index as the left values in y. for example I want to erase the first value of y (25) and accordingly the matching value in x (1).
I tried this:
filter=np.where(y>17, 0, y)
but I don't know how to filter the x values accordingly (the actual data are much longer arrays so doing it "by hand" is basically imposible)
Solution: using #mozway tip, now that x,y have the same length the needed code is:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20,80,45])
x_filtered=x[y>17]
As your question is not fully clear and you did not provide the expected output, here are two possibilities:
filtering
Nunique arrays can be sliced by an array (iterable) of booleans.
If the two arrays were the same length you could do:
x[y>17]
Here, xis longer than y so we first need to make it the same length:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20])
x[:len(y)][y>17]
Output: array([1, 2, 4, 5, 8])
replacement
To select between x and y based on a condition, use where:
np.where(y>17, x[:len(y)], y)
Output:
array([ 1, 2, 16, 4, 5, 5, 9, 8])
As someone with little experience in Numpy specifically, I wrote this answer before seeing #mozway's excellent answer for filtering. My answer works on more generic containers than Numpy's arrays, though it uses more concepts as a result. I'll attempt to explain each concept in enough detail for the answer to make sense.
TL;DR:
Please, definitely read the rest of the answer, it'll help you understand what's going on.
import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9,10])
y = np.array([25,18,16,19,30,5,9,20])
filtered_x_list = []
filtered_y_list = []
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
filtered_x_list.append(x[i])
filtered_x = np.array(filtered_x_list)
filtered_y = np.array(filtered_y_list)
# These lines are just for us to see what happened
print(filtered_x) # prints [1 2 4 5 8]
print(filtered_y) # prints [25 18 19 30 20]
Pre-requisite Knowledge
Python containers (lists, arrays, and a bunch of other stuff I won't get into)
Lets take a look at the line:
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
What's Python doing?
The first thing it's doing is creating a list:
[1, 2, 3] # and so on
Lists in Python have a few features that are useful for us in this solution:
Accessing elements:
x_list = [ 1, 2, 3 ]
print(x_list[0]) # prints 1
print(x_list[1]) # prints 2, and so on
Adding elements to the end:
x_list = [ 1, 2, 3 ]
x_list.append(4)
print(x_list) # prints [1, 2, 3, 4]
Iteration:
x_list = [ 1, 2, 3 ]
for x in x_list:
print(x)
# prints:
# 1
# 2
# 3
Numpy arrays are slightly different: we can still access and iterate elements in them, but once they're created, we can't modify them - they have no .append, and there are other modifications one can do with lists (like changing one value, or deleting a value) we can't do with numpy arrays.
So the filtered_x_list and the filtered_y_list are empty lists we're creating, but we're going to modify them by adding the values we care about to the end.
The second thing Python is doing is creating a numpy array, using the list to define its contents. The array constructor can take a list expressed as [...], or a list defined by x_list = [...], which we're going to take advantage of later.
A little more on iteration
In your question, for every x element, there is a corresponding y element. We want to test something for each y element, then act on the corresponding x element, too.
Since we can access the same element in both arrays using an index - x[0], for instance - instead of iterating over one list or the other, we can iterate over all indices needed to access the lists.
First, we need to figure out how many indices we're going to need, which is just the length of the lists. len(x) lets us do that - in this case, it returns 10.
What if x and y are different lengths? In this case, I chose the smallest of the two - first, do len(x) and len(y), then pass those to the min() function, which is what min(len(x), len(y)) in the code above means.
Finally, we want to actually iterate through the indices, starting at 0 and ending at len(x) - 1 or len(y) - 1, whichever is smallest. The range sequence lets us do exactly that:
for i in range(10):
print(i)
# prints:
# 0
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# 8
# 9
So range(min(len(x), len(y))), finally, gets us the indices to iterate over, and finally, this line makes sense:
for i in range(min(len(x), len(y))):
Inside this for loop, i now gives us an index we can use for both x and y.
Now, we can do the comparison in our for loop:
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
Then, including xs for the corresponding ys is a simple case of just appending the same x value to the x list:
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
filtered_x_list.append(x[i])
The filtered lists now contain the numbers you're after. The last two lines, outside the for loop, just create numpy arrays from the results:
filtered_x = np.array(filtered_x_list)
filtered_y = np.array(filtered_y_list)
Which you might want to do, if certain numpy functions expect arrays.
While there are, in my opinion, better ways to do this (I would probably write custom iterators that produce the intended results without creating new lists), they require a somewhat more advanced understanding of programming, so I opted for something simpler.
I am trying to insert values, one at at time, from several python lists of lists (i.e. 2D lists) into another 2D list. (I know numpy is better at this, but I am trying to compare the performance of lists to numpy, so please don't just suggest numpy.) I want to insert the values at specific locations, hence the indexing on the left hand side.
resampled_pix_spot_list is a 240 by 240 list of lists, and pix_spot_list is a 225 by 225 list of lists.
The error I am getting, from the final four lines in the example, is "TypeError: 'float' object is not subscriptable". I get that pix_prod_bl[0][0], for example, is a float, but I don't understand why I can't insert it into a particular set of indices in resampled_pix_spot_list.
Edit 1- added minimal working example.
Edit 2- in adding the working example, I found that I accidentally had the line commented where I convert the lists back to numpy, and somehow I misinterpreted the Spyder console about where the error was originating. Anyway it works now, thank you very much for the quick feedback. I guess I'll leave this here in case it's helpful to anyone else.
Edit 3- pix_spot_values is an array of data, so just a random array of floats between 0 and 1 will suffice.
xc=57
yc=189
rebin=15
# fraction pixel offset requiring interpolation
dx=xc*rebin-int(np.floor(xc*rebin)) # positive value between 0 and 1
dy=yc*rebin-int(np.floor(yc*rebin)) # positive value between 0 and 1
# weights for interpolation
w00=(1-dy)*(1-dx)
w10=dy*(1-dx)
w01=(1-dy)*dx
w11=dy*dx
# now the rest of the offset is an integer shift
dx=int(np.floor(xc*rebin))-int(np.floor(xc))*rebin # positive integer between 0 and 14
dy=int(np.floor(yc*rebin))-int(np.floor(yc))*rebin # positive integer between 0 and 14
def new_pix_spot(w00, w10, w01, w11, pix_spot_list, ny_spot, nx_spot, rebin, dy, dx):
#first change numpy array to list
pix_spot_list=pix_spot_values.tolist()
#preallocate array of zeros
resampled_pix_spot_list=[[0 for x in range (ny_spot + rebin)] for y in range(nx_spot+rebin)]
#create 2D lists
pix_prod_bl = [[x*w00 for x in y] for y in pix_spot_list]#bottom left
pix_prod_br = [[x*w10 for x in y] for y in pix_spot_list]#bottom right
pix_prod_tl = [[x*w01 for x in y] for y in pix_spot_list]#top left
pix_prod_tr = [[x*w11 for x in y] for y in pix_spot_list]#top right
for i in range (len(pix_spot_list)):
for j in range (len(pix_spot_list)):
k=dy + i
m=dx + j
n=dy + 1 + i
p=dx + 1 + i
resampled_pix_spot_list[k][m] += pix_prod_bl[i][j] #bottom left
resampled_pix_spot_list[n][m] += pix_prod_br[i][j] #bottom right
resampled_pix_spot_list[k][p] += pix_prod_tl[i][j] #top left
resampled_pix_spot_list[n][p] += pix_prod_tr[i][j] #top right
resampled_pix_spot_values = np.array(resampled_pix_spot_list)
return resampled_pix_spot_values
Inserting and Replacing
To insert values into a list in Python, you must work with the list object (for example, resampled_pix_spot_list[0]) rather than the elements within it (resampled_pix_spot_list[0][0], as you tried).
In both Python 2 and 3, you can insert into a list with your_list.insert(<index>, <element>) (list insertion docs here).
So to insert a number to the left of your chosen coordinate, the code would be:
resampled_pix_spot_list[k].insert(m, pix_prod_bl[i][j])
If you wanted to replace the pixel at that position, you would write:
resampled_pix_spot_list[k][m] = pix_prod_bl[i][j]
(Notice the [k] vs [k][m].) In short: To insert, talk to the list; to replace, talk to the element.
Pitfalls of Repeated Inserts
Just a tip: if you're planning on repeatedly inserting values into specific places in a list, try to iterate from the end of the list, backwards. If you don't, you'll have to adjust your indices, since each .insert() call will shift part of your list to the right.
To see what I mean, let's imagine I have the list [1, 2, 3] and want to end up with [1, 88, 2, 99, 3] via insertions. The order we insert matters. Compare the wrong order (iterating forwards):
data = [1, 2, 3]
>>> data.insert(1, 88)
>>> print(data)
[1, 88, 2, 3] # so far so good
>>> data.insert(2, 99)
>>> print(data)
[1, 88, 99, 2, 3] # oops! the first insert changed my indices, so index "2" was wrong!
with the right order (iterating backwards):
data = [1, 2, 3]
>>> data.insert(2, 99)
>>> print(data)
[1, 2, 99, 3] # so far so good
>>> data.insert(1, 88)
>>> print(data)
[1, 88, 2, 99, 3] # same insertions + different order = different results!
Slices
Some food for thought: Python 2.7 and 3 both allow you to replace whole "slices" of lists with a very clean syntax, which would also help you avoid "off-by-one" errors (slice notation docs here). For example:
>>> data = [1, 2, 3]
>>> data[1:2] = [88, data[1], 99] # safer, shorter, faster, and clearer
>>> print(data)
[1, 88, 2, 99, 3]
Working with slices might be a bit more declarative and clear. Hope this helps!
I have a range of possible values, for example:
possible_values = range(100)
I have a list with unsystematic (but unique) numbers within that range, for example:
somelist = [0, 5, 10, 15, 20, 33, 77, 99]
I want to create a new list of length < len(somelist) including a subset of these values but as equally distributed as possible over the range of possible values. For example:
length_newlist = 2
newlist = some_function(somelist, length_newlist, possible_values)
print(newlist)
Which would then ideally output something like
[33, 77]
So I neither want a random sample nor a sample that chosen from equally spaced integers. I'd like to have a sample based on a distribution (here an uniform distribution) in regard to an interval of possible values.
Is there a function or an easy way to achieve this?
What about the closest values of your subset to certain list's pivots? ie:
def some_function(somelist, length_list, possible_values):
a = min(possible_values)
b = max(possible_values)
chunk_size = (b-a)/(length_list+1)
new_list = []
for i in range(1,length_list+1):
index = a+i*chunk_size
new_list.append(min(somelist, key=lambda x:abs(x-index)))
return new_list
possible_values = range(100)
somelist = [0, 5, 10, 15, 20, 33, 77, 99]
length_newlist = 2
newlist = some_function(somelist, length_newlist, possible_values)
print(newlist)
In any case, I'd also recommend to take a look to numpy's random sampling functions, that could help you as well.
Suppose your range is 0..N-1, and you want a list of K<=N-1 values. Then define an "ideal" list of K values, which would be your desired distribution over this full list (which I am frankly not sure I understand what that would be, but hopefully you do). Finally, take the closest matches to those values from your randomly chosen greater-than-K-length sublist to get your properly distributed K-length random sublist.
I think you should check random.sample(population, k) function. It samples the population in k-length list.
I am pretty new to the Python language and want to know how to the following
(1) y = [some vector]
(2) z = [some other vector]
(3) [ynew,indx] = sort(y)
(4) znew = z(indx)
I can do lines 1,2 and 4 but line 3 is giving me fits. Any suggestions. What I am looking for is not a user written function but something intrinsic to the language itself.
Thanks
using NumPy for line 3, assuming y is a row vector, otherwise axis=0 is needed:
ynew=y.sort(axis=1)
indx=y.argsort(axis=1)
I had the same problem with the following form, and the solution provided didn't work for me. I found a solution that work for me and I thought I could share it here in case anyone has the same problem:
My goal was to sort x in ascending number and move the indices of y in the same manner
x = np.array([00, 44, 22, 33, 11]) # create an array to sort
y = np.array([00, 11, 22, 33, 44]) # another array where you want to move index in the same way you did x
x_sorted = x[x.argsort()] # x sorted in ascending number
y_sorted = y[x.argsort()] # y sorted
The difference is that you don't store the original index position but, in my case, it wasn't a problem since I did that to modify multiple array following one. Thus, by using x.argsort(), it already gives where index have to be moved and, I think, will achieve the same results.
You could try to do something like the following:
import numpy as np
y = [1,3,2]
z = [3,2,1]
indx = [i[0] for i in sorted(enumerate(y), key=lambda x:x[1])]
print(indx)
#convert z to numpy array in order to use np.ix_ function
z = np.asarray(z)
znew = z[np.ix_(indx)]
print(znew)
Results:
#the indx is
[0, 2, 1]
#the znew is
array([3, 1, 2])
i have two 1D numpy arrays. The lengths are unequal. I want to make pairs (array1_elemnt,array2_element) of the elements which are close to each other. Lets consider following example
a = [1,2,3,8,20,23]
b = [1,2,3,5,7,21,35]
The expected result is
[(1,1),
(2,2),
(3,3),
(8,7),
(20,21),
(23,25)]
It is important to note that 5 is left alone. It could easily be done by loops but I have very large arrays. I considered using nearest neighbor. But felt like killing a sparrow with a canon.
Can anybody please suggest any elegant solution.
Thanks a lot.
How about using the Needleman-Wunsch algorithm? :)
The scoring matrix would be trivial, as the "distance" between two numbers is just their difference.
But that will probably feel like killing a sparrow with a tank ...
You could use the built in map function to vectorize a function that does this. For example:
ar1 = np.array([1,2,3,8,20,23])
ar2 = np.array([1,2,3,5,7,21,35])
def closest(ar1, ar2, iter):
x = np.abs(ar1[iter] - ar2)
index = np.where(x==x.min())
value = ar2[index]
return value
def find(x):
return closest(ar1, ar2, x)
c = np.array(map(find, range(ar1.shape[0])))
In the example above, it looked like you wanted to exclude values once they had been paired. In that case, you could include a removal process in the first function like this, but be very careful about how array 1 is sorted:
def closest(ar1, ar2, iter):
x = np.abs(ar1[iter] - ar2)
index = np.where(x==x.min())
value = ar2[index]
ar2[ar2==value] = -10000000
return value
The best method I can think of is use a loop. If loop in python is slow, you can use Cython to speedup you code.
I think one can do it like this:
create two new structured arrays, such that there is a second index which is 0 or 1 indicating to which array the value belongs, i.e. the key
concatenate both arrays
sort the united array along the first field (the values)
use 2 stacks: go through the array putting elements with key 1 on the left stack, and when you cross an element with key 0, put them in the right stack. When you reach the second element with key 0, for the first with key 0 check the top and bottom of the left and right stacks and take the closest value (maybe with a maximum distance), switch stacks and continue.
sort should be slowest step and max total space for the stacks is n or m.
You can do the following:
a = np.array([1,2,3,8,20,23])
b = np.array([1,2,3,5,7,21,25])
def find_closest(a, sorted_b):
j = np.searchsorted(.5*(sorted_b[1:] + sorted_b[:-1]), a, side='right')
return b[j]
b.sort() # or, b = np.sort(b), if you don't want to modify b in-place
print np.c_[a, find_closest(a, b)]
# ->
# array([[ 1, 1],
# [ 2, 2],
# [ 3, 3],
# [ 8, 7],
# [20, 21],
# [23, 25]])
This should be pretty fast. How it works is that searchsorted will find for each number a the index into the b past the midpoint between two numbers, i.e., the closest number.