Let's say I have a 2D numpy array 'x' and I want to create a new array 'y' with only certain columns from x.
Is there an easy solution for this?
I was trying to write a function that iterates through each column of an array, and then appends every 3rd column to a new array.
def grab_features(x, starting = 0, every = 3, rowlength = 16):
import numpy as np
import pandas as pd
y = np.empty([rowlength,1])
for i in range(starting, np.size(x, 1), every):
y = np.append(y, np.reshape(x[:, i], (rowlength, 1)), axis=0)
return y
I didn't get any errors, but instead the function returned a long 1 dimensional array of float numbers. I was hoping for an array of the same type as x, just with 1/3 of the columns.
You can use the slice syntax of i:j:k where i is the starting index, j is the stopping index, and k is the step size
import numpy as np
array = np.array([[1,2],
[3,4],
[5,6],
[7,8],
[9,10],
[11,12]])
print(array[::3])
[[1 2]
[7 8]]
Related
I am trying to loop through a set of coordinates and 'stacking' these arrays of coordinates to another array (so in essence I want to have an array of arrays) using numpy.
This is my attempt:
import numpy as np
all_coordinates = np.array([[]])
for y in range(2):
for x in range(2):
coordinate = np.array([[x,y]])
# append
all_coordinates = np.append(all_coordinates,[coordinate])
print(all_coordinates)
But it's not working. It's just concatenating the individual numbers and not appending the array.
Instead of giving me (the output that I want to achieve):
[[0 0] [1 0] [0,1] [1,1]]
The output I get instead is:
[0 0 1 0 0 1 1 1]
Why? What I am doing wrong here?
The problem that stack functions don't work, is that they need that the row added is of the same size of the already present rows. Using np.array([[]]), the first row is has a length of zero, which means that you can only add rows that also have length zero.
In order to solve this, we need to tell Numpy that the first row is of size two and not zero. The array thus needs to be of size (0, 2) and not (0, 0). This can be done using one of the array-initializing functions that accept size arguments, like empty, zeros or ones. Which function does not matter, as there are no spaces to fill.
Then you can use one of the functions mentioned in comments, like vstack or stack. The code thus becomes:
import numpy as np
all_coordinates = np.zeros((0, 2))
for y in range(2):
for x in range(2):
coordinate = np.array([[x,y]])
# append
all_coordinates = np.vstack((all_coordinates, coordinate))
print(all_coordinates)
In such a case, I would use a list and only convert it into an array once you have appended all the elements you want.
here is a suggested improvement
import numpy as np
all_coordinates = []
for y in range(2):
for x in range(2):
coordinate = np.array([x,y])
# append
all_coordinates.append(coordinate)
all_coordinates = np.array(all_coordinates)
print(all_coordinates)
The output of this code is indeed
array([[0, 0],
[1, 0],
[0, 1],
[1, 1]])
I have written this piece of code:
data = np.array([[3,6], [5,9], [4, 8]])
orig_x, orig_y = np.split(data, 2, axis=1)
x = np.array([3, 4])
y = np.zeros((len(x)))
for i in range(len(x)):
y[i] = orig_y[np.where(orig_x == x[i])[0]]
So basically, I have a 2D NumPy array. I split it into two 1D arrays orig_x and orig_y, one storing values of the x-axis and the other values of the y-axis.
I also have another 1D NumPy array, which has some of the values that exist in the orig_x array. I want to find the y-axis values for each value in the x array. I created this method, using a simple loop, but it is extremely slow since I'm using it with thousands of values.
Do you have a better idea? Maybe by using a NumPy function?
Note: Also a better title for this question can be made. Sorry :(
You could create a mask over which values you want from the x column and then use this mask to select values from the y column.
data = np.array([[3,6], [5,9], [4, 8]])
# the values you want to lookup on the x-axis
x = np.array([3, 4])
mask = np.isin(data[:,0], x)
data[mask,1]
Output:
array([6, 8])
The key function here is to use np.isin. What this is basically doing is broadcasting x or data to the appropriate shape and doing an element-wise comparison:
mask = data[:,0,None] == x
y_mask = np.logical_or.reduce(mask, axis=1)
data[y_mask, 1]
Output:
array([6, 8])
I'm not 100% sure I understood the problem correctly, but I think the following should work:
>>> rows, cols = np.where(orig_x == x)
>>> y = orig_y[rows[np.argsort(cols)]].ravel()
>>> y
array([6, 8])
It assumes that all the values in orig_x are unique, but since your code example has the same restriction, I considered it a given.
What about a lookup table?
import numpy as np
data = np.array([[3,6], [5,9], [4, 8]])
orig_x, orig_y = np.split(data, 2, axis=1)
x = np.array([3, 4])
y = np.zeros((len(x)))
You can pack a dict for lookup:
lookup = {i: j for i, j in zip(orig_x.ravel(), orig_y.ravel())}
And just map this into a new array:
np.fromiter(map(lambda i: lookup.get(i, np.nan), x), dtype=int, count=len(x))
array([6, 8])
If orig_x & orig_y are your smaller data structures this will probably be most efficient.
EDIT - It's occurred to me that if your values are integers the default np.nan won't work and you should figure out what value makes sense for your application if you're trying to find a value that isn't in your orig_x array.
Slice a 3d numpy array using a 1d lookup between indices
import numpy as np
a = np.arange(12).reshape(2, 3, 2)
b = np.array([2, 0])
b maps i to j where i and j are the first 2 indexes of a, so a[i,j,k]
Desired result after applying b to a is:
[[4 5]
[6 7]]
Naive solution:
c = np.empty(shape=(2, 2), dtype=int)
for i in range(2):
j = b[i]
c[i, :] = a[i, j, :]
Question: Is there a way to do this using a numpy or scipy routine or routines or fancy indexing?
Application: Reinforcement Learning finite MDPs where b is a deterministic policy vector pi(a|s), a is the state transition probabilities p(s'|s,a) and c is the state transition matrix for that policy vector p(s'|s). The arrays will be large and this operation will be repeated a large number of times so needs to be scaleable and fast.
What I have tried:
Compiling using numba but line profiler suggests my code is slower compared to a similarly sized numpy routine. Also numpy is more widely understood and used.
Maintaining pi(a|s) as a sparse matrix (all zero except one 1 per row) b_as_a_matrix and then using einsum but this involves storing and updating the matrix and creates more work (an extra loop over j and sum operation).
c = np.einsum('ij,ijk->ik', b_as_a_matrix, a)
Numpy arrays can be indexed using other arrays as indices. See also: NumPy selecting specific column index per row by using a list of indexes.
With that in mind, we can vectorize your loop to simply use b for indexing:
>>> import numpy as np
>>> a = np.arange(12).reshape(2, 3, 2)
>>> b = np.array([2, 0])
>>> i = np.arange(len(b))
>>> i
array([0, 1])
>>> a[i, b, :]
array([[4, 5],
[6, 7]])
I need to carry out some operation on a subset of an NxN array. I have the center of the sub-array, x and y, and its size.
So I can easily do:
subset = data[y-size:y+size,x-size:x+size]
And this is fine.
What I ask is if there is the possibility to do the same without writing an explicit loop if x and y are both 1D arrays of positions.
Thanks!
Using a simple example of a 5x5 array and setting size=1 we can get:
import numpy as np
data = np.arange(25).reshape((5,5))
size = 1
x = np.array([1,4])
y = np.array([1,4])
subsets = [data[j-size:j+size,i-size:i+size] for i in x for j in y]
print(subsets)
Which returns a list of numpy arrays:
[array([[0, 1],[5, 6]]),
array([[15, 16],[20, 21]]),
array([[3, 4],[8, 9]]),
array([[18, 19],[23, 24]])]
Which I hope is what you are looking for.
To get the list of subset assuming you have the list of positions xList and yList, this will do the tric:
subsetList = [ data[y-size:y+size,x-size:x+size] for x,y in zip(xList,yList) ]
I have a 2 dimensional NumPy array. I know how to get the maximum values over axes:
>>> a = array([[1,2,3],[4,3,1]])
>>> amax(a,axis=0)
array([4, 3, 3])
How can I get the indices of the maximum elements? I would like as output array([1,1,0]) instead.
>>> a.argmax(axis=0)
array([1, 1, 0])
>>> import numpy as np
>>> a = np.array([[1,2,3],[4,3,1]])
>>> i,j = np.unravel_index(a.argmax(), a.shape)
>>> a[i,j]
4
argmax() will only return the first occurrence for each row.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html
If you ever need to do this for a shaped array, this works better than unravel:
import numpy as np
a = np.array([[1,2,3], [4,3,1]]) # Can be of any shape
indices = np.where(a == a.max())
You can also change your conditions:
indices = np.where(a >= 1.5)
The above gives you results in the form that you asked for. Alternatively, you can convert to a list of x,y coordinates by:
x_y_coords = zip(indices[0], indices[1])
There is argmin() and argmax() provided by numpy that returns the index of the min and max of a numpy array respectively.
Say e.g for 1-D array you'll do something like this
import numpy as np
a = np.array([50,1,0,2])
print(a.argmax()) # returns 0
print(a.argmin()) # returns 2
And similarly for multi-dimensional array
import numpy as np
a = np.array([[0,2,3],[4,30,1]])
print(a.argmax()) # returns 4
print(a.argmin()) # returns 0
Note that these will only return the index of the first occurrence.
v = alli.max()
index = alli.argmax()
x, y = index/8, index%8