I have an numpy array that is shape 20, 3. (So 20 3 by 1 arrays. Correct me if I'm wrong, I am still pretty new to python)
I need to separate it into 3 arrays of shape 20,1 where the first array is 20 elements that are the 0th element of each 3 by 1 array. Second array is also 20 elements that are the 1st element of each 3 by 1 array, etc.
I am not sure if I need to write a function for this. Here is what I have tried:
Essentially I'm trying to create an array of 3 20 by 1 arrays that I can later index to get the separate 20 by 1 arrays.
a = np.load() #loads file
num=20 #the num is if I need to change array size
num_2=3
for j in range(0,num):
for l in range(0,num_2):
array_elements = np.zeros(3)
array_elements[l] = a[j:][l]
This gives the following error:
'''
ValueError: setting an array element with a sequence
'''
I have also tried making it a dictionary and making the dictionary values lists that are appended, but it only gives the first or last value of the 20 that I need.
Your array has shape (20, 3), this means it's a 2-dimensional array with 20 rows and 3 columns in each row.
You can access data in this array by indexing using numbers or ':' to indicate ranges. You want to split this in to 3 arrays of shape (20, 1), so one array per column. To do this you can pick the column with numbers and use ':' to mean 'all of the rows'. So, to access the three different columns: a[:, 0], a[:, 1] and a[:, 2].
You can then assign these to separate variables if you wish e.g. arr = a[:, 0] but this is just a reference to the original data in array a. This means any changes in arr will also be made to the corresponding data in a.
If you want to create a new array so this doesn't happen, you can easily use the .copy() function. Now if you set arr = a[:, 0].copy(), arr is completely separate to a and changes made to one will not affect the other.
Essentially you want to group your arrays by their index. There are plenty of ways of doing this. Since numpy does not have a group by method, you have to horizontally split the arrays into a new array and reshape it.
old_length = 3
new_length = 20
a = np.array(np.hsplit(a, old_length)).reshape(old_length, new_length)
Edit: It appears you can achieve the same effect by rotating the array -90 degrees. You can do this by using rot90 and setting k=-1 or k=3 telling numpy to rotate by 90 k times.
a = np.rot90(a, k=-1)
I have 4-dimensional array. I am going to turn it into a 1-dim array. I use numpy ravel and it works fin with the default parameters.
However I would also like the positions/indices in the 4-dim array.
I want something like this as row in my output.
x,y,z,w,value
With x being the first dimension of my initial array and so on.
The obvious approach is iteration, however I was told to avoid it when I can.
for i in range(test.shape[0]):
for j in range(test.shape[1]):
for k in range(test.shape[2]):
for l in range(test.shape[3]):
print(i,j,k,l,test[(i,j,k,l)])
It will be to slow when I use a larger dataset.
Is there a way to configure ravel to do this or any other approach faster than iteration.
Use np.indices with sparse=False, combined with np.concatenate to build the array. np.indices provides the first n columns, and np.concatenate appends the last one:
test = np.random.randint(10, size=(3, 5, 4, 2))
index = np.indices(test.shape, sparse=False) # shape: (4, 3, 5, 4, 2)
data = np.concatenate((index, test[None, ...]), axis=0).reshape(test.ndim + 1, -1).T
A more detailed breakdown:
index is a (4, *test.shape) array, with one element per dimension.
To make test concatenatable with index, you need to prepend a unit dimension, which is what test[None, ...] does. None is synonymous with np.newaxis, and Ellipsis, or ..., means "all the remaining dimensions".
When you concatenate along axis=0, you are appending test to the array of indices. Each element of index along the first axis is now a 5-element array containing the index followed by the value. The remaining axes reflect the shape of test, but besides that, you have what you want.
The goal is to flatten out the trailing dimensions, so you get a (5, N) array, where N = np.prod(test.shape). Thats what the final reshape does. test.ndim + 1 is the size of the index +1 for the value. -1 can appear exactly once in a reshape. It means "product of all the remaining dimensions".
I want to slice a NumPy nxn array. I want to extract an arbitrary selection of m rows and columns of that array (i.e. without any pattern in the numbers of rows/columns), making it a new, mxm array. For this example let us say the array is 4x4 and I want to extract a 2x2 array from it.
Here is our array:
from numpy import *
x = range(16)
x = reshape(x,(4,4))
print x
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
The line and columns to remove are the same. The easiest case is when I want to extract a 2x2 submatrix that is at the beginning or at the end, i.e. :
In [33]: x[0:2,0:2]
Out[33]:
array([[0, 1],
[4, 5]])
In [34]: x[2:,2:]
Out[34]:
array([[10, 11],
[14, 15]])
But what if I need to remove another mixture of rows/columns? What if I need to remove the first and third lines/rows, thus extracting the submatrix [[5,7],[13,15]]? There can be any composition of rows/lines. I read somewhere that I just need to index my array using arrays/lists of indices for both rows and columns, but that doesn't seem to work:
In [35]: x[[1,3],[1,3]]
Out[35]: array([ 5, 15])
I found one way, which is:
In [61]: x[[1,3]][:,[1,3]]
Out[61]:
array([[ 5, 7],
[13, 15]])
First issue with this is that it is hardly readable, although I can live with that. If someone has a better solution, I'd certainly like to hear it.
Other thing is I read on a forum that indexing arrays with arrays forces NumPy to make a copy of the desired array, thus when treating with large arrays this could become a problem. Why is that so / how does this mechanism work?
To answer this question, we have to look at how indexing a multidimensional array works in Numpy. Let's first say you have the array x from your question. The buffer assigned to x will contain 16 ascending integers from 0 to 15. If you access one element, say x[i,j], NumPy has to figure out the memory location of this element relative to the beginning of the buffer. This is done by calculating in effect i*x.shape[1]+j (and multiplying with the size of an int to get an actual memory offset).
If you extract a subarray by basic slicing like y = x[0:2,0:2], the resulting object will share the underlying buffer with x. But what happens if you acces y[i,j]? NumPy can't use i*y.shape[1]+j to calculate the offset into the array, because the data belonging to y is not consecutive in memory.
NumPy solves this problem by introducing strides. When calculating the memory offset for accessing x[i,j], what is actually calculated is i*x.strides[0]+j*x.strides[1] (and this already includes the factor for the size of an int):
x.strides
(16, 4)
When y is extracted like above, NumPy does not create a new buffer, but it does create a new array object referencing the same buffer (otherwise y would just be equal to x.) The new array object will have a different shape then x and maybe a different starting offset into the buffer, but will share the strides with x (in this case at least):
y.shape
(2,2)
y.strides
(16, 4)
This way, computing the memory offset for y[i,j] will yield the correct result.
But what should NumPy do for something like z=x[[1,3]]? The strides mechanism won't allow correct indexing if the original buffer is used for z. NumPy theoretically could add some more sophisticated mechanism than the strides, but this would make element access relatively expensive, somehow defying the whole idea of an array. In addition, a view wouldn't be a really lightweight object anymore.
This is covered in depth in the NumPy documentation on indexing.
Oh, and nearly forgot about your actual question: Here is how to make the indexing with multiple lists work as expected:
x[[[1],[3]],[1,3]]
This is because the index arrays are broadcasted to a common shape.
Of course, for this particular example, you can also make do with basic slicing:
x[1::2, 1::2]
As Sven mentioned, x[[[0],[2]],[1,3]] will give back the 0 and 2 rows that match with the 1 and 3 columns while x[[0,2],[1,3]] will return the values x[0,1] and x[2,3] in an array.
There is a helpful function for doing the first example I gave, numpy.ix_. You can do the same thing as my first example with x[numpy.ix_([0,2],[1,3])]. This can save you from having to enter in all of those extra brackets.
I don't think that x[[1,3]][:,[1,3]] is hardly readable. If you want to be more clear on your intent, you can do:
a[[1,3],:][:,[1,3]]
I am not an expert in slicing but typically, if you try to slice into an array and the values are continuous, you get back a view where the stride value is changed.
e.g. In your inputs 33 and 34, although you get a 2x2 array, the stride is 4. Thus, when you index the next row, the pointer moves to the correct position in memory.
Clearly, this mechanism doesn't carry well into the case of an array of indices. Hence, numpy will have to make the copy. After all, many other matrix math function relies on size, stride and continuous memory allocation.
If you want to skip every other row and every other column, then you can do it with basic slicing:
In [49]: x=np.arange(16).reshape((4,4))
In [50]: x[1:4:2,1:4:2]
Out[50]:
array([[ 5, 7],
[13, 15]])
This returns a view, not a copy of your array.
In [51]: y=x[1:4:2,1:4:2]
In [52]: y[0,0]=100
In [53]: x # <---- Notice x[1,1] has changed
Out[53]:
array([[ 0, 1, 2, 3],
[ 4, 100, 6, 7],
[ 8, 9, 10, 11],
[ 12, 13, 14, 15]])
while z=x[(1,3),:][:,(1,3)] uses advanced indexing and thus returns a copy:
In [58]: x=np.arange(16).reshape((4,4))
In [59]: z=x[(1,3),:][:,(1,3)]
In [60]: z
Out[60]:
array([[ 5, 7],
[13, 15]])
In [61]: z[0,0]=0
Note that x is unchanged:
In [62]: x
Out[62]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
If you wish to select arbitrary rows and columns, then you can't use basic slicing. You'll have to use advanced indexing, using something like x[rows,:][:,columns], where rows and columns are sequences. This of course is going to give you a copy, not a view, of your original array. This is as one should expect, since a numpy array uses contiguous memory (with constant strides), and there would be no way to generate a view with arbitrary rows and columns (since that would require non-constant strides).
With numpy, you can pass a slice for each component of the index - so, your x[0:2,0:2] example above works.
If you just want to evenly skip columns or rows, you can pass slices with three components
(i.e. start, stop, step).
Again, for your example above:
>>> x[1:4:2, 1:4:2]
array([[ 5, 7],
[13, 15]])
Which is basically: slice in the first dimension, with start at index 1, stop when index is equal or greater than 4, and add 2 to the index in each pass. The same for the second dimension. Again: this only works for constant steps.
The syntax you got to do something quite different internally - what x[[1,3]][:,[1,3]] actually does is create a new array including only rows 1 and 3 from the original array (done with the x[[1,3]] part), and then re-slice that - creating a third array - including only columns 1 and 3 of the previous array.
I have a similar question here: Writting in sub-ndarray of a ndarray in the most pythonian way. Python 2
.
Following the solution of previous post for your case the solution looks like:
columns_to_keep = [1,3]
rows_to_keep = [1,3]
An using ix_:
x[np.ix_(rows_to_keep, columns_to_keep)]
Which is:
array([[ 5, 7],
[13, 15]])
I'm not sure how efficient this is but you can use range() to slice in both axis
x=np.arange(16).reshape((4,4))
x[range(1,3), :][:,range(1,3)]
Right, perhaps I should be using the normal Python lists for this, but here goes:
I want a 9 by 4 multidimensional array/matrix (whatever really) that I want to store arrays in. These arrays will be 1-dimensional and of length 4096.
So, I want to be able to go something like
column = 0 #column to insert into
row = 7 #row to insert into
storageMatrix[column,row][0] = NEW_VALUE
storageMatrix[column,row][4092] = NEW_VALUE_2
etc..
I appreciate I could be doing something a bit silly/unnecessary here, but it will make it ALOT easier for me to have it structured like this in my code (as there's alot of these, and alot of analysis to be done later).
Thanks!
Note that to leverage the full power of numpy, you'd be much better off with a 3-dimensional numpy array. Breaking apart the 3-d array into a 2-d array with 1-d values
may complicate your code and force you to use loops instead of built-in numpy functions.
It may be worth investing the time to refactor your code to use the superior 3-d numpy arrays.
However, if that's not an option, then:
import numpy as np
storageMatrix=np.empty((4,9),dtype='object')
By setting the dtype to 'object', we are telling numpy to allow each element of storageMatrix to be an arbitrary Python object.
Now you must initialize each element of the numpy array to be an 1-d numpy array:
storageMatrix[column,row]=np.arange(4096)
And then you can access the array elements like this:
storageMatrix[column,row][0] = 1
storageMatrix[column,row][4092] = 2
The Tentative NumPy Tutorial says you can declare a 2D array using the comma operator:
x = ones( (3,4) )
and index into a 2D array like this:
>>> x[1,2] = 20
>>> x[1,:] # x's second row
array([ 1, 1, 20, 1])
>>> x[0] = a # change first row of x
>>> x
array([[10, 20, -7, -3],
[ 1, 1, 20, 1],
[ 1, 1, 1, 1]])