I'm having a problem with np.append.
I'm trying to duplicate the last column of 20x361 matrix n_list_converted by using the code below:
n_last = []
n_last = n_list_converted[:, -1]
n_lists = np.append(n_list_converted, n_last, axis=1)
But I get error:
ValueError: all the input arrays must have same number of dimensions
However, I've checked the matrix dimensions by doing
print(n_last.shape, type(n_last), n_list_converted.shape, type(n_list_converted))
and I get
(20L,) (20L, 361L)
so the dimensions match? Where is the mistake?
If I start with a 3x4 array, and concatenate a 3x1 array, with axis 1, I get a 3x5 array:
In [911]: x = np.arange(12).reshape(3,4)
In [912]: np.concatenate([x,x[:,-1:]], axis=1)
Out[912]:
array([[ 0, 1, 2, 3, 3],
[ 4, 5, 6, 7, 7],
[ 8, 9, 10, 11, 11]])
In [913]: x.shape,x[:,-1:].shape
Out[913]: ((3, 4), (3, 1))
Note that both inputs to concatenate have 2 dimensions.
Omit the :, and x[:,-1] is (3,) shape - it is 1d, and hence the error:
In [914]: np.concatenate([x,x[:,-1]], axis=1)
...
ValueError: all the input arrays must have same number of dimensions
The code for np.append is (in this case where axis is specified)
return concatenate((arr, values), axis=axis)
So with a slight change of syntax append works. Instead of a list it takes 2 arguments. It imitates the list append is syntax, but should not be confused with that list method.
In [916]: np.append(x, x[:,-1:], axis=1)
Out[916]:
array([[ 0, 1, 2, 3, 3],
[ 4, 5, 6, 7, 7],
[ 8, 9, 10, 11, 11]])
np.hstack first makes sure all inputs are atleast_1d, and then does concatenate:
return np.concatenate([np.atleast_1d(a) for a in arrs], 1)
So it requires the same x[:,-1:] input. Essentially the same action.
np.column_stack also does a concatenate on axis 1. But first it passes 1d inputs through
array(arr, copy=False, subok=True, ndmin=2).T
This is a general way of turning that (3,) array into a (3,1) array.
In [922]: np.array(x[:,-1], copy=False, subok=True, ndmin=2).T
Out[922]:
array([[ 3],
[ 7],
[11]])
In [923]: np.column_stack([x,x[:,-1]])
Out[923]:
array([[ 0, 1, 2, 3, 3],
[ 4, 5, 6, 7, 7],
[ 8, 9, 10, 11, 11]])
All these 'stacks' can be convenient, but in the long run, it's important to understand dimensions and the base np.concatenate. Also know how to look up the code for functions like this. I use the ipython ?? magic a lot.
And in time tests, the np.concatenate is noticeably faster - with a small array like this the extra layers of function calls makes a big time difference.
(n,) and (n,1) are not the same shape. Try casting the vector to an array by using the [:, None] notation:
n_lists = np.append(n_list_converted, n_last[:, None], axis=1)
Alternatively, when extracting n_last you can use
n_last = n_list_converted[:, -1:]
to get a (20, 1) array.
The reason why you get your error is because a "1 by n" matrix is different from an array of length n.
I recommend using hstack() and vstack() instead.
Like this:
import numpy as np
a = np.arange(32).reshape(4,8) # 4 rows 8 columns matrix.
b = a[:,-1:] # last column of that matrix.
result = np.hstack((a,b)) # stack them horizontally like this:
#array([[ 0, 1, 2, 3, 4, 5, 6, 7, 7],
# [ 8, 9, 10, 11, 12, 13, 14, 15, 15],
# [16, 17, 18, 19, 20, 21, 22, 23, 23],
# [24, 25, 26, 27, 28, 29, 30, 31, 31]])
Notice the repeated "7, 15, 23, 31" column.
Also, notice that I used a[:,-1:] instead of a[:,-1]. My version generates a column:
array([[7],
[15],
[23],
[31]])
Instead of a row array([7,15,23,31])
Edit: append() is much slower. Read this answer.
You can also cast (n,) to (n,1) by enclosing within brackets [ ].
e.g. Instead of np.append(b,a,axis=0) use np.append(b,[a],axis=0)
a=[1,2]
b=[[5,6],[7,8]]
np.append(b,[a],axis=0)
returns
array([[5, 6],
[7, 8],
[1, 2]])
I normally use np.row_stack((ndarray_1, ndarray_2, ..., ndarray_nth))
Assuming your ndarrays are indeed the same shape, this should work for you
n_last = []
n_last = n_list_converted[:, -1]
n_lists = np.row_stack((n_list_converted, n_last))
Related
I have a 2D tensor in Pytorch that I would like to slice:
x = torch.rand((3, 5))
In this example, the tensor has 3 rows and I want to slice x, creating a new tensor y that also has 3 rows and num_col cols.
What's challenging for me is that I want to slice different columns per row. All I have is x, num_cols, and idx, which is a tensor holding the start index from where to slice.
Example:
What I have is num_cols=2, idx=[1,2,3] and
x=torch.arange(15).reshape((3,-1)) =
tensor([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
What I want is
y=
tensor([[ 1, 2],
[ 7, 8],
[13, 14]])
What's the "torch"-way of doing this? I know, I can slice if I get a boolean mask somehow, but I don't know how to construct that with idx and num_cols without normal Python loops.
You could use fancy indexing together with broadcasting. Another solution might be to use torch.gather which is similar to numpy's take_along_axis. Your idx array would need to be extended with the extra column:
x = torch.arange(15).reshape(3,-1)
idx = torch.tensor([1,2,3])
idx = torch.column_stack([idx, idx+1])
torch.gather(x, 1, idx)
output:
tensor([[ 1, 2],
[ 7, 8],
[13, 14]])
These are two outputs in a chunk of code after I apply the call .shape to a variable b before and after applying the call np.expand_dim(b, axis=1).
I see that the _dim part may seem like a dead giveaway, but the outputs don't seem to be different, except for, perhaps turning a row vector into a column vector (?):
b is [208. 193. 208. ... 46. 93. 200.] a row vector, but np.expand_dim(b, axis=1) gives:
[[208.]
[193.]
[208.]
...
[ 46.]
[ 93.]
[200.]]
Which could be interpreted as a column vector (?), as opposed to any increased number of dimensions.
What is the difference between (13027,) and (13027,1)
They are arrays of different dimensions and some operations apply to them differently. For example
>>> a = np.arange(5)
>>> b = np.arange(5, 10)
>>> a + b
array([ 5, 7, 9, 11, 13])
>>> np.expand_dims(a, axis=1) + b
array([[ 5, 6, 7, 8, 9],
[ 6, 7, 8, 9, 10],
[ 7, 8, 9, 10, 11],
[ 8, 9, 10, 11, 12],
[ 9, 10, 11, 12, 13]])
The last result is what we call broadcasting, for which you can read in the numpy docs, or even this SO question.
Basically np.expand_dims adds new axes at the specified dimensions and all the following achieve the same result
>>> a.shape
(5,)
>>> np.expand_dims(a, axis=(0, 2)).shape
(1, 5, 1)
>>> a[None,:,None].shape
(1, 5, 1)
>>> a[np.newaxis,:,np.newaxis].shape
(1, 5, 1)
Note that in numpy the transpose of a 1D array is still a 1D array. It isn't like in MATLAB where a row vector turns to a column vector.
>>> a
array([0, 1, 2, 3, 4])
>>> a.T
array([0, 1, 2, 3, 4])
>>> a.T.shape
(5,)
So in order to turn it to a "column vector" you have to turn the array from shape (N,) to (N, 1) with broadcasting (or reshaping). But you're better off treating it as a 2D array of N rows with 1 element per row.
(13027,) is treating the x axis as 0, while (13027,1) is treating the x axis as 1.
https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html
It's like "i" where i = 0 by default so if you don't explicitly define it, it will start at 0.
Suppose I have a 2D NumPy array values. I want to add new column to it. New column should be values[:, 19] but lagged by one sample (first element equals to zero). It could be returned as np.append([0], values[0:-2:1, 19]). I tried: Numpy concatenate 2D arrays with 1D array
temp = np.append([0], [values[1:-2:1, 19]])
values = np.append(dataset.values, temp[:, None], axis=1)
but I get:
ValueError: all the input array dimensions except for the concatenation axis
must match exactly
I tried using c_ too as:
temp = np.append([0], [values[1:-2:1, 19]])
values = np.c_[values, temp]
but effect is the same. How this concatenation could be made. I think problem is in temp orientation - it is treated as a row instead of column, so there is an issue with dimensions. In Octave ' (transpose operator) would do the trick. Maybe there is similiar solution in NumPy?
Anyway, thank you for you time.
Best regards,
Max
In [76]: values = np.arange(16).reshape(4,4)
In [77]: temp = np.concatenate(([0], values[1:,-1]))
In [78]: values
Out[78]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [79]: temp
Out[79]: array([ 0, 7, 11, 15])
This use of concatenate to make temp is similar to your use of append (which actually uses concatenate).
Sounds like you want to join values and temp in this way:
In [80]: np.concatenate((values, temp[:,None]),axis=1)
Out[80]:
array([[ 0, 1, 2, 3, 0],
[ 4, 5, 6, 7, 7],
[ 8, 9, 10, 11, 11],
[12, 13, 14, 15, 15]])
Again I prefer using concatenate directly.
You need to convert the 1D array to 2D as shown. You can then use vstack or hstack with reshaping to get the final array you want as shown:
a = np.array([[1, 2, 3],[4, 5, 6]])
b = np.array([[7, 8, 9]])
c = np.vstack([ele for ele in [a, b]])
print(c)
c = np.hstack([a.reshape(1,-1) for a in [a,b]]).reshape(-1,3)
print(c)
Either way, the output is:
[[1 2 3] [4 5 6] [7 8 9]]
Hope I understood the question correctly
I want to create 5*3 array like below without typing it explicitly.
[[1, 6, 11],
[2, 7, 12],
[3, 8, 13],
[4, 9, 14],
[5, 10, 15]]
I used write following codes.
np.arange(1, 16).T.reshape((5,3))
but it shows
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])
How can I order numbers in ascending order so that it becomes the first array?
That's what you are looking for:
np.arange(1, 16).reshape((3,5)).T
In fact, in order:
np.arange(1,16) will return evenly spaced values within the interval 1 to 6 (default step size is 1) [http://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html ];
.reshape((3,5)) is giving new shape to the previously formed array [http://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html ]. The new array will have 3 rows and 5 columns;
.T will transpose the previously reshaped array [http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.T.html ]
For completeness, it is worth noticing that there is no need to transpose the array as suggested in the currently accepted answer. You just need to invoke numpy.reshape with the following arguments:
(5, 3), which corresponds to the positional parameter newshape, i.e. the shape of the array you wish to create.
order='F'. The default value is 'C'.
Here is an excerpt from the docs on the order optional parameter:
āCā means to read / write the elements using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest. āFā means to read / write the elements using Fortran-like index order, with the first index changing fastest, and the last index changing slowest.
By doing so, the numbers are arranged column-wise:
In [45]: np.arange(1, 16).reshape((5, 3), order='F')
Out[45]:
array([[ 1, 6, 11],
[ 2, 7, 12],
[ 3, 8, 13],
[ 4, 9, 14],
[ 5, 10, 15]])
I am using Numpy to manipulate some very strange tabular data. The data entries always come in columns of 1200 entries each.
However, the number of rows always varies. Sometimes the tables I import have 12 rows (i.e. a numpy ndarray.shape = (12, 1200), with 1200 times 12 total entries, i.e. 1200*12 = 14400.) Sometimes the tables have 6 rows (shape = (6, 1200)), and so forth. There's no pattern here.
The number of columns is consistently 1200, but the number of rows always varies. I have no prior knowledge about how many rows, so I cannot write some sort of mathematical formula.
I would like to use numpy.concatenate to take each array I am given into a one-dimensional ndarray. (For our example above, that would be shape = (1, 14400). )
So far, for each individual array, I have to individually break it up into N arrays (N = unknown number of rows) and then individually concatenate them.
Or, in order to write a for statement, I have to find the number of rows, and manually set the for statement for each array.
Any ideas for a better method? This takes forever.
EDIT: Sorry, mixing together "rows" and "columns". I have re-typed the post above to reflect this. Yes, the arrays are consistently of the shape (n, 1200). So, the format is(rows, columns)` and the columns are consistently 1200.
FURTHER QUESTION: My question about numpy.reshape is whether the order of the data is changed. So, for an array with 6 rows, shape (6, 1200), will numpy.reshape() return an array shape (1, 72000) such that the original order is preserved? That is,
newarray = array([row 1, row 2, row 3, row 4, row 5, row 6])
?
A couple of ways to address the type of questions you are asking about are:
import numpy as np
x = np.ones((6, 12000))
a = np.reshape(x, (1, -1))
b = np.concatenate([x[i,:] for i in range(x.shape[0])])
print x.shape # (6, 12000)
print a.shape # (1, 72000)
print b.shape # (72000,)
The advantage of reshape is that it doesn't copy the data, so it's fast, but since it's just a new view on the old data, changes to a will also change x. Of course, you could also just copy the reshaped array to get separate data.
concatenate here will make a copy, but note that the items copied are again just views onto the original x, so there's only one copy per element. Making the concatenated array have shape (1, 72000) seems a bit contrived to me so I didn't do it, but it's certainly possible if that's what you really want.
Below is an example for understanding how the ordering works in reshape:
x2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
c = np.reshape(x2, (1, -1))
print x2
# [[1 2 3]
# [4 5 6]
# [7 8 9]]
print c
# [[1 2 3 4 5 6 7 8 9]]
So you have several arrays with shape (n,1200)
Make some simpler samples. It will be easier to see what is going on.
a = np.arange(12).reshape(2,6)
#array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
Notice how the numbers increase
b = np.arange(18).reshape(3,6)
c = np.concatenate([a,b], axis=0)
producing
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
Since it is just the 1st dimensions that varies, it has no problem concatenating along this dimension. np.vstack does the same thing.
How about joining the arrays after flattening:
np.concatenate([a.flatten(),b.flatten()])
# array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17])
You'd get the same thing with c.flatten(). (flatten, ravel, reshape all do essentially the same thing.)
np.concatenate(c,axis=0)
np.concatenate([c[0,:],c[1,:],c[2,:]...],axis=0)
concatenate can also be used to flatten an array, but this isn't the usual method. It is, in effect, the same as splitting it by rows and joining those. Note that np.vstack(c) is not the same thing.