Concatenate NumPy 2D array with column (1D array) - python

Suppose I have a 2D NumPy array values. I want to add new column to it. New column should be values[:, 19] but lagged by one sample (first element equals to zero). It could be returned as np.append([0], values[0:-2:1, 19]). I tried: Numpy concatenate 2D arrays with 1D array
temp = np.append([0], [values[1:-2:1, 19]])
values = np.append(dataset.values, temp[:, None], axis=1)
but I get:
ValueError: all the input array dimensions except for the concatenation axis
must match exactly
I tried using c_ too as:
temp = np.append([0], [values[1:-2:1, 19]])
values = np.c_[values, temp]
but effect is the same. How this concatenation could be made. I think problem is in temp orientation - it is treated as a row instead of column, so there is an issue with dimensions. In Octave ' (transpose operator) would do the trick. Maybe there is similiar solution in NumPy?
Anyway, thank you for you time.
Best regards,
Max

In [76]: values = np.arange(16).reshape(4,4)
In [77]: temp = np.concatenate(([0], values[1:,-1]))
In [78]: values
Out[78]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [79]: temp
Out[79]: array([ 0, 7, 11, 15])
This use of concatenate to make temp is similar to your use of append (which actually uses concatenate).
Sounds like you want to join values and temp in this way:
In [80]: np.concatenate((values, temp[:,None]),axis=1)
Out[80]:
array([[ 0, 1, 2, 3, 0],
[ 4, 5, 6, 7, 7],
[ 8, 9, 10, 11, 11],
[12, 13, 14, 15, 15]])
Again I prefer using concatenate directly.

You need to convert the 1D array to 2D as shown. You can then use vstack or hstack with reshaping to get the final array you want as shown:
a = np.array([[1, 2, 3],[4, 5, 6]])
b = np.array([[7, 8, 9]])
c = np.vstack([ele for ele in [a, b]])
print(c)
c = np.hstack([a.reshape(1,-1) for a in [a,b]]).reshape(-1,3)
print(c)
Either way, the output is:
[[1 2 3] [4 5 6] [7 8 9]]
Hope I understood the question correctly

Related

How to slice 2D Torch tensor individually per row?

I have a 2D tensor in Pytorch that I would like to slice:
x = torch.rand((3, 5))
In this example, the tensor has 3 rows and I want to slice x, creating a new tensor y that also has 3 rows and num_col cols.
What's challenging for me is that I want to slice different columns per row. All I have is x, num_cols, and idx, which is a tensor holding the start index from where to slice.
Example:
What I have is num_cols=2, idx=[1,2,3] and
x=torch.arange(15).reshape((3,-1)) =
tensor([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
What I want is
y=
tensor([[ 1, 2],
[ 7, 8],
[13, 14]])
What's the "torch"-way of doing this? I know, I can slice if I get a boolean mask somehow, but I don't know how to construct that with idx and num_cols without normal Python loops.
You could use fancy indexing together with broadcasting. Another solution might be to use torch.gather which is similar to numpy's take_along_axis. Your idx array would need to be extended with the extra column:
x = torch.arange(15).reshape(3,-1)
idx = torch.tensor([1,2,3])
idx = torch.column_stack([idx, idx+1])
torch.gather(x, 1, idx)
output:
tensor([[ 1, 2],
[ 7, 8],
[13, 14]])

indexing rows and columns in numpy

a = np.array(list(range(16).reshape((4,4))
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Say I want the middle square. It'd seem reasonable to do this:
a[[1,2],[1,2]]
but I get this:
array([5, 10])
This works, but seems inelegant:
a[[1,2],:][:,[1,2]]
array([[5, 6],
[9, 10]])
So my questions are:
Why is it this way? What premises are required to make the implemented way sensible?
Is there a canonical way to select along more than one index at once?
I think you can read more details on advanced indexing. Basically, when you slice the array by lists/arrays, the arrays will be broadcast and iterate together.
In your case, you can do:
idx = np.array([1,3])
a[idx,idx[:,None]]
Or as in the doc above:
a[np.ix_(idx, idx)]
Output:
array([[ 5, 13],
[ 7, 15]])
You can do both slicing operations at once instead of creating a view and indexing that again:
import numpy as np
a = np.arange(16).reshape((4, 4))
# preferred if possible
print(a[1:3, 1:3])
# [[ 5 6]
# [ 9 10]]
# otherwise add a second dimension to the first index to make it broadcastable
index1 = np.asarray([1, 2])
index2 = np.asarray([1, 2])
print(a[index1[:, None], index2])
# [[ 5 6]
# [ 9 10]]
You could use multiple np.take to select indices from multiple axes
a = np.arange(16).reshape((4, 4))
idx = np.array([1,2])
np.take(np.take(a, idx, axis=1), idx, axis=0)
Or (slightly more readable)
a.take(idx, axis=1).take(idx, axis=0)
Output:
array([[ 5, 6],
[ 9, 10]])
np.take also allows you to conveniently wrap around out-of-bound indices and such.

How to split numpy array vertically from any column index

I want to split a numpy array into two subarrays where the splitting point is based on a column id, i.e., vertical split. For instance, if I generate a numpy array of shape [10,16] and I want to create two subarrays by splitting it from the column's index 11, then I should get one subarray of size [10,10] and the other one is from [10,15]. Therefore, I am following numpy.hsplit here but it seems it only does an even split (the subarrays need to be equal). I want to be able to:
Split any numpy array vertically, no matter what is the size of subarrays.
Extract both subarrays.
To simulate my request, the following is my code:
import numpy as np
C = [[1,2,3,4],[5,6,7,8],[9,10,11,12], [13,14,15,16]]
C = np.asarray(C)
C = np.hsplit(C, 3)
print(C)
As you can see, np.hsplit(C, 3) doesn't work unless the splitting generates similar subarrays. Even if I did np.hsplit(C, 2), I don't know how to extract both subarrays into separate numpy arrays.
To achieve my goals, how can I modify this code?
Use the array indexing.
C[:,:3] # All rows , columns 0 to 2
Out[29]:
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11],
[13, 14, 15]])
C[:,3:] # All rows column 3 (to end in this case also 3).
Out[30]:
array([[ 4],
[ 8],
[12],
[16]])
You need to specify the indices as list:
import numpy as np
C = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]
C = np.asarray(C)
C = np.hsplit(C, [3])
print(C)
Output
[array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11],
[13, 14, 15]]), array([[ 4],
[ 8],
[12],
[16]])]

Remove/delete each minimum value in each row of a NxM matrix?

Is it possible to remove/delete each minimum value in each row of a NxM matrix
creating a new matrix?
I´ve tried this so far without any luck:
for n in range(0,len(matrix_name)):
Ma = grades.remove(np.min(matrix_name[n,:]))
and this too:
for n in range(0,len(matrix_name)):
Ma = np.delete(matrix_name,np.min(matrix_name[n,:]))
If duplicates are not an issue or if deleting only one of them per row is acceptable:
m, n = a.shape
np.where(np.arange(n-1) < a.argmin(axis=1)[:, None], a[:, :-1], a[:, 1:])
If reconstructing the desired result from the original array instead of modifying the original array by deleting the min values, is allowed then this approach should do the job:
# some test array
In [19]: arr
Out[19]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
In [20]: r, c = arr.shape
# find the minimum along axis 1 (i.e. along rows)
In [21]: min_vals = np.min(arr, axis=1, keepdims=True)
# reshape the result to 2D array
In [22]: (arr[np.where(arr != min_vals)]).reshape(r, c-1)
Out[22]:
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11],
[13, 14, 15],
[17, 18, 19]])
Note: This approach assumes that there's only one minimum value in each row.

Numpy array slicing using colons

I am trying to learn numpy array slicing.
But this is a syntax i cannot seem to understand.
What does
a[:1] do.
I ran it in python.
a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16])
a = a.reshape(2,2,2,2)
a[:1]
Output:
array([[[ 5, 6],
[ 7, 8]],
[[13, 14],
[15, 16]]])
Can someone explain to me the slicing and how it works. The documentation doesn't seem to answer this question.
Another question would be would there be a way to generate the a array using something like
np.array(1:16) or something like in python where
x = [x for x in range(16)]
The commas in slicing are to separate the various dimensions you may have. In your first example you are reshaping the data to have 4 dimensions each of length 2. This may be a little difficult to visualize so if you start with a 2D structure it might make more sense:
>>> a = np.arange(16).reshape((4, 4))
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>>> a[0] # access the first "row" of data
array([0, 1, 2, 3])
>>> a[0, 2] # access the 3rd column (index 2) in the first row of the data
2
If you want to access multiple values using slicing you can use the colon to express a range:
>>> a[:, 1] # get the entire 2nd (index 1) column
array([[1, 5, 9, 13]])
>>> a[1:3, -1] # get the second and third elements from the last column
array([ 7, 11])
>>> a[1:3, 1:3] # get the data in the second and third rows and columns
array([[ 5, 6],
[ 9, 10]])
You can do steps too:
>>> a[::2, ::2] # get every other element (column-wise and row-wise)
array([[ 0, 2],
[ 8, 10]])
Hope that helps. Once that makes more sense you can look in to stuff like adding dimensions by using None or np.newaxis or using the ... ellipsis:
>>> a[:, None].shape
(4, 1, 4)
You can find more here: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
It might pay to explore the shape and individual entries as we go along.
Let's start with
>>> a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16])
>>> a.shape
(16, )
This is a one-dimensional array of length 16.
Now let's try
>>> a = a.reshape(2,2,2,2)
>>> a.shape
(2, 2, 2, 2)
It's a multi-dimensional array with 4 dimensions.
Let's see the 0, 1 element:
>>> a[0, 1]
array([[5, 6],
[7, 8]])
Since there are two dimensions left, it's a matrix of two dimensions.
Now a[:, 1] says: take a[i, 1 for all possible values of i:
>>> a[:, 1]
array([[[ 5, 6],
[ 7, 8]],
[[13, 14],
[15, 16]]])
It gives you an array where the first item is a[0, 1], and the second item is a[1, 1].
To answer the second part of your question (generating arrays of sequential values) you can use np.arange(start, stop, step) or np.linspace(start, stop, num_elements). Both of these return a numpy array with the corresponding range of values.

Categories