What is the Numpy slicing notation in this code? - python

# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Can someone explain the second line of code with reference to specific documentation? I know its slicing but the I couldn't find any reference for the notation ":-1" anywhere. Please give the specific documentation portion.
Thank you
It results in slicing, most probably using numpy and it is being done on a data of shape (610, 14)

Per the docs:
Indexing on ndarrays
ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.
1D array
Slicing a 1-dimensional array is much like slicing a list
import numpy as np
np.random.seed(0)
array_1d = np.random.random((5,))
print(len(array_1d.shape))
1
NOTE: The len of the array shape tells you the number of dimensions.
We can use standard python list slicing on the 1D array.
# get the last element
print(array_1d[-1])
0.4236547993389047
# get everything up to but excluding the last element
print(array_1d[:-1])
[0.5488135 0.71518937 0.60276338 0.54488318]
2D array
array_2d = np.random.random((5, 1))
print(len(array_2d.shape))
2
Think of a 2-dimensional array like a data frame. It has rows (the 0th axis) and columns (the 1st axis). numpy grants us the ability to slice these axes independently by separating them with a comma (,).
# the 0th row and all columns
# the 0th row and all columns
print(array_2d[0, :])
[0.79172504]
# the 1st row and everything after + all columns
print(array_2d[1:, :])
[[0.52889492]
[0.56804456]
[0.92559664]
[0.07103606]]
# the 1st through second to last row + the last column
print(array_2d[1:-1, -1])
[0.52889492 0.56804456 0.92559664]
Your Example
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Note that data.shape is >= 2 (otherwise you'd get an IndexError).
This means data[:, :-1] is keeping all "rows" and slicing up to, but not including, the last "column". Likewise, data[:, -1] is keeping all "rows" and selecting only the last "column".
It's important to know that when you slice an ndarray using a colon (:), you will get an array with the same dimensions.
print(len(array_2d[1:, :-1].shape)) # 2
But if you "select" a specific index (i.e. don't use a colon), you may reduce the dimensions.
print(len(array_2d[1, :-1].shape)) # 1, because I selected a single index value on the 0th axis
print(len(array_2d[1, -1].shape)) # 0, because I selected a single index value on both the 0th and 1st axes
You can, however, select a list of indices on either axis (assuming they exist).
print(len(array_2d[[1], [-1]].shape)) # 1
print(len(array_2d[[1, 3], :].shape)) # 2

This slicing notation is explained here https://docs.python.org/3/tutorial/introduction.html#strings
-1 means last element, -2 - second from last, etc. For example, if there are 8 elements in a list, -1 is equivalent to 7 (not 8 because indexing starts from 0)
Keep in mind that "normal" python slicing for nested lists looks like [1:3][5:7], while numpy arrays also have a slightly different syntax ([8:10, 12:14]) that lets you slice multidimensional arrays. However, -1 always means the same thing. Here is the numpy documentation for slicing https://numpy.org/doc/stable/user/basics.indexing.html

Related

indexing in python ml [duplicate]

How can I index the last axis of a Numpy array if I don't know its rank in advance?
Here is what I want to do: Let a be a Numpy array of unknown rank. I want the slice of the last k elements of the last axis.
If a is 1D, I want
b = a[-k:]
If a is 2D, I want
b = a[:, -k:]
If a is 3D, I want
b = a[:, :, -k:]
and so on.
I want this to work regardless of the rank of a (as long as the rank is at least 1).
The fact that I want the last k elements in the example is irrelevant of course, the point is that I want to specify indices for whatever the last axis is when I don't know the rank of an array in advance.
b = a[..., -k:]
This is mentioned in the docs.

Is there a way to write a python function that will create 'N' arrays? (see body)

I have an numpy array that is shape 20, 3. (So 20 3 by 1 arrays. Correct me if I'm wrong, I am still pretty new to python)
I need to separate it into 3 arrays of shape 20,1 where the first array is 20 elements that are the 0th element of each 3 by 1 array. Second array is also 20 elements that are the 1st element of each 3 by 1 array, etc.
I am not sure if I need to write a function for this. Here is what I have tried:
Essentially I'm trying to create an array of 3 20 by 1 arrays that I can later index to get the separate 20 by 1 arrays.
a = np.load() #loads file
num=20 #the num is if I need to change array size
num_2=3
for j in range(0,num):
for l in range(0,num_2):
array_elements = np.zeros(3)
array_elements[l] = a[j:][l]
This gives the following error:
'''
ValueError: setting an array element with a sequence
'''
I have also tried making it a dictionary and making the dictionary values lists that are appended, but it only gives the first or last value of the 20 that I need.
Your array has shape (20, 3), this means it's a 2-dimensional array with 20 rows and 3 columns in each row.
You can access data in this array by indexing using numbers or ':' to indicate ranges. You want to split this in to 3 arrays of shape (20, 1), so one array per column. To do this you can pick the column with numbers and use ':' to mean 'all of the rows'. So, to access the three different columns: a[:, 0], a[:, 1] and a[:, 2].
You can then assign these to separate variables if you wish e.g. arr = a[:, 0] but this is just a reference to the original data in array a. This means any changes in arr will also be made to the corresponding data in a.
If you want to create a new array so this doesn't happen, you can easily use the .copy() function. Now if you set arr = a[:, 0].copy(), arr is completely separate to a and changes made to one will not affect the other.
Essentially you want to group your arrays by their index. There are plenty of ways of doing this. Since numpy does not have a group by method, you have to horizontally split the arrays into a new array and reshape it.
old_length = 3
new_length = 20
a = np.array(np.hsplit(a, old_length)).reshape(old_length, new_length)
Edit: It appears you can achieve the same effect by rotating the array -90 degrees. You can do this by using rot90 and setting k=-1 or k=3 telling numpy to rotate by 90 k times.
a = np.rot90(a, k=-1)

Call numpy ravel and include an interger multi-index

I have 4-dimensional array. I am going to turn it into a 1-dim array. I use numpy ravel and it works fin with the default parameters.
However I would also like the positions/indices in the 4-dim array.
I want something like this as row in my output.
x,y,z,w,value
With x being the first dimension of my initial array and so on.
The obvious approach is iteration, however I was told to avoid it when I can.
for i in range(test.shape[0]):
for j in range(test.shape[1]):
for k in range(test.shape[2]):
for l in range(test.shape[3]):
print(i,j,k,l,test[(i,j,k,l)])
It will be to slow when I use a larger dataset.
Is there a way to configure ravel to do this or any other approach faster than iteration.
Use np.indices with sparse=False, combined with np.concatenate to build the array. np.indices provides the first n columns, and np.concatenate appends the last one:
test = np.random.randint(10, size=(3, 5, 4, 2))
index = np.indices(test.shape, sparse=False) # shape: (4, 3, 5, 4, 2)
data = np.concatenate((index, test[None, ...]), axis=0).reshape(test.ndim + 1, -1).T
A more detailed breakdown:
index is a (4, *test.shape) array, with one element per dimension.
To make test concatenatable with index, you need to prepend a unit dimension, which is what test[None, ...] does. None is synonymous with np.newaxis, and Ellipsis, or ..., means "all the remaining dimensions".
When you concatenate along axis=0, you are appending test to the array of indices. Each element of index along the first axis is now a 5-element array containing the index followed by the value. The remaining axes reflect the shape of test, but besides that, you have what you want.
The goal is to flatten out the trailing dimensions, so you get a (5, N) array, where N = np.prod(test.shape). Thats what the final reshape does. test.ndim + 1 is the size of the index +1 for the value. -1 can appear exactly once in a reshape. It means "product of all the remaining dimensions".

How to extract elements in specific column of the dataset?

i have been trying to build a neural network,to do so i have to divide the data into x and y,(my dataset was converted to numpy).
The data in the "x" is the 1st column which i have extracted successfully but when i try to extract the 2nd column i get the both x and y values for "y".
Here the code i used to divide the data:
data=np.genfromtxt("/home/crpsm/Pycharm/DataSet/headbrain.csv",delimiter=',')
x=data[:,:1]
y=data[:, :2]
Heres the output of x and y:
x:-
[[3738.]
[4261.]
[3777.]
[4177.]
[3585.]
[3785.]
[3559.]
[3613.]
[3982.]
[3443.]
y:-
[[3738. 1297.]
[4261. 1335.]
[3777. 1282.]
[4177. 1590.]
[3585. 1300.]
[3785. 1400.]
[3559. 1255.]
[3613. 1355.]
[3982. 1375.]
[3443. 1340.]
please tell me how to fix this error.Thanks in Advance..!!!
You may want to review the numpy indexing documentation.
To get the second column in the same shape as x, use y=data[:, 1:2].
Note: you are creating 2d arrays with this indexing (shape of (len(data), 1)). If you want 1d arrays, just use integers, not slices, for the second term:
x = data[:, 0]
y = data[:, 1]
What #w-m said in their answer is correct, you are currently assigning all rows (the first :) and all columns, starting from zero up to column one, excluding the upper bound, to x (with :1) and all rows (again the first :) and all columns, starting from zero up to column two, excluding the upper bound, to y (with :2).
x = data[:, 0]
y = data[:, 1]
Is one way to do this properly, but a nicer and more succinct way would be to use tuple unpacking:
x, y = data.T
This transposes (`T) the data, i.e. the two dimensions are exchanged, after which the first dimension has length two. If your actual data has more columns than that, you can use :
x, y, *rest = data.T
In this case rest will be a list of the remaining columns. This syntax was introduced in Python 3.0.

extracting vectors from an array using logical indexing

I have the following numpy arrays:
a truth table of (nx1), and a matrix of (nxk) where n is 5 and k is 2 in this example.
btable = np.array([[True],[False],[False],[True],[True]])
bb=np.array([[1.842,4.607],[5.659,4.799],[6.352,3.290],[2.904,4.612],[3.231,4.939]])
I would like to extract the vectors in bb according the indexing values in btable.
I tried choicebb=bb[btable==True] which gets me the result
[ 1.84207953 2.90401653 3.23197916]
choicebb=bb[btable] gets me the same results as well.
What I want instead is
[[1.842,4.607]
[2.904,4.612]
[3.231,4.939]]
I also tried
choicebb=bb[btable==True,:]
but then I would get
---> 13 choicebb=bb[btable==True,:]
14 print(choicebb)
IndexError: too many indices for array
This can be easily done in matlab with choicebb=bb(btable,:);
Get the 1D version of the mask with np.ravel() or slice out the first column with [:,0] and use it for logical indexing into the data array, like so -
bb[btable.ravel()]
bb[btable[:,0]]
Note that bb[btable.ravel()] is essentially - bb[btable.ravel(),:]. In NumPy, we could skip mentioning the trailing axes if all elements are to be selected, that's why it simplified to bb[btable.ravel()].
Explanataion : To index into a single axis and such that it select all elements along the rest of the axes, we need to feed in a 1D array (boolean or integer array) along that axis and use : along the leftover axes. In our case, we are indexing into the first axis to select rows, so we need to feed in a boolean array along that axis and : along the rest of axes.
When we are feeding the 2D version of the mask, it indexes along those corresponding multiple axes. So, when we feed in (N,1) shaped boolean array, we are selecting correct rows, but also only selecting the first column elements, which is not the intended output.

Categories