indexing in python ml [duplicate] - python

How can I index the last axis of a Numpy array if I don't know its rank in advance?
Here is what I want to do: Let a be a Numpy array of unknown rank. I want the slice of the last k elements of the last axis.
If a is 1D, I want
b = a[-k:]
If a is 2D, I want
b = a[:, -k:]
If a is 3D, I want
b = a[:, :, -k:]
and so on.
I want this to work regardless of the rank of a (as long as the rank is at least 1).
The fact that I want the last k elements in the example is irrelevant of course, the point is that I want to specify indices for whatever the last axis is when I don't know the rank of an array in advance.

b = a[..., -k:]
This is mentioned in the docs.

Related

What is the Numpy slicing notation in this code?

# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Can someone explain the second line of code with reference to specific documentation? I know its slicing but the I couldn't find any reference for the notation ":-1" anywhere. Please give the specific documentation portion.
Thank you
It results in slicing, most probably using numpy and it is being done on a data of shape (610, 14)
Per the docs:
Indexing on ndarrays
ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.
1D array
Slicing a 1-dimensional array is much like slicing a list
import numpy as np
np.random.seed(0)
array_1d = np.random.random((5,))
print(len(array_1d.shape))
1
NOTE: The len of the array shape tells you the number of dimensions.
We can use standard python list slicing on the 1D array.
# get the last element
print(array_1d[-1])
0.4236547993389047
# get everything up to but excluding the last element
print(array_1d[:-1])
[0.5488135 0.71518937 0.60276338 0.54488318]
2D array
array_2d = np.random.random((5, 1))
print(len(array_2d.shape))
2
Think of a 2-dimensional array like a data frame. It has rows (the 0th axis) and columns (the 1st axis). numpy grants us the ability to slice these axes independently by separating them with a comma (,).
# the 0th row and all columns
# the 0th row and all columns
print(array_2d[0, :])
[0.79172504]
# the 1st row and everything after + all columns
print(array_2d[1:, :])
[[0.52889492]
[0.56804456]
[0.92559664]
[0.07103606]]
# the 1st through second to last row + the last column
print(array_2d[1:-1, -1])
[0.52889492 0.56804456 0.92559664]
Your Example
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Note that data.shape is >= 2 (otherwise you'd get an IndexError).
This means data[:, :-1] is keeping all "rows" and slicing up to, but not including, the last "column". Likewise, data[:, -1] is keeping all "rows" and selecting only the last "column".
It's important to know that when you slice an ndarray using a colon (:), you will get an array with the same dimensions.
print(len(array_2d[1:, :-1].shape)) # 2
But if you "select" a specific index (i.e. don't use a colon), you may reduce the dimensions.
print(len(array_2d[1, :-1].shape)) # 1, because I selected a single index value on the 0th axis
print(len(array_2d[1, -1].shape)) # 0, because I selected a single index value on both the 0th and 1st axes
You can, however, select a list of indices on either axis (assuming they exist).
print(len(array_2d[[1], [-1]].shape)) # 1
print(len(array_2d[[1, 3], :].shape)) # 2
This slicing notation is explained here https://docs.python.org/3/tutorial/introduction.html#strings
-1 means last element, -2 - second from last, etc. For example, if there are 8 elements in a list, -1 is equivalent to 7 (not 8 because indexing starts from 0)
Keep in mind that "normal" python slicing for nested lists looks like [1:3][5:7], while numpy arrays also have a slightly different syntax ([8:10, 12:14]) that lets you slice multidimensional arrays. However, -1 always means the same thing. Here is the numpy documentation for slicing https://numpy.org/doc/stable/user/basics.indexing.html

Is there a way to write a python function that will create 'N' arrays? (see body)

I have an numpy array that is shape 20, 3. (So 20 3 by 1 arrays. Correct me if I'm wrong, I am still pretty new to python)
I need to separate it into 3 arrays of shape 20,1 where the first array is 20 elements that are the 0th element of each 3 by 1 array. Second array is also 20 elements that are the 1st element of each 3 by 1 array, etc.
I am not sure if I need to write a function for this. Here is what I have tried:
Essentially I'm trying to create an array of 3 20 by 1 arrays that I can later index to get the separate 20 by 1 arrays.
a = np.load() #loads file
num=20 #the num is if I need to change array size
num_2=3
for j in range(0,num):
for l in range(0,num_2):
array_elements = np.zeros(3)
array_elements[l] = a[j:][l]
This gives the following error:
'''
ValueError: setting an array element with a sequence
'''
I have also tried making it a dictionary and making the dictionary values lists that are appended, but it only gives the first or last value of the 20 that I need.
Your array has shape (20, 3), this means it's a 2-dimensional array with 20 rows and 3 columns in each row.
You can access data in this array by indexing using numbers or ':' to indicate ranges. You want to split this in to 3 arrays of shape (20, 1), so one array per column. To do this you can pick the column with numbers and use ':' to mean 'all of the rows'. So, to access the three different columns: a[:, 0], a[:, 1] and a[:, 2].
You can then assign these to separate variables if you wish e.g. arr = a[:, 0] but this is just a reference to the original data in array a. This means any changes in arr will also be made to the corresponding data in a.
If you want to create a new array so this doesn't happen, you can easily use the .copy() function. Now if you set arr = a[:, 0].copy(), arr is completely separate to a and changes made to one will not affect the other.
Essentially you want to group your arrays by their index. There are plenty of ways of doing this. Since numpy does not have a group by method, you have to horizontally split the arrays into a new array and reshape it.
old_length = 3
new_length = 20
a = np.array(np.hsplit(a, old_length)).reshape(old_length, new_length)
Edit: It appears you can achieve the same effect by rotating the array -90 degrees. You can do this by using rot90 and setting k=-1 or k=3 telling numpy to rotate by 90 k times.
a = np.rot90(a, k=-1)

Dealing with three 2-D numpy arrays without using loops

I have three 2-D numpy arrays with shape as (3,7).
I want to take the (0,0) element from each of the array, pass these values in a function and store the returned value at the (0,0) index in a new 2-D array.
Then I want to take (0,1) element from each of the array, pass these values to the same function and store the returned value at the (0,1) index of the same new array.
I want to run this for all the columns and then move on to the next row and continue till the end of the array.
The catch here is that I don't want to use loops, just the numpy methods. Been struggling a lot on this lately. Any ideas would be of great help.
Thanks!
I am running a loop like this for now. It gives me back the result for each element in the 1st row only. Here a, b and c are the three 2-D arrays that I mentioned earlier.
count = 0
def(a, b, c):
for i in range(0,7):
count += -(c[:1,:][i][0]) - (((a[:1,:][0][i]-b[:1,:][i][0])/c[:1,:][i][0]))**2
return count
Since all three arrays are the same shape, and you're operating on each element in the same way, you can easily translate to vetorised NumPy functions like so:
# res is a 2-D array of the same shape as a, b and c
res = -c - ((a - b) / c)**2
It looks like in your example code you're trying to sum each row, so you can do this after performing the operations:
count = np.sum(res, axis=1)

Index multidimensional torch tensor by another multidimensional tensor

I have a tensor x in pytorch let's say of shape (5,3,2,6) and another tensor idx of shape (5,3,2,1) which contain indices for every element in first tensor. I want a slicing of the first tensor with the indices of the second tensor. I tried x= x[idx] but I get a weird dimensionality when I really want it to be of shape (5,3,2) or (5,3,2,1).
I'll try to give an easier example:
Let's say
x=torch.Tensor([[10,20,30],
[8,4,43]])
idx = torch.Tensor([[0],
[2]])
I want something like
y = x[idx]
such that 'y' outputs [[10],[43]] or something like.
The indices represent the position of the wanted elements the last dimension. for the example above where x.shape = (2,3) the last dimension are the columns, then the indices in 'idx' is the column. I want this but for more than 2 dimensions
From what I understand from the comments, you need idx to be index in the last dimension and each index in idx corresponds to similar index in x (except for the last dimension). In that case (this is the numpy version, you can convert it to torch):
ind = np.indices(idx.shape)
ind[-1] = idx
x[tuple(ind)]
output:
[[10]
[43]]
You can use range; and squeeze to get proper idx dimension like
x[range(x.size(0)), idx.squeeze()]
tensor([10., 43.])
# or
x[range(x.size(0)), idx.squeeze()].unsqueeze(1)
tensor([[10.],
[43.]])
Here's the one that works in PyTorch using gather. The idx needs to be in torch.int64 format which the following line will ensure (note the lowercase of 't' in tensor).
idx = torch.tensor([[0],
[2]])
torch.gather(x, 1, idx) # 1 is the axis to index here
tensor([[10.],
[43.]])

Element-wise median and percentiles of arrays with Numeric Python

I am using Numeric Python. Unfortunately, NumPy is not an option. If I have multiple arrays, such as:
a=Numeric.array(([1,2,3],[4,5,6],[7,8,9]))
b=Numeric.array(([9,8,7],[6,5,4],[3,2,1]))
c=Numeric.array(([5,9,1],[5,4,7],[5,2,3]))
How do I return an array that represents the element-wise median of arrays a,b and c?...such as,
array(([5,8,3],[5,5,6],[5,2,3]))
And then looking at a more general situation: Given n number of arrays, how do I find the percentiles of each element? For example, return an array that represents the 30th percentile of 10 arrays. Thank you very much for your help!
Combine your stack of 2-D arrays into one 3-D array, d = Numeric.array([a, b, c]) and then sort on the third dimension. Afterwards, the successive 2-D planes will be rank order so you can extract planes for the low, high, quartiles, percentiles, or median.
Well, I'm not versed in Numeric, but I'll just start with a naive solution and see if we can make it any better.
To get the 30th percentile of list foo let x=0.3, sort the list, and pick the the element at foo[int(len(foo)*x)]
For your data, you want to put it in a matrix, transpose it, sort each row, and get the median of each row.
A matrix in Numeric (just like numpy) is an array with two dimensions.
I think that bar = Numeric.array(a,b,c) would make Array you want, and then you could get the nth column with 'bar[:,n]' if Numeric has the same slicing techniques as Numpy.
foo = sorted(bar[:,n])
foo[int(len(foo)*x)]
I hope that helps you.
Putting Raymond Hettinger's description into python:
a=Numeric.array(([1,2,3],[4,5,6],[7,8,9]))
b=Numeric.array(([9,8,7],[6,5,4],[3,2,1]))
c=Numeric.array(([5,9,1],[5,4,7],[5,2,3]))
d = Numeric.array([a, b, c])
d.sort(axis=0)
Since there are n=3 input matrii so the median would be that of the middle one, the one indexed by one,
print d[n//2]
[[5 8 3]
[5 5 6]
[5 2 3]]
And if you had 4 input matrii, you would have to get the mean-elements of d[1] and d[2].

Categories