How to extract elements in specific column of the dataset? - python

i have been trying to build a neural network,to do so i have to divide the data into x and y,(my dataset was converted to numpy).
The data in the "x" is the 1st column which i have extracted successfully but when i try to extract the 2nd column i get the both x and y values for "y".
Here the code i used to divide the data:
data=np.genfromtxt("/home/crpsm/Pycharm/DataSet/headbrain.csv",delimiter=',')
x=data[:,:1]
y=data[:, :2]
Heres the output of x and y:
x:-
[[3738.]
[4261.]
[3777.]
[4177.]
[3585.]
[3785.]
[3559.]
[3613.]
[3982.]
[3443.]
y:-
[[3738. 1297.]
[4261. 1335.]
[3777. 1282.]
[4177. 1590.]
[3585. 1300.]
[3785. 1400.]
[3559. 1255.]
[3613. 1355.]
[3982. 1375.]
[3443. 1340.]
please tell me how to fix this error.Thanks in Advance..!!!

You may want to review the numpy indexing documentation.
To get the second column in the same shape as x, use y=data[:, 1:2].
Note: you are creating 2d arrays with this indexing (shape of (len(data), 1)). If you want 1d arrays, just use integers, not slices, for the second term:
x = data[:, 0]
y = data[:, 1]

What #w-m said in their answer is correct, you are currently assigning all rows (the first :) and all columns, starting from zero up to column one, excluding the upper bound, to x (with :1) and all rows (again the first :) and all columns, starting from zero up to column two, excluding the upper bound, to y (with :2).
x = data[:, 0]
y = data[:, 1]
Is one way to do this properly, but a nicer and more succinct way would be to use tuple unpacking:
x, y = data.T
This transposes (`T) the data, i.e. the two dimensions are exchanged, after which the first dimension has length two. If your actual data has more columns than that, you can use :
x, y, *rest = data.T
In this case rest will be a list of the remaining columns. This syntax was introduced in Python 3.0.

Related

What is the Numpy slicing notation in this code?

# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Can someone explain the second line of code with reference to specific documentation? I know its slicing but the I couldn't find any reference for the notation ":-1" anywhere. Please give the specific documentation portion.
Thank you
It results in slicing, most probably using numpy and it is being done on a data of shape (610, 14)
Per the docs:
Indexing on ndarrays
ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.
1D array
Slicing a 1-dimensional array is much like slicing a list
import numpy as np
np.random.seed(0)
array_1d = np.random.random((5,))
print(len(array_1d.shape))
1
NOTE: The len of the array shape tells you the number of dimensions.
We can use standard python list slicing on the 1D array.
# get the last element
print(array_1d[-1])
0.4236547993389047
# get everything up to but excluding the last element
print(array_1d[:-1])
[0.5488135 0.71518937 0.60276338 0.54488318]
2D array
array_2d = np.random.random((5, 1))
print(len(array_2d.shape))
2
Think of a 2-dimensional array like a data frame. It has rows (the 0th axis) and columns (the 1st axis). numpy grants us the ability to slice these axes independently by separating them with a comma (,).
# the 0th row and all columns
# the 0th row and all columns
print(array_2d[0, :])
[0.79172504]
# the 1st row and everything after + all columns
print(array_2d[1:, :])
[[0.52889492]
[0.56804456]
[0.92559664]
[0.07103606]]
# the 1st through second to last row + the last column
print(array_2d[1:-1, -1])
[0.52889492 0.56804456 0.92559664]
Your Example
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Note that data.shape is >= 2 (otherwise you'd get an IndexError).
This means data[:, :-1] is keeping all "rows" and slicing up to, but not including, the last "column". Likewise, data[:, -1] is keeping all "rows" and selecting only the last "column".
It's important to know that when you slice an ndarray using a colon (:), you will get an array with the same dimensions.
print(len(array_2d[1:, :-1].shape)) # 2
But if you "select" a specific index (i.e. don't use a colon), you may reduce the dimensions.
print(len(array_2d[1, :-1].shape)) # 1, because I selected a single index value on the 0th axis
print(len(array_2d[1, -1].shape)) # 0, because I selected a single index value on both the 0th and 1st axes
You can, however, select a list of indices on either axis (assuming they exist).
print(len(array_2d[[1], [-1]].shape)) # 1
print(len(array_2d[[1, 3], :].shape)) # 2
This slicing notation is explained here https://docs.python.org/3/tutorial/introduction.html#strings
-1 means last element, -2 - second from last, etc. For example, if there are 8 elements in a list, -1 is equivalent to 7 (not 8 because indexing starts from 0)
Keep in mind that "normal" python slicing for nested lists looks like [1:3][5:7], while numpy arrays also have a slightly different syntax ([8:10, 12:14]) that lets you slice multidimensional arrays. However, -1 always means the same thing. Here is the numpy documentation for slicing https://numpy.org/doc/stable/user/basics.indexing.html

Shifting Indexes in Python (Comparison to R)

I check using two series X and Y if ones is bigger than another. By using loc, I can get the index of my series where X>Y is TRUE. For example:
X.loc[X>Y]
Using this indexing, I want to shift the indexes n periods. For instance, if X.loc[X>Y] gives us {1,5,8,9}, I am interested in shifting these to {1+2,5+2,8+2,9+2}. I will appreciate any kind of advice on this matter!
You could use numpy.nonzero to get the indices and then shift them:
# two random arrays as an example
X = numpy.random.random(100)
Y = numpy.random.random(100)
ids = numpy.nonzero(X > Y)[0]
print ids
print ids + 2

Python - Create an array from columns in file

I have a text file with two columns and n rows. Usually I work with two separate vector using x,y=np.loadtxt('data',usecols=(0,1),unpack=True) but I would like to have them as an array of the form array=[[a,1],[b,2],[c,3]...] where all the letters correspond to the x-vector and the numbers to the y-vector so I can ask something like array[0,2]=b. I tried defining
array[0,:]=x but I didn't succeed. Any simple way to do this?
In addition, I want to get the respective x-value for certain y-value. I tried with
x_value=np.argwhere(array[:,1]==3)
And I'm expecting the x_value to be c because it corresponds to 3 in column 1 but it doesn't work either.
I think you simply need to not unpack the array you get back from loadtxt. Do:
arr = np.loadtxt('data', usecols=(0,1))
If your file contained:
0 1
2 3
4 5
arr will be like:
[[0, 1],
[2, 3],
[4, 5]]
Note that to index into this array, you need to specify the row first (and indexes start at 0):
arr[1,0] == 2 # True!
You can find the x values that correspond to a give y value with:
x_vals = arr[:,0][arr[:,1]==y_val]
The indexing will return an array, though x_vals will have only a single value if the y_val was unique. If you know in advance there will be only one match for the y_val, you could tack on [0] to the end of the indexing, so you get the first result.

Numpy signed maximum magnitude of cumsum along an axis

I have a numpy array a, a.shape=(17,90,144). I want to find the maximum magnitude of each column of cumsum(a, axis=0), but retaining the original sign. In other words, if for a given column a[:,j,i] the largest magnitude of cumsum corresponds to a negative value, I want to retain the minus sign.
The code np.amax(np.abs(a.cumsum(axis=0))) gets me the magnitude, but doesn't retain the sign. Using np.argmax instead will get me the indices I need, which I can then plug into the original cumsum array. But I can't find a good way to do so.
The following code works, but is dirty and really slow:
max_mag_signed = np.zeros((90,144))
indices = np.argmax(np.abs(a.cumsum(axis=0)), axis=0)
for j in range(90):
for i in range(144):
max_mag_signed[j,i] = a.cumsum(axis=0)[indices[j,i],j,i]
There must be a cleaner, faster way to do this. Any ideas?
I can't find any alternatives to argmax but at least you can fasten that with a more vectorized approach:
# store the cumsum, since it's used multiple times
cum_a = a.cumsum(axis=0)
# find the indices as before
indices = np.argmax(abs(cum_a), axis=0)
# construct the indices for the second and third dimensions
y, z = np.indices(indices.shape)
# get the values with np indexing
max_mag_signed = cum_a[indices, y, z]

Using numpy.argmax() on multidimensional arrays

I have a 4 dimensional array, i.e., data.shape = (20,30,33,288). I am finding the index of the closest array to n using
index = abs(data - n).argmin(axis = 1), so
index.shape = (20,33,288) with the indices varying.
I would like to use data[index] = "values" with values.shape = (20,33,288), but data[index] returns the error "index (8) out of range (0<=index<1) in dimension 0" or this operation takes a relatively long time to compute and returns a matrix with a shape that doesn't seem to make sense.
How do I return a array of correct values? i.e.,
data[index] = "values" with values.shape = (20,33,288)
This seems like a simple problem, is there a simple answer?
I would eventually like to find index2 = abs(data - n2).argmin(axis = 1), so I can perform an operation, say sum data at index to data at index2 without looping through the variables. Is this possible?
I am using python2.7 and numpy version 1.5.1.
You should be able to access the maximum values indexed by index using numpy.indices():
x, z, t = numpy.indices(index.shape)
data[x, index, z, t]
If I understood you correctly, this should work:
numpy.put(data, index, values)
I learned something new today, thanks.

Categories