Python - Create an array from columns in file - python

I have a text file with two columns and n rows. Usually I work with two separate vector using x,y=np.loadtxt('data',usecols=(0,1),unpack=True) but I would like to have them as an array of the form array=[[a,1],[b,2],[c,3]...] where all the letters correspond to the x-vector and the numbers to the y-vector so I can ask something like array[0,2]=b. I tried defining
array[0,:]=x but I didn't succeed. Any simple way to do this?
In addition, I want to get the respective x-value for certain y-value. I tried with
x_value=np.argwhere(array[:,1]==3)
And I'm expecting the x_value to be c because it corresponds to 3 in column 1 but it doesn't work either.

I think you simply need to not unpack the array you get back from loadtxt. Do:
arr = np.loadtxt('data', usecols=(0,1))
If your file contained:
0 1
2 3
4 5
arr will be like:
[[0, 1],
[2, 3],
[4, 5]]
Note that to index into this array, you need to specify the row first (and indexes start at 0):
arr[1,0] == 2 # True!
You can find the x values that correspond to a give y value with:
x_vals = arr[:,0][arr[:,1]==y_val]
The indexing will return an array, though x_vals will have only a single value if the y_val was unique. If you know in advance there will be only one match for the y_val, you could tack on [0] to the end of the indexing, so you get the first result.

Related

What is the Numpy slicing notation in this code?

# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Can someone explain the second line of code with reference to specific documentation? I know its slicing but the I couldn't find any reference for the notation ":-1" anywhere. Please give the specific documentation portion.
Thank you
It results in slicing, most probably using numpy and it is being done on a data of shape (610, 14)
Per the docs:
Indexing on ndarrays
ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.
1D array
Slicing a 1-dimensional array is much like slicing a list
import numpy as np
np.random.seed(0)
array_1d = np.random.random((5,))
print(len(array_1d.shape))
1
NOTE: The len of the array shape tells you the number of dimensions.
We can use standard python list slicing on the 1D array.
# get the last element
print(array_1d[-1])
0.4236547993389047
# get everything up to but excluding the last element
print(array_1d[:-1])
[0.5488135 0.71518937 0.60276338 0.54488318]
2D array
array_2d = np.random.random((5, 1))
print(len(array_2d.shape))
2
Think of a 2-dimensional array like a data frame. It has rows (the 0th axis) and columns (the 1st axis). numpy grants us the ability to slice these axes independently by separating them with a comma (,).
# the 0th row and all columns
# the 0th row and all columns
print(array_2d[0, :])
[0.79172504]
# the 1st row and everything after + all columns
print(array_2d[1:, :])
[[0.52889492]
[0.56804456]
[0.92559664]
[0.07103606]]
# the 1st through second to last row + the last column
print(array_2d[1:-1, -1])
[0.52889492 0.56804456 0.92559664]
Your Example
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Note that data.shape is >= 2 (otherwise you'd get an IndexError).
This means data[:, :-1] is keeping all "rows" and slicing up to, but not including, the last "column". Likewise, data[:, -1] is keeping all "rows" and selecting only the last "column".
It's important to know that when you slice an ndarray using a colon (:), you will get an array with the same dimensions.
print(len(array_2d[1:, :-1].shape)) # 2
But if you "select" a specific index (i.e. don't use a colon), you may reduce the dimensions.
print(len(array_2d[1, :-1].shape)) # 1, because I selected a single index value on the 0th axis
print(len(array_2d[1, -1].shape)) # 0, because I selected a single index value on both the 0th and 1st axes
You can, however, select a list of indices on either axis (assuming they exist).
print(len(array_2d[[1], [-1]].shape)) # 1
print(len(array_2d[[1, 3], :].shape)) # 2
This slicing notation is explained here https://docs.python.org/3/tutorial/introduction.html#strings
-1 means last element, -2 - second from last, etc. For example, if there are 8 elements in a list, -1 is equivalent to 7 (not 8 because indexing starts from 0)
Keep in mind that "normal" python slicing for nested lists looks like [1:3][5:7], while numpy arrays also have a slightly different syntax ([8:10, 12:14]) that lets you slice multidimensional arrays. However, -1 always means the same thing. Here is the numpy documentation for slicing https://numpy.org/doc/stable/user/basics.indexing.html

Is there a way to write a python function that will create 'N' arrays? (see body)

I have an numpy array that is shape 20, 3. (So 20 3 by 1 arrays. Correct me if I'm wrong, I am still pretty new to python)
I need to separate it into 3 arrays of shape 20,1 where the first array is 20 elements that are the 0th element of each 3 by 1 array. Second array is also 20 elements that are the 1st element of each 3 by 1 array, etc.
I am not sure if I need to write a function for this. Here is what I have tried:
Essentially I'm trying to create an array of 3 20 by 1 arrays that I can later index to get the separate 20 by 1 arrays.
a = np.load() #loads file
num=20 #the num is if I need to change array size
num_2=3
for j in range(0,num):
for l in range(0,num_2):
array_elements = np.zeros(3)
array_elements[l] = a[j:][l]
This gives the following error:
'''
ValueError: setting an array element with a sequence
'''
I have also tried making it a dictionary and making the dictionary values lists that are appended, but it only gives the first or last value of the 20 that I need.
Your array has shape (20, 3), this means it's a 2-dimensional array with 20 rows and 3 columns in each row.
You can access data in this array by indexing using numbers or ':' to indicate ranges. You want to split this in to 3 arrays of shape (20, 1), so one array per column. To do this you can pick the column with numbers and use ':' to mean 'all of the rows'. So, to access the three different columns: a[:, 0], a[:, 1] and a[:, 2].
You can then assign these to separate variables if you wish e.g. arr = a[:, 0] but this is just a reference to the original data in array a. This means any changes in arr will also be made to the corresponding data in a.
If you want to create a new array so this doesn't happen, you can easily use the .copy() function. Now if you set arr = a[:, 0].copy(), arr is completely separate to a and changes made to one will not affect the other.
Essentially you want to group your arrays by their index. There are plenty of ways of doing this. Since numpy does not have a group by method, you have to horizontally split the arrays into a new array and reshape it.
old_length = 3
new_length = 20
a = np.array(np.hsplit(a, old_length)).reshape(old_length, new_length)
Edit: It appears you can achieve the same effect by rotating the array -90 degrees. You can do this by using rot90 and setting k=-1 or k=3 telling numpy to rotate by 90 k times.
a = np.rot90(a, k=-1)

How to extract elements in specific column of the dataset?

i have been trying to build a neural network,to do so i have to divide the data into x and y,(my dataset was converted to numpy).
The data in the "x" is the 1st column which i have extracted successfully but when i try to extract the 2nd column i get the both x and y values for "y".
Here the code i used to divide the data:
data=np.genfromtxt("/home/crpsm/Pycharm/DataSet/headbrain.csv",delimiter=',')
x=data[:,:1]
y=data[:, :2]
Heres the output of x and y:
x:-
[[3738.]
[4261.]
[3777.]
[4177.]
[3585.]
[3785.]
[3559.]
[3613.]
[3982.]
[3443.]
y:-
[[3738. 1297.]
[4261. 1335.]
[3777. 1282.]
[4177. 1590.]
[3585. 1300.]
[3785. 1400.]
[3559. 1255.]
[3613. 1355.]
[3982. 1375.]
[3443. 1340.]
please tell me how to fix this error.Thanks in Advance..!!!
You may want to review the numpy indexing documentation.
To get the second column in the same shape as x, use y=data[:, 1:2].
Note: you are creating 2d arrays with this indexing (shape of (len(data), 1)). If you want 1d arrays, just use integers, not slices, for the second term:
x = data[:, 0]
y = data[:, 1]
What #w-m said in their answer is correct, you are currently assigning all rows (the first :) and all columns, starting from zero up to column one, excluding the upper bound, to x (with :1) and all rows (again the first :) and all columns, starting from zero up to column two, excluding the upper bound, to y (with :2).
x = data[:, 0]
y = data[:, 1]
Is one way to do this properly, but a nicer and more succinct way would be to use tuple unpacking:
x, y = data.T
This transposes (`T) the data, i.e. the two dimensions are exchanged, after which the first dimension has length two. If your actual data has more columns than that, you can use :
x, y, *rest = data.T
In this case rest will be a list of the remaining columns. This syntax was introduced in Python 3.0.

finding the max of a column in an array

def maxvalues():
for n in range(1,15):
dummy=[]
for k in range(len(MotionsAndMoorings)):
dummy.append(MotionsAndMoorings[k][n])
max(dummy)
L = [x + [max(dummy)]] ## to be corrected (adding columns with value max(dummy))
## suggest code to add new row to L and for next function call, it should save values here.
i have an array of size (k x n) and i need to pick the max values of the first column in that array. Please suggest if there is a simpler way other than what i tried? and my main aim is to append it to L in columns rather than rows. If i just append, it is adding values at the end. I would like to this to be done in columns for row 0 in L, because i'll call this function again and add a new row to L and do the same. Please suggest.
General suggestions for your code
First of all it's not very handy to access globals in a function. It works but it's not considered good style. So instead of using:
def maxvalues():
do_something_with(MotionsAndMoorings)
you should do it with an argument:
def maxvalues(array):
do_something_with(array)
MotionsAndMoorings = something
maxvalues(MotionsAndMoorings) # pass it to the function.
The next strange this is you seem to exlude the first row of your array:
for n in range(1,15):
I think that's unintended. The first element of a list has the index 0 and not 1. So I guess you wanted to write:
for n in range(0,15):
or even better for arbitary lengths:
for n in range(len(array[0])): # I chose the first row length here not the number of columns
Alternatives to your iterations
But this would not be very intuitive because the max function already implements some very nice keyword (the key) so you don't need to iterate over the whole array:
import operator
column = 2
max(array, key=operator.itemgetter(column))[column]
this will return the row where the i-th element is maximal (you just define your wanted column as this element). But the maximum will return the whole row so you need to extract just the i-th element.
So to get a list of all your maximums for each column you could do:
[max(array, key=operator.itemgetter(column))[column] for column in range(len(array[0]))]
For your L I'm not sure what this is but for that you should probably also pass it as argument to the function:
def maxvalues(array, L): # another argument here
but since I don't know what x and L are supposed to be I'll not go further into that. But it looks like you want to make the columns of MotionsAndMoorings to rows and the rows to columns. If so you can just do it with:
dummy = [[MotionsAndMoorings[j][i] for j in range(len(MotionsAndMoorings))] for i in range(len(MotionsAndMoorings[0]))]
that's a list comprehension that converts a list like:
[[1, 2, 3], [4, 5, 6], [0, 2, 10], [0, 2, 10]]
to an "inverted" column/row list:
[[1, 4, 0, 0], [2, 5, 2, 2], [3, 6, 10, 10]]
Alternative packages
But like roadrunner66 already said sometimes it's easiest to use a library like numpy or pandas that already has very advanced and fast functions that do exactly what you want and are very easy to use.
For example you convert a python list to a numpy array simple by:
import numpy as np
Motions_numpy = np.array(MotionsAndMoorings)
you get the maximum of the columns by using:
maximums_columns = np.max(Motions_numpy, axis=0)
you don't even need to convert it to a np.array to use np.max or transpose it (make rows to columns and the colums to rows):
transposed = np.transpose(MotionsAndMoorings)
I hope this answer is not to unstructured. Some parts are suggestions to your function and some are alternatives. You should pick the parts that you need and if you have any trouble with it, just leave a comment or ask another question. :-)
An example with a random input array, showing that you can take the max in either axis easily with one command.
import numpy as np
aa= np.random.random([4,3])
print aa
print
print np.max(aa,axis=0)
print
print np.max(aa,axis=1)
Output:
[[ 0.51972266 0.35930957 0.60381998]
[ 0.34577217 0.27908173 0.52146593]
[ 0.12101346 0.52268843 0.41704152]
[ 0.24181773 0.40747905 0.14980534]]
[ 0.51972266 0.52268843 0.60381998]
[ 0.60381998 0.52146593 0.52268843 0.40747905]

How to declare and fill an array in NumPy?

I need to create an empty array in Python and fill it in a loop method.
data1 = np.array([ra,dec,[]])
Here is what I have. The ra and dec portions are from another array I've imported. What I am having trouble with is filling the other columns.
Example. Lets say to fill the 3rd column I do this:
for i in range (0,56):
data1[i,3] = 32
The error I am getting is:
IndexError: invalid index for the second line in the aforementioned
code sample.
Additionally, when I check the shape of the array I created, it will come out at (3,). The data that I have already entered into this is intended to be two columns with 56 rows of data.
So where am I messing up here? Should I transpose the array?
You could do:
data1 = np.zeros((56,4))
to get a 56 by 4 array. If you don't like to start the array with 0, you could use np.ones or np.empty or np.ones((56, 4)) * np.nan
Then, in most cases it is best not to python-loop if not needed for performance reasons.
So as an example this would do your loop:
data[:, 3] = 32
data1 = np.array([ra,dec,[32]*len(ra)])
Gives a single-line solution to your problem; but for efficiency, allocating an empty array first and then copying in the relevant parts would be preferable, so you avoid the construction of the dummy list.
One thing that nobody has mentioned is that in Python, indexing starts at 0, not 1.
This means that if you want to look at the third column of the array, you actually should address [:,2], not [:,3].
Good luck!
Assuming ra and dec are vectors (1-d):
data1 = np.concatenate([ra[:, None], dec[:, None], np.zeros((len(ra), 1))+32], axis=1)
Or
data1 = np.empty((len(ra), 3))
data[:, 0] = ra
data[:, 1] = dec
data[:, 2] = 32
hey guys if u want to fill an array with just the same number just
x_2 = np.ones((1000))+1
exemple for 1000 numbers 2

Categories