How to declare and fill an array in NumPy?

How to declare and fill an array in NumPy? - python

I need to create an empty array in Python and fill it in a loop method.
data1 = np.array([ra,dec,[]])
Here is what I have. The ra and dec portions are from another array I've imported. What I am having trouble with is filling the other columns.
Example. Lets say to fill the 3rd column I do this:
for i in range (0,56):
data1[i,3] = 32
The error I am getting is:
IndexError: invalid index for the second line in the aforementioned
code sample.
Additionally, when I check the shape of the array I created, it will come out at (3,). The data that I have already entered into this is intended to be two columns with 56 rows of data.
So where am I messing up here? Should I transpose the array?

You could do:
data1 = np.zeros((56,4))
to get a 56 by 4 array. If you don't like to start the array with 0, you could use np.ones or np.empty or np.ones((56, 4)) * np.nan
Then, in most cases it is best not to python-loop if not needed for performance reasons.
So as an example this would do your loop:
data[:, 3] = 32

data1 = np.array([ra,dec,[32]*len(ra)])
Gives a single-line solution to your problem; but for efficiency, allocating an empty array first and then copying in the relevant parts would be preferable, so you avoid the construction of the dummy list.

One thing that nobody has mentioned is that in Python, indexing starts at 0, not 1.
This means that if you want to look at the third column of the array, you actually should address [:,2], not [:,3].
Good luck!

Assuming ra and dec are vectors (1-d):
data1 = np.concatenate([ra[:, None], dec[:, None], np.zeros((len(ra), 1))+32], axis=1)
Or
data1 = np.empty((len(ra), 3))
data[:, 0] = ra
data[:, 1] = dec
data[:, 2] = 32

hey guys if u want to fill an array with just the same number just
x_2 = np.ones((1000))+1
exemple for 1000 numbers 2

Related

What is the Numpy slicing notation in this code?

# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Can someone explain the second line of code with reference to specific documentation? I know its slicing but the I couldn't find any reference for the notation ":-1" anywhere. Please give the specific documentation portion.
Thank you
It results in slicing, most probably using numpy and it is being done on a data of shape (610, 14)

Per the docs:
Indexing on ndarrays
ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.
1D array
Slicing a 1-dimensional array is much like slicing a list
import numpy as np
np.random.seed(0)
array_1d = np.random.random((5,))
print(len(array_1d.shape))
1
NOTE: The len of the array shape tells you the number of dimensions.
We can use standard python list slicing on the 1D array.
# get the last element
print(array_1d[-1])
0.4236547993389047
# get everything up to but excluding the last element
print(array_1d[:-1])
[0.5488135 0.71518937 0.60276338 0.54488318]
2D array
array_2d = np.random.random((5, 1))
print(len(array_2d.shape))
2
Think of a 2-dimensional array like a data frame. It has rows (the 0th axis) and columns (the 1st axis). numpy grants us the ability to slice these axes independently by separating them with a comma (,).
# the 0th row and all columns
# the 0th row and all columns
print(array_2d[0, :])
[0.79172504]
# the 1st row and everything after + all columns
print(array_2d[1:, :])
[[0.52889492]
[0.56804456]
[0.92559664]
[0.07103606]]
# the 1st through second to last row + the last column
print(array_2d[1:-1, -1])
[0.52889492 0.56804456 0.92559664]
Your Example
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Note that data.shape is >= 2 (otherwise you'd get an IndexError).
This means data[:, :-1] is keeping all "rows" and slicing up to, but not including, the last "column". Likewise, data[:, -1] is keeping all "rows" and selecting only the last "column".
It's important to know that when you slice an ndarray using a colon (:), you will get an array with the same dimensions.
print(len(array_2d[1:, :-1].shape)) # 2
But if you "select" a specific index (i.e. don't use a colon), you may reduce the dimensions.
print(len(array_2d[1, :-1].shape)) # 1, because I selected a single index value on the 0th axis
print(len(array_2d[1, -1].shape)) # 0, because I selected a single index value on both the 0th and 1st axes
You can, however, select a list of indices on either axis (assuming they exist).
print(len(array_2d[[1], [-1]].shape)) # 1
print(len(array_2d[[1, 3], :].shape)) # 2

This slicing notation is explained here https://docs.python.org/3/tutorial/introduction.html#strings
-1 means last element, -2 - second from last, etc. For example, if there are 8 elements in a list, -1 is equivalent to 7 (not 8 because indexing starts from 0)
Keep in mind that "normal" python slicing for nested lists looks like [1:3][5:7], while numpy arrays also have a slightly different syntax ([8:10, 12:14]) that lets you slice multidimensional arrays. However, -1 always means the same thing. Here is the numpy documentation for slicing https://numpy.org/doc/stable/user/basics.indexing.html

Is there a way to write a python function that will create 'N' arrays? (see body)

I have an numpy array that is shape 20, 3. (So 20 3 by 1 arrays. Correct me if I'm wrong, I am still pretty new to python)
I need to separate it into 3 arrays of shape 20,1 where the first array is 20 elements that are the 0th element of each 3 by 1 array. Second array is also 20 elements that are the 1st element of each 3 by 1 array, etc.
I am not sure if I need to write a function for this. Here is what I have tried:
Essentially I'm trying to create an array of 3 20 by 1 arrays that I can later index to get the separate 20 by 1 arrays.
a = np.load() #loads file
num=20 #the num is if I need to change array size
num_2=3
for j in range(0,num):
for l in range(0,num_2):
array_elements = np.zeros(3)
array_elements[l] = a[j:][l]
This gives the following error:
'''
ValueError: setting an array element with a sequence
'''
I have also tried making it a dictionary and making the dictionary values lists that are appended, but it only gives the first or last value of the 20 that I need.

Your array has shape (20, 3), this means it's a 2-dimensional array with 20 rows and 3 columns in each row.
You can access data in this array by indexing using numbers or ':' to indicate ranges. You want to split this in to 3 arrays of shape (20, 1), so one array per column. To do this you can pick the column with numbers and use ':' to mean 'all of the rows'. So, to access the three different columns: a[:, 0], a[:, 1] and a[:, 2].
You can then assign these to separate variables if you wish e.g. arr = a[:, 0] but this is just a reference to the original data in array a. This means any changes in arr will also be made to the corresponding data in a.
If you want to create a new array so this doesn't happen, you can easily use the .copy() function. Now if you set arr = a[:, 0].copy(), arr is completely separate to a and changes made to one will not affect the other.

Essentially you want to group your arrays by their index. There are plenty of ways of doing this. Since numpy does not have a group by method, you have to horizontally split the arrays into a new array and reshape it.
old_length = 3
new_length = 20
a = np.array(np.hsplit(a, old_length)).reshape(old_length, new_length)
Edit: It appears you can achieve the same effect by rotating the array -90 degrees. You can do this by using rot90 and setting k=-1 or k=3 telling numpy to rotate by 90 k times.
a = np.rot90(a, k=-1)

How to get specific index of np.array of np.arrays fast

At the most basic I have the following dataframe:
a = {'possibility' : np.array([1,2,3])}
b = {'possibility' : np.array([4,5,6])}
df = pd.DataFrame([a,b])
This gives me a dataframe of size 2x1:
like so:
row 1: np.array([1,2,3])
row 2: np.array([4,5,6])
I have another vector of length 2. Like so:
[1,2]
These represent the index I want from each row.
So if I have [1,2] I want: from row 1: 2, and from row 2: 6.
Ideally, my output is [2,6] in a vector form, of length 2.
Is this possible? I can easily run through a for loop, but am looking for FAST approaches, ideally vectors approaches since it is already in pandas/numpy.
For actual use case approximations, I am looking to make this work in the 300k-400k row ranges. And need to run it in optimization problems (hence the fast part)

You could transform to a multi-dimensional numpy array and take_along_axis:
v = np.array([1,2])
a = np.vstack(df['possibility'])
np.take_along_axis(a.T, v[None], axis=0)[0]
output: array([2, 6])

Taking mean along columns with masks in Python

I have a 2D array containing data from some measurements. I have to take mean along each column considering good data only.
Hence I have another 2D array of the same shape which contains 1s and 0s showing whether data at that (i,j) is good or bad. Some of the "bad" data can be nan as well.
def mean_exc_mask(x, mas): #x is the real data arrray
#mas tells if the data at the location is good/bad
sum_array = np.zeros(len(x[0]))
avg_array = np.zeros(len(x[0]))
items_array = np.zeros(len(x[0]))
for i in range(0, len(x[0])): #We take a specific column first
for j in range(0, len(x)): #And then parse across rows
if mas[j][i]==0: #If the data is good
sum_array[i]= sum_array[i] + x[j][i]
items_array[i]=items_array[i] + 1
if items_array[i]==0: # If none of the data is good for a particular column
avg_array[i] = np.nan
else:
avg_array[i] = float(sum_array[i])/items_array[i]
return avg_array
I am getting all values as nan!
Any ideas of what's going on wrong here or someother way?

The code seems to work for me, but you can do it a whole lot simpler by using the build-in aggregation in Numpy:
(x*(m==0)).sum(axis=0)/(m==0).sum(axis=0)
I tried it with:
x=np.array([[-0.32220561, -0.93043128, 0.37695923],[ 0.08824206, -0.86961453, -0.54558324],[-0.40942331, -0.60216952, 0.17834533]])
and
m=array([[1, 1, 0],[1, 0, 0],[1, 1, 1]])
If you post example data, it is often easier to give a qualified answer.

Python - Create an array from columns in file

I have a text file with two columns and n rows. Usually I work with two separate vector using x,y=np.loadtxt('data',usecols=(0,1),unpack=True) but I would like to have them as an array of the form array=[[a,1],[b,2],[c,3]...] where all the letters correspond to the x-vector and the numbers to the y-vector so I can ask something like array[0,2]=b. I tried defining
array[0,:]=x but I didn't succeed. Any simple way to do this?
In addition, I want to get the respective x-value for certain y-value. I tried with
x_value=np.argwhere(array[:,1]==3)
And I'm expecting the x_value to be c because it corresponds to 3 in column 1 but it doesn't work either.

I think you simply need to not unpack the array you get back from loadtxt. Do:
arr = np.loadtxt('data', usecols=(0,1))
If your file contained:
0 1
2 3
4 5
arr will be like:
[[0, 1],
[2, 3],
[4, 5]]
Note that to index into this array, you need to specify the row first (and indexes start at 0):
arr[1,0] == 2 # True!
You can find the x values that correspond to a give y value with:
x_vals = arr[:,0][arr[:,1]==y_val]
The indexing will return an array, though x_vals will have only a single value if the y_val was unique. If you know in advance there will be only one match for the y_val, you could tack on [0] to the end of the indexing, so you get the first result.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to declare and fill an array in NumPy? - python

data1 = np.array([ra,dec,[32]*len(ra)]) Gives a single-line solution to your problem; but for efficiency, allocating an empty array first and then copying in the relevant parts would be preferable, so you avoid the construction of the dummy list.

One thing that nobody has mentioned is that in Python, indexing starts at 0, not 1. This means that if you want to look at the third column of the array, you actually should address [:,2], not [:,3]. Good luck!

Assuming ra and dec are vectors (1-d): data1 = np.concatenate([ra[:, None], dec[:, None], np.zeros((len(ra), 1))+32], axis=1) Or data1 = np.empty((len(ra), 3)) data[:, 0] = ra data[:, 1] = dec data[:, 2] = 32

hey guys if u want to fill an array with just the same number just x_2 = np.ones((1000))+1 exemple for 1000 numbers 2

Related

What is the Numpy slicing notation in this code?

Is there a way to write a python function that will create 'N' arrays? (see body)

How to get specific index of np.array of np.arrays fast

Taking mean along columns with masks in Python

Python - Create an array from columns in file

Categories

Resources