I am trying to replace a specific row of NaN's in a 3-D array (filled with NaN's) with rows of known integer values from a specific column in a text file (ex: 24 rows of column 8). Is there a method to perform this replacement that I have missed in my search for help?
My most recent trial code (of many) is as follows:
import numpy as np
tfile = "C:\...\Lee_Gilmer_MEM_GA_01_02_2015.txt"
data = np.genfromtxt(tfile, dtype=None)
#creation of empty 24 hour global matrix
s_array = np.empty((24,361,720))
s_array[:] = np.NAN
#Get values from column 8
c_data = data[:,7]
#Replace all 24 NaN's slices of row 1 column 1 with corresponding 24 row values from column 8
s_array[:,0:1,0:1] = c_data
print s_array
This produces a result of:
ValueError: could not broadcast input array from shape (24) into shape (24,1,1)
When I print out the shape of c_data, I get:
(24L,)
Is this at all possible to do without having to use a loop and replacing each one individually?
The error message tells you pretty much everything you need to know: the array slice on the left-hand side of the assignment has a shape of (24,1,1), whereas the right-hand side has shape (24,). Since these shapes don't match, numpy raises a ValueError.
There are two ways to solve this:
Make the shape of the LHS (24,) rather than (24, 1, 1). A nice way to do this would be to index with an integer rather than a slice for the last two dimensions:
s_array[:, 0, 0] = c_data
Reshape c_data to match the shape of the LHS:
s_array[:, 0:1, 0:1] = c_data.reshape(24, 1, 1)
I think option 1 is a lot more readable.
Related
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Can someone explain the second line of code with reference to specific documentation? I know its slicing but the I couldn't find any reference for the notation ":-1" anywhere. Please give the specific documentation portion.
Thank you
It results in slicing, most probably using numpy and it is being done on a data of shape (610, 14)
Per the docs:
Indexing on ndarrays
ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.
1D array
Slicing a 1-dimensional array is much like slicing a list
import numpy as np
np.random.seed(0)
array_1d = np.random.random((5,))
print(len(array_1d.shape))
1
NOTE: The len of the array shape tells you the number of dimensions.
We can use standard python list slicing on the 1D array.
# get the last element
print(array_1d[-1])
0.4236547993389047
# get everything up to but excluding the last element
print(array_1d[:-1])
[0.5488135 0.71518937 0.60276338 0.54488318]
2D array
array_2d = np.random.random((5, 1))
print(len(array_2d.shape))
2
Think of a 2-dimensional array like a data frame. It has rows (the 0th axis) and columns (the 1st axis). numpy grants us the ability to slice these axes independently by separating them with a comma (,).
# the 0th row and all columns
# the 0th row and all columns
print(array_2d[0, :])
[0.79172504]
# the 1st row and everything after + all columns
print(array_2d[1:, :])
[[0.52889492]
[0.56804456]
[0.92559664]
[0.07103606]]
# the 1st through second to last row + the last column
print(array_2d[1:-1, -1])
[0.52889492 0.56804456 0.92559664]
Your Example
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Note that data.shape is >= 2 (otherwise you'd get an IndexError).
This means data[:, :-1] is keeping all "rows" and slicing up to, but not including, the last "column". Likewise, data[:, -1] is keeping all "rows" and selecting only the last "column".
It's important to know that when you slice an ndarray using a colon (:), you will get an array with the same dimensions.
print(len(array_2d[1:, :-1].shape)) # 2
But if you "select" a specific index (i.e. don't use a colon), you may reduce the dimensions.
print(len(array_2d[1, :-1].shape)) # 1, because I selected a single index value on the 0th axis
print(len(array_2d[1, -1].shape)) # 0, because I selected a single index value on both the 0th and 1st axes
You can, however, select a list of indices on either axis (assuming they exist).
print(len(array_2d[[1], [-1]].shape)) # 1
print(len(array_2d[[1, 3], :].shape)) # 2
This slicing notation is explained here https://docs.python.org/3/tutorial/introduction.html#strings
-1 means last element, -2 - second from last, etc. For example, if there are 8 elements in a list, -1 is equivalent to 7 (not 8 because indexing starts from 0)
Keep in mind that "normal" python slicing for nested lists looks like [1:3][5:7], while numpy arrays also have a slightly different syntax ([8:10, 12:14]) that lets you slice multidimensional arrays. However, -1 always means the same thing. Here is the numpy documentation for slicing https://numpy.org/doc/stable/user/basics.indexing.html
I have an numpy array that is shape 20, 3. (So 20 3 by 1 arrays. Correct me if I'm wrong, I am still pretty new to python)
I need to separate it into 3 arrays of shape 20,1 where the first array is 20 elements that are the 0th element of each 3 by 1 array. Second array is also 20 elements that are the 1st element of each 3 by 1 array, etc.
I am not sure if I need to write a function for this. Here is what I have tried:
Essentially I'm trying to create an array of 3 20 by 1 arrays that I can later index to get the separate 20 by 1 arrays.
a = np.load() #loads file
num=20 #the num is if I need to change array size
num_2=3
for j in range(0,num):
for l in range(0,num_2):
array_elements = np.zeros(3)
array_elements[l] = a[j:][l]
This gives the following error:
'''
ValueError: setting an array element with a sequence
'''
I have also tried making it a dictionary and making the dictionary values lists that are appended, but it only gives the first or last value of the 20 that I need.
Your array has shape (20, 3), this means it's a 2-dimensional array with 20 rows and 3 columns in each row.
You can access data in this array by indexing using numbers or ':' to indicate ranges. You want to split this in to 3 arrays of shape (20, 1), so one array per column. To do this you can pick the column with numbers and use ':' to mean 'all of the rows'. So, to access the three different columns: a[:, 0], a[:, 1] and a[:, 2].
You can then assign these to separate variables if you wish e.g. arr = a[:, 0] but this is just a reference to the original data in array a. This means any changes in arr will also be made to the corresponding data in a.
If you want to create a new array so this doesn't happen, you can easily use the .copy() function. Now if you set arr = a[:, 0].copy(), arr is completely separate to a and changes made to one will not affect the other.
Essentially you want to group your arrays by their index. There are plenty of ways of doing this. Since numpy does not have a group by method, you have to horizontally split the arrays into a new array and reshape it.
old_length = 3
new_length = 20
a = np.array(np.hsplit(a, old_length)).reshape(old_length, new_length)
Edit: It appears you can achieve the same effect by rotating the array -90 degrees. You can do this by using rot90 and setting k=-1 or k=3 telling numpy to rotate by 90 k times.
a = np.rot90(a, k=-1)
I have a (18, 10525) numpy array.
18 columns with 10525 rows, but the number of rows is not always the same and I must slice the array into 18 columns and groups or windows of 200 rows to feed it to AI.
For example I would like to do
data = np.ones((18, 10525))
data.reshape(-1,18,200)
But 10525 isn't divisible by 200 so I get a ValueError. I would like to get a zero padded array of shape (-1,18,200). I.e. add zeros to data until I can do .reshape(-1,18,200). Thanks in advance.
Assuming you want to fill with zeros here is your solution
data = np.ones((18, 10525))
old_size = np.prod(data.shape)
rounded_up_size = (old_size//(18*200)+1)*18*200
reshaped_arr = np.empty(rounded_up_size)
reshaped_arr[:old_size] = data.reshape(-1)
reshaped_arr[old_size:] = 0
reshaped_arr.reshape(-1,18,200)
Notice that I avoided copying all the data. It's just a view on the old data.
I have this numpy array
data = np.array([10.66252794 10.65999505 10.65745968 10.65492432 10.65239142 10.64985606
10.64732069 10.64478533 10.64225243 10.63971707 10.6371817 10.6346488
10.63211344 10.62957807 10.62704518 10.62450981 10.62197445 10.61944155
10.61690619 10.61437082])
I want the values in data to be in the p-th column of the array result.
Just to clarify, I want to achieve the same as Matlab's result(:,p)
I tried
result[..., p] = data
but this gives me
ValueError: could not broadcast input array from shape (20) into shape ()
Isn't numpy's result[..., p] the same as Matlab's result(:,p)
I also tried what it's been suggested here Assigning to columns in NumPy?
But result[...,p] = data[..., 0] puts in result only the first value of data which is 10.66252794
You're trying to assign a column to an apparently empty array. You can only assign data of shape (20,) to any column in result if result is an array with mxn rows and columns, such that the number of rows, m = 20. Like:
result = np.zeros((20,5))
result[:,0] = data #Assigning to column 0
I want to set a column in numpy array to zero at different times, in other words, I have numpy array M with size 5000x500. When I enter shape command the result is (5000,500), I think 5000 are rows and 500 are columns
shape(M)
(5000,500)
But the problem when I want to access one column like first column
Mcol=M[:][0]
Then I check by shape again with new matrix Mcol
shape(Mcol)
(500,)
I expected the results will be (5000,) as the first has 5000 rows. Even when changed the operation the result was the same
shape(M)
(5000,500)
Mcol=M[0][:]
shape(Mcol)
(500,)
Any help please in explaining what happens in my code and if the following operation is right to set one column to zero
M[:][0]=0
You're doing this:
M[:][0] = 0
But you should be doing this:
M[:,0] = 0
The first one is wrong because M[:] just gives you the entire array, like M. Then [0] gives you the first row.
Similarly, M[0][:] gives you the first row as well, because again [:] has no effect.