Looking to print the minimum values of numpy array columns.
I am using a loop in order to do this.
The array is shaped (20, 3) and I want to find the min values of columns, starting with the first (i.e. col_value=0)
I have coded
col_value=0
for col_value in X:
print(X[:, col_value].min)
col_value += 1
However, it is coming up with an error
"arrays used as indices must be of integer (or boolean) type"
How do I fix this?
Let me suggest an alternative approach that you might find useful. numpy min() has axis argument that you can use to find min values along various
dimensions.
Example:
X = np.random.randn(20, 3)
print(X.min(axis=0))
prints numpy array with minimum values of X columns.
You don't need col_value=0 nor do you need col_value+=1.
x = numpy.array([1,23,4,6,0])
print(x.min())
EDIT:
Sorry didn't see that you wanted to iterate through columns.
import numpy as np
X = np.array([[1,2], [3,4]])
for col in X.T:
print(col.min())
Transposing the axis of the matrix is one the best solution.
X=np.array([[11,2,14],
[5,15, 7],
[8,9,20]])
X=X.T #Transposing the array
for i in X:
print(min(i))
Related
I have a numpy array with n row and p columns.
I want to check if a given row is in my array and find the index.
For exemple I have a numpy array like this :
[[1,0,8,7,2,2],[1,3,7,0,3,0],[1,7,1,0,1,0],[1,9,1,0,6,0],[1,8,1,7,9,0],....]
I want to check if this array [6,0,5,8,2,1] is in my numpy array or and where.
Is there a numpy function for that ?
I'm sorry for asking naive question but I'm quite confuse right now.
You can use == and .all(axis=1) to match entire rows, then use numpy.where() to get the index:
import numpy as np
a = np.array([[1,0,8,7,2,2],[1,3,7,0,3,0],[1,7,1,0,1,0],[1,9,1,0,6,0],[1,8,1,7,9,0], [6,0,5,8,2,1]])
b = np.array([6,0,5,8,2,1])
print(np.where((a==b).all(axis=1)))
Output:
(array([5], dtype=int32),)
I have a numpy array of 4000*6 (6 column). And I have a numpy column (1*6) of minimum values (made from another numpy array of 3000*6).
I want to find everything in the large array that is below those values. but each value to it's corresponding column.
I've tried the simple way, based on a one column solution I already had:
largearray=[float('nan') if x<min_values else x for x in largearray]
but sadly it didn't work :(.
I can do a for loop for each column and each value, but i was wondering if there is a faster more elegant solution.
Thanks
EDIT: I'll try to rephrase: I have 6 values, and 6 columns.
i want to find the values in each column that are lower then the corresponding one from the 6 values.
by array I mean a 2d array. sorry if it wasn't clear
sorry, i'm still thinking in Matlab a bit.
this my loop solution. It's on df, not numpy. still, is there a faster way?
a=0
for y in dfnames:
df[y]=[float('nan') if x<minvalues[a] else x for x in df[y]]
a=a+1
df is the large array or dataframe
dfnames are the column names i'm interested in.
minvalues are the minimum values for each column. I'm assuming that the order is the same. bad assumption, but works for now.
will appreciate any help making it better
I think you just need
result = largearray.copy()
result[result < min_values] = np.nan
That is, result is a copy of largearray but ay element less than the corresponding column of min_values is set to nan.
If you want to blank entire rows only when all entries in the row are less than the corresponding column of min_values, then you want:
result = largearray.copy()
result[np.all(result < min_values, axis=1)] = np.nan
I don't use numpy, so it may be not commont used solution, but such work:
largearray = numpy.array([[1,2,3], [3,4,5]])
minvalues =numpy.array([3,4,5])
largearray1=[(float('nan') if not numpy.all(numpy.less(x, min_values)) else x) for x in largearray]
result should be: [[1,2,3], 'nan']
I have numpy array called data of dimensions 150x4
I want to create a new numpy array called mean of dimensions 3x4 by choosing random elements from data.
My current implementation is:
cols = (data.shape[1])
K=3
mean = np.zeros((K,cols))
for row in range(K):
index = np.random.randint(data.shape[0])
for col in range(cols):
mean[row][col] = data[index][col]
Is there a faster way to do the same?
You can specify the number of random integers in numpy.randint (third argument). Also, you should be familiar with numpy.array's index notations. Here, you can access all the elements in one row by : specifier.
mean = data[np.random.randint(0,len(data),3),:]
Does anyone know how to combine integer indices in numpy? Specifically, I've got the results of a few np.wheres and I would like to extract the elements that are common between them.
For context, I am trying to populate a large 3d array with the number of elements that are between boundary values of each cell, i.e. I have records of individual events including their time, latitude and longitude. I want to grid this into a 3D frequency matrix, where the dimensions are time, lat and lon.
I could loop round the array elements doing an np.where(timeCondition & latCondition & lonCondition), population with the length of the where result, but I figured this would be very inefficient as you would have to repeat a lot of the wheres.
What would be better is to just have a list of wheres for each of the cells in each dimension, and then loop through the logically combining them?
as #ali_m said, use bitwise and should be much faster, but to answer your question:
call ravel_multi_index() to convert the multi-dim index into 1-dim index.
call intersect1d() to get the index that in both condition.
call unravel_index() to convert the 1-dim index back to multi-dim index.
Here is the code:
import numpy as np
a = np.random.rand(10, 20, 30)
idx1 = np.where(a>0.2)
idx2 = np.where(a<0.4)
ridx1 = np.ravel_multi_index(idx1, a.shape)
ridx2 = np.ravel_multi_index(idx2, a.shape)
ridx = np.intersect1d(ridx1, ridx2)
idx = np.unravel_index(ridx, a.shape)
np.allclose(a[idx], a[(a>0.2) & (a<0.4)])
or you can use ridx directly:
a.ravel()[ridx]
I have a numpy array a, a.shape=(17,90,144). I want to find the maximum magnitude of each column of cumsum(a, axis=0), but retaining the original sign. In other words, if for a given column a[:,j,i] the largest magnitude of cumsum corresponds to a negative value, I want to retain the minus sign.
The code np.amax(np.abs(a.cumsum(axis=0))) gets me the magnitude, but doesn't retain the sign. Using np.argmax instead will get me the indices I need, which I can then plug into the original cumsum array. But I can't find a good way to do so.
The following code works, but is dirty and really slow:
max_mag_signed = np.zeros((90,144))
indices = np.argmax(np.abs(a.cumsum(axis=0)), axis=0)
for j in range(90):
for i in range(144):
max_mag_signed[j,i] = a.cumsum(axis=0)[indices[j,i],j,i]
There must be a cleaner, faster way to do this. Any ideas?
I can't find any alternatives to argmax but at least you can fasten that with a more vectorized approach:
# store the cumsum, since it's used multiple times
cum_a = a.cumsum(axis=0)
# find the indices as before
indices = np.argmax(abs(cum_a), axis=0)
# construct the indices for the second and third dimensions
y, z = np.indices(indices.shape)
# get the values with np indexing
max_mag_signed = cum_a[indices, y, z]