Probably this is a very easy and silly fix, but I cannot think of anyway to solve this issue.
I have a three column array with 2000 elements like so where each column represent x, y, z coordinates.
Final_array =np.zeros([2000,3])
Through some for loops I am trying to populate this array's columns. I was able to populate the first 1000 rows of the y values (second column) with the information of another array y_coords1, but I don't know how to populate the remaining 1000 bottom rows with another array y_coords2. Can someone please help?
# putting y values
for y in range(len(y_coords1)):
Final_array[y, 1] = y_coords1[y]
for w in Final_array[999:2000]:
for y in range(len(y_coords2)):
Final_array[y,1] = y_coords2[y]
print(Final_array)
I have tried some variations on the for loops but I get errors.
I was looking for a different approach to answer this question, but the solution provided by Ulises Bussi worked to solve the issue
FinalArray[:,1] = np.concatenate([y_coords1 ,y_coords2 ],axis=0)
Related
Looking to print the minimum values of numpy array columns.
I am using a loop in order to do this.
The array is shaped (20, 3) and I want to find the min values of columns, starting with the first (i.e. col_value=0)
I have coded
col_value=0
for col_value in X:
print(X[:, col_value].min)
col_value += 1
However, it is coming up with an error
"arrays used as indices must be of integer (or boolean) type"
How do I fix this?
Let me suggest an alternative approach that you might find useful. numpy min() has axis argument that you can use to find min values along various
dimensions.
Example:
X = np.random.randn(20, 3)
print(X.min(axis=0))
prints numpy array with minimum values of X columns.
You don't need col_value=0 nor do you need col_value+=1.
x = numpy.array([1,23,4,6,0])
print(x.min())
EDIT:
Sorry didn't see that you wanted to iterate through columns.
import numpy as np
X = np.array([[1,2], [3,4]])
for col in X.T:
print(col.min())
Transposing the axis of the matrix is one the best solution.
X=np.array([[11,2,14],
[5,15, 7],
[8,9,20]])
X=X.T #Transposing the array
for i in X:
print(min(i))
I have a Pandas dataframe with two columns, x and y, that correspond to a large signal. It is about 3 million rows in size.
Wavelength from dataframe
I am trying to isolate the peaks from the signal. After using scipy, I got a 1D Python list corresponding to the indexes of the peaks. However, they are not the actual x-values of the signal, but just the index of their corresponding row:
from scipy.signal import find_peaks
peaks, _ = find_peaks(y, height=(None, peakline))
So, I decided I would just filter the original dataframe by setting all values in its y column to NaN unless they were on an index found in the peak list. I did this iteratively, however, since it is 3000000 rows, it is extremely slow:
peak_index = 0
for data_index in list(data.index):
if data_index != peaks[peak_index]:
data[data_index, 1] = float('NaN')
else:
peak_index += 1
Does anyone know what a faster method of filtering a Pandas dataframe might be?
Looping in most cases is extremely inefficient when it comes to pandas. Assuming you just need filtered DataFrame that contains the values of both x and y columns only when y is a peak, you may use the following piece of code:
df.iloc[peaks]
Alternatively, if you are hoping to retrieve an original DataFrame with y column retaining its peak values and having NaN otherwise, then please use:
df.y = df.y.where(df.y.iloc[peaks] == df.y.iloc[peaks])
Finally, since you seem to care about just the x values of the peaks, you might just rework the first piece in the following way:
df.iloc[peaks].x
I have a numpy array of 4000*6 (6 column). And I have a numpy column (1*6) of minimum values (made from another numpy array of 3000*6).
I want to find everything in the large array that is below those values. but each value to it's corresponding column.
I've tried the simple way, based on a one column solution I already had:
largearray=[float('nan') if x<min_values else x for x in largearray]
but sadly it didn't work :(.
I can do a for loop for each column and each value, but i was wondering if there is a faster more elegant solution.
Thanks
EDIT: I'll try to rephrase: I have 6 values, and 6 columns.
i want to find the values in each column that are lower then the corresponding one from the 6 values.
by array I mean a 2d array. sorry if it wasn't clear
sorry, i'm still thinking in Matlab a bit.
this my loop solution. It's on df, not numpy. still, is there a faster way?
a=0
for y in dfnames:
df[y]=[float('nan') if x<minvalues[a] else x for x in df[y]]
a=a+1
df is the large array or dataframe
dfnames are the column names i'm interested in.
minvalues are the minimum values for each column. I'm assuming that the order is the same. bad assumption, but works for now.
will appreciate any help making it better
I think you just need
result = largearray.copy()
result[result < min_values] = np.nan
That is, result is a copy of largearray but ay element less than the corresponding column of min_values is set to nan.
If you want to blank entire rows only when all entries in the row are less than the corresponding column of min_values, then you want:
result = largearray.copy()
result[np.all(result < min_values, axis=1)] = np.nan
I don't use numpy, so it may be not commont used solution, but such work:
largearray = numpy.array([[1,2,3], [3,4,5]])
minvalues =numpy.array([3,4,5])
largearray1=[(float('nan') if not numpy.all(numpy.less(x, min_values)) else x) for x in largearray]
result should be: [[1,2,3], 'nan']
So I am trying to create an array and then access the columns by name. So I came up with something like this:
import numpy as np
data = np.ndarray(shape=(3,1000),
dtype=[('x',np.float64),
('y',np.float64),
('z',np.float64)])
I am confused as to why
data.shape
and
data['x'].shape
both come back as (3,1000), this is causing me issues when I'm trying to populate my data fields
data['x'] = xvalues
where xvalues has a shape of (1000,). Is there a better way to do this?
The reason why it comes out the same is because 'data' has a bit more structure than the one revealed by shape.
Example:
data[0][0] returns:
(6.9182540632428e-310, 6.9182540633353e-310, 6.9182540633851e-310)
while data['x'][0][0]:
returns 6.9182540632427993e-310
so data contains 3 rows and 1000 columns, and the element of that is a 3-tuple.
data['x'] is the first element of that tuple of all combinations of 3 rows and 1000 columns, so the shape is (3,1000) as well.
Just set shape=(1000,). The triple dtype will create 3 columns.
I am looking for coding examples to learn Numpy.
Usage would be dtype ='object'.
To construnct array the code used would
a= np.asarray(d, dtype ='object')
not np.asarray(d) or np.asarray(d, dtype='float32')
Is sorting any different than float32/64?
Coming from excel "cell" equations, wrapping my head around Row Column math.
Ex:
A = array([['a',2,3,4],['b',5,6,2],['c',5,1,5]], dtype ='object')
[['a',2,3,4],
['b',5,6,2],
['c',5,1,5]])
Create new array with:
How would I sort high to low by [3].
How calc for entire col. (1,1)- (1,0), Example without sorting A
['b',3],
['c',0]
How calc for enitre array (1,1) - (2,0) Example without sorting A
['b',2],
['c',-1]
Despite the fact that I still cannot understand exactly what you are asking, here is my best guess. Let's say you want to sort A by the values in 3rd column:
A = array([['a',2,3,4],['b',5,6,2],['c',5,1,5]], dtype ='object')
ii = np.argsort(A[:,2])
print A[ii,:]
Here the rows have been sorted according to the 3rd column, but each row is left unsorted.
Subtracting all of the columns is a problem due to the string objects, however if you exclude them, you can for example subtract the 3rd row from the 1st by:
A[0,1:] - A[2,1:]
If I didn't understand the basic point of your question, then please revise it. I highly recommend you take a look at the numpy tutorial and documentation if you have not done so already:
http://docs.scipy.org/doc/numpy/reference/
http://docs.scipy.org/doc/numpy/user/