I have a pretty big numpy matrix (2-D array with more than 1000 * 1000 cells), and another 2-D array of indexes in the following form: [[x1,y1],[x2,y2],...,[xn,yn]], which is also quite large (n > 1000). I want to extract all the cells in the matrix that their (x,y) coordinates appear in the array as efficient as possible, i.e. without loops. If the array was an array of tuples I could just to
cells = matrix[array]
and get what I want, but the array is not in that format, and I couldn't find an efficient way to convert it to the desired form...
You can make your array into a tuple of arrays like this:
tuple(array.T)
This matches the output style of np.where(), which can be indexed.
cells=matrix[tuple(array.T)]
you can also do standard numpy array slicing and get Divakar's answer in the comments:
cells=matrix[array[:,0],array[:,1]]
Related
I am a bit new to python and I have a large list of data whose shape is as such:
print(mydata.shape)
>>> (487, 24, 12 , 13)
One index of this array itself looks like this:
[4.22617843e-11 5.30694273e-11 3.73693923e-11 2.09353628e-11
2.42581129e-11 3.87492538e-11 2.34626762e-11 1.87155829e-11
2.99512706e-11 3.32095254e-11 4.91165476e-11 9.57019117e-11
7.86496424e-11]]]]
I am trying to take all the elements from this multi-dimensional array and put it into a one-dimensional one so I can graph it.
I would appreciate any help. Thank you.
mydata.ravel()
Will make a new array, flattened to shape:
(487*24*12*13,)
or...
(1823328,)
You can do this by using flatten()
mydata.flatten(order='C')
And order types depend on the following:
order: The order in which items from the NumPy array will be read.
‘C’: Read items from array row wise i.e. using C-like index order.
‘F’: Read items from array column wise i.e. using Fortran-like index
order.
‘A’: Read items from an array based on the memory order of items
I have a pandas dataframe for lists. And each one of the lists can use np.asarray(list) to convert the list to a numpy array. The shape of the array should be (263,300) ,so i do this
a=dataframe.to_numpy()
# a.shape is (100000,)
output_array=np.array([])
for list in a:
output_array=np.append(output_array,np.asarray(list))
Since there are 100000 rows in my pandas, so i expect to get
output_array.shape is (100000,263,300)
It works, but it takes long time.
I want to know which part of my code cost the most and how to solve it.
Is there a more efficient method to reach this? Thanks!
I am looking for an efficient and cleaner way to achieve this as I can easily do this through list comprehension.
I have a large 2D numpy array A of shape 20,000x500 and a vector numpy array B of shape 20,000.
I want to concatenate B with each column of A to create a list of length 500 where each element of the list is as follows:
np.hstack(A[:, i], B.reshape(-1,1)) where i is in range(0,500)
I am currently using list comprehension which makes it appear messy.
Edit:
Output is list of length 500 with each element of list an array of shape 20,000x2
I have written a code that generates a large 3d numpy array of data observations (floats). The dimensions are (33,000 x 2016 x 53), which corresponds to (#obs.locations x 5min_intervals_perweek x weeks_in_leapyear). It is very sparse (about 1.5% of entries are filled).
Currently I do this by calling:
my3Darray = np.zeros(33000,2016,53)
or
my3Darray = np.empty(33000,2016,53)
My loop then indexes into the array one entry at a time and updates 1.5% with floats (this part is actually very fast). I then need to:
Save each 2D (33000 x 2016) slice as a CSV or other 'general format' data file
Take the mean over the 3rd dimension (so I should get a 33000 x 2016 matrix)
I have tried saving with:
for slice_2d_week_i in xrange(nweeks):
weekfile = str(slice_2d_week_i)
np.savetxt(weekfile, my3Darray[:,:,slice_2d_week_i], delimiter=",")
However, this is extremely slow and the empty entries in the output show up as
0.000000000000000000e+00
which makes the file sizes huge.
Is there a more efficient way to save (possibly leaving blanks for entries that were never updated?) Is there a better way to allocate the array besides np.zeros or np.empty? And how can I take the mean over the 3rd dimension while ignoring non-updated entries ( mean(my3Darray,3) does not ignore the 0 entries ).
You can save in one of numpy's binary formats, here's one I use: np.savez.
You can average with np.sum(a, axis=2) / np.sum(a != 0, axis=2). Keep in mind that this will still give you NaN's when there are zeros in the denominator.
Does anyone know how to combine integer indices in numpy? Specifically, I've got the results of a few np.wheres and I would like to extract the elements that are common between them.
For context, I am trying to populate a large 3d array with the number of elements that are between boundary values of each cell, i.e. I have records of individual events including their time, latitude and longitude. I want to grid this into a 3D frequency matrix, where the dimensions are time, lat and lon.
I could loop round the array elements doing an np.where(timeCondition & latCondition & lonCondition), population with the length of the where result, but I figured this would be very inefficient as you would have to repeat a lot of the wheres.
What would be better is to just have a list of wheres for each of the cells in each dimension, and then loop through the logically combining them?
as #ali_m said, use bitwise and should be much faster, but to answer your question:
call ravel_multi_index() to convert the multi-dim index into 1-dim index.
call intersect1d() to get the index that in both condition.
call unravel_index() to convert the 1-dim index back to multi-dim index.
Here is the code:
import numpy as np
a = np.random.rand(10, 20, 30)
idx1 = np.where(a>0.2)
idx2 = np.where(a<0.4)
ridx1 = np.ravel_multi_index(idx1, a.shape)
ridx2 = np.ravel_multi_index(idx2, a.shape)
ridx = np.intersect1d(ridx1, ridx2)
idx = np.unravel_index(ridx, a.shape)
np.allclose(a[idx], a[(a>0.2) & (a<0.4)])
or you can use ridx directly:
a.ravel()[ridx]