counting points in grid cells in python, np.histogramdd - python

I have a numpy array including the coordinates of the points in 3-dimensional space:
import numpy as np
testdata=np.array([[0.5,0.5,0.5],[0.6,0.6,0.6],[0.7,0.7,0.7],[1.5,0.5,0.5],[1.5,0.6,0.6],[0.5,1.5,0.5],[0.5,1.5,1.5]])
Each row for one particle including 3 coordinates (x y z).There are 8 points in this example. is there any python package for griding the 3D space, then counting the particles in each cell?
I tried np.histogramdd in this way
xcoord=testdata[:,0]
ycoord=testdata[:,1]
zcoord=testdata[:,2]
xedg=[0,1,2]
yedg=[0,1,2]
zedg=[0,1,2]
histo=np.histogramdd([xcoord,ycoord,zcoord],bins=(xedg,yedg,zedg),range=[[0,2],[0,2],[0,2]])
and it seems it is working but the indexing is strange. I mean the final array that np.histogramdd returns has no meaningful indexing regarding the original coordinates. is there any other way for griding the 3d space and count the number of points in each cell?

Not sure if this is what you are needing but you can use Pandas.
import pandas as pd
coords = [[1,2,3],[4,5,6],[7,8,9]]
df_coords = pd.DataFrame(coords)
df_coords.count()

Related

Histogram of 2D arrays and determine array which contains highest and lowest values

I have a 2D array of shape 5 and 10. So 5 different arrays with 10 values. I am hoping to get a histogram and see which array is on the lower end versus higher end of a histogram. Hope that makes sense. I am attaching an image of an example of what I mean (labeled example).
Looking for one histogram but the histogram is organized by the distribution of the highest and lowest of each array.
I'm having trouble doing this with Python. I tried a few ways of doing this:
# setting up 2d array
import numpy as np
from scipy import signal
np.random.seed(1234)
array_2d = np.random.random((5,20))
I thought you could maybe just plot all the histograms of each array (5 of them) like this:
for i in range(5):
plt.hist(signal.detrend(array_2d[i,:],type='constant'),bins=20)
plt.show()
And then looking to see which array's histogram is furthest to the right or left, but not sure if that makes too much sense...
Then also considered using .ravel to make the 2D array into a 1D array which makes a nice histogram. But all the values within each array are being shifted around so it's difficult to tell which array is on the lower or higher end of the histogram:
plt.hist(signal.detrend(array_2d.ravel(),type='constant'),bins=20)
plt.xticks(np.linspace(-1,1,10));
How might I get a histogram of the 5 arrays (shape 5, 10) and get the range of the arrays with the lowest values versus array with highest values?
Also please let me know if this is unclear or not possible at all too haha. Thanks!
Maybe you could use a kdeplot? This would replace each input value with a small Gaussian curve and sum them.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(1234)
array_2d = np.random.random((5, 20))
sns.kdeplot(data=pd.DataFrame(array_2d.T, columns=range(1, 6)), palette='Set1', multiple='layer')

Python: Convert 2d point cloud to grayscale image

I have an array of variable length filled with 2d coordinate points (coming from a point cloud) which are distributed around (0,0) and i want to convert them into a 2d matrix (=grayscale image).
# have
array = [(1.0,1.1),(0.0,0.0),...]
# want
matrix = [[0,100,...],[255,255,...],...]
how would i achieve this using python and numpy
Looks like matplotlib.pyplot.hist2d is what you are looking for.
It basically bins your data into 2-dimensional bins (with a size of your choice).
here the documentation and a working example is given below.
import numpy as np
import matplotlib.pyplot as plt
data = [np.random.randn(1000), np.random.randn(1000)]
plt.scatter(data[0], data[1])
Then you can call hist2d on your data, for instance like this
plt.hist2d(data[0], data[1], bins=20)
note that the arguments of hist2d are two 1-dimensional arrays, so you will have to do a bit of reshaping of our data prior to feed it to hist2d.
Quick solution using only numpy without the need for matplotlib and therefor plots:
import numpy as np
# given a 2dArray "array" and a desired image shape "[x,y]"
matrix = np.histogram2d(array[:,0], array[:,1], bins=[x,y])

scipy ND Interpolating over NaNs

I have been trouble working out how to use the scipy.interpolate functions (either LinearNDInterpolator, griddata or Preferably NearestNDInterpolator)
There are some tutorials online but i am confused what form my data needs to be in.
The online documentation for nearestND is terrible.
The function asks for:
x : (Npoints, Ndims) ndarray of floats
Data point coordinates.
y : (Npoints,) ndarray of float or complex
Data point values.
I have data in the form: lat,long,data,time held within an xarray dataset. There are some gaps in the data I would like to fill in.
I don't understand how to tell the function my x points.
i have tried (lat,long) as a tuple and np.meshgrid(lat,long) but can't seem to get it going.
Any help on how i can pass my lat,long coordinates into the function? Bonus points for time coordinates as well to make the estimates more robust through the third dimension.
Thanks!
i have tried (lat,long) as a tuple
If lat and long are 1D arrays or lists, try this:
points = np.array((lat, long)).T # make a 2D array of shape Npoints x 2
nd = NearestNDInterpolator(points, data)
The you can compute interpolated values as nd(lat1, long1), etc.
Scipy provides multivariate interpolation methods for both unstructured data and data point regularly placed on a grid. Unstructured data means the data could be provided as a list of non-ordered points. It seems that your data is structured: it is an array of size (480, 2040). However, the NearestNDInterpolator works on unstructured data. The flatten method can be used to transform the array to a list (1d) of value (of length 480*2040). The same have to be done for the coordinates. meshgrid is used to have the coordinates for every points of the grid, and again flatten is used to obtain a "list" of 2d coordinates (an array of shape 480*2040 x 2).
Here is an example which go from structured data to unstructured:
import numpy as np
lat = np.linspace(2, 6, 10)
lon = np.linspace(5, 9, 14)
latM, lonM = np.meshgrid(lat, lon) # M is for Matrix
dataM = np.sin(latM)*np.cos(lonM) # example of data, Matrix form
from scipy.interpolate import NearestNDInterpolator
points = np.array((latM.flatten(), lonM.flatten())).T
print( points.shape )
# >>> (140, 2)
f_nearest = NearestNDInterpolator(points, dataM.flatten())
f_nearest(5, 5)
Working with NaNs should not be a big problem in this case, because it is just a missing point in the list, except that the coordinates of the missing points have to be removed from the list too.

numpy create 2D mask from list of indices [+ then draw from masked array]

I have a 2-D array of values and need to mask certain elements of that array (with indices taken from a list of ~ 100k tuple-pairs) before drawing random samples from the remaining elements without replacement.
I need something that is both quite fast/efficient (hopefully avoiding for loops) and has a small memory footprint because in practice the master array is ~ 20000 x 20000.
For now I'd be content with something like (for illustration):
xys=[(1,2),(3,4),(6,9),(7,3)]
gxx,gyy=numpy.mgrid[0:100,0:100]
mask = numpy.where((gxx,gyy) not in set(xys)) # The bit I can't get right
# Now sample the masked array
draws=numpy.random.choice(master_array[mask].flatten(),size=40,replace=False)
Fortunately for now I don't need the x,y coordinates of the drawn fluxes - but bonus points if you know an efficient way to do this all in one step (i.e. it would be acceptable for me to identify those coordinates first and then use them to fetch the corresponding master_array values; the illustration above is a shortcut).
Thanks!
Linked questions:
Numpy mask based on if a value is in some other list
Mask numpy array based on index
Implementation of numpy in1d for 2D arrays?
You can do it efficently using sparse coo matrix
from scipy import sparse
xys=[(1,2),(3,4),(6,9),(7,3)]
coords = zip(*xys)
mask = sparse.coo_matrix((numpy.ones(len(coords[0])), coords ), shape= master_array.shape, dtype=bool)
draws=numpy.random.choice( master_array[~mask.toarray()].flatten(), size=10)

Numpy Index values for each element in 3D array

I have a 3D array created using the numpy mgrid command so that each element has a certain value and the indexes retain the spatial information. For example, if one summed over the z-axis (3rd dimension) then the the resultant 2D array could be used in matplotlib with the function imshow() to obtain an image with different binned pixel values.
My question is: How can I obtain the index values for each element in this grid (a,b,c)?
I need to use the index values to calculate the relative angle of each point to the origin of the grid. (eg. theta=sin-1(sqrt(x^2+y^2)/sqrt(x^2+y^2+z^2))
Maybe this can be translated to another 3D grid where each element is the array [a,b,c]?
I'm not exactly clear on your meaning, but if you are looking for 3d arrays that contain the indices x, y, and z, then the following may suit your needs; assume your data is held in a 3D array called "abc":
import numpy as nm
x,y,z = nm.mgrid[[slice(dm) for dm in abc.shape]]

Categories