Normalise max value of probability function for all frames - python

I have working code that plots a bivariate gaussian distribution. The distribution is produced by adjusting the COV matrix to account for specific variables. Specifically, every XY coordinate is applied with a radius. The COV matrix is then adjusted by a scaling factor to expand the radius in x-direction and contract in y-direction. The direction of this is measured by theta. The output is expressed as a probability density function (PDF).
I have normalised the PDF values. However, I'm calling a separate PDF for each frame. As such, the maximum value changes and hence the probability will be transformed differently for each frame.
Question: Using #Prasanth's suggestion. Is it possible to create normalized arrays for each frame before plotting, and then plot these arrays?
Below is the function I'm currently using to normalise the PDF for a single frame.
normPDF = (PDFs[0]-PDFs[1])/max(PDFs[0].max(),PDFs[1].max())

Is it possible to create normalized arrays for each frame before plotting, and then plot these arrays?
Indeed is possible. In your case you probably need to rescale your arrays between two values, say -1 and 1, before plotting. So that the minimum becomes -1, the maximum 1 and the intermediate values are scaled accordingly.
You could also choose 0 and 1 or whatever as minimum and maximum, but let's go with -1 and 1 so that a the middle value is 0.
To do this, in your code replace:
normPDF = (PDFs[0]-PDFs[1])/max(PDFs[0].max(),PDFs[1].max())
with:
renormPDF = PDFs[0]-PDFs[1]
renormPDF -= renormPDF.min()
normPDF = (renormPDF * 2 / renormPDF.max()) -1
This three lines ensure that normPDF.min() == -1 and normPDF.max() == 1.
Now when plotting the animation the axis on the right of your image does not change.

Your problem is to find the maximum values of PDFs[0].max() and PDFs[1].max() for all frames.
Why don't you run plotmvs on all your planned frames in order to find the absolute maximum for PDFs[0] and PDFs[1] and then run your animation with these absolute maxima to normalize your plots? This way, the colorbar will be the same for all frames.

Related

Clustering on evenly spaced grid points

I have a 50 by 50 grid of evenly spaced (x,y) points. Each of these points has a third scalar value. This can be visualized using a contourplot which I have added. I am interested in the regions indicated in by the red circles. These regions of low "Z-values" are what I want to extract from this data.
2D contour plot of 50 x 50 evenly spaced grid points:
I want to do this by using clustering (machine learning), which can be lightning quick when applied correctly. The problem is, however, that the points are evenly spaced together and therefore the density of the entire dataset is equal everywhere.
I have tried using a DBSCAN algorithm with a custom distance metric which takes into account the Z values of each point. I have defined the distance between two points as follows:\
def custom_distance(point1,point2):
average_Z = (point1[2]+point2[2])/2
distance = np.sqrt(np.square((point1[0]-point2[0])) + np.square((point1[1]-point2[1])))
distance = distance * average_Z
return distance
This essentially determines the Euclidean distance between two points and adds to it the average of the two Z values of both points. In the picture below I have tested this distance determination function applied in a DBSCAN algorithm. Each point in this 50 by 50 grid each has a Z value of 1, except for four clusters that I have randomly placed. These points each have a z value of 10. The algorithm is able to find the clusters in the data based on their z value as can be seen below.
DBSCAN clustering result using scalar value distance determination:
Positive about the results I tried to apply it to my actual data, only to be disappointed by the results. Since the x and y values of my data are very large, I have simply scaled them to be 0 to 49. The z values I have left untouched. The results of the clustering can be seen in the image below:
Clustering result on original data:
This does not come close to what I want and what I was expecting. For some reason the clusters that are found are of rectangular shape and the light regions of low Z values that I am interested in are not extracted with this approach.
Is there any way I can make the DBSCAN algorithm work in this way? I suspect the reason that it is currently not working has something to do with the differences in scale of the x,y and z values. I am also open for tips or recommendations on other approaches on how to define and find the lighter regions in the data.

How to check if a vector hirogramm correlates with uniform distribution?

I have a vector of floats V with values from 0 to 1. I want to create a histogram with some window say A==0.01. And check how close is the resulting histogram to uniform distribution getting one value from zero to one where 0 is correlating perfectly and 1 meaning not correlating at all. For me correlation here first of all means histogram shape.
How one would do such a thing in python with numpy?
You can create the histogram with np.histogram. Then, you can generate the uniform histogram from the average of the previously retrieved histogram with np.mean. Then you can use a statistical test like the Pearson coefficient to do that with scipy.stats.pearsonr.

Calculate Divergence of Velocity Field (3D) in Python

I am trying to calculate the divergence of a 3D velocity field in a multi-phase flow setting (with solids immersed in a fluid). If we assume u,v,w to be the three velocity components (each a n x n x n) 3D numpy array, here is the function I have for calculating divergence:
def calc_divergence_velocity(df,h=0.025):
"""
#param df: A dataframe with the entire vector field with columns [x,y,z,u,v,w] with
x,y,z indicating the 3D coordinates of each point in the field and u,v,w
the velocities in the x,y,z directions respectively.
#param h: This is the dimension of a single side of the 3D (uniform) grid. Used
as input to numpy.gradient() function.
"""
"""
Reshape dataframe columns to get 3D numpy arrays (dim = 80) so each u,v,w is a
80x80x80 ndarray.
"""
u = df['u'].values.reshape((dim,dim,dim))
v = df['v'].values.reshape((dim,dim,dim))
w = df['w'].values.reshape((dim,dim,dim))
#Supply x,y,z coordinates appropriately.
#Note: Only a scalar `h` has been supplied to np.gradient because
#the type of grid we are dealing with is a uniform grid with each
#grid cell having the same dimensions in x,y,z directions.
u_grad = np.gradient(u,h,axis=0) #central diff. du_dx
v_grad = np.gradient(v,h,axis=1) #central diff. dv_dy
w_grad = np.gradient(w,h,axis=2) #central diff. dw_dz
"""
The `mask` column in the dataframe is a binary column indicating the locations
in the field where we are interested in measuring divergence.
The problem I am looking at is multi-phase flow with solid particles and a fluid
hence we are only interested in the fluid locations.
"""
sdf = df['mask'].values.reshape((dim,dim,dim))
div = (u_grad*sdf) + (v_grad*sdf) + (w_grad*sdf)
return div
The problem I'm having is that the divergence values that I am seeing are far too high.
For example the image below showcases, a distribution with values between [-350,350] whereas most values should technically be close to zero and somewhere between [20,-20] in my case. This tells me I'm calculating the divergence incorrectly and I would like some pointers as to how to correct the above function to calculate the divergence appropriately. As far as I can tell (please correct me if I'm wrong), I think have done something similar to this upvoted SO response. Thanks in advance!

2D histogram colour by "label fraction" of data in each bin

Following on from the post found here: 2D histogram coloured by standard deviation in each bin
I would like to colour each bin in a 2D grid by the fraction of points whose label values are below a certain threshold in Python.
Note that, in this dataset, each point has a continuous label value between 0-1.
For example here is a histogram I made whereby the colour denotes the standard deviation of label values of all points in each bin:
The way this was done was by using
scipy.stats.binned_statistic_2d()
(see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic_2d.html)
..and setting the statistic argument to 'std'
But is there a way to change this kind of plot so that the colouring is representative of the fraction of points in each bin with label value below 0.5 for example?
It could be that the only way to do this is by explicitly defining a grid of some kind and calculating the fractions but I'm not sure of the best way to do that so any help on this matter would be greatly appreciated!
Maybe using scipy.stats.binned_statistic_2d or numpy.histogram2d and being able to return the raw data values in each bin as a multi dimensional array would help in being able to quickly compute the fractions explicitly.
The fraction of elements in an array below a threshold can be calculated as
fraction = lambda a, threshold: len(a[a<threshold])/len(a)
Hence you can call
scipy.stats.binned_statistic_2d(x, y, values, statistic=lambda a: fraction(a, 0.5))

How to obtain a correlogram using Moran's I values at different lag distances

I am new to calculating these values and am having a hard time figuring out how to calculate a (global?) Moran's I value for an increasing neighbour distance between points. Specifically, I'm not really sure how to set this lag/neighbour distance so that I can plot a correlogram.
The data I have is for the variation of single parameter in a 2D list (matrix). This can be plotted simply as a colorplot where the axes represent the points/pixels in each direction of the image, and the colormap shows the value of this parameter for each box across the 2D surface. As they seem to be clumping, I would like to see how long this 'parameter clump length' is using a correlogram.
So far I have managed to create another colorplot which I don't know exactly how to interpret.
y = 2D_Array
w = pysal.lat2W(rows,cols,rook=False,id_type="float")
lm = pysal.Moran_Local(y,w)
moran_significance = np.reshape(lm.p_sim,np.shape(ListOrArray))
plt.imshow(moran_significance)
I have also managed to obtain the global Moran I value by converting the 2D_Array into a 1D list.
y = 1D_List
w = pysal.lat2W(rows,cols)
mi = pysal.Moran(y,w,two_tailed=False)
But what I am really looking for is, how does I change when looking at how the parameter changes for neighbour n = 1,2,3,4,... where n = 1 is the nearest neighbour and n = 2 is the next nearest, and so on. Here is an example of what I'd like: https://creativesciences.files.wordpress.com/2015/05/morins-i-e1430616786173.png

Categories