How to plot joint probability distribution of three elements? - python

I have 3 columns of data. I have used numpy.histogramdd to find joint probability distribution of these elements. I have used no. of bins = 100. Now I want to plot these using matplotlib. The data now have format of 100X100X100. What is the best way to plot?

Related

Histogram with same frequencies

I am plotting a histogram, with another set of data, but the frequencies are all 1, no matter how I change the number of bins. I did this with data generated from a normal distribution in the following fashion
x=npr.normal(0,2,(1,100))
plt.hist(x,bins=10)
and I get the following histogram:
This happens even if I increase the number of simulations to 1000 or 10000.
How do I plot a histogram that displays the bell shape of the normal distribution?
Thanks in advance.
You are ploting one histogram for each column of your input array. That is one histogram with 1 value for each of your 100 columns.
x=npr.normal(0,2,(1,100))
plt.hist(x[0],bins=10)
will do (note that I am selecting the first (and only) row of x).

How to check if a vector hirogramm correlates with uniform distribution?

I have a vector of floats V with values from 0 to 1. I want to create a histogram with some window say A==0.01. And check how close is the resulting histogram to uniform distribution getting one value from zero to one where 0 is correlating perfectly and 1 meaning not correlating at all. For me correlation here first of all means histogram shape.
How one would do such a thing in python with numpy?
You can create the histogram with np.histogram. Then, you can generate the uniform histogram from the average of the previously retrieved histogram with np.mean. Then you can use a statistical test like the Pearson coefficient to do that with scipy.stats.pearsonr.

How to visualize correlation of discrete data using scatter_matrix in python?

for attribute in ['alcohol','chlorides','density']:
compare = wine_data[["quality", attribute]]
plot = pp.scatter_matrix(compare)
plt.show()
I found the following graph. Quality is an integer in the range of 0-10. ['alcohol','chlorides','density'] are continues data. The correlations between ['alcohol','chlorides','density'] and quality are 0.432733,-0.305599 and -0.207202, respectively. How do I understand the three graphs below? Is there a better way to visualize the correlation of discrete datas?
I prefer Seaborn's regplot function - which will graph the same scatterplot you see here along with a regression line on top o fit. The regression line will help you understand whether the correlation is positive or negative (upward / downward sloping) as well as providing error bars in shading around the regression line.
https://seaborn.pydata.org/generated/seaborn.regplot.html

Find density of points from their scatter plot in python

Can I go through equal boxed areas in the scatter plot, so I can calculate how many there are on average on each box?
Or, is there a specific function in python to calculate this?
I don't want a colored density plot, but a number that represents the density of these points in the scatter plot.
Here is for example a plot of the eigenvalues of a random matrix:
How would I find their density?
from scipy import linalg as la
e = la.eigvals(my_matrix)
hist,xedges,yedges = np.histogram2d(e.real,e.imag,bins=40,normed=False)
So in this case, 'hist' would be a 40x40 array (since bins=40). Its elements are the number of eigenvalues for each bin.
Thanks to #jepio and #plonser for the comments.

How to calculate difference between sparse histogram in Python

I am using 3D histogram to compare a superpixel region in two images taken from a video sequence. Based on histogram threshold, I want to classify regions as similar or dissimilar.
I was using chi square distance to compare the two histograms. But I saw that chi square distance should be used only for dense histograms.
My histogram is sparse with a lot of bins containing zero entries.
Can you suggest me the best way to compare these histograms in Python?

Categories