Related
I'm trying to plot via:
g = sns.jointplot(x = etas, y = vs, marginal_kws=dict(bins=100), space = 0)
g.ax_joint.set_xscale('log')
g.ax_joint.set_yscale('log')
g.ax_joint.set_xlim(0.01)
g.ax_joint.set_ylim(0.01)
g.ax_joint.set_xlabel(r'$\eta$')
g.ax_joint.set_ylabel("V")
plt.savefig("simple_scatter_plot_Seanborn.png",figsize=(8,8), dpi=150)
Which leaves me with the following image:
This is not what I want. Why are the histograms filled at the end? There are no data points there so I don't get it...
You're setting a log scale on the matplotlib axes, but by the time you are doing that, seaborn has already computed the histogram. So the equal-width bins in linear space appear to have different widths; the lowest bin has a narrow range in terms of actual values, but that takes up a lot of space on the horizontal plot.
Tested in python 3.10, matplotlib 3.5.1, seaborn 0.11.2
Solution: pass log_scale=True to the histograms:
import seaborn as sns
# test dataset
planets = sns.load_dataset('planets')
g = sns.jointplot(data=planets, x="orbital_period", y="distance", marginal_kws=dict(log_scale=True))
without using marginal_kws=dict(log_scale=True)
Compared to setting the scale after the plot is created.
g = sns.jointplot(data=planets, x="orbital_period", y="distance")
g.ax_joint.set_xscale('log')
g.ax_joint.set_yscale('log')
I am plotting from a pandas dataframe with commands like
fig1 = plt.hist(dataset_1[dataset_1>-1.0],bins=bins,alpha=0.75,label=label1,normed=True)
and the plots comprise multiple histograms on one canvas. Since each histogram is normalised to its own integral (hence the histograms have the same area, because the purpose of the histograms is to illustrate the shape of the datasets rather than their relative sizes), the numbers on the y axis are not meaningful. For now, I am suppressing y axis labelling using
axes.set_ylabel("(Normalised to unity)")
axes.get_yaxis().set_ticks([])
Is there a way of adjusting the scaling of the y axis such that "1" corresponds to the highest value on any histogram? This would display a vertical scale to guide the eye and with which to judge the relative values of different bins. In essence, I mean re-normalising the maximum displayed y value without affecting the scaling of the histograms (i.e. decoupling the axis scale from what it represents).
You have two options:
Drawing histogram, adjusting y axis tick.
You may set the y tick to the location of the maximum and label it with 1 afterwards.
import numpy as np; np.random.seed(1)
import matplotlib.pyplot as plt
a = np.random.rayleigh(scale=3, size=2000)
hist, edges,_ = plt.hist(a, ec="k")
plt.yticks([0,hist.max()], [0,1])
plt.show()
Normalizing histogram, drawing to scale.
You may normalize the histogram in the way you desire by first calculating the histogram, dividing it by its maximum and then plot a bar plot of it.
import numpy as np; np.random.seed(1)
import matplotlib.pyplot as plt
a = np.random.rayleigh(scale=3, size=2000)
hist, edges = np.histogram(a)
hist = hist/float(hist.max())
plt.bar(edges[1:], hist, width=np.diff(edges)[0], align="edge", ec="k")
plt.yticks([0,1])
plt.show()
The output in both cases would be the same:
As explained by Joe Kington answering in this question : How can I make a scatter plot colored by density in matplotlib, I made a scatter plot colored by density. However, due to the complex distribution of my data, I would like to change the parameters used to calculate the density.
Here is the results with some fake data similar to mine :
I would want to calibrate the density calculations of gaussian_kde so that the left part of the plot looks like this :
I don't like the first plot because the groups of points influence the density of adjacent groups of points and that prevents me from analyzing the distribution within a group. In other words, even if each of the 8 groups have exactly the same distribution, that won't be visible on the graph.
I tried to modify the covariance_factor (like I once did for a 2d plot of density over x), but when gaussian_kde is used with multiple dimension arrays it returns a numpy.ndarray, not a "scipy.stats.kde.gaussian_kde" object. Plus, I don't even know if changing the covariance_factor will do it.
Here's my dummy code :
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
a = np.random.normal(size=1000)
b = np.random.normal(size=1000)
# Data for the first image
x = np.concatenate((a+10,a+10,a+20,a+20,a+30,a+30,a+40,a+40,a+80))
y = np.concatenate((b+10,b-10,b+10,b-10,b+10,b-10,b+10,b-10,b*4))
# Data for the second image
#x = np.concatenate((a+10,a+10,a+20,a+20,a+30,a+30,a+40,a+40))
#y = np.concatenate((b+10,b-10,b+10,b-10,b+10,b-10,b+10,b-10))
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
# My unsuccesfull try to modify covariance which would work in 1D with "z = gaussian_kde(x)"
#z.covariance_factor = lambda : 0.01
#z._compute_covariance()
# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=50, edgecolor='')
plt.show()
The solution could use an other density calculator, I don't mind.
The goal is to make a density plot like the ones showed above, where I can play with the density parameters.
I'm using python 3.4.3
Did have a look at Seaborn? It's not exactly what you're asking for, but it already has functions for generating density plots:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import kendalltau
import seaborn as sns
# Generate fake data
a = np.random.normal(size=1000)
b = np.random.normal(size=1000)
# Data for the first image
x = np.concatenate((a+10, a+10, a+20, a+20, a+30, a+30, a+40, a+40, a+80))
y = np.concatenate((b+10, b-10, b+10, b-10, b+10, b-10, b+10, b-10, b*4))
sns.jointplot(x, y, kind="hex", stat_func=kendalltau)
sns.jointplot(x, y, kind="kde", stat_func=kendalltau)
plt.show()
It gives:
and
I'm new to Python and having some trouble with matplotlib. I currently have data that is contained in two numpy arrays, call them x and y, that I am plotting on a scatter plot with coordinates for each point (x, y) (i.e I have points x[0], y[0] and x1, y1 and so on on my plot). I have been using the following code segment to color the points in my scatter plot based on the spatial density of nearby points (found this on another stackoverflow post):
http://prntscr.com/abqowk
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
x = np.random.normal(size=1000)
y = x*3 + np.random.normal(size=1000)
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
idx = z.argsort()
fig,ax = plt.subplots()
ax.scatter(x,y,c=z,s=50,edgecolor='')
plt.show()
Output:
I've been using it without being sure exactly how it works (namely the point density calculation - if someone could explain how exactly that works, would also be much appreciated).
However, now I'd like to color code by the ratio of the spatial density of points in x,y to that of the spatial density of points in another set of numpy arrays, call them x2, y2. That is, I would like to make a plot such that I can identify how the density of points in x,y compares to the points in x2,y2 on the same scatter plot. Could someone please explain how I could go about doing this?
Thanks in advance for your help!
I've been trying to do the same thing based on that same earlier post, and I think I just figured it out! The trick is to use matplotlib.colors.Normalize() to define a scale and then weight it according to some data set (xnorm,ynorm):
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mplc
import matplotlib.cm as cm
from scipy.stats import gaussian_kde
def kdeplot(x,y,xnorm,ynorm):
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
wt = 1.0*len(x)/(len(xnorm)*1.0)
norm = mplc.Normalize(vmin=0, vmax=8/wt)
cmap = cm.gnuplot
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
args = (x,y)
kwargs = {'c':z,'s':10,'edgecolor':'','cmap':cmap,'norm':norm}
return args, kwargs
# (x1,y1) is some data set whose density map coloring you
# want to scale to (xnorm,ynorm)
args,kwargs = kdeplot(x1,y1,xnorm,ynorm)
plt.scatter(*args,**kwargs)
I used trial and error to optimize my normalization for my particular data and choice of colormap. Here's what my data looks like scaled to itself; here's my data scaled to some comparison data (which is on the bottom of that image).
I'm not sure this method is entirely general, but it works in my case: I know that my data and the comparison data are in similar regions of parameter space, and they both have gaussian scatter, so I can use a naive linear scaling determined by the number of data points and it results in something that gives the right idea visually.
I have a problem changing my axis labels in Matplotlib. I want to change the radial axis options in my Polar Plot.
Basically, I'm computing the distortion of a cylinder, which is nothing but how much the radius deviates from the original (perfectly circular) cylinder. Some of the distortion values are negative, while some are positive due to tensile and compressive forces. I'm looking for a way to represent this in cylindrical coordinates graphically, so I thought that a polar plot was my best bet. Excel gives me a 'radar chart' option which is flexible enough to let me specify minimum and maximum radial axis values. I want to replicate this on Python using Matplotlib.
My Python script for plotting on polar coordinates is as follows.
#!usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-180.0,190.0,10)
theta = (np.pi/180.0 )*x # in radians
offset = 2.0
R1 = [-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358]
fig1 = plt.figure()
ax1 = fig1.add_axes([0.1,0.1,0.8,0.8],polar=True)
ax1.set_rmax(1)
ax1.plot(theta,R1,lw=2.5)
My plot looks as follows:
But this is not how I want to present it. I want to vary my radial axis, so that I can show the data as a deviation from some reference value, say -2. How do I ask Matplotlib in polar coordinates to change the minimum axis label? I can do this VERY easily in Excel. I choose a minimum radial value of -2, to get the following Excel radar chart:
On Python, I can easily offset my input data by a magnitude of 2. My new dataset is called R2, as shown:
#!usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-180.0,190.0,10)
theta = (np.pi/180.0 )*x # in radians
offset = 2.0
R2 = [1.642,1.517,1.521,1.654,1.879,2.137,2.358,2.483,2.479,2.346,2.121,1.863,\
1.642,1.517,1.521,1.654,1.879,2.137,2.358,2.483,2.479,2.346,2.121,1.863,1.642,\
1.517,1.521,1.654,1.879,2.137,2.358,2.483,2.479,2.346,2.121,1.863,1.642]
fig2 = plt.figure()
ax2 = fig2.add_axes([0.1,0.1,0.8,0.8],polar=True)
ax2.plot(theta,R2,lw=2.5)
ax2.set_rmax(1.5*offset)
plt.show()
The plot is shown below:
Once I get this, I can MANUALLY add axis labels and hard-code it into my script. But this is a really ugly way. Is there any way I can directly get a Matplotlib equivalent of the Excel radar chart and change my axis labels without having to manipulate my input data?
You can just use the normal way of setting axis limits:
#!usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-180.0,190.0,10)
theta = (np.pi/180.0 )*x # in radians
offset = 2.0
R1 = [-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358]
fig1 = plt.figure()
ax1 = fig1.add_axes([0.1,0.1,0.8,0.8],polar=True)
ax1.set_ylim(-2,2)
ax1.set_yticks(np.arange(-2,2,0.5))
ax1.plot(theta,R1,lw=2.5)