Plotting CDF for Discrete Variable - Step Plot with Alternating Lines

Plotting CDF for Discrete Variable - Step Plot with Alternating Lines - python

I would like to draw a graph that looks like:
The data is given in a .csv file, which I already imported to data and used as x in the graph.
Y is calculated as following:
y = np.arange(1, len(data)+1)/len(data)
And then plotted using:
plt.step((data), y)
plt.show(block="false")
My problem is now, that the graph looks like a normal step graph like this one (not my actual data).
How do I format this to look like the one mentioned, i.e. coming from the left on the y = 0 line and extending on the right on the y = 1 line, dotted vs. solid lines and points on the graph?
I have googled around and found many solutions for the graph that I already have, but I would like to format it as mentioned.
I'm new to the general subject, so if the setup is wrong, any help there is appreciated as well!
Thanks!

I'm not sure if step() can do this; it's really just a few lines of code wrapped around plt.plot().
Alternately, you could use vlines() and hlines(). The logic of slicing varies based on how you want the steps "configured," (as in how you would specify the where argument to step(), but here is a close reproduction of the example from your question:
import numpy as np
import matplotlib.pyplot as plt
data = np.arange(0, 7)
y = np.array([.07, .21, .42, .68, 1.])
yn = np.insert(y, 0, 0)
fig, ax = plt.subplots()
ax.set_facecolor('white')
# https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.hlines.html
ax.hlines(y=yn, xmin=data[:-1], xmax=data[1:],
color='red', zorder=1)
# https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.vlines.html
ax.vlines(x=data[1:-1], ymin=yn[:-1], ymax=yn[1:], color='red',
linestyle='dashed', zorder=1)
ax.scatter(data[1:-1], y, color='red', s=18, zorder=2)
ax.scatter(data[1:-1], yn[:-1], color='white', s=18, zorder=2,
edgecolor='red')
ax.grid(False)
ax.set_xlim(data[0], data[-1])
ax.set_ylim([-0.01, 1.01])
zorder makes sure the scatter points are overlaid on top of lines.
As it stands, your creation of y doesn't really follow suite with what the image you showed looks like, but this example tries to mimic the image itself.

Related

How to show only the outline of a bar plot matplotlib

I'm plotting data as a bar plot in matplotlib and am trying to only show the outline of the bars, so that it appears as a 'stepped graph' of the data.
I've added my code below along with an image of the desired output.
plt.bar(x, y, align='center', width=0.1, edgecolor='black', color='none')
The plot I have:
The plot I would like:
Are there any other libraries that may be able to produce this? The bar keyword arguments don't seem to have anything that can.

Your image looks like a function that is horizontal around each x,y value. The following code simulates this:
for every x,y: create two new points one at x-0.5 and one at x+0.5, both with the same y
to close the shape at the ends, add (x[0]-0.5, 0) at the start and (x[-1]+0.5, 0) at the end.
import numpy as np
from matplotlib import pyplot as plt
x = np.arange(0, 30, 1)
y = np.random.uniform(2, 10, 30)
xs = [x[0] - 0.5]
ys = [0]
for i in range(len(x)):
xs.append(x[i] - 0.5)
xs.append(x[i] + 0.5)
ys.append(y[i])
ys.append(y[i])
xs.append(x[-1] + 0.5)
ys.append(0)
plt.plot(xs, ys, color='dodgerblue')
# optionally color the area below the curve
plt.fill_between(xs, 0, ys, color='gold')
PS: #AsishM. mentioned in the comments that matplotlib also has its own step function. If that function fulfils, please use that one. If you need some extra control or variation, this answer could give a start, such as coloring the area below the curve or handling the shape at the ends.

Resizing Twinned axis for plotting of different type of data versus a common parameter

I am trying to plot some data. It is recorded as different parameters against a common parameter. For example, say time on X Axis and Temperature, Wind Speed etc. on Y Axis.
I have been using origin to do my plotting. In origin, such manipulations are carried out by use of Layers. Each axis on different layers can share X or Y axis from the host. And it can be resized so that the plot is drawn on some percentage of actual area. Please see the attached sketch.
I am trying to achieve something similar in python to shift to open-source. After some reading, I found out twinx() and twiny() are two axes functions that can clone/share an axis. Spines can be moved as required through spine['pos'].set_position().
My problem is that, although X axis is shared, I can not individually resize the Y axis of individual parameters. On using set_position, whole figure changes. Request the community for a solution. I am including a snippet of representative code.
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (8,11))
host = fig.add_axes([0.1,0.1,0.7,0.8])
host.plot([1,2,3],[1,4,9])
parm1 = host.twinx()
#parm1.set_position([0.1,0.1,0.8,0.5]) #this doesn't work
parm1.spines['right'].set_position(('axes',1.1))
..
..

I do not think there is an existing function in matplotlib that can do what you want in one call. However, with a little bit of work, you can achieve something like that:
import matplotlib.pyplot as plt
import numpy as np
fig, (ax, bx) = plt.subplots(nrows=2, ncols=1, sharex=True)
bx.yaxis.set_label_position('right')
bx.yaxis.tick_right()
ax.spines['bottom'].set_visible(False)
bx.spines['top'].set_visible(False)
fig.subplots_adjust(hspace=0)
bx.set_xlabel('X Axis')
ax.set_ylabel('Cosine')
bx.set_ylabel('Sine')
x = np.linspace(-np.pi, np.pi, 100)
ax.plot(x, np.cos(x), color='xkcd:avocado', linestyle='dashdot', linewidth=3)
bx.plot(x, np.sin(x), color='xkcd:purple', linestyle='dotted', linewidth=3)
plt.show()

Scatter plot over 2D-histogram in matplotlib with log-scale

I have two sets of points with values (x, y). One is enormous (300k) and one is small (2k). I want to show a scatter plot of the latter over a 2D-histogram of the former in log-log scale. plt.xscale('log')-like commands keep messing up the histogram and when I just take logs of x's and y's and then do all the plotting, my ticks are say -3 not 10^-3 and the pretty logarithmic minor ticks are missing altogether. What's the most elegant solution in matplotlib? Do I have to dig into the artist layer?

If you forgive a bit of self-advertisement, you may use my library physt (see https://github.com/janpipek/physt). Then, you can write code like this:
import numpy as np
import matplotlib.pyplot as plt
from physt import h2
# Data
r1 = np.random.normal(0, 1, 20000)
r2 = np.random.normal(0, .3, 20000) + r1
x = np.exp(r1)
y = np.exp(r2)
# Plot scatter
fig, ax = plt.subplots()
ax.scatter(x[:1000], y[:1000], s=2)
H = h2(x, y, "exponential")
H.plot(ax=ax, zorder=-1) # Necessary to put behind
Which, I hope is the solution to your problem:

Plotting KDE with logarithmic x-data in Matplotlib

I want to plot a KDE for some data with data that covers a large range in x-values. Therefore I want to use a logarithmic scale for the x-axis. For plotting I was using seaborn and the solution from Plotting 2D Kernel Density Estimation with Python, both of which fail once I set the xscale to logarithmic. When I take the logarithm of my x-data beforehand, everything looks fine, except the tics and ticlabels are still linear with the logarithm of the actual values as the labels. I could manually change the tics using something like:
labels = np.array(ax.get_xticks().tolist(), dtype=np.float64)
new_labels = [r'$10^{%.1f}$' % (labels[i]) for i in range(len(labels))]
ax.set_xticklabels(new_labels)
but in my eyes that looks just wrong and is nothing close to the axis labels (including the minor tics) when I would just use
ax.set_xscale('log')
Is there an easier way to plot a KDE with logarithmic x-data? Or is it possible to just change the tic- or label-scale without changing the scaling of the data, so that I could plot the logarithmic values of x and change the scaling of the labels afterwards?
Edit:
The plot I want to create looks like this:
The two right columns are what it is supposed to look like. There I used the the x data with the logarithm already applied. I don't like the labels on the x-axis, though.
The left column displays the plots, when the original data is used for the kde and all the other plots, and afterwards the scale is changed using
ax.set_xscale('log')
For some reason the kde, does not look like it is supposed to look. This is also not a result of erroneous data, since it looks just fine if the logarithmic data is used.
Edit 2:
A working example of code is
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = np.random.multivariate_normal((0, 0), [[0.8, 0.05], [0.05, 0.7]], 100)
x = np.power(10, data[:, 0])
y = data[:, 1]
fig, ax = plt.subplots(2, 1)
sns.kdeplot(data=np.log10(x), data2=y, ax=ax[0])
sns.kdeplot(data=x, data2=y, ax=ax[1])
ax[1].set_xscale('log')
plt.show()
The ax[1] plot is not displayed correctly for me (the x-axis is inverted), but the general behavior is the same as for the case described above. I believe the problem lies with the bandwidth of the kde, which should probably account for the logarithmic x-data.

I found an answer that works for me and wanted to post it in case someone else has a similar problem.
Based on the accepted answer from this post, I defined a function that first applies the logarithm to the x-data and after the KDE was performed, transforms the x-values back to the original values. Afterwards I can simply plot the contours and use ax.set_xscale('log')
import numpy as np
import scipy.stats as st
def logx_kde(x, y, xmin, xmax, ymin, ymax):
x = np.log10(x)
# Peform the kernel density estimate
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)
return np.power(10, xx), yy, f

Time series plotted with imshow

I tried to make the title as clear as possible although I am not sure it is completely limpid.
I have three series of data (number of events along time). I would like to do a subplots were the three time series are represented. You will find attached the best I could come up with. The last time series is significantly shorter and that's why it is not visible on here.
I'm also adding the corresponding code so you can maybe understand better why I'm trying to do and advice me on the proper/smart way to do so.
import numpy as np
import matplotlib.pyplot as plt
x=np.genfromtxt('nbr_lig_bound1.dat')
x1=np.genfromtxt('nbr_lig_bound2.dat')
x2=np.genfromtxt('nbr_lig_bound3.dat')
# doing so because imshow requieres a 2D array
# best way I found and probably not the proper way to get it done
x=np.expand_dims(x, axis=0)
x=np.vstack((x,x))
x1=np.expand_dims(x1, axis=0)
x1=np.vstack((x1,x1))
x2=np.expand_dims(x2, axis=0)
x2=np.vstack((x2,x2))
# hoping that this would compensate for sharex shrinking my X range to
# the shortest array
ax[0].set_xlim(1,24)
ax[1].set_xlim(1,24)
ax[2].set_xlim(1,24)
fig, ax = plt.subplots(nrows=3, ncols=1, figsize=(6,6), sharex=True)
fig.subplots_adjust(hspace=0.001) # this seem to have no effect
p1=ax[0].imshow(x1[:,::10000], cmap='autumn_r')
p2=ax[1].imshow(x2[:,::10000], cmap='autumn_r')
p3=ax[2].imshow(x[:,::10000], cmap='autumn')
Here is what I could reach so far:
and here is a scheme of what I wish to have since I could not find it on the web. In short, I would like to remove the blank spaces around the plotted data in the two upper graphs. And as a more general question I would like to know if imshow is the best way of obtaining such plot (cf intended results below).

Using fig.subplots_adjust(hspace=0) sets the vertical (height) space between subplots to zero but doesn't adjust the vertical space within each subplot. By default, plt.imshow has a default aspect ratio (rc image.aspect) usually set such that pixels are squares so that you can accurately recreate images. To change this use aspect='auto' and adjust the ylim of your axes accordingly.
For example:
# you don't need all the `expand_dims` and `vstack`ing. Use `reshape`
x0 = np.linspace(5, 0, 25).reshape(1, -1)
x1 = x0**6
x2 = x0**2
fig, axes = plt.subplots(3, 1, sharex=True)
fig.subplots_adjust(hspace=0)
for ax, x in zip(axes, (x0, x1, x2)):
ax.imshow(x, cmap='autumn_r', aspect='auto')
ax.set_ylim(-0.5, 0.5) # alternatively pass extent=[0, 1, 0, 24] to imshow
ax.set_xticks([]) # remove all xticks
ax.set_yticks([]) # remove all yticks
plt.show()
yields
To add a colorbar, I recommend looking at this answer which uses fig.add_axes() or looking at the documentation for AxesDivider (which I personally like better).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting CDF for Discrete Variable - Step Plot with Alternating Lines - python

Related

How to show only the outline of a bar plot matplotlib

Resizing Twinned axis for plotting of different type of data versus a common parameter

Scatter plot over 2D-histogram in matplotlib with log-scale

Plotting KDE with logarithmic x-data in Matplotlib

Time series plotted with imshow

Categories

Resources