Creating a modulo/folded plot in Python - python

I am trying to "fold" an exponential plot (and a fit to it - see the first image below) around a discrete interval on the x-axis (a.k.a a "modulo plot"). The aim is that after 10 x-units the exponential is continued on the same plot from 0 for the 10 to 20 interval, as shown on a second "photoshopped" image below.
The MWE code is below:
import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt
Generate points
x=np.arange(20)
y=np.exp(-x/10)
Fit to data
def fit_func(x, t):
return np.exp(-x/t)
par, pcov = optimize.curve_fit(f=fit_func, xdata=x, ydata=y)
Plot data and fit function
fig, ax = plt.subplots()
ax.plot(x,y, c='g', label="Data");
ax.plot(x,fit_func(x, par), c='r', linestyle=":", label="Fit");
ax.set_xlabel("x (modulo 10)")
ax.legend()
plt.savefig("fig/mod.png", dpi=300)
What I have: Origianl exponential from 0 to 20
What I want: Modulo/folded exponential in intervals of 10

You could try to simply write:
ax.plot(x % 10,y, c='g', label="Data")
ax.plot(x % 10, f, c='r', linestyle=":", label="Fit")
but then you get confusing lines connecting the last point of one section to the first point of the next.
Another idea is to create a loop to plot every part separately. To avoid multiple legend entries, only the first section sets a legend label.
import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt
x=np.arange(40)
y=np.exp(-x/10)
def fit_func(x, t):
return np.exp(-x/t)
par, pcov = optimize.curve_fit(f=fit_func, xdata=x, ydata=y)
f = fit_func(x, par)
fig, ax = plt.subplots()
left = x.min()
section = 1
while left < x.max():
right = left+10
filter = (x >= left) & (x <= right)
ax.plot(x[filter]-left,y[filter], c='g', label="Data" if section == 1 else '')
ax.plot(x[filter]-left, f[filter], c='r', linestyle=":", label="Fit" if section == 1 else '')
left = right
section += 1
ax.set_xlabel("x (modulo 10)")
ax.legend()
#plt.savefig("fig/mod.png", dpi=300)
plt.show()

Assuming that x is a sorted array, we'll have :
>>> y_ = fit_func(x, par)
>>> temp_x = []
>>> temp_y = []
>>> temp_y_ = []
>>> fig, ax = plt.subplots()
>>> for i in range(len(x)):
if x[i]%10==0 or i == len(x)-1:
ax.plot(temp_x,temp_y, c='g', label="Data");
ax.plot(temp_x,temp_y_, c='r', linestyle=":", label="Fit")
temp_x,temp_y,temp_y_ = [],[],[]
else:
temp_x.append(x[i]%10)
temp_y.append(y[i])
temp_y_.append(y_[i])
>>> plt.show()
and this would be the resulting plot :

Related

Matplotlib, vertical space between legend symbols

I have an issue with customizing the legend of my plot. I did lot's of customizing but couldnt get my head around this one. I want the symbols (not the labels) to be equally spaced in the legend. As you can see in the example, the space between the circles in the legend, gets smaller as the circles get bigger.
any ideas?
Also, how can I also add a color bar (in addition to the size), with smaller circles being light red (for example) and bigger circle being blue (for example)
here is my code so far:
import pandas as pd
import matplotlib.pyplot as plt
from vega_datasets import data as vega_data
gap = pd.read_json(vega_data.gapminder.url)
df = gap.loc[gap['year'] == 2000]
fig, ax = plt.subplots(1, 1,figsize=[14,12])
ax=ax.scatter(df['life_expect'], df['fertility'],
s = df['pop']/100000,alpha=0.7, edgecolor="black",cmap="viridis")
plt.xlabel("X")
plt.ylabel("Y");
kw = dict(prop="sizes", num=6, color="lightgrey", markeredgecolor='black',markeredgewidth=2)
plt.legend(*ax.legend_elements(**kw),bbox_to_anchor=(1, 0),frameon=False,
loc="lower left",markerscale=1,ncol=1,borderpad=2,labelspacing=4,handletextpad=2)
plt.grid()
plt.show()
It's a bit tricky, but you could measure the legend elements and reposition them to have a constant inbetween distance. Due to the pixel positioning, the plot can't be resized afterwards.
I tested the code inside PyCharm with the 'Qt5Agg' backend. And in a Jupyter notebook, both with %matplotlib inline and with %matplotlib notebook. I'm not sure whether it would work well in all environments.
Note that ax.scatter doesn't return an ax (countrary to e.g. sns.scatterplot) but a list of the created scatter dots.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.transforms import IdentityTransform
from vega_datasets import data as vega_data
gap = pd.read_json(vega_data.gapminder.url)
df = gap.loc[gap['year'] == 2000]
fig, ax = plt.subplots(1, 1, figsize=[14, 12])
fig.subplots_adjust(right=0.8)
scat = ax.scatter(df['life_expect'], df['fertility'],
s=df['pop'] / 100000, alpha=0.7, edgecolor="black", cmap="viridis")
plt.xlabel("X")
plt.ylabel("Y")
x = 1.1
y = 0.1
is_first = True
kw = dict(prop="sizes", num=6, color="lightgrey", markeredgecolor='black', markeredgewidth=2)
handles, labels = scat.legend_elements(**kw)
inverted_transData = ax.transData.inverted()
for handle, label in zip(handles[::-1], labels[::-1]):
plt.setp(handle, clip_on=False)
for _ in range(1 if is_first else 2):
plt.setp(handle, transform=ax.transAxes)
if is_first:
xd, yd = x, y
else:
xd, yd = inverted_transData.transform((x, y))
handle.set_xdata([xd])
handle.set_ydata([yd])
ax.add_artist(handle)
bbox = handle.get_window_extent(fig.canvas.get_renderer())
y += y - bbox.y0 + 15 # 15 pixels inbetween
x = (bbox.x0 + bbox.x1) / 2
if is_first:
xd_text, _ = inverted_transData.transform((bbox.x1+10, y))
ax.text(xd_text, yd, label, transform=ax.transAxes, ha='left', va='center')
y = bbox.y1
is_first = False
plt.show()

How to avoid overlapping error bars in matplotlib?

I want to create a plot for two different datasets similar to the one presented in this answer:
In the above image, the author managed to fix the overlapping problem of the error bars by adding some small random scatter in x to the new dataset.
In my problem, I must plot a similar graphic, but having some categorical data in the x axis:
Any ideas on how to slightly move one the error bars of the second dataset using categorical variables at the x axis? I want to avoid the overlapping between the bars for making the visualization easier.
You can translate each errorbar by adding the default data transform to a prior translation in data space. This is possible when knowing that categories are in general one data unit away from each other.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
from matplotlib.transforms import Affine2D
x = list("ABCDEF")
y1, y2 = np.random.randn(2, len(x))
yerr1, yerr2 = np.random.rand(2, len(x))*4+0.3
fig, ax = plt.subplots()
trans1 = Affine2D().translate(-0.1, 0.0) + ax.transData
trans2 = Affine2D().translate(+0.1, 0.0) + ax.transData
er1 = ax.errorbar(x, y1, yerr=yerr1, marker="o", linestyle="none", transform=trans1)
er2 = ax.errorbar(x, y2, yerr=yerr2, marker="o", linestyle="none", transform=trans2)
plt.show()
Alternatively, you could translate the errorbars after applying the data transform and hence move them in units of points.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
from matplotlib.transforms import ScaledTranslation
x = list("ABCDEF")
y1, y2 = np.random.randn(2, len(x))
yerr1, yerr2 = np.random.rand(2, len(x))*4+0.3
fig, ax = plt.subplots()
trans1 = ax.transData + ScaledTranslation(-5/72, 0, fig.dpi_scale_trans)
trans2 = ax.transData + ScaledTranslation(+5/72, 0, fig.dpi_scale_trans)
er1 = ax.errorbar(x, y1, yerr=yerr1, marker="o", linestyle="none", transform=trans1)
er2 = ax.errorbar(x, y2, yerr=yerr2, marker="o", linestyle="none", transform=trans2)
plt.show()
While results look similar in both cases, they are fundamentally different. You will observe this difference when interactively zooming the axes or changing the figure size.
Consider the following approach to highlight plots - combination of errorbar and fill_between with non-zero transparency:
import random
import matplotlib.pyplot as plt
# create sample data
N = 8
data_1 = {
'x': list(range(N)),
'y': [10. + random.random() for dummy in range(N)],
'yerr': [.25 + random.random() for dummy in range(N)]}
data_2 = {
'x': list(range(N)),
'y': [10.25 + .5 * random.random() for dummy in range(N)],
'yerr': [.5 * random.random() for dummy in range(N)]}
# plot
plt.figure()
# only errorbar
plt.subplot(211)
for data in [data_1, data_2]:
plt.errorbar(**data, fmt='o')
# errorbar + fill_between
plt.subplot(212)
for data in [data_1, data_2]:
plt.errorbar(**data, alpha=.75, fmt=':', capsize=3, capthick=1)
data = {
'x': data['x'],
'y1': [y - e for y, e in zip(data['y'], data['yerr'])],
'y2': [y + e for y, e in zip(data['y'], data['yerr'])]}
plt.fill_between(**data, alpha=.25)
Result:
Threre is example on lib site: https://matplotlib.org/stable/gallery/lines_bars_and_markers/errorbar_subsample.html
enter image description here
You need parameter errorevery=(m, n),
n - how often plot error lines, m - shift with range from 0 to n

Preventing plot joining when values "wrap" in matplotlib plots

I'm plotting right ascension ephemerides for planets, which have the property that they are cyclical: they hit a maximum value, 24, and then start again at 0. When I plot these using matplotlib, the "jump" from 24 to zero is joined so that I get horizontal lines running across my figure:
How can I eliminate these lines? Is there an approach in matplotlib, or perhaps a way to split the lists at between the points where the jump occurs.
Code to generate above figure:
from __future__ import division
import ephem
import matplotlib
import matplotlib.pyplot
import math
fig, ax = matplotlib.pyplot.subplots()
ax.set(xlim=[0, 24])
ax.set(ylim=[min(date_range), max(date_range)])
ax.plot([12*ep.ra/math.pi for ep in [ephem.Jupiter(base_date + d) for d in date_range]], date_range,
ls='-', color='g', lw=2)
ax.plot([12*ep.ra/math.pi for ep in [ephem.Venus(base_date + d) for d in date_range]], date_range,
ls='-', color='r', lw=1)
ax.plot([12*ep.ra/math.pi for ep in [ephem.Sun(base_date + d) for d in date_range]], date_range,
ls='-', color='y', lw=3)
Here is a generator function that finds the contiguous regions of 'wrapped' data:
import numpy as np
def unlink_wrap(dat, lims=[-np.pi, np.pi], thresh = 0.95):
"""
Iterate over contiguous regions of `dat` (i.e. where it does not
jump from near one limit to the other).
This function returns an iterator object that yields slice
objects, which index the contiguous portions of `dat`.
This function implicitly assumes that all points in `dat` fall
within `lims`.
"""
jump = np.nonzero(np.abs(np.diff(dat)) > ((lims[1] - lims[0]) * thresh))[0]
lasti = 0
for ind in jump:
yield slice(lasti, ind + 1)
lasti = ind + 1
yield slice(lasti, len(dat))
An example usage would be,
x = np.arange(0, 100, .1)
y = x.copy()
lims = [0, 24]
x = (x % lims[1])
fig, ax = matplotlib.pyplot.subplots()
for slc in unlink_wrap(x, lims):
ax.plot(x[slc], y[slc], 'b-', linewidth=2)
ax.plot(x, y, 'r-', zorder=-10)
ax.set_xlim(lims)
Which gives the figure below. Note that the blue lines (which utilize unlink_wrap) are broken and the standard-plotted red lines are shown for reference.

how to interpolate points in a specific interval on a plot formed by loading a txt file in to scipy program?

I have a text file with two columns, x and y. I have plotted them using the below program in scipy as shown below.
import matplotlib.pyplot as plt
with open("data.txt") as f:
data = f.read()
data = data.split('\n')
x = [row.split(' ')[0] for row in data]
y = [row.split(' ')[1] for row in data]
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Plot B vs H")
ax1.set_xlabel('B')
ax1.set_ylabel('H')
ax1.plot(x,y, c='r', label='the data')
leg = ax1.legend()
plt.show()
Now I would like to know how to interpolate several points between x=1 and x=5 with increment of around 0.1 on the same graph?
You can create a function using scipy.interp1d:
import numpy as np
from scipy import interpolate
data = np.genfromtxt('data.txt')
x = data[:,0] #first column
y = data[:,1] #second column
f = interpolate.interp1d(x, y)
xnew = np.arange(1, 5.1, 0.1) # this could be over the entire range, depending on what your data is
ynew = f(xnew) # use interpolation function returned by `interp1d`
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Plot B vs H")
ax1.set_xlabel('B')
ax1.set_ylabel('H')
ax1.plot(x,y, c='r', label='the data')
ax1.plot(xnew, ynew, 'o', label='the interpolation')
leg = ax1.legend()
plt.show()
If you want to smooth your data, you can use the univariatespline, just replace the f = interpolate... line with:
f = interpolate.UnivariateSpline(x, y)
To change how much it smooths, you can fiddle with the s and k options:
f = interpolate.UnivariateSpline(x, y, k=3, s=1)
As described at the documentation

How to space overlapping annotations

I want to annotate the bars in a graph with some text but if the bars are close together and have comparable height, the annotations are above ea. other and thus hard to read (the coordinates for the annotations were taken from the bar position and height).
Is there a way to shift one of them if there is a collision?
Edit: The bars are very thin and very close sometimes so just aligning vertically doesn't solve the problem...
A picture might clarify things:
I've written a quick solution, which checks each annotation position against default bounding boxes for all the other annotations. If there is a collision it changes its position to the next available collision free place. It also puts in nice arrows.
For a fairly extreme example, it will produce this (none of the numbers overlap):
Instead of this:
Here is the code:
import numpy as np
import matplotlib.pyplot as plt
from numpy.random import *
def get_text_positions(x_data, y_data, txt_width, txt_height):
a = zip(y_data, x_data)
text_positions = y_data.copy()
for index, (y, x) in enumerate(a):
local_text_positions = [i for i in a if i[0] > (y - txt_height)
and (abs(i[1] - x) < txt_width * 2) and i != (y,x)]
if local_text_positions:
sorted_ltp = sorted(local_text_positions)
if abs(sorted_ltp[0][0] - y) < txt_height: #True == collision
differ = np.diff(sorted_ltp, axis=0)
a[index] = (sorted_ltp[-1][0] + txt_height, a[index][1])
text_positions[index] = sorted_ltp[-1][0] + txt_height
for k, (j, m) in enumerate(differ):
#j is the vertical distance between words
if j > txt_height * 2: #if True then room to fit a word in
a[index] = (sorted_ltp[k][0] + txt_height, a[index][1])
text_positions[index] = sorted_ltp[k][0] + txt_height
break
return text_positions
def text_plotter(x_data, y_data, text_positions, axis,txt_width,txt_height):
for x,y,t in zip(x_data, y_data, text_positions):
axis.text(x - txt_width, 1.01*t, '%d'%int(y),rotation=0, color='blue')
if y != t:
axis.arrow(x, t,0,y-t, color='red',alpha=0.3, width=txt_width*0.1,
head_width=txt_width, head_length=txt_height*0.5,
zorder=0,length_includes_head=True)
Here is the code producing these plots, showing the usage:
#random test data:
x_data = random_sample(100)
y_data = random_integers(10,50,(100))
#GOOD PLOT:
fig2 = plt.figure()
ax2 = fig2.add_subplot(111)
ax2.bar(x_data, y_data,width=0.00001)
#set the bbox for the text. Increase txt_width for wider text.
txt_height = 0.04*(plt.ylim()[1] - plt.ylim()[0])
txt_width = 0.02*(plt.xlim()[1] - plt.xlim()[0])
#Get the corrected text positions, then write the text.
text_positions = get_text_positions(x_data, y_data, txt_width, txt_height)
text_plotter(x_data, y_data, text_positions, ax2, txt_width, txt_height)
plt.ylim(0,max(text_positions)+2*txt_height)
plt.xlim(-0.1,1.1)
#BAD PLOT:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(x_data, y_data, width=0.0001)
#write the text:
for x,y in zip(x_data, y_data):
ax.text(x - txt_width, 1.01*y, '%d'%int(y),rotation=0)
plt.ylim(0,max(text_positions)+2*txt_height)
plt.xlim(-0.1,1.1)
plt.show()
Another option using my library adjustText, written specially for this purpose (https://github.com/Phlya/adjustText). I think it's probably significantly slower that the accepted answer (it slows down considerably with a lot of bars), but much more general and configurable.
from adjustText import adjust_text
np.random.seed(2017)
x_data = np.random.random_sample(100)
y_data = np.random.random_integers(10,50,(100))
f, ax = plt.subplots(dpi=300)
bars = ax.bar(x_data, y_data, width=0.001, facecolor='k')
texts = []
for x, y in zip(x_data, y_data):
texts.append(plt.text(x, y, y, horizontalalignment='center', color='b'))
adjust_text(texts, add_objects=bars, autoalign='y', expand_objects=(0.1, 1),
only_move={'points':'', 'text':'y', 'objects':'y'}, force_text=0.75, force_objects=0.1,
arrowprops=dict(arrowstyle="simple, head_width=0.25, tail_width=0.05", color='r', lw=0.5, alpha=0.5))
plt.show()
If we allow autoalignment along x axis, it gets even better (I just need to resolve a small issue that it doesn't like putting labels above the points and not a bit to the side...).
np.random.seed(2017)
x_data = np.random.random_sample(100)
y_data = np.random.random_integers(10,50,(100))
f, ax = plt.subplots(dpi=300)
bars = ax.bar(x_data, y_data, width=0.001, facecolor='k')
texts = []
for x, y in zip(x_data, y_data):
texts.append(plt.text(x, y, y, horizontalalignment='center', size=7, color='b'))
adjust_text(texts, add_objects=bars, autoalign='xy', expand_objects=(0.1, 1),
only_move={'points':'', 'text':'y', 'objects':'y'}, force_text=0.75, force_objects=0.1,
arrowprops=dict(arrowstyle="simple, head_width=0.25, tail_width=0.05", color='r', lw=0.5, alpha=0.5))
plt.show()
(I had to adjust some parameters here, of course)
One option is to rotate the text/annotation, which is set by the rotation keyword/property. In the following example, I rotate the text 90 degrees to guarantee that it wont collide with the neighboring text. I also set the va (short for verticalalignment) keyword, so that the text is presented above the bar (above the point that I use to define the text):
import matplotlib.pyplot as plt
data = [10, 8, 8, 5]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(range(4),data)
ax.set_ylim(0,12)
# extra .4 is because it's half the default width (.8):
ax.text(1.4,8,"2nd bar",rotation=90,va='bottom')
ax.text(2.4,8,"3nd bar",rotation=90,va='bottom')
plt.show()
The result is the following figure:
Determining programmatically if there are collisions between various annotations is a trickier process. This might be worth a separate question: Matplotlib text dimensions.
Just thought I would provide an alternative solution that I just created textalloc that makes sure that text-boxes avoids overlap with both each other and lines when possible, and is fast.
For this example you could use something like this:
import textalloc as ta
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(2017)
x_data = np.random.random_sample(100)
y_data = np.random.random_integers(10,50,(100))
f, ax = plt.subplots(dpi=200)
bars = ax.bar(x_data, y_data, width=0.002, facecolor='k')
ta.allocate_text(f,ax,x_data,y_data,
[str(yy) for yy in list(y_data)],
x_lines=[np.array([xx,xx]) for xx in list(x_data)],
y_lines=[np.array([0,yy]) for yy in list(y_data)],
textsize=8,
margin=0.004,
min_distance=0.005,
linewidth=0.7,
textcolor="b")
plt.show()
This results in this

Categories