Annotation Difficulty in PCA with numpy and matplotlib

Annotation Difficulty in PCA with numpy and matplotlib - python

I'm trying to do PCA analysis for our Repertory Grid Tool. I have a matrix which contains all the info I need, however I want to put the names of the alternatives(column names) on the dots in the analysis. My code is something like this:
matrixAlternatives= transpose(matrixAlternatives)
var_grid = np.array(matrixAlternatives)
#improve output readability
np.set_printoptions(precision=2)
np.set_printoptions(suppress=True)
print "var_grid:"
print var_grid
#Create the PCA node and train it
pcan = mdp.nodes.PCANode(output_dim=2, svd=True)
pcar = pcan.execute(var_grid)
print "\npcar"
print pcar
print "\neigenvalues:"
print pcan.d
print "\nexplained variance:"
print pcan.explained_variance
print "\neigenvectors:"
print pcan.v
#Graph results
#pcar[3,0],pcar[3,1] has the projections of alternative3 on the
#first two principal components (0 and 1)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(pcar[:, 0], pcar[:, 1], 'r^')
ax.plot(pcan.v[:,0], pcan.v[:,1], 'ro')
#draw axes
ax.axhline(0, color='black')
ax.axvline(0, color='black')
#annotations each concern
id=0
for xpoint, ypoint in pcan.v:
ax.annotate('C{:.0f}'.format(id), (xpoint, ypoint), ha='center',
va='center', bbox=dict(fc='white',ec='none'))
id+=1
#calculate accounted for variance
var_accounted_PC1 = pcan.d[0] * pcan.explained_variance * 100 /(pcan.d[0] + pcan.d[1])
var_accounted_PC2 = pcan.d[1] * pcan.explained_variance * 100 /(pcan.d[0] + pcan.d[1])
#Show variance accounted for
ax.set_xlabel('Accounted variance on PC1 (%.1f%%)' % (var_accounted_PC1))
ax.set_ylabel('Accounted variance on PC2 (%.1f%%)' % (var_accounted_PC2))
canvas = FigureCanvas(fig)
response = HttpResponse(content_type='image/png')
canvas.print_png(response)
fig.clf()
plt.close()
plt.clf()
del var_grid
gc.collect()
return response

If I understand you correctly you just need to annotate your plots using the column heading. Here is a minimal example:
import matplotlib.pylab as plt
import numpy as np
x = np.linspace(0, 10 ,100)
y = np.sin(x)
plt.plot(x, y , "ro")
plt.annotate(s=" some string", xy=(x[25], y[25]))
You will need to add some formatting I suspect to get the strings in the correct place.

Related

scatterplot and combined polar histogram in matplotlib

I am attempting to produce a plot like this which combines a cartesian scatter plot and a polar histogram. (Radial lines optional)
A similar solution (by Nicolas Legrand) exists for looking at differences in x and y (code here), but we need to look at ratios (i.e. x/y).
More specifically, this is useful when we want to look at the relative risk measure which is the ratio of two probabilities.
The scatter plot on it's own is obviously not a problem, but the polar histogram is more advanced.
The most promising lead I have found is this central example from the matplotlib gallery here
I have attempted to do this, but have run up against the limits of my matplotlib skills. Any efforts moving towards this goal would be great.

I'm sure that others will have better suggestions, but one method that gets something like you want (without the need for extra axes artists) is to use a polar projection with a scatter and bar chart together. Something like
import matplotlib.pyplot as plt
import numpy as np
x = np.random.uniform(size=100)
y = np.random.uniform(size=100)
r = np.sqrt(x**2 + y**2)
phi = np.arctan2(y, x)
h, b = np.histogram(phi, bins=np.linspace(0, np.pi/2, 21), density=True)
colors = plt.cm.Spectral(h / h.max())
ax = plt.subplot(111, projection='polar')
ax.scatter(phi, r, marker='.')
ax.bar(b[:-1], h, width=b[1:] - b[:-1],
align='edge', bottom=np.max(r) + 0.2, color=colors)
# Cut off at 90 degrees
ax.set_thetamax(90)
# Set the r grid to cover the scatter plot
ax.set_rgrids([0, 0.5, 1])
# Let's put a line at 1 assuming we want a ratio of some sort
ax.set_thetagrids([45], [1])
which will give
It is missing axes labels and some beautification, but it might be a place to start. I hope it is helpful.

You can use two axes on top of each other:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(6,6))
ax1 = fig.add_axes([0.1,0.1,.8,.8], label="cartesian")
ax2 = fig.add_axes([0.1,0.1,.8,.8], projection="polar", label="polar")
ax2.set_rorigin(-1)
ax2.set_thetamax(90)
plt.show()

Ok. Thanks to the answer from Nicolas, and the answer from tomjn I have a working solution :)
import numpy as np
import matplotlib.pyplot as plt
# Scatter data
n = 50
x = 0.3 + np.random.randn(n)*0.1
y = 0.4 + np.random.randn(n)*0.02
def radial_corner_plot(x, y, n_hist_bins=51):
"""Scatter plot with radial histogram of x/y ratios"""
# Axis setup
fig = plt.figure(figsize=(6,6))
ax1 = fig.add_axes([0.1,0.1,.6,.6], label="cartesian")
ax2 = fig.add_axes([0.1,0.1,.8,.8], projection="polar", label="polar")
ax2.set_rorigin(-20)
ax2.set_thetamax(90)
# define useful constant
offset_in_radians = np.pi/4
def rotate_hist_axis(ax):
"""rotate so that 0 degrees is pointing up and right"""
ax.set_theta_offset(offset_in_radians)
ax.set_thetamin(-45)
ax.set_thetamax(45)
return ax
# Convert scatter data to histogram data
r = np.sqrt(x**2 + y**2)
phi = np.arctan2(y, x)
h, b = np.histogram(phi,
bins=np.linspace(0, np.pi/2, n_hist_bins),
density=True)
# SCATTER PLOT -------------------------------------------------------
ax1.scatter(x,y)
ax1.set(xlim=[0, 1], ylim=[0, 1], xlabel="x", ylabel="y")
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
# HISTOGRAM ----------------------------------------------------------
ax2 = rotate_hist_axis(ax2)
# rotation of axis requires rotation in bin positions
b = b - offset_in_radians
# plot the histogram
bars = ax2.bar(b[:-1], h, width=b[1:] - b[:-1], align='edge')
def update_hist_ticks(ax, desired_ratios):
"""Update tick positions and corresponding tick labels"""
x = np.ones(len(desired_ratios))
y = 1/desired_ratios
phi = np.arctan2(y,x) - offset_in_radians
# define ticklabels
xticklabels = [str(round(float(label), 2)) for label in desired_ratios]
# apply updates
ax2.set(xticks=phi, xticklabels=xticklabels)
return ax
ax2 = update_hist_ticks(ax2, np.array([1/8, 1/4, 1/2, 1, 2, 4, 8]))
# just have radial grid lines
ax2.grid(which="major", axis="y")
# remove bin count labels
ax2.set_yticks([])
return (fig, [ax1, ax2])
fig, ax = radial_corner_plot(x, y)
Thanks for the pointers!

Creating a modulo/folded plot in Python

I am trying to "fold" an exponential plot (and a fit to it - see the first image below) around a discrete interval on the x-axis (a.k.a a "modulo plot"). The aim is that after 10 x-units the exponential is continued on the same plot from 0 for the 10 to 20 interval, as shown on a second "photoshopped" image below.
The MWE code is below:
import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt
Generate points
x=np.arange(20)
y=np.exp(-x/10)
Fit to data
def fit_func(x, t):
return np.exp(-x/t)
par, pcov = optimize.curve_fit(f=fit_func, xdata=x, ydata=y)
Plot data and fit function
fig, ax = plt.subplots()
ax.plot(x,y, c='g', label="Data");
ax.plot(x,fit_func(x, par), c='r', linestyle=":", label="Fit");
ax.set_xlabel("x (modulo 10)")
ax.legend()
plt.savefig("fig/mod.png", dpi=300)
What I have: Origianl exponential from 0 to 20
What I want: Modulo/folded exponential in intervals of 10

You could try to simply write:
ax.plot(x % 10,y, c='g', label="Data")
ax.plot(x % 10, f, c='r', linestyle=":", label="Fit")
but then you get confusing lines connecting the last point of one section to the first point of the next.
Another idea is to create a loop to plot every part separately. To avoid multiple legend entries, only the first section sets a legend label.
import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt
x=np.arange(40)
y=np.exp(-x/10)
def fit_func(x, t):
return np.exp(-x/t)
par, pcov = optimize.curve_fit(f=fit_func, xdata=x, ydata=y)
f = fit_func(x, par)
fig, ax = plt.subplots()
left = x.min()
section = 1
while left < x.max():
right = left+10
filter = (x >= left) & (x <= right)
ax.plot(x[filter]-left,y[filter], c='g', label="Data" if section == 1 else '')
ax.plot(x[filter]-left, f[filter], c='r', linestyle=":", label="Fit" if section == 1 else '')
left = right
section += 1
ax.set_xlabel("x (modulo 10)")
ax.legend()
#plt.savefig("fig/mod.png", dpi=300)
plt.show()

Assuming that x is a sorted array, we'll have :
>>> y_ = fit_func(x, par)
>>> temp_x = []
>>> temp_y = []
>>> temp_y_ = []
>>> fig, ax = plt.subplots()
>>> for i in range(len(x)):
if x[i]%10==0 or i == len(x)-1:
ax.plot(temp_x,temp_y, c='g', label="Data");
ax.plot(temp_x,temp_y_, c='r', linestyle=":", label="Fit")
temp_x,temp_y,temp_y_ = [],[],[]
else:
temp_x.append(x[i]%10)
temp_y.append(y[i])
temp_y_.append(y_[i])
>>> plt.show()
and this would be the resulting plot :

How to avoid overlapping error bars in matplotlib?

I want to create a plot for two different datasets similar to the one presented in this answer:
In the above image, the author managed to fix the overlapping problem of the error bars by adding some small random scatter in x to the new dataset.
In my problem, I must plot a similar graphic, but having some categorical data in the x axis:
Any ideas on how to slightly move one the error bars of the second dataset using categorical variables at the x axis? I want to avoid the overlapping between the bars for making the visualization easier.

You can translate each errorbar by adding the default data transform to a prior translation in data space. This is possible when knowing that categories are in general one data unit away from each other.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
from matplotlib.transforms import Affine2D
x = list("ABCDEF")
y1, y2 = np.random.randn(2, len(x))
yerr1, yerr2 = np.random.rand(2, len(x))*4+0.3
fig, ax = plt.subplots()
trans1 = Affine2D().translate(-0.1, 0.0) + ax.transData
trans2 = Affine2D().translate(+0.1, 0.0) + ax.transData
er1 = ax.errorbar(x, y1, yerr=yerr1, marker="o", linestyle="none", transform=trans1)
er2 = ax.errorbar(x, y2, yerr=yerr2, marker="o", linestyle="none", transform=trans2)
plt.show()
Alternatively, you could translate the errorbars after applying the data transform and hence move them in units of points.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
from matplotlib.transforms import ScaledTranslation
x = list("ABCDEF")
y1, y2 = np.random.randn(2, len(x))
yerr1, yerr2 = np.random.rand(2, len(x))*4+0.3
fig, ax = plt.subplots()
trans1 = ax.transData + ScaledTranslation(-5/72, 0, fig.dpi_scale_trans)
trans2 = ax.transData + ScaledTranslation(+5/72, 0, fig.dpi_scale_trans)
er1 = ax.errorbar(x, y1, yerr=yerr1, marker="o", linestyle="none", transform=trans1)
er2 = ax.errorbar(x, y2, yerr=yerr2, marker="o", linestyle="none", transform=trans2)
plt.show()
While results look similar in both cases, they are fundamentally different. You will observe this difference when interactively zooming the axes or changing the figure size.

Consider the following approach to highlight plots - combination of errorbar and fill_between with non-zero transparency:
import random
import matplotlib.pyplot as plt
# create sample data
N = 8
data_1 = {
'x': list(range(N)),
'y': [10. + random.random() for dummy in range(N)],
'yerr': [.25 + random.random() for dummy in range(N)]}
data_2 = {
'x': list(range(N)),
'y': [10.25 + .5 * random.random() for dummy in range(N)],
'yerr': [.5 * random.random() for dummy in range(N)]}
# plot
plt.figure()
# only errorbar
plt.subplot(211)
for data in [data_1, data_2]:
plt.errorbar(**data, fmt='o')
# errorbar + fill_between
plt.subplot(212)
for data in [data_1, data_2]:
plt.errorbar(**data, alpha=.75, fmt=':', capsize=3, capthick=1)
data = {
'x': data['x'],
'y1': [y - e for y, e in zip(data['y'], data['yerr'])],
'y2': [y + e for y, e in zip(data['y'], data['yerr'])]}
plt.fill_between(**data, alpha=.25)
Result:

Threre is example on lib site: https://matplotlib.org/stable/gallery/lines_bars_and_markers/errorbar_subsample.html
enter image description here
You need parameter errorevery=(m, n),
n - how often plot error lines, m - shift with range from 0 to n

How to change marker size/scale in legend when marker is set to pixel

I am scatter ploting data points with a very small marker (see screengrab below). When I use the very small marker ',' the legend is very hard to read (example code taken from here).
(Python 3, Jupyter lab)
How can I increase the size of the marker in the legend. The two versions shown on the above mentioned site do not work:
legend = ax.legend(frameon=True)
for legend_handle in legend.legendHandles:
legend_handle._legmarker.set_markersize(9)
and
ax.legend(markerscale=6)
The two solutions do however work when the marker is set to '.'.
How can I show bigger makers in the legend?
Sample Code from intoli.com:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(12)
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
for i in range(5):
mean = [np.random.random()*10, np.random.random()*10]
covariance = [ [1 + np.random.random(), np.random.random() - 1], [0, 1 + np.random.random()], ]
covariance[1][0] = covariance[0][1] # must be symmetric
x, y = np.random.multivariate_normal(mean, covariance, 3000).T
plt.plot(x, y, ',', label=f'Cluster {i + 1}')
ax.legend(markerscale=12)
fig.tight_layout()
plt.show()

You can get 1 pixel sized markers for a plot by setting the markersize to 1 pixel. This would look like
plt.plot(x, y, marker='s', markersize=72./fig.dpi, mec="None", ls="None")
What the above does is set the marker to a square, set the markersize to the ppi (points per inch) divided by dpi (dots per inch) == dots == pixels, and removes lines and edges.
Then the solution you tried using markerscale in the legend works nicely.
Complete example:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(12)
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
for i in range(5):
mean = [np.random.random()*10, np.random.random()*10]
covariance = [ [1 + np.random.random(), np.random.random() - 1], [0, 1 + np.random.random()], ]
covariance[1][0] = covariance[0][1] # must be symmetric
x, y = np.random.multivariate_normal(mean, covariance, 3000).T
plt.plot(x, y, marker='s', markersize=72./fig.dpi, mec="None", ls="None",
label=f'Cluster {i + 1}')
ax.legend(markerscale=12)
fig.tight_layout()
plt.show()

According to this discussion, the markersize has no effect when using pixels (,) as marker. How about generating a custom legend instead? For example, by adapting the first example in this tutorial, one can get a pretty decent legend:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
np.random.seed(12)
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
for i in range(5):
mean = [np.random.random()*10, np.random.random()*10]
covariance = [ [1 + np.random.random(), np.random.random() - 1], [0, 1 + np.random.random()], ]
covariance[1][0] = covariance[0][1] # must be symmetric
x, y = np.random.multivariate_normal(mean, covariance, 3000).T
plt.plot(x, y, ',', label=f'Cluster {i + 1}')
##generating custom legend
handles, labels = ax.get_legend_handles_labels()
patches = []
for handle, label in zip(handles, labels):
patches.append(mpatches.Patch(color=handle.get_color(), label=label))
legend = ax.legend(handles=patches)
fig.tight_layout()
plt.show()
The output would look like this:

How to space overlapping annotations

I want to annotate the bars in a graph with some text but if the bars are close together and have comparable height, the annotations are above ea. other and thus hard to read (the coordinates for the annotations were taken from the bar position and height).
Is there a way to shift one of them if there is a collision?
Edit: The bars are very thin and very close sometimes so just aligning vertically doesn't solve the problem...
A picture might clarify things:

I've written a quick solution, which checks each annotation position against default bounding boxes for all the other annotations. If there is a collision it changes its position to the next available collision free place. It also puts in nice arrows.
For a fairly extreme example, it will produce this (none of the numbers overlap):
Instead of this:
Here is the code:
import numpy as np
import matplotlib.pyplot as plt
from numpy.random import *
def get_text_positions(x_data, y_data, txt_width, txt_height):
a = zip(y_data, x_data)
text_positions = y_data.copy()
for index, (y, x) in enumerate(a):
local_text_positions = [i for i in a if i[0] > (y - txt_height)
and (abs(i[1] - x) < txt_width * 2) and i != (y,x)]
if local_text_positions:
sorted_ltp = sorted(local_text_positions)
if abs(sorted_ltp[0][0] - y) < txt_height: #True == collision
differ = np.diff(sorted_ltp, axis=0)
a[index] = (sorted_ltp[-1][0] + txt_height, a[index][1])
text_positions[index] = sorted_ltp[-1][0] + txt_height
for k, (j, m) in enumerate(differ):
#j is the vertical distance between words
if j > txt_height * 2: #if True then room to fit a word in
a[index] = (sorted_ltp[k][0] + txt_height, a[index][1])
text_positions[index] = sorted_ltp[k][0] + txt_height
break
return text_positions
def text_plotter(x_data, y_data, text_positions, axis,txt_width,txt_height):
for x,y,t in zip(x_data, y_data, text_positions):
axis.text(x - txt_width, 1.01*t, '%d'%int(y),rotation=0, color='blue')
if y != t:
axis.arrow(x, t,0,y-t, color='red',alpha=0.3, width=txt_width*0.1,
head_width=txt_width, head_length=txt_height*0.5,
zorder=0,length_includes_head=True)
Here is the code producing these plots, showing the usage:
#random test data:
x_data = random_sample(100)
y_data = random_integers(10,50,(100))
#GOOD PLOT:
fig2 = plt.figure()
ax2 = fig2.add_subplot(111)
ax2.bar(x_data, y_data,width=0.00001)
#set the bbox for the text. Increase txt_width for wider text.
txt_height = 0.04*(plt.ylim()[1] - plt.ylim()[0])
txt_width = 0.02*(plt.xlim()[1] - plt.xlim()[0])
#Get the corrected text positions, then write the text.
text_positions = get_text_positions(x_data, y_data, txt_width, txt_height)
text_plotter(x_data, y_data, text_positions, ax2, txt_width, txt_height)
plt.ylim(0,max(text_positions)+2*txt_height)
plt.xlim(-0.1,1.1)
#BAD PLOT:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(x_data, y_data, width=0.0001)
#write the text:
for x,y in zip(x_data, y_data):
ax.text(x - txt_width, 1.01*y, '%d'%int(y),rotation=0)
plt.ylim(0,max(text_positions)+2*txt_height)
plt.xlim(-0.1,1.1)
plt.show()

Another option using my library adjustText, written specially for this purpose (https://github.com/Phlya/adjustText). I think it's probably significantly slower that the accepted answer (it slows down considerably with a lot of bars), but much more general and configurable.
from adjustText import adjust_text
np.random.seed(2017)
x_data = np.random.random_sample(100)
y_data = np.random.random_integers(10,50,(100))
f, ax = plt.subplots(dpi=300)
bars = ax.bar(x_data, y_data, width=0.001, facecolor='k')
texts = []
for x, y in zip(x_data, y_data):
texts.append(plt.text(x, y, y, horizontalalignment='center', color='b'))
adjust_text(texts, add_objects=bars, autoalign='y', expand_objects=(0.1, 1),
only_move={'points':'', 'text':'y', 'objects':'y'}, force_text=0.75, force_objects=0.1,
arrowprops=dict(arrowstyle="simple, head_width=0.25, tail_width=0.05", color='r', lw=0.5, alpha=0.5))
plt.show()
If we allow autoalignment along x axis, it gets even better (I just need to resolve a small issue that it doesn't like putting labels above the points and not a bit to the side...).
np.random.seed(2017)
x_data = np.random.random_sample(100)
y_data = np.random.random_integers(10,50,(100))
f, ax = plt.subplots(dpi=300)
bars = ax.bar(x_data, y_data, width=0.001, facecolor='k')
texts = []
for x, y in zip(x_data, y_data):
texts.append(plt.text(x, y, y, horizontalalignment='center', size=7, color='b'))
adjust_text(texts, add_objects=bars, autoalign='xy', expand_objects=(0.1, 1),
only_move={'points':'', 'text':'y', 'objects':'y'}, force_text=0.75, force_objects=0.1,
arrowprops=dict(arrowstyle="simple, head_width=0.25, tail_width=0.05", color='r', lw=0.5, alpha=0.5))
plt.show()
(I had to adjust some parameters here, of course)

One option is to rotate the text/annotation, which is set by the rotation keyword/property. In the following example, I rotate the text 90 degrees to guarantee that it wont collide with the neighboring text. I also set the va (short for verticalalignment) keyword, so that the text is presented above the bar (above the point that I use to define the text):
import matplotlib.pyplot as plt
data = [10, 8, 8, 5]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(range(4),data)
ax.set_ylim(0,12)
# extra .4 is because it's half the default width (.8):
ax.text(1.4,8,"2nd bar",rotation=90,va='bottom')
ax.text(2.4,8,"3nd bar",rotation=90,va='bottom')
plt.show()
The result is the following figure:
Determining programmatically if there are collisions between various annotations is a trickier process. This might be worth a separate question: Matplotlib text dimensions.

Just thought I would provide an alternative solution that I just created textalloc that makes sure that text-boxes avoids overlap with both each other and lines when possible, and is fast.
For this example you could use something like this:
import textalloc as ta
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(2017)
x_data = np.random.random_sample(100)
y_data = np.random.random_integers(10,50,(100))
f, ax = plt.subplots(dpi=200)
bars = ax.bar(x_data, y_data, width=0.002, facecolor='k')
ta.allocate_text(f,ax,x_data,y_data,
[str(yy) for yy in list(y_data)],
x_lines=[np.array([xx,xx]) for xx in list(x_data)],
y_lines=[np.array([0,yy]) for yy in list(y_data)],
textsize=8,
margin=0.004,
min_distance=0.005,
linewidth=0.7,
textcolor="b")
plt.show()
This results in this

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Annotation Difficulty in PCA with numpy and matplotlib - python

Related

scatterplot and combined polar histogram in matplotlib

Creating a modulo/folded plot in Python

How to avoid overlapping error bars in matplotlib?

How to change marker size/scale in legend when marker is set to pixel

How to space overlapping annotations

Categories

Resources