How to plot 1-d data at given y-value with pylab - python

I want to plot the data points that are in a 1-D array just along the horizontal axis [edit: at a given y-value], like in this plot:
How can I do this with pylab?

Staven already edited his post to include how to plot the values along y-value 1, but he was using Python lists.
A variant that should be faster (although I did not measure it) only uses numpy arrays:
import numpy as np
import matplotlib.pyplot as pp
val = 0. # this is the value where you want the data to appear on the y-axis.
ar = np.arange(10) # just as an example array
pp.plot(ar, np.zeros_like(ar) + val, 'x')
pp.show()
As a nice-to-use function that offers all usual matplotlib refinements via kwargs this would be:
def plot_at_y(arr, val, **kwargs):
pp.plot(arr, np.zeros_like(arr) + val, 'x', **kwargs)
pp.show()

This will plot the array "ar":
import matplotlib.pyplot as pp
ar = [1, 2, 3, 8, 4, 5]
pp.plot(ar)
pp.show()
If you are using ipython, you can start it with the "-pylab" option and it will import numpy and matplotlib automatically on startup, so you just need to write:
ar = [1, 2, 3, 8, 4, 5]
plot(ar)
To do a scatter plot with the y coordinate set to 1:
plot(ar, len(ar) * [1], "x")

X = np.arange(10)
plt.scatter( X, [0] * X.shape[0])
Click on the link to check the plot

Related

y axis has decreasing values instead of increasing ones for plt

I am trying to build a histogram and here is my code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
x = ['0','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','38','40','41','42','43','44','45','48','50','51','53','54','57','60','64','70','77','93','104','108','147'] #sample names
y = ['164','189','288','444','311','216','122','111','92','54','45','31','31','30','18','15','15','10','4','15','2','8','6','4','7','5','3','3','1','10','3','3','3','2','4','2','1','1','1','2','2','1','1','1','1','1','2','1','2','2','2','1','1','2','1','1','1','1']
plt.bar(x, y)
plt.xlabel('Number of Methods')
plt.ylabel('Variables')
plt.show()
Here is the histogram I obtain:
I would like the values in the y axis to be in an increasing order. This means that 1 should be first followed by 3, 5, 7, etc. How can I fix this?
They're not decreasing, they're in the order in which they are in the list, because the list items are strings. Try
x = [int(i) for i in x]
y = [int(i) for i in y]
to convert them to numbers before plotting.

How can I plot 10 normal distribution in one graph ( example picture included)

Assume I have 10 normal distribution how can I plot them like picture below.
my normal distribution samples:
import numpy as np
q = np.random.normal(0, 1, 10)
random_distr = []
for i in q:
random_distr.append(np.random.normal(i, 1, 1000))
the plot I want to create:
You can use the pyplot.violinplot() function.
import numpy as np
import matplotlib.pyplot as plt
q = np.random.normal(0, 1, 10)
random_distr = []
for i in q:
random_distr.append(np.random.normal(i, 1, 1000))
plt.violinplot(random_distr)
plt.show()
You may also want to check seaborn.

How to adjust branch lengths of dendrogram in matplotlib (like in astrodendro)? [Python]

Here is my resulting plot below but I would like it to look like the truncated dendrograms in astrodendro such as this:
There is also a really cool looking dendrogram from this paper that I would like to recreate in matplotlib.
Below is the code for generating an iris data set with noise variables and plotting the dendrogram in matplotlib.
Does anyone know how to either: (1) truncate the branches like in the example figures; and/or (2) to use astrodendro with a custom linkage matrix and labels?
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
import astrodendro
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial import distance
def iris_data(noise=None, palette="hls", desat=1):
# Iris dataset
X = pd.DataFrame(load_iris().data,
index = [*map(lambda x:f"iris_{x}", range(150))],
columns = [*map(lambda x: x.split(" (cm)")[0].replace(" ","_"), load_iris().feature_names)])
y = pd.Series(load_iris().target,
index = X.index,
name = "Species")
c = map_colors(y, mode=1, palette=palette, desat=desat)#y.map(lambda x:{0:"red",1:"green",2:"blue"}[x])
if noise is not None:
X_noise = pd.DataFrame(
np.random.RandomState(0).normal(size=(X.shape[0], noise)),
index=X_iris.index,
columns=[*map(lambda x:f"noise_{x}", range(noise))]
)
X = pd.concat([X, X_noise], axis=1)
return (X, y, c)
def dism2linkage(DF_dism, method="ward"):
"""
Input: A (m x m) dissimalrity Pandas DataFrame object where the diagonal is 0
Output: Hierarchical clustering encoded as a linkage matrix
Further reading:
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.linkage.html
https://pypi.python.org/pypi/fastcluster
"""
#Linkage Matrix
Ar_dist = distance.squareform(DF_dism.as_matrix())
return linkage(Ar_dist,method=method)
# Get data
X_iris_with_noise, y_iris, c_iris = iris_data(50)
# Get distance matrix
df_dism = 1- X_iris_with_noise.corr().abs()
# Get linkage matrix
Z = dism2linkage(df_dism)
#Create dendrogram
with plt.style.context("seaborn-white"):
fig, ax = plt.subplots(figsize=(13,3))
D_dendro = dendrogram(
Z,
labels=df_dism.index,
color_threshold=3.5,
count_sort = "ascending",
#link_color_func=lambda k: colors[k]
ax=ax
)
ax.set_ylabel("Distance")
I'm not sure this really constitutes a practical answer, but it does allow you to generate dendrograms with truncated hanging lines. The trick is to generate the plot as normal, then manipulate the resulting matplotlib plot to recreate the lines.
I couldn't get your example to work locally, so I've just created a dummy dataset.
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
a = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[5,])
b = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[5,])
X = np.concatenate((a, b),)
Z = linkage(X, 'ward')
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
dendrogram(Z, ax=ax)
The resulting plot is the usual long-arm dendrogram.
Now for the more interesting bit. A dendrogram is made up of a number of LineCollection objects (one for each colour). To update the lines we iterate through these, extracting the details about their constituent paths, modifying these to remove any lines reaching to a y of zero, and then recreating a LineCollection for these modified paths.
The updated path is then added to the axes, and the original is removed.
The one tricky part is determining what height to draw to instead of zero. Since we are iterating over each dendrograms path, we don't know which point came before — we basically have no idea where we are. However, we can exploit the fact that hanging lines hang vertically. Assuming there are no lines on the same x, we can look for the known other y values for a given x and use that as the basis for our new y when calculating. The downside is that in order to make sure we have this number, we have to pre-scan the data.
Note: If you can get dendrogram hanging lines on the same x, you would need to include the y and search for nearest y above this x to do this.
import numpy as np
from matplotlib.path import Path
from matplotlib.collections import LineCollection
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
dendrogram(Z, ax=ax);
for c in ax.collections[:]: # use [:] to get a copy, since we're adding to the same list
paths = []
for path in c.get_paths():
segments = []
y_at_x = {}
# Pre-pass over all elements, to find the lowest y value at each x value.
# we can use this to caculate where to cut our lines.
for n, seg in enumerate(path.iter_segments()):
x, y = seg[0]
# Don't store if the y is zero, or if it's higher than the current low.
if y > 0 and y < y_at_x.get(x, np.inf):
y_at_x[x] = y
for n, seg in enumerate(path.iter_segments()):
x, y = seg[0]
if y == 0:
# If we know the last y at this x, use it - 0.5, limit > 0
y = max(0, y_at_x.get(x, 0) - 0.5)
segments.append([x,y])
paths.append(segments)
lc = LineCollection(paths, colors=c.get_colors()) # Recreate a LineCollection with the same params
ax.add_collection(lc)
ax.collections.remove(c) # Remove the original LineCollection
The resulting dendrogram looks like this:

Smooth curved line between 3 points in plot

I have 3 data points on the x axis and 3 on the y axis:
x = [1,3,5]
y=[0,5,0]
I would like a curved line that starts at (1,0), goes to the highest point at (3,5) and then finishes at (5,0)
I think I need to use interpolation, but unsure how. If I use spline from scipy like this:
import bokeh.plotting as bk
from scipy.interpolate import spline
p = bk.figure()
xvals=np.linspace(1, 5, 10)
y_smooth = spline(x,y,xvals)
p.line(xvals, y_smooth)
bk.show(p)
I get the highest point before (3,5) and it looks unbalanced:
The issue is due to that spline with no extra argument is of order 3. That means that you do not have points/equations enough to get a spline curve (which manifests itself as a warning of an ill-conditioned matrix). You need to apply a spline of lower order, such as a cubic spline, which is of order 2:
import bokeh.plotting as bk
from scipy.interpolate import spline
p = bk.figure()
xvals=np.linspace(1, 5, 10)
y_smooth = spline(x,y,xvals, order=2) # This fixes your immediate problem
p.line(xvals, y_smooth)
bk.show(p)
In addition, spline is deprecated in SciPy, so you should preferably not use it, even if it is possible. A better solution is to use the CubicSpline class:
import bokeh.plotting as bk
from scipy.interpolate import CubicSpline
p = bk.figure()
xvals=np.linspace(1, 5, 10)
spl = CubicSpline(x, y) # First generate spline function
y_smooth = spl(xvals) # then evalute for your interpolated points
p.line(xvals, y_smooth)
bk.show(p)
Just to show the difference (using pyplot):
As can be seen, the CubicSpline is identical to the spline of order=2
use pchip_interpolate():
import numpy as np
from scipy import interpolate
x = [1,3,5]
y=[0,5,0]
x2 = np.linspace(x[0], x[-1], 100)
y2 = interpolate.pchip_interpolate(x, y, x2)
pl.plot(x2, y2)
pl.plot(x, y, "o")
the result:
You can use quadratic interpolation. This is possible by making use of scipy.interpolate.interp1d.
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import numpy as np
x = [1, 3, 5]
y = [0, 5, 0]
f = interp1d(x, y, kind='quadratic')
x_interpol = np.linspace(1, 5, 1000)
y_interpol = f(x_interpol)
plt.plot(x_interpol, y_interpol)
plt.show()
Check the documentation for more details.

Looping within matplotlib

I am trying to plot multiple graphs on a single set of axis.
I have a 2D array of data and want to break it down into 111 1D arrays and plot them. Here is an example of my code so far:
from numpy import *
import matplotlib.pyplot as plt
x = linspace(1, 130, 130) # create a 1D array of 130 integers to set as the x axis
y = Te25117.data # set 2D array of data as y
plt.plot(x, y[1], x, y[2], x, y[3])
This code works fine, but I cannot see a way of writing a loop which will loop within the plot itself. I can only make the code work if I explicitly write a number 1 to 111 each time, which is not ideal! (The range of numbers I need to loop over is 1 to 111.)
Let me guess...long time matlab user?
Matplotlib automatically add a line plot to the present plot if you don't create a new one. So your code can be simply:
from numpy import *
import matplotlib.pyplot as plt
x = linspace(1, 130, 130) # create a 1D array of 130 integers to set as the x axis
y = Te25117.data # set 2D array of data as y
L = len(y) # I assume you can infere the size of the data in this way...
#L = 111 # this is if you don't know any better
for i in range(L)
plt.plot(x, y[i], color='mycolor',linewidth=1)
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1,2])
y = np.array([[1,2],[3,4]])
In [5]: x
Out[5]: array([1, 2])
In [6]: y
Out[6]:
array([[1, 2],
[3, 4]])
In [7]: for y_i in y:
....: plt.plot(x, y_i)
Will plot these in one figure.

Categories