Plot multiple histograms as a grid - python

I am trying to plot multiple histograms on the same window using a list of tuples. I have managed to get it to sketch only 1 tuple at a time and I just can't seem to get it to work with all of them.
import numpy as np
import matplotlib.pyplot as plt
a = [(1, 2, 0, 0, 0, 3, 3, 1, 2, 2), (0, 2, 3, 3, 0, 1, 1, 1, 2, 2), (1, 2, 0, 3, 0, 1, 2, 1, 2, 2),(2, 0, 0, 3, 3, 1, 2, 1, 2, 2),(3,1,2,3,0,0,1,2,3,1)] #my list of tuples
q1,q2,q3,q4,q5,q6,q7,q8,q9,q10 = zip(*a) #split into [(1,0,1,2,3) ,(2,2,2,0,1),..etc] where q1=(1,0,1,2,3)
labels, counts = np.unique(q1,return_counts=True) #labels = 0,1,2,3 and counts the occurence of 0,1,2,3
ticks = range(len(counts))
plt.bar(ticks,counts, align='center')
plt.xticks(ticks, labels)
plt.show()
As you can see from the above code, I can plot one tuple at a time say q1,q2 etc but how do I generalise it so that it plots all of them.
I've tried to mimic this python plot multiple histograms, which is exactly what I want however I had no luck.
Thank you for your time :)

You need to define a grid of axes with plt.subplots taking into account the amount of tuples in the list, and how many you want per row. Then iterate over the returned axes, and plot the histograms in the corresponding axis. You could use Axes.hist, but I've always preferred to use ax.bar, from the result of np.unique, which also can return the counts of unique values:
from matplotlib import pyplot as plt
import numpy as np
l = list(zip(*a))
n_cols = 2
fig, axes = plt.subplots(nrows=int(np.ceil(len(l)/n_cols)),
ncols=n_cols,
figsize=(15,15))
for i, (t, ax) in enumerate(zip(l, axes.flatten())):
labels, counts = np.unique(t, return_counts=True)
ax.bar(labels, counts, align='center', color='blue', alpha=.3)
ax.title.set_text(f'Tuple {i}')
plt.tight_layout()
plt.show()
You can customise the above to whatever amount of rows/cols you prefer, for 3 rows for instance:
l = list(zip(*a))
n_cols = 3
fig, axes = plt.subplots(nrows=int(np.ceil(len(l)/n_cols)),
ncols=n_cols,
figsize=(15,15))
for i, (t, ax) in enumerate(zip(l, axes.flatten())):
labels, counts = np.unique(t, return_counts=True)
ax.bar(labels, counts, align='center', color='blue', alpha=.3)
ax.title.set_text(f'Tuple {i}')
plt.tight_layout()
plt.show()

Related

Supblots to include radar plot

I'm running into issues with some subplots. I've provided some sample code to generate the types of plots I would like to create. I'd like these to be the same size, side by side.
I'm am having a really hard time figuring out how to create the subplots though. I keep running into some issues with the thetagrids here. This is what i've tried. I can get these to work seprarately, but cant figure out how to combine them. Eventually I might want a third plot as well.
import numpy as np
import matplotlib.pyplot as plt
## Plot 1
x1 = np.array([0, 1, 2, 3])
y1 = np.array([7, 2, 4, 2])
plt.subplot(1, 2, 1)
plt.figure(figsize=(5, 5))
plt.scatter(x1, y1)
# plt.show()
### Plot 2
# make up data for plot
polar_list = ['a', 'b', 'c', 'd', 'a']
polar_points = [4, 3, 6, 7, 4]
# modify lists for plots
label_loc = np.linspace(start=0, stop=2 * np.pi, num=len(polar_list))
plt.figure(figsize=(5, 5))
plt.subplot(1, 2, 2, polar=True)
plt.plot(label_loc, polar_points, label='DataLable')
plt.title('DataLable comparison', size=20, y=1.05)
lines, labels = plt.thetagrids(np.degrees(label_loc), labels=polar_list)
plt.legend()
plt.show()
You are creating a new figure every time you call plt.figure(). Just place one at the very beginning and then the plt.subplot() will add subplots to the figures.
import numpy as np
import matplotlib.pyplot as plt
## Plot 1
x1 = np.array([0, 1, 2, 3])
y1 = np.array([7, 2, 4, 2])
plt.figure(figsize= (5, 5))
plt.subplot(1, 2, 1)
plt.scatter(x1, y1)
# plt.show()
### Plot 2
# make up data for plot
polar_list = ['a', 'b', 'c', 'd', 'a']
polar_points = [4, 3, 6, 7, 4]
# modify lists for plots
label_loc = np.linspace(start=0, stop=2 * np.pi, num=len(polar_list))
plt.subplot(1, 2, 2, polar=True)
plt.plot(label_loc, polar_points, label='DataLable')
plt.title('DataLable comparison', size=20, y=1.05)
lines, labels = plt.thetagrids(np.degrees(label_loc), labels=polar_list)
plt.legend()
plt.show()

python distplot with color by values

I wany to create a dist plot (preferably using seaborn) with different colors to different range of values.
I have the vector:
[3,1,2,3,5,6,8,0,0,5,7,0,1, 0.2]
And I want to create a distplot such that all the parts with range 0 to 1 will be red and all the other will be blue.
What is the best way to do so?
I don't know if there is an easy way in seaborn to do this but doing the plot yourself is probably much easier. First you need to get equally sized bins (if you want that) such that the plot looks homogenous (np.histogram). Afterwards it's just a single numpy filter on your observations and the plot.
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
x = np.array([3,1,2,3,5,6,8,0,0,5,7,0,1, 0.2])
# make equal binning through the range, you can adapt the bin size here
counts, bins = np.histogram(x, bins=10)
# here we do the filtering and split the observations based on your color code
x1 = x[(x <= 1) & (x >= 0)]
x2 = x[~((x <= 1) & (x >= 0))]
# finally, do the plot
f, ax = plt.subplots()
ax.hist(x1, bins=bins, color="tab:red")
ax.hist(x2, bins=bins, color="tab:blue")
ax.set(xlabel="Measurement", ylabel="Counts", title="histogram with 2 colors")
sns.despine()
Gives you:
I think you need a scatter plot. In that case, you can try the following solution. Here you first create a column of colors based on your condition and then assign those colors to the scatter plot.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = np.array([3, 1, 2, 3, 5, 6, 8, 0, 0, 5, 7, 0,1, 0.2])
df = pd.DataFrame({'data':data}).reset_index()
df['colors'] = np.where(data<1, 'red', 'blue')
plt.scatter(df['index'], df['data'], c=df['colors'])
Alternative would be to plot directly using DataFrame
data = np.array([3, 1, 2, 3, 5, 6, 8, 0, 0, 5, 7, 0,1, 0.2])
df = pd.DataFrame({'data':data}).reset_index()
colors = np.where(data<1, 'red', 'blue')
df.plot(kind='scatter', x='index', y='data',c=colors)

How to add counts of points as a label in a sparse scatter plot

I have sparse scatter plot to visualize the comparison of predicted vs actual values. The range of the values are 1-4 and there are no decimal points.
I have tried plotly so far with hte following code (but I can also use a matplotlib solution):
my_scatter = go.Scatter(
x = y_actual, y = y_pred, mode = 'markers',
marker = dict(color = 'rgb(240, 189, 89)', opacity=0.5)
)
This prints the graph nicely (see below). I use opacity to see the density at each point. I.e. if two points lie on top of each other, the point will be shown in darker color. However, this is not explanatory enough. Is it possible to add the counts at each point as a label? There are some overlaps at certain intersections. I want to display how many points intersects. Can this be done automatically using matplotlib or plotly?
This answer uses matplotlib.
To answer the initial question first: You need to find out how often the data produces a point at a given coordinate to be able to annotate the points. If all values are integers this can easily be done using a 2d histogram. Out of the hstogram one would then select only those bins where the count value is nonzero and annotate the respective values in a loop:
x = [3, 0, 1, 2, 2, 0, 1, 3, 3, 3, 4, 1, 4, 3, 0]
y = [1, 0, 4, 3, 2, 1, 4, 0, 3, 0, 4, 2, 3, 3, 1]
import matplotlib.pyplot as plt
import numpy as np
x = np.array(x)
y = np.array(y)
hist, xbins,ybins = np.histogram2d(y,x, bins=range(6))
X,Y = np.meshgrid(xbins[:-1], ybins[:-1])
X = X[hist != 0]; Y = Y[hist != 0]
Z = hist[hist != 0]
fig, ax = plt.subplots()
ax.scatter(x,y, s=49, alpha=0.4)
for i in range(len(Z)):
ax.annotate(str(int(Z[i])), xy=(X[i],Y[i]), xytext=(4,0),
textcoords="offset points" )
plt.show()
You may then decide not to plot all points but the result from the histogramming which offers the chance to change the color and size of the scatter points,
ax.scatter(X,Y, s=(Z*20)**1.4, c = Z/Z.max(), cmap="winter_r", alpha=0.4)
Since all values are integers, you may also opt for an image plot,
fig, ax = plt.subplots()
ax.imshow(hist, cmap="PuRd")
for i in range(len(Z)):
ax.annotate(str(int(Z[i])), xy=(X[i],Y[i]), xytext=(0,0), color="w",
ha="center", va="center", textcoords="offset points" )
Without the necesity to calculate the number of occurances, another option is to use a hexbin plot. This gives slightly inaccurate positions of the dots, du to the hexagonal binning, but I still wanted to mention this option.
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np
x = np.array(x)
y = np.array(y)
fig, ax = plt.subplots()
cmap = plt.cm.PuRd
cmaplist = [cmap(i) for i in range(cmap.N)]
cmaplist[0] = (1.0,1.0,1.0,1.0)
cmap = matplotlib.colors.LinearSegmentedColormap.from_list('mcm',cmaplist, cmap.N)
ax.hexbin(x,y, gridsize=20, cmap=cmap, linewidth=0 )
plt.show()

Suggestions to plot overlapping lines in matplotlib?

Does anybody have a suggestion on what's the best way to present overlapping lines on a plot? I have a lot of them, and I had the idea of having full lines of different colors where they don't overlap, and having dashed lines where they do overlap so that all colors are visible and overlapping colors are seen.
But still, how do I that.
I have the same issue on a plot with a high degree of discretization.
Here the starting situation:
import matplotlib.pyplot as plt
grid=[x for x in range(10)]
graphs=[
[1,1,1,4,4,4,3,5,6,0],
[1,1,1,5,5,5,3,5,6,0],
[1,1,1,0,0,3,3,2,4,0],
[1,2,4,4,3,2,3,2,4,0],
[1,2,3,3,4,4,3,2,6,0],
[1,1,3,3,0,3,3,5,4,3],
]
for gg,graph in enumerate(graphs):
plt.plot(grid,graph,label='g'+str(gg))
plt.legend(loc=3,bbox_to_anchor=(1,0))
plt.show()
No one can say where the green and blue lines run exactly
and my "solution"
import matplotlib.pyplot as plt
grid=[x for x in range(10)]
graphs=[
[1,1,1,4,4,4,3,5,6,0],
[1,1,1,5,5,5,3,5,6,0],
[1,1,1,0,0,3,3,2,4,0],
[1,2,4,4,3,2,3,2,4,0],
[1,2,3,3,4,4,3,2,6,0],
[1,1,3,3,0,3,3,5,4,3],
]
for gg,graph in enumerate(graphs):
lw=10-8*gg/len(graphs)
ls=['-','--','-.',':'][gg%4]
plt.plot(grid,graph,label='g'+str(gg), linestyle=ls, linewidth=lw)
plt.legend(loc=3,bbox_to_anchor=(1,0))
plt.show()
I am grateful for suggestions on improvement!
Just decrease the opacity of the lines so that they are see-through. You can achieve that using the alpha variable. Example:
plt.plot(x, y, alpha=0.7)
Where alpha ranging from 0-1, with 0 being invisible.
imagine your panda data frame is called respone_times, then you can use alpha to set different opacity for your graphs. Check the picture before and after using alpha.
plt.figure(figsize=(15, 7))
plt.plot(respone_times,alpha=0.5)
plt.title('a sample title')
plt.grid(True)
plt.show()
Depending on your data and use case, it might be OK to add a bit of random jitter to artificially separate the lines.
from numpy.random import default_rng
import pandas as pd
rng = default_rng()
def jitter_df(df: pd.DataFrame, std_ratio: float) -> pd.DataFrame:
"""
Add jitter to a DataFrame.
Adds normal distributed jitter with mean 0 to each of the
DataFrame's columns. The jitter's std is the column's std times
`std_ratio`.
Returns the jittered DataFrame.
"""
std = df.std().values * std_ratio
jitter = pd.DataFrame(
std * rng.standard_normal(df.shape),
index=df.index,
columns=df.columns,
)
return df + jitter
Here's a plot of the original data from Markus Dutschke's example:
And here's the jittered version, with std_ratio set to 0.1:
Replacing solid lines by dots or dashes works too
g = sns.FacetGrid(data, col='config', row='outputs', sharex=False)
g.map_dataframe(sns.lineplot, x='lag',y='correlation',hue='card', linestyle='dotted')
Instead of random jitter, the lines can be offset just a little bit, creating a layered appearance:
import matplotlib.pyplot as plt
from matplotlib.transforms import offset_copy
grid = list(range(10))
graphs = [[1, 1, 1, 4, 4, 4, 3, 5, 6, 0],
[1, 1, 1, 5, 5, 5, 3, 5, 6, 0],
[1, 1, 1, 0, 0, 3, 3, 2, 4, 0],
[1, 2, 4, 4, 3, 2, 3, 2, 4, 0],
[1, 2, 3, 3, 4, 4, 3, 2, 6, 0],
[1, 1, 3, 3, 0, 3, 3, 5, 4, 3]]
fig, ax = plt.subplots()
lw = 1
for gg, graph in enumerate(graphs):
trans_offset = offset_copy(ax.transData, fig=fig, x=lw * gg, y=lw * gg, units='dots')
ax.plot(grid, graph, lw=lw, transform=trans_offset, label='g' + str(gg))
ax.legend(loc='upper left', bbox_to_anchor=(1.01, 1.01))
# manually set the axes limits, because the transform doesn't set them automatically
ax.set_xlim(grid[0] - .5, grid[-1] + .5)
ax.set_ylim(min([min(g) for g in graphs]) - .5, max([max(g) for g in graphs]) + .5)
plt.tight_layout()
plt.show()

pyplot legend for scatter plot colored by values

When I use a variable for coloring a scatter plot, how can I make a legend stating what colors represent? How can I make the legend show a label of 0 represents empty and 1 represents full?
import matplotlib.pyplot as plt
X = [1,2,3,1,2,3,4]
Y = [1,1,1,2,2,2,2]
label = [0,1,1,0,0,1,1]
plt.scatter(X, Y, c= label, s=50)
plt.show()
Give this code a try:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
X = [1, 2 ,3, 1, 2, 3, 4]
Y = [1, 1, 1, 2, 2, 2, 2]
labels = [0, 1, 1, 0, 0, 1, 1]
key = {0: ('red', 'empty'), 1: ('green', 'full')}
plt.scatter(X, Y, c=[key[index][0] for index in labels], s=50)
patches = [mpatches.Patch(color=color, label=label) for color, label in key.values()]
plt.legend(handles=patches, labels=[label for _, label in key.values()], bbox_to_anchor=(1, .3))
plt.show()
And this is what you'll get:
To use colors or labels different than those shown in the figure you simply need to change the values of the dictionary key appropriately.

Categories