Dynamic treshold line in a bar chart - python

I have a stacked bar char where I want to add a dynamic threshold line. The threshold is calculated via a simple formular (90% of each specific value)
Graphic attached. The green line is what I am looking for. Looking forward for any idea how to approach this problem.

Here is what I came up with:
The idea was to have a continuous segment of Xs projected with a constant y value with a 0.5 excess before and after:
import numpy as np
import matplotlib.pyplot as plt
groups = 9
X = list(range(1, groups))
y = [1, 1, 2, 2, 1, 2, 1, 1]
threshold_interval_x = np.arange(min(X) - 0.5, max(X) + 0.5, 0.01).tolist()
threshold_y = []
for y_elt in y:
for i in range(0, int(len(threshold_interval_x) / (groups - 1))):
threshold_y.append(y_elt * 0.9)
plt.bar(X, y, width=0.4, align='center', color='yellow')
plt.plot(threshold_interval_x, threshold_y, color='green')
labels_X = ['PD', 'PZV', 'PP', 'FW', 'BA', 'IA', 'EA', 'NA']
plt.xticks(X, labels_X, rotation='horizontal')
plt.show()
And here's the output:

You could use matplotlibs step-function for this:
import pandas as pd
import matplotlib.pyplot as plt
supposed your data is structured like this:
df = pd.DataFrame({'In': [1, 1, 1, 2 , 0, 2, 0, 0], 'Out': [0, 0, 1, 0, 1, 0, 1, 1]}, index=['PD', 'PZV', 'PP', 'FW', 'BA', 'IA', 'EA', 'NA'])
In Out
PD 1 0
PZV 1 0
PP 1 1
FW 2 0
BA 0 1
IA 2 0
EA 0 1
NA 0 1
Then plotting the bars would be
df.plot(kind='bar', stacked=True, rot=0, color=['gold', 'beige'])
and plotting the threshold line at 90% of the sum would be
plt.step(df.index, df.sum(1) * .9, 'firebrick', where='mid', label = 'Ziel: 90%')
add legend:
plt.legend()
leads to:

Related

Matplotlib align uneven number of subplots

I want to plot 11 figures using subplots. My idea is to have 2 rows: 6 plots on the first, 5 on the second. I use the following code.
import matplotlib.pyplot as plt
import pandas as pd
fig, axes = plt.subplots(2, 6, figsize=(30, 8))
fig.tight_layout(h_pad=6, w_pad=6)
x = 0
y = 0
for i in range(0, 11):
data = [[1, i*1], [2, i*2*2], [3, i*3*3]]
df = pd.DataFrame(data, columns = ['x', 'y'])
df.plot('x', ['y'], ax=axes[x,y])
y += 1
if y > 5:
y = 0
x += 1
fig.delaxes(ax=axes[1,5])
This works, but the bottom row is not aligned to the center, which makes the result a bit ugly. I want the figures to all be of the same size, so I cannot extend the last one to make everything even.
My question: how do I align the second row to be centered such that the full picture is symmetrical?
You could use gridspec dividing each row into 12 partitions and recombining them pairswise:
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import pandas as pd
fig = plt.figure(figsize=(12, 5))
gs = gridspec.GridSpec(2, 12)
for i in range(0, 11):
if i < 6:
ax = plt.subplot(gs[0, 2 * i:2 * i + 2])
else:
ax = plt.subplot(gs[1, 2 * i - 11:2 * i + 2 - 11])
data = [[1, i * 1], [2, i * 2 * 2], [3, i * 3 * 3]]
df = pd.DataFrame(data, columns=['x', 'y'])
df.plot('x', 'y', ax=ax)
plt.tight_layout()
plt.show()

Generate a heat map from CSV file using numpy and matplotlib - how to display negative numbers on axis

I would like to produce a heat map from CSV data that contain negative values in the x axis. I copied code from this post as a starting point: previous post. However, when I try it this does not display the negative x values. In fact, with some data sets (like the example) it doesn't appear to set the correct axis values at all. I am unsure why this is the case as the axis seem to be defined from the CSV data in the code. I thought it might be to do with dtype=np.int but it seems it is not.
import numpy as np
import matplotlib.pyplot as plt
csv_file_path = '<FILE PATH>'
def get_xyz_from_csv_file_np(csv_file_path):
'''
get a grid of values from a csv file
csv file format: x0,y0,z0
'''
x, y, z = np.loadtxt(csv_file_path, delimiter=',', dtype=np.int).T
plt_z = np.zeros((y.max()+1, x.max()+1))
plt_z[y, x] = z
return plt_z
def draw_heatmap(plt_z):
# Generate y and x values from the dimension lengths
plt_y = np.arange(plt_z.shape[0])
plt_x = np.arange(plt_z.shape[1])
z_min = plt_z.min()
z_max = plt_z.max()
plot_name = "plot"
z_name = "Signal"
color_map = plt.cm.rainbow
fig, ax = plt.subplots()
cax = ax.pcolor(plt_x, plt_y, plt_z, cmap=color_map, vmin=z_min, vmax=z_max)
ax.set_xlim(plt_x.min(), plt_x.max())
ax.set_ylim(plt_y.min(), plt_y.max())
fig.colorbar(cax).set_label(z_name, rotation=270)
ax.set_title(plot_name)
ax.set_aspect('auto')
plt.show()
return figure
figure = plt.gcf()
plt.show()
return figure
if __name__ == "__main__":
fname = 'temp.csv'
# create_test_csv(fname)
res = get_xyz_from_csv_file_np(csv_file_path)
draw_heatmap(res)
The output I get is this:
The example data file is a comma delimited csv with this data (x,y,z):
-2 -1 0
-2 0 10
-2 1 0
-1 -1 2
-1 0 5
-1 1 2
0 -1 0
0 0 0
0 1 10
1 -1 10
1 0 0
1 1 0
2 -1 10
2 0 0
2 1 10
Can anyone (1) fix this code so that negative values can be displayed an the axis are correct and (2) explain to me what I am doing wrong.
Thanks!
The code below first mimics the .csvfile with an array and then extracts x, y and z. To know the dimensions, not just the maximum but the difference between maximum and minimum need to be considered. The x and y arrays are only interesting for the rest of the code because of their minimum and maximum.
To draw the heatmap, only plt_z is needed, as it already has the correct shape. x and y can be used to set the extents (i.e. the values for the x and y axis). plt.imshow() is a similar function to plt.pcolor() but allows to set the extents as a parameter. It needs origin='lower' because for many image formats the origin is at the top.
To have the ticks in the center of the cells, an extra margin 0.5 needs to be added. To have the ticks shown at every integer position, a MultipleLocator() can be used.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import ticker
def get_xyz_from_csv_file_np():
data = [[-2, -1, 0],
[-2, 0, 10],
[-2, 1, 0],
[-1, -1, 2],
[-1, 0, 5],
[-1, 1, 2],
[0, -1, 0],
[0, 0, 0],
[0, 1, 10],
[1, -1, 10],
[1, 0, 0],
[1, 1, 0],
[2, -1, 10],
[2, 0, 0],
[2, 1, 10]]
data = np.array(data, dtype=np.int)
x = data[:, 0]
y = data[:, 1]
z = data[:, 2]
n = y.max() - y.min() + 1
m = x.max() - x.min() + 1
return x.reshape(n, m), y.reshape(n, m), z.reshape(n, m)
def draw_heatmap(plt_x, plt_y, plt_z):
plot_name = "plot"
z_name = "Signal"
color_map = plt.cm.rainbow
fig, ax = plt.subplots()
cax = ax.imshow(plt_z, cmap=color_map,
extent=[plt_x.min() - 0.5, plt_x.max() + 0.5, plt_y.min() - 0.5, plt_y.max() + 0.5], origin='lower')
fig.colorbar(cax).set_label(z_name, rotation=270)
ax.set_title(plot_name)
ax.set_aspect('auto')
# optionally force to have ticks at every integer position
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))
plt.show()
x, y, z = get_xyz_from_csv_file_np()
draw_heatmap(x, y, z)
PS: In case the z-values have a natural order, it would be best not to use the rainbow colormap, but one of the 'Perceptually Uniform Sequential' colormaps ('viridis', 'plasma', 'inferno', 'magma', 'cividis').

Plotting multiple 3d lines in one figure using plotly

I have many 2d sequences with variable length, i.e. lists of list where each sublist is a sequence. I want to project these sequences/lines/sublists in a 3d visualisation adding time-step as another dimension. So far I am failing to plot all the 3d lines using plotly.express.
import plotly.express as px
t = [[ii+1 for ii in range(len(features[i]))] for i in range(len(labels))]
x0 = [[x[0] for x in features[i]] for i in range(len(labels))]
x1 = [[x[1] for x in features[i]] for i in range(len(labels))]
df = pd.DataFrame(dict(
X=[tii for ti in t for tii in ti],
Y=[xii for xi in x0 for xii in xi],
Z=[xii for xi in x1 for xii in xi],
color=[aa for a in labels for aa in a]
))
fig = px.line_3d(df, x="X", y="Y", z="Z", color="color")
fig.show
This is what I get, which is not really what I want. It is treating all the cases/sublists with common label as one single sequence, thus we see at the end of each line it goes back the where it starts. I have looked up on how to iteratively plotting this in a for-loop (just like matplotlib) (basically creating a new pandas dataframe at each iteration and plot it), however with no success. Does anyone have any experience on this please? Much appreciated!
A mcve is as below:
import plotly.express as px
import numpy as np
import pandas as pd
features = [np.random.rand(4,2).tolist(),
np.random.rand(5,2).tolist(),
np.random.rand(6,2).tolist(),
np.random.rand(5,2).tolist(),
np.random.rand(9,2).tolist()]
labels = [[1, 1, 1, 1], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2], [0, 0, 0, 0, 0, 0, 0, 0, 0]]
t = [[ii+1 for ii in range(len(features[i]))] for i in range(len(labels))]
x0 = [[x[0] for x in features[i]] for i in range(len(labels))]
x1 = [[x[1] for x in features[i]] for i in range(len(labels))]
df2 = pd.DataFrame(dict(
X=[tii for ti in t for tii in ti],
Y=[xii for xi in x0 for xii in xi],
Z=[xii for xi in x1 for xii in xi],
color=[aa for a in labels for aa in a]
))
fig1 = px.line_3d(df2, x="X", y="Y", z="Z", color="color")
fig1.show()
You see basically 3 lines instead of 5.
Your problems is that you are using the same label for different traces. Here a workaround with a loop
import numpy as np
import plotly.graph_objs as go
features = [np.random.rand(4,2).tolist(),
np.random.rand(5,2).tolist(),
np.random.rand(6,2).tolist(),
np.random.rand(5,2).tolist(),
np.random.rand(9,2).tolist()]
labels = [[1, 1, 1, 1],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 0, 0, 0, 0]]
fig = go.Figure()
for i, feat in enumerate(features):
feat = np.array(feat)
fig.add_trace(
go.Scatter3d(
x=np.arange(len(feat)),
y=feat[:,0],
z=feat[:,1],
mode='lines',
hovertext=labels[i]
)
)
fig.show()
You might need to play with trace names.
Update
Hoefully it's not too overcomplicated but it is meant to be as generic as possible
import numpy as np
import plotly.graph_objs as go
from itertools import cycle
def plotly_color_map(names):
# From https://stackoverflow.com/a/44727682
plotly_colors = cycle(['#1f77b4', # muted blue
'#ff7f0e', # safety orange
'#2ca02c', # cooked asparagus green
'#d62728', # brick red
'#9467bd', # muted purple
'#8c564b', # chestnut brown
'#e377c2', # raspberry yogurt pink
'#7f7f7f', # middle gray
'#bcbd22', # curry yellow-green
'#17becf' # blue-teal
])
return dict(zip(names, plotly_colors))
features = [np.random.rand(4,2).tolist(),
np.random.rand(5,2).tolist(),
np.random.rand(6,2).tolist(),
np.random.rand(5,2).tolist(),
np.random.rand(9,2).tolist()]
labels = [[1, 1, 1, 1],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 0, 0, 0, 0]]
legend_groups = [l[0] for l in labels]
traces = [False if (len(legend_groups[:i])>0 and l in legend_groups[:i])
else True for i, l in enumerate(legend_groups)]
cm = plotly_color_map(set(legend_groups))
fig = go.Figure()
for i, feat in enumerate(features):
feat = np.array(feat)
fig.add_trace(
go.Scatter3d(
x=np.arange(len(feat)),
y=feat[:,0],
z=feat[:,1],
mode='lines',
line={"color":cm[legend_groups[i]]},
legendgroup=legend_groups[i],
hovertext=labels[i],
showlegend=traces[i],
name="label_{}".format(legend_groups[i])
)
)
fig.show()

Creating a series of pie charts from a dataframe with color linked to indexes's values

This is an example dataframe:
import pandas as pd
import numpy as np
values = np.array([
[0, 1, 2, 0, 0, 4],
[1, 0, 0, 1, 1, 0 ],
[0, 4, 0, 0, 2, 1],
[2, 0, 2, 0, 4, 0],
])
indexes= 'a','b','c','d'
columns='ab','bc','cd','de','ef','fg'
df = pd.DataFrame(index=indexes,columns=columns, data=values)
print(df)
from this dataframe I need to create a series of pie charts, one for every column, shown on the same figure, where the slices dimension is fixed (equal to 100/len(indexes)) and the color of the slices depends on the value of the index, in particular: white if 0, green if 1, yellow if 2, red if 4.
What suggestions can you give me?
I found that:
df.plot(kind='pie', subplots=True, figsize=(len(columns)*2, 2))
it creates a series, but I can't control the input values...
I've created a pie for a column, but then I wasn't able to link the color to the value of index:
labels = indexes
sizes = np.linspace(100/len(labels),100/len(labels), num=len(labels))
fig1, ax1 = plt.subplots()
ax1.pie(sizes, labels=labels)
ax1.axis('equal')
plt.show()
ImportanceOfBeingErnest answer has helped me giving to the piechart the wanted look:
fig1, ax1 = plt.subplots()
labels = indexes
sizes = np.linspace(100/len(labels),100/len(labels), num=len(labels))
coldic = {0 : "w", 1 : "g", 2 : "y", 4 : "r" }
colors = [coldic[v] for v in values[:,0]]
ax1.pie(sizes, labels=labels, colors=colors,counterclock=False, startangle=90)
ax1.axis('equal')
plt.show()
Now the colors a linked to the values, and the dimensions of the slices are fixed. I just need to have the same pie chart for all the columns and in the same image.
The importance of these charts is given by the colors, not the dimensions of the slices, which I want to be always equal.
Thanks for your time!
Not relying on pandas internal plotting functions (which are of course limited) one can use matplotlib' pie function to plot the diagrams.
The colors can be set as a list, which is generated from the values according to some mapping dictionary.
import numpy as np
import matplotlib.pyplot as plt
coldic = {0 : "w", 1 : "g", 2 : "y", 4 : "r" }
values = np.array([
[0, 1, 2, 0, 0, 4],
[1, 0, 0, 1, 1, 0 ],
[0, 4, 0, 0, 2, 1],
[2, 0, 2, 0, 4, 0],
])
labels= ['a','b','c','d']
fig1, axes = plt.subplots(ncols=values.shape[1], )
for i in range(values.shape[1]):
colors = [coldic[v] for v in values[:,i]]
labs = [l if values[j,i] > 0 else "" for j, l in enumerate(labels)]
axes[i].pie(values[:,i], labels=labs, colors=colors)
axes[i].set_aspect("equal")
plt.show()
For fixed wedge sizes you just use a fixed array to supply to pie.
import numpy as np
import matplotlib.pyplot as plt
coldic = {0 : "w", 1 : "g", 2 : "y", 4 : "r" }
values = np.array([
[0, 1, 2, 0, 0, 4],
[1, 0, 0, 1, 1, 0 ],
[0, 4, 0, 0, 2, 1],
[2, 0, 2, 0, 4, 0],
])
labels= ['a','b','c','d']
fig1, axes = plt.subplots(ncols=values.shape[1], )
for i in range(values.shape[1]):
colors = [coldic[v] for v in values[:,i]]
axes[i].pie(np.ones(values.shape[0]), labels=labels, colors=colors,
wedgeprops=dict(linewidth=1, edgecolor="k"))
axes[i].set_aspect("equal")
axes[i].set_title("".join(list(map(str,values[:,i]))))
plt.show()

Divide seaborn matrix by lines according to clustering

This paper has a nice way of visualizing clusters of a dataset with binary features by plotting a 2D matrix and sorting the values according to a cluster.
In this case, there are three clusters, as indicated by the black dividing lines; the rows are sorted, and show which examples are in each cluster, and the columns are the features of each example.
Given a vector of cluster assignments and a pandas DataFrame, how can I replicate this using a Python library (e.g. seaborn)? Plotting a DataFrame using seaborn isn't difficult, nor is sorting the rows of the DataFrame to align with the cluster assignments. What I am most interested in is how to display those black dividing lines which delineate each cluster.
Dummy data:
"""
col1 col2
x1_c0 0 1
x2_c0 0 1
================= I want a line drawn here
x3_c1 1 0
================= and here
x4_c2 1 0
"""
import pandas as pd
import seaborn as sns
df = pd.DataFrame(
data={'col1': [0, 0, 1, 1], 'col2': [1, 1, 0, 0]},
index=['x1_c0', 'x2_c0', 'x3_c1', 'x4_c2']
)
clus = [0, 0, 1, 2] # This is the cluster assignment
sns.heatmap(df)
The link that mwaskom posted in a comment is good starting place. The trick is figuring out what the coordinates are for the vertical and horizontal lines.
To illustrate what the code is actually doing, it's worthwhile to just plot all of the lines individually
%matplotlib inline
import pandas as pd
import seaborn as sns
df = pd.DataFrame(data={'col1': [0, 0, 1, 1], 'col2': [1, 1, 0, 0]},
index=['x1_c0', 'x2_c0', 'x3_c1', 'x4_c2'])
f, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(df)
ax.axvline(1, 0, 2, linewidth=3, c='w')
ax.axhline(1, 0, 1, linewidth=3, c='w')
ax.axhline(2, 0, 1, linewidth=3, c='w')
ax.axhline(3, 0, 1, linewidth=3, c='w')
f.tight_layout()
The the way that the axvline method works is the first argument is the x location of the line and then the lower bound and upper bound of the line (in this case 1, 0, 2). The horizontal line takes the y location and then the x start and x stop of the line. The defaults will create the line for the entire plot, so you can typically leave those out.
This code above creates a line for every value in the dataframe. If you want to create groups for the heatmap, you will want to create an index in your data frame, or some other list of values to loop through. For instance with a more complicated example using code from this example:
df = pd.DataFrame(data={'col1': [0, 0, 1, 1, 1.5], 'col2': [1, 1, 0, 0, 2]},
index=['x1_c0', 'x2_c0', 'x3_c1', 'x4_c2', 'x5_c2'])
df['id_'] = df.index
df['group'] = [1, 2, 2, 3, 3]
df.set_index(['group', 'id_'], inplace=True)
df
col1 col2
group id_
1 x1_c0 0.0 1
2 x2_c0 0.0 1
x3_c1 1.0 0
3 x4_c2 1.0 0
x5_c2 1.5 2
Then plot the heatmap with the groups:
f, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(df)
groups = df.index.get_level_values(0)
for i, group in enumerate(groups):
if i and group != groups[i - 1]:
ax.axhline(len(groups) - i, c="w", linewidth=3)
ax.axvline(1, c="w", linewidth=3)
f.tight_layout()
Because your heatmap is not symmetric you may need to use a separate for loop for the columns

Categories