I have numerous sets of seasonal data that I am looking to show in a heatmap format. I am not worried about the magnitude of the values in the dataset but more the overall direction and any patterns that i can look at in more detail later. To do this I want to create a heatmap that only shows 2 colours (red for below zero and green for zero and above).
I can create a normal heatmap with seaborn but the normal colour maps do not have only 2 colours and I am not able to create one myself. Even if I could I am unable to set the parameters to reflect the criteria of below zero = red and zero+ = green.
I managed to create this simply by styling the dataframe but I was unable to export it as a .png because the table_criteria='matplotlib' option removes the formatting.
Below is an example of what I would like to create made from random data, could someone help or point me in the direction of a helpful Stackoverflow answer?
I have also included the code I used to style and export the dataframe.
Desired output - this is created with random data in an Excel spreadsheet
#Code to create a regular heatmap - can this be easily amended?
df_hm = pd.read_csv(filename+h)
pivot = df_hm.pivot_table(index='Year', columns='Month', values='delta', aggfunc='sum')
fig, ax = plt.subplots(figsize=(10,5))
ax.set_title('M1 '+h[:-7])
sns.heatmap(pivot, annot=True, fmt='.2f', cmap='RdYlGn')
plt.savefig(chartpath+h[:-7]+" M1.png", bbox_inches='tight')
plt.close()
#code used to export dataframe that loses format in the .png
import matplotlib.pyplot as plt
import dataframe_image as dfi
#pivot is the dateframe name
pivot = pd.DataFrame(np.random.randint(-100,100,size= (5, 12)),columns=list ('ABCDEFGHIJKL'))
styles = [dict(selector="caption", props=[("font-size", "120%"),("font-weight", "bold")])]
pivot = pivot.style.format(precision=2).highlight_between(left=-100000, right=-0.01, props='color:white;background-color:red').highlight_between(left=0, right= 100000, props='color:white;background-color:green').set_caption(title).set_table_styles(styles)
dfi.export(pivot, root+'testhm.png', table_conversion='matplotlib',chrome_path=None)
You can manually set cmap property to list of colors and if you want to annotate you can do it and it will show same value as it's not converted to -1 or 1.
import numpy as np
import seaborn as sns
arr = np.random.randn(10,10)
sns.heatmap(arr,cmap=["grey",'green'],annot=True,center=0)
# center will make it dividing point
Output:
PS. If you don't want color-bar you can pass cbar=False in `sns.heatmap)
Welcome to SO!
To achieve what you need, you just need to pass delta through the sign function, here's an example code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
arr = np.random.randn(25,25)
sns.heatmap(np.sign(arr))
Which results in a binary heatmap, albeit one with a quite ugly colormap, still, you can fiddle around with Seaborn's colormaps in order to make it look like excel.
Related
I have a dataframe corresponding to a multivariate time series which I'd like to plot. Each channel would appear on its own set of axes, with all plots arranged vertically. I'd also like to add the interactive options available with Bokeh, including the ability to remove one channel from view by clicking on its label.
Without Bokeh, I can use subplots to get the separate "static" plots stacked vertically as follows:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
A=np.random.rand(800,10)
df=pd.DataFrame(data=A,columns=['a','b','c','d','e','f','g','h','i','j'])
df.plot(subplots=True)
plt.show()
I can plot the 10 channels on one set of axes using Bokeh using this:
import numpy as np
import pandas as pd
pd.set_option('plotting.backend', 'pandas_bokeh')
A=np.random.rand(800,10)
df=pd.DataFrame(data=A,columns=['a','b','c','d','e','f','g','h','i','j'])
df.plot_bokeh(kind="line")
The resulting graph allows for zooming, panning, channel de-selection, etc. However all plots signals are plotted on the same set of axes, which I would rather not do.
I use this code snippet to plot my figures in a grid.
import pandas as pd
import pandas_bokeh
from bokeh.palettes import Dark2_5 as palette
def plot_grid(df: pd.DataFrame):
figs = []
color = itertools.cycle(palette)
for c in df.columns:
figs.append(df[c].plot_bokeh(show_figure=False, color=next(color)))
pandas_bokeh.plot_grid(figs, ncols=1, plot_width=1500)
The ncols parameter allows you to specify how many columns you want per row.
Hope this helps!
Here is a Plotly Express scatterplot with marker color, size and symbol representing different fields in the data frame. There is a legend for symbol and a colorbar for color, but there is nothing to indicate what marker size represents.
Is it possible to display a "size" legend? In the legend I'm hoping to show some example marker sizes and their respective values.
A similar question was asked for R and I'm hoping for a similar results in Python. I've tried adding markers using fig.add_trace(), and this would work, except I don't know how to make the sizes equal.
import pandas as pd
import plotly.express as px
import random
# create data frame
df = pd.DataFrame({
'X':list(range(1,11,1)),
'Y':list(range(1,11,1)),
'Symbol':['Yes']*5+['No']*5,
'Color':list(range(1,11,1)),
'Size':random.sample(range(10,150), 10)
})
# create scatterplot
fig = px.scatter(df, y='Y', x='X',color='Color',symbol='Symbol',size='Size')
# move legend
fig.update_layout(legend=dict(y=1, x=0.1))
fig.show()
Scatterplot Image:
Thank you
You can not achieve this goal, if you use a metric scale/data like in your range. Plotly will try to always interpret it like metric, even if it seems/is discrete in the output. So your data has to be a factor like in R, as you are showing groups. One possible solution could be to use a list comp. and convert everything to a str. I did it in two steps so you can follow:
import pandas as pd
import plotly.express as px
import random
check = sorted(random.sample(range(10,150), 10))
check = [str(num) for num in check]
# create data frame
df = pd.DataFrame({
'X':list(range(1,11,1)),
'Y':list(range(1,11,1)),
'Symbol':['Yes']*5+['No']*5,
'Color':check,
'Size':list(range(1,11,1))
})
# create scatterplot
fig = px.scatter(df, y='Y', x='X',color='Color',symbol='Symbol',size='Size')
# move legend
fig.update_layout(legend=dict(y=1, x=0.1))
fig.show()
That gives:
Keep in mind, that you also get the symbol label, as you now have TWO groups!
Maybe you want to sort the values in the list before converting to string!
Like in this picture (added it to the code above)
UPDATE
Hey There,
yes, but as far as I know, only in matplotlib, and it is a little bit hacky, as you simulate scatter plots. I can only show you a modified example from matplotlib, but maybe it helps you so you can fiddle it out by yourself:
from numpy.random import randn
z = randn(10)
red_dot, = plt.plot(z, "ro", markersize=5)
red_dot_other, = plt.plot(z*2, "ro", markersize=20)
plt.legend([red_dot, red_dot_other], ["Yes", "No"], markerscale=0.5)
That gives:
As you can see you are working with two different plots, to be exact one plot for each size legend. In the legend these plots are merged together. Legendsize is further steered through markerscale and it is linked to markersize of each plot. And because we have two plots with TWO different markersizes, we can create a plot with different markersizes in the legend. markerscale is normally a value between 0 and 1 but you can also do 150% thus 1.5.
You can achieve this through fiddling around with the legend handler in matplotlib see here:
https://matplotlib.org/stable/tutorials/intermediate/legend_guide.html
Is there a way to add a mean and a mode to a violinplot ? I have categorical data in one of my columns and the corresponding values in the next column. I tried looking into matplotlib violin plot as it technically offers the functionality I am looking for but it does not allow me to specify a categorical variable on the x axis, and this is crucial as I am looking at the distribution of the data per category. I have added a small table illustrating the shape of the data.
plt.figure(figsize=10,15)
ax=sns.violinplot(x='category',y='value',data=df)
First we calculate the the mode and means:
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'Category':[1,2,5,1,2,4,3,4,2],
'Value':[1.5,1.2,2.2,2.6,2.3,2.7,5,3,0]})
Means = df.groupby('Category')['Value'].mean()
Modes = df.groupby('Category')['Value'].agg(lambda x: pd.Series.mode(x)[0])
You can use seaborn to make the basic plot, below I remove the inner boxplot using the inner= argument, so that we can see the mode and means:
fig, ax = plt.subplots()
sns.violinplot(x='Category',y='Value',data=df,inner=None)
plt.setp(ax.collections, alpha=.3)
plt.scatter(x=range(len(Means)),y=Means,c="k")
plt.scatter(x=range(len(Modes)),y=Modes)
I want to create a matplotlib bar plot that has the look of a stacked plot without being additive from a multi-index pandas dataframe.
The below code gives the basic behaviour
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import io
data = io.StringIO('''Fruit,Color,Price
Apple,Red,1.5
Apple,Green,1.0
Pear,Red,2.5
Pear,Green,2.3
Lime,Green,0.5
Lime, Red, 3.0
''')
df_unindexed = pd.read_csv(data)
df_unindexed
df = df_unindexed.set_index(['Fruit', 'Color'])
df.unstack().plot(kind='bar')
The plot command df.unstack().plot(kind='bar') shows all the apple prices grouped next to each other. If you choose the option df.unstack().plot(kind='bar',stacked=True) - it adds the prices for Red and Green together and stacks them.
I am wanting a plot that is halfway between the two - it shows each group as a single bar, but overlays the values so you can see them all. The below figure (done in powerpoint) shows what behaviour I am looking for -> I want the image on the right.
Short of calculating all the values and then using the stacked option, is this possible?
This seems (to me) like a bad idea, since this representation leads to several problem. Will a reader understand that those are not staked bars? What happens when the front bar is taller than the ones behind?
In any case, to accomplish what you want, I would simply repeatedly call plot() on each subset of the data and using the same axes so that the bars are drawn on top of each other.
In your example, the "Red" prices are always higher, so I had to adjust the order to plot them in the back, or they would hide the "Green" bars.
fig,ax = plt.subplots()
my_groups = ['Red','Green']
df_group = df_unindexed.groupby("Color")
for color in my_groups:
temp_df = df_group.get_group(color)
temp_df.plot(kind='bar', ax=ax, x='Fruit', y='Price', color=color, label=color)
There are two problems with this kind of plot. (1) What if the background bar is smaller than the foreground bar? It would simply be hidden and not visible. (2) A chart like this is not distinguishable from a stacked bar chart. Readers will have severe problems interpreting it.
That being said, you can plot both columns individually.
import matplotlib.pyplot as plt
import pandas as pd
import io
data = io.StringIO('''Fruit,Color,Price
Apple,Red,1.5
Apple,Green,1.0
Pear,Red,2.5
Pear,Green,2.3
Lime,Green,0.5
Lime,Red,3.0''')
df_unindexed = pd.read_csv(data)
df = df_unindexed.set_index(['Fruit', 'Color']).unstack()
df.columns = df.columns.droplevel()
plt.bar(df.index, df["Red"].values, label="Red")
plt.bar(df.index, df["Green"].values, label="Green")
plt.legend()
plt.show()
I've spent too much time looking into this, some tabs still open in my browser:
Link1 Link2 Link3 Link4
I'm supposed to be working!
Anyway, my problem is: I use someone else's scripts to produce lots of heat maps which I then have to review and sort/assign:
Here's an example of one:
HM sample
I need to be able to easily distinguish a 0.03 from a zero but as you can see they look virtually the same. Ideal solution would be: White(just zero's)-Yellow-Orange-Red or White(just zero's)-Orange-Red
The dev used 'YlOrRd' like so:
sns.heatmap(heat_map, annot=True, fmt=".2g", cmap="YlOrRd", linewidths=0.5,
linecolor='black', xticklabels=xticks, yticklabels=yticks
)
I've tried a bunch of the standard/default colour map options provided to no avail.
I don't have any real experience building colour maps and I don't want break something that's already working. Would anyone have any ideas?
Thanks
**I'm limited in what code/samples I can post due to it being work product.
An option is to take the colors from an existing colormap, replace the first one by white and create a new colormap from those manipulated values.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import matplotlib.colors
import seaborn as sns
# some data
a = np.array([0.,0.002,.005,.0099,0.01,.0101,.02,.04,.24,.42,.62,0.95,.999,1.])
data = np.random.choice(a, size=(12,12))
# create colormap. We take 101 values equally spaced between 0 and 1
# hence the first value 0, second value 0.01
c = np.linspace(0,1,101)
# For those values we store the colors from the "YlOrRd" map in an array
colors = plt.get_cmap("YlOrRd",101)(c)
# We replace the first row of that array, by white
colors[0,:] = np.array([1,1,1,1])
# We create a new colormap with the colors
cmap = matplotlib.colors.ListedColormap(colors)
# Plot the heatmap. The format is set to 4 decimal places
# to be able to disingush specifically the values ,.0099, .0100, .0101,
sns.heatmap(data, annot=True, fmt=".4f", cmap=cmap, vmin=0, vmax=1,
linewidths=0.5, linecolor='black')
plt.show()