I plotting a pandas dataframe to a seaborn heatmap, and I would like to set specific y-axis ticks for specific locations.
My dataframe index is 100 rows which corresponds to a "depth" parameter, but the values in this index are not arranged with a nice interval :
I would like to set tick labels at multiples of 100. I can do this fine using :
yticks = np.linspace(10,100,10)
ylabels = np.linspace(100,1000,10)
for my dataframe which has 100 rows, with values from approx 100 - 1000, but the result is clearly not desirable, as the position of the tick labels clearly do not correspond to the correct depth values (index value), only the position in the index.
How can I produce a heatmap where the plot is warped so that the actual depth values (index values) are aligned with the ylabels I am setting?
A complicating factor for this is also that the index values are not sampled linearly...
My solution is a little bit ugly but it works for me. Suppose your depth data is in depth_list and num_ticks is the number of ticks you want:
num_ticks = 10
# the index of the position of yticks
yticks = np.linspace(0, len(depth_list) - 1, num_ticks, dtype=np.int)
# the content of labels of these yticks
yticklabels = [depth_list[idx] for idx in yticks]
then plot the heatmap in this way (where your data is in data):
ax = sns.heatmap(data, yticklabels=yticklabels)
ax.set_yticks(yticks)
plt.show()
While plotting with seaborn you have to specify arguments xticklabels and yticklabels for heatmap function. These arguments in you case have to be lists with custom tick labels.
I have developed a solution which does what I intended, modified after liwt31's solution:
def round(n, k):
# function to round number 'n' up/down to nearest 'k'
# use positive k to round up
# use negative k to round down
return n - n % k
# note: the df.index is a series of elevation values
tick_step = 25
tick_min = int(round(data.index.min(), (-1 * tick_step))) # round down
tick_max = (int(round(data.index.max(), (1 * tick_step)))) + tick_step # round up
# the depth values for the tick labels
# I want my y tick labels to refer to these elevations,
# but with min and max values being a multiple of 25.
yticklabels = range(tick_min, tick_max, tick_step)
# the index position of the tick labels
yticks = []
for label in yticklabels:
idx_pos = df.index.get_loc(label)
yticks.append(idx_pos)
cmap = sns.color_palette("coolwarm", 128)
plt.figure(figsize=(30, 10))
ax1 = sns.heatmap(df, annot=False, cmap=cmap, yticklabels=yticklabels)
ax1.set_yticks(yticks)
plt.show()
Related
I have searched many ways of making histograms centered around tick marks but not able to find a solution that works with seaborn displot. The function displot lets me stack the histogram according to a column in the dataframe and thus would prefer a solution using displot or something that allows stacking based on a column in a data frame with color-coding as with palette.
Even after setting the tick values, I am not able to get the bars to center around the tick marks.
Example code
# Center the histogram on the tick marks
tips = sns.load_dataset('tips')
sns.displot(x="total_bill",
hue="day", multiple = 'stack', data=tips)
plt.xticks(np.arange(0, 50, 5))
I would also like to plot a histogram of a variable that takes a single value and choose the bin width of the resulting histogram in such a way that it is centered around the value. (0.5 in this example.)
I can get the center point by choosing the number of bins equal to a number of tick marks but the resulting bar is very thin. How can I increase the bin size in this case, where there is only one bar but want to display all the other possible points. By displaying all the tick marks, the bar width is very tiny.
I want the same centering of the bar at the 0.5 tick mark but make it wider as it is the only value for which counts are displayed.
Any solutions?
tips['single'] = 0.5
sns.displot(x='single',
hue="day", multiple = 'stack', data=tips, bins = 10)
plt.xticks(np.arange(0, 1, 0.1))
Edit:
Would it be possible to have more control over the tick marks in the second case? I would not want to display the round off to 1 decimal place but chose which of the tick marks to display. Is it possible to display just one value in the tick mark and have it centered around that?
Does the min_val and max_val in this case refer to value of the variable which will be 0 in this case and then the x axis would be plotted on negative values even when there are none and dont want to display them.
For your first problem, you may want to figure out a few properties of the data that your plotting. For example the range of the data. Additionally, you may want to choose beforehand the number of bins that you want displayed.
tips = sns.load_dataset('tips')
min_val = tips.total_bill.min()
max_val = tips.total_bill.max()
val_width = max_val - min_val
n_bins = 10
bin_width = val_width/n_bins
sns.histplot(x="total_bill",
hue="day", multiple = 'stack', data=tips,
bins=n_bins, binrange=(min_val, max_val),
palette='Paired')
plt.xlim(0, 55) # Define x-axis limits
Another thing to remember is that width a of a bar in a histogram identifies the bounds of its range. So a bar spanning [2,5] on the x-axis implies that the values represented by that bar belong to that range.
Considering this, it is easy to formulate a solution. Assume that we want the original bar graphs - identifying the bounds of each bar graph, one solution may look like
plt.xticks(np.arange(min_val-bin_width, max_val+bin_width, bin_width))
Now, if we offset the ticks by half a bin-width, we will get to the centers of the bars.
plt.xticks(np.arange(min_val-bin_width/2, max_val+bin_width/2, bin_width))
For your single value plot, the idea remains the same. Control the bin_width and the x-axis range and ticks. Bin-width has to be controlled explicitly since automatic inference of bin-width will probably be 1 unit wide which on the plot will have no thickness. Histogram bars always indicate a range - even though when we have just one single value. This is illustrated in the following example and figure.
single_val = 23.5
tips['single'] = single_val
bin_width = 4
fig, axs = plt.subplots(1, 2, sharey=True, figsize=(12,4)) # Get 2 subplots
# Case 1 - With the single value as x-tick label on subplot 0
sns.histplot(x='single',
hue="day", multiple = 'stack', data=tips,
binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
palette='rocket',
ax=axs[0])
ticks = [single_val, single_val+bin_width] # 2 ticks - given value and given_value + width
axs[0].set(
title='Given value as tick-label starts the bin on x-axis',
xticks=ticks,
xlim=(0, int(single_val*2)+bin_width)) # x-range such that bar is at middle of x-axis
axs[0].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
# Case 2 - With centering on the bin starting at single-value on subplot 1
sns.histplot(x='single',
hue="day", multiple = 'stack', data=tips,
binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
palette='rocket',
ax=axs[1])
ticks = [single_val+bin_width/2] # Just the bin center
axs[1].set(
title='Bin centre is offset from single_value by bin_width/2',
xticks=ticks,
xlim=(0, int(single_val*2)+bin_width) ) # x-range such that bar is at middle of x-axis
axs[1].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
Output:
I feel from your description that what you are really implying by a bar graph is a categorical bar graph. The centering is then automatic. Because the bar is not a range anymore but a discrete category. For the numeric and continuous nature of the variable in the example data, I would not recommend such an approach. Pandas provides for plotting categorical bar plots. See here. For our example, one way to do this is as follows:
n_colors = len(tips['day'].unique()) # Get number of uniques categories
agg_df = tips[['single', 'day']].groupby(['day']).agg(
val_count=('single', 'count'),
val=('single','max')
).reset_index() # Get aggregated information along the categories
agg_df.pivot(columns='day', values='val_count', index='val').plot.bar(
stacked=True,
color=sns.color_palette("Paired", n_colors), # Choose "number of days" colors from palette
width=0.05 # Set bar width
)
plt.show()
This yields:
Hello and thanks in advance. I am starting with a pandas dataframe and I would like like make a 2d plot with a trendline showing the weighteed mean y value with error bars for the uncertainty on the mean. The mean should be weighted by the total number of events in each bin. I start by grouping the df into a "photon" group and a "total" group where "photon" is a subset of the total. In each bin, I am plotting the ratio of photon events to total. On the x axis and y axis I have two unrelated variables "cluster energy" and "perimeter energy".
My attempt:
#make the 2d binning and total hist
energybins=[11,12,13,14,15,16,17,18,19,20,21,22]
ybins = [0,.125,.25,.5,.625,.75,1.,1.5,2.5]
total_hist,x,y,i = plt.hist2d(train['total_energy'].values,train['max_perimeter'].values,[energybins,ybins])
total_hist = np.array(total_hist)
#make the photon 2d hist with same bins
groups = train.groupby(['isPhoton'])
prompt_hist,x,y,i = plt.hist2d(groups.get_group(1)['total_energy'].values,groups.get_group(1)['max_perimeter'].values,bins=[energybins,ybins])
prompt_hist = np.array(prompt_hist)
ratio = np.divide(prompt_hist,total_hist,out=np.zeros_like(prompt_hist),where = total_hist!=0)
#plot the ratio
fig, ax = plt.subplots()
ratio=np.transpose(ratio)
p = ax.pcolormesh(ratio,)
for i in range(len(ratio)):
for j in range(len(ratio[i])):
text = ax.text(j+1, i+1, round(ratio[i, j], 2),ha="right", va="top", color="w")
ax.set_xticklabels(energybins)
ax.set_yticklabels(ybins)
plt.xlabel("Cluster Energy")
plt.ylabel("5x5 Perimeter Energy")
plt.title("Prompt Photon Fraction")
def myBinnedStat(x,v,bins):
means,_,_ = stats.binned_statistic(x,v,'mean',bins)
std,_ ,_= stats.binned_statistic(x,v,'std',bins)
count,_,_ = stats.binned_statistic(x,v,'count',bins)
return [ufloat(m,s/(c**(1./2))) for m,s,c in zip(means,std,count)]
I can then plot an errorbar plot, but I have not been able to plot the errorbar on the same axis as the pcolormesh. I was able to do this with hist2d. I am not sure why that is. I feel like there is a cleaner way to do the whole thing.
This yields a plot
pcolormesh plots each element as a unit on the x axis. That is, if you plot 8 columns, this data will span 0-8 on the x axis. However, you also redefined the x axis ticklabel so that 0-10 is labeled as 11-21.
For your errorbars, you specified x values at 11-21, or so it looks, which is where the data is plotted. But is not labeled since you changed the ticklabels to correspond to pcolormesh.
This discrepancy is why your two plots do not align. Instead, you could use "default" x values for errorbar or define x values for pcolormesh. For example, use:
ax.errorbar(range(11), means[0:11], yerr=uncertainties[0:11])
I'm using Python and Matplotlib to add a variable number of scatter plots to a figure. Each scatter plot is a series and I want to color each scatter trace's points by their sequence index numbers. However, since my sequences have different length, I want 1 consistent colorbar for all the sequences.
I can keep track on the min and max sequence length. But how do I use this information to configure the colorbar to represent this complete range?
From the question it is not completely clear how the data looks like.
Here is an example using 4 series with different lengths, all having an index 0,1,2,3,.... The scatter plots the index versus the value, and colors each series using the full color range.
The proposal is to add one colorbar per series, provided there aren't too many. Each colorbar has a title and shows ticks depending on the range used for that series. Making the colorbars (except the first) very narrow makes them look like an additional ax.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
N = 4
series_to_plot = [pd.Series(np.random.uniform(i, i + 0.8, np.random.randint(50, 400))) for i in range(1, N + 1)]
names = [f'data{i}' for i in range(1, N + 1)]
fig, ax = plt.subplots(figsize=(12, 5))
for ind, (ser, name) in enumerate(zip(series_to_plot, names)):
scat = ax.scatter(ser.index, ser, c=ser.index, s=1, cmap='plasma')
cax = fig.add_axes(
[1 - (N - ind) * 0.05, 0.1, 0.01 if ind == 0 else 0.002, 0.75]) # left, bottom, width, height, where 1,1 is top right of the fig
cbar = fig.colorbar(scat, cax=cax)
cax.set_title(name)
plt.subplots_adjust(right=0.97 - N * 0.05) # make room for the colorbars
plt.show()
I am making a plot with matplotlib and I need irregular values for the xtick labels.
I know there is parameter size. However, I would like to have values on my x-axis that are two times greater than the values of the labels on the y-axis. With size parameter I can only change the font size of the values, but the actual numeric values will still be in proportion 1:1.
Is it possible to make xtick label values in proportion 2:1 with the ytick label values?
import matplotlib.pyplot as plt
x = [0,1,2,3,4,5]
fig=plt.figure()
ax = plt.subplot(111)
ax.plot(x)
y_tick_values = ax.get_yticks()
new_x_tick_values = [2 * y for y in y_tick_values]
ax.set_xticklabels(new_x_tick_values)
plt.show()
I'm using matplotlib.pyplot for plot a scatterplot. The following code produces a scatterplot that does not match this request.
months = []
data = [...] #some data in list form
#skipping the 8th value since I don't want data to refer at this value
for i in [x for x in range(1, len(data) +2) if x != 8]:
months.append(i)
fig, ax = plt.subplots()
plt.scatter(months,data)
plt.scatter([months[-1]],[data[-1]], color=['red'])
plt.title('Quantity scatterplot')
ax.set_xlabel('Months')
ax.set_ylabel('Quantities')
ax.legend(['Historical quantities','Forecasted quantity'], loc=1)
plt.show()
While I would like to see all months (from 1 to 10) on x-axis
The easiest way to force all numbers between 1 and 10 to appear as ticklabels on the x axis is to use
ax.set_xticks(range(1,11))
For the more general case where axis limits are not determined beforehands you may get ticklabels at integer positions using a matplotlib.ticker.MultipleLocator.
ax.xaxis.set_major_locator(matplotlib.ticker.MultipleLocator(1))
where 1 is the number of which all ticks should be multiples of.