I am making a plot with matplotlib and I need irregular values for the xtick labels.
I know there is parameter size. However, I would like to have values on my x-axis that are two times greater than the values of the labels on the y-axis. With size parameter I can only change the font size of the values, but the actual numeric values will still be in proportion 1:1.
Is it possible to make xtick label values in proportion 2:1 with the ytick label values?
import matplotlib.pyplot as plt
x = [0,1,2,3,4,5]
fig=plt.figure()
ax = plt.subplot(111)
ax.plot(x)
y_tick_values = ax.get_yticks()
new_x_tick_values = [2 * y for y in y_tick_values]
ax.set_xticklabels(new_x_tick_values)
plt.show()
Related
TLDR: What I'm looking for is a way to plot a list of timestamps as equidistant datapoints, with mpl deciding which labels to show.
If equidistant plotting of timestamped datapoints is only possible by turning the timestamps into strings (and so plotting them as a categorical axis), my question could also be phrased: how can one get mpl to automatically drop labels from an overcrowded categorical axis?
Details:
I have a timeseries with monthly data, that I'd like to plot as a bar graph. My issue is that matplotlib.pyplot automatically plots this data on a time axis:
import matplotlib.pyplot as plt
import pandas as pd
fig, ax = plt.subplots(1, 1, )
s = pd.Series(range(3,7), pd.date_range('2021', freq='MS', periods=4))
ax.bar(s.index, s.values, 27) # width 27 days
ax.set_ylabel('income [Eur]')
Usually, this it what I want, but with monthly data it looks weird because Feb is significantly shorter. What I want is for the data to be plotted equi-distantly. Is there a way to force this?
Importantly, I want to retain the behaviour that e.g. only the second or third label is plotted once the x-axis becomes too crowded, without having to adjust it manually.
What I've tried or don't want to do:
I could make the gaps between the bars the same - by tweaking the width of the bars. However, I'm plotting revenue data [Eur], which means that an uneven bar width is misleading.
I could turn the timestamps into a string so that the data is plotted categorically:
s = pd.Series(range(3,7), pd.date_range('2021', freq='MS', periods=4))
x = [f'{ts.year}-{ts.month:02}' for ts in s.index]
ax.bar(x, s.values, 0.9) # width now as fraction of spacing between datapoints
However, this leads mpl to think each label must be plotted, which gets crowded:
s = pd.Series(range(3,17), pd.date_range('2021', freq='MS', periods=14))
x = [f'{ts.year}-{ts.month:02}' for ts in s.index]
ax.bar(x, s.values, 0.9) # width now as fraction of spacing between datapoints
You can space out your categorical ticks with the MaxNLocator.
Given your bigger Series sample with categorical labels:
s = pd.Series(range(3,17), pd.date_range('2021', freq='MS', periods=14))
x = [f'{ts.year}-{ts.month:02}' for ts in s.index]
fig, ax = plt.subplots()
ax.bar(x, s.values, 0.9)
ax.set_ylabel('income [Eur]')
Apply the MaxNLocator with a specified number of bins (or 'auto'):
from matplotlib.ticker import MaxNLocator
locator = MaxNLocator(nbins=5) # or nbins='auto'
ax.xaxis.set_major_locator(locator)
Let's say I have made a plot, and in that plot there is a specific point where I draw vertical line from to the x-axis. This point has the x-value 33.55 for example. However, my tick separation is something like 10 or 20 from 0 to 100.
So basically: Is there a way in which I can add this single custom value to the tick axis, so it shows together with all the other values that where there before ?
Use np.append to add to the array of ticks:
import numpy as np
from matplotlib import pyplot as plt
x = np.random.rand(100) * 100
y = np.random.rand(100) * 100
fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(x, y)
ax.set_xticks(np.append(ax.get_xticks(), 33.55))
Note that if your plot is not big enough, the tick labels may overlap.
If you want the new tick to "clear its orbit", so to speak:
special_value = 33.55
black_hole_radius = 10
new_ticks = [value for value in ax.get_xticks() if abs(value - special_value) > black_hole_radius] + [special_value]
ax.set_xticks(new_ticks)
I plotting a pandas dataframe to a seaborn heatmap, and I would like to set specific y-axis ticks for specific locations.
My dataframe index is 100 rows which corresponds to a "depth" parameter, but the values in this index are not arranged with a nice interval :
I would like to set tick labels at multiples of 100. I can do this fine using :
yticks = np.linspace(10,100,10)
ylabels = np.linspace(100,1000,10)
for my dataframe which has 100 rows, with values from approx 100 - 1000, but the result is clearly not desirable, as the position of the tick labels clearly do not correspond to the correct depth values (index value), only the position in the index.
How can I produce a heatmap where the plot is warped so that the actual depth values (index values) are aligned with the ylabels I am setting?
A complicating factor for this is also that the index values are not sampled linearly...
My solution is a little bit ugly but it works for me. Suppose your depth data is in depth_list and num_ticks is the number of ticks you want:
num_ticks = 10
# the index of the position of yticks
yticks = np.linspace(0, len(depth_list) - 1, num_ticks, dtype=np.int)
# the content of labels of these yticks
yticklabels = [depth_list[idx] for idx in yticks]
then plot the heatmap in this way (where your data is in data):
ax = sns.heatmap(data, yticklabels=yticklabels)
ax.set_yticks(yticks)
plt.show()
While plotting with seaborn you have to specify arguments xticklabels and yticklabels for heatmap function. These arguments in you case have to be lists with custom tick labels.
I have developed a solution which does what I intended, modified after liwt31's solution:
def round(n, k):
# function to round number 'n' up/down to nearest 'k'
# use positive k to round up
# use negative k to round down
return n - n % k
# note: the df.index is a series of elevation values
tick_step = 25
tick_min = int(round(data.index.min(), (-1 * tick_step))) # round down
tick_max = (int(round(data.index.max(), (1 * tick_step)))) + tick_step # round up
# the depth values for the tick labels
# I want my y tick labels to refer to these elevations,
# but with min and max values being a multiple of 25.
yticklabels = range(tick_min, tick_max, tick_step)
# the index position of the tick labels
yticks = []
for label in yticklabels:
idx_pos = df.index.get_loc(label)
yticks.append(idx_pos)
cmap = sns.color_palette("coolwarm", 128)
plt.figure(figsize=(30, 10))
ax1 = sns.heatmap(df, annot=False, cmap=cmap, yticklabels=yticklabels)
ax1.set_yticks(yticks)
plt.show()
Here is the histogram
To generate this plot, I did:
bins = np.array([0.03, 0.3, 2, 100])
plt.hist(m, bins = bins, weights=np.zeros_like(m) + 1. / m.size)
However, as you noticed, I want to plot the histogram of the relative frequency of each data point with only 3 bins that have different sizes:
bin1 = 0.03 -> 0.3
bin2 = 0.3 -> 2
bin3 = 2 -> 100
The histogram looks ugly since the size of the last bin is extremely large relative to the other bins. How can I fix the histogram? I want to change the width of the bins but I do not want to change the range of each bin.
As #cel pointed out, this is no longer a histogram, but you can do what you are asking using plt.bar and np.histogram. You then just need to set the xticklabels to a string describing the bin edges. For example:
import numpy as np
import matplotlib.pyplot as plt
bins = [0.03,0.3,2,100] # your bins
data = [0.04,0.07,0.1,0.2,0.2,0.8,1,1.5,4,5,7,8,43,45,54,56,99] # random data
hist, bin_edges = np.histogram(data,bins) # make the histogram
fig,ax = plt.subplots()
# Plot the histogram heights against integers on the x axis
ax.bar(range(len(hist)),hist,width=1)
# Set the ticks to the middle of the bars
ax.set_xticks([0.5+i for i,j in enumerate(hist)])
# Set the xticklabels to a string that tells us what the bin edges were
ax.set_xticklabels(['{} - {}'.format(bins[i],bins[i+1]) for i,j in enumerate(hist)])
plt.show()
EDIT
If you update to matplotlib v1.5.0, you will find that bar now takes a kwarg tick_label, which can make this plotting even easier (see here):
hist, bin_edges = np.histogram(data,bins)
ax.bar(range(len(hist)),hist,width=1,align='center',tick_label=
['{} - {}'.format(bins[i],bins[i+1]) for i,j in enumerate(hist)])
If your actual values of the bins are not important but you want to have a histogram of values of completely different orders of magnitude, you can use a logarithmic scaling along the x axis. This here gives you bars with equal widths
import numpy as np
import matplotlib.pyplot as plt
data = [0.04,0.07,0.1,0.2,0.2,0.8,1,1.5,4,5,7,8,43,45,54,56,99]
plt.hist(data,bins=10**np.linspace(-2,2,5))
plt.xscale('log')
plt.show()
When you have to use your bin values you can do
import numpy as np
import matplotlib.pyplot as plt
data = [0.04,0.07,0.1,0.2,0.2,0.8,1,1.5,4,5,7,8,43,45,54,56,99]
bins = [0.03,0.3,2,100]
plt.hist(data,bins=bins)
plt.xscale('log')
plt.show()
However, in this case the widths are not perfectly equal but still readable. If the widths must be equal and you have to use your bins I recommend #tom's solution.
I'm trying to combine a normal matplotlib.pyplot plt.plot(x,y) with variable y as a function of variable x with a boxplot. However, I only want a boxplot on certain (variable) locations of x but this does not seem to work in matplotlib?
Are you wanting something like this? The positions kwarg to boxplot allows you to place the boxplots at arbitrary positions.
import matplotlib.pyplot as plt
import numpy as np
# Generate some data...
data = np.random.random((100, 5))
y = data.mean(axis=0)
x = np.random.random(y.size) * 10
x -= x.min()
x.sort()
# Plot a line between the means of each dataset
plt.plot(x, y, 'b-')
# Save the default tick positions, so we can reset them...
locs, labels = plt.xticks()
plt.boxplot(data, positions=x, notch=True)
# Reset the xtick locations.
plt.xticks(locs)
plt.show()
This is what has worked for me:
plot box-plot
get boxt-plot x-axis tick locations
use box-plot x-axis tick locations as x-axis values for the line plot
# Plot Box-plot
ax.boxplot(data, positions=x, notch=True)
# Get box-plot x-tick locations
locs=ax.get_xticks()
# Plot a line between the means of each dataset
# x-values = box-plot x-tick locations
# y-values = means
ax.plot(locs, y, 'b-')