Number of columns does not match number of bins - python

I am trying to plot some 336 data points and am encountering an issue with my use of pythons plt.hist() function. I would like to use more than eight bins for my data, but when I do a lot of whitespace is introduced. For example, here is a plot with bins = 8
and with bins = 24
Does anyone know why this is and how I can best represent my data with more bins? Many thanks, ~S.
Sample code:
tumbles = np.array(df['Tumbles'])
fig, axs = plt.subplots(1, 1,
tight_layout = True)
N, bins, patches = axs.hist(tumbles, bins = 24, edgecolor= "black")
axs.grid(b = True, color ='grey',
linestyle ='-.', linewidth = 0.5,
alpha = 0.6)
plt.xlabel("Time (s)", size = 14)
plt.ylabel("Frequency", size = 14)
plt.title('Histogram ofTimes', size = 18)
plt.show()

I feel like your data is distributed in a way that the empty space between bars are simply bars with height 0 (a lack of samples). In such a case you just don't need more bins.
Please include your code
With this setup I get the same problem:
import matplotlib.pyplot as plt
plt.hist([1, 2, 2, 3, 4, 5, 5, 5, 5, 5, 6, 7, 9], bins=20)
plt.show()

It would be a bit more effort, but if you want a bit more control over the number of bins and the range of each bin, you might set up the bin parameter in your histogram definition as a list. This was alluded to above, but here is a snippet of code illustrating that.
import matplotlib.pyplot as plt
data = [0.02, 0.02, 0.02, 0.27, 0.27, 0.03, 0.03, 0.04, 0.044, 0.044, 0.05, 0.05, 0.06, 0.07, 0.08, 0.08, 0.08, 0,10, 0.10, 0.11, 0.12, 0.13, 0.13, 0.14, 0.15, 0.17, 0,18, 0.19, 0.20, 0.20, 0.22, 0.23, 0.23, 0.23, 0.23, 0.24, 0.26, 0.26, 0.28, 0.29, 0.30, 0.32]
fig, ax = plt.subplots()
N, bins, values = ax.hist(data, [0.000,0.015,0.030,0.045,0.060,0.075,0.090,0.105,0.120,0.135,0.150,0.165,0.180,0.195,0.210,0.225,0.240,0.255,0.270,0.285,0.300,0.315,0.330,0.345], linewidth=1)
plt.bar_label(values)
plt.xlabel("Time (s)", size = 14)
plt.ylabel("Frequency", size = 14)
plt.title('Histogram of Times', size = 18)
plt.show()
The data is just a small subset to establish some data points to produce a histogram. Following was the histogram created in this fashion.
You might give that a try adjusting for the range each bin should have.

Related

how do I avoid the overlapping error bars and replace X and Y axis?

I am new to matplotlib and I am asking for your help to solve my little problem. I am sharing the graph below, here are the questions:
1- I want x-axis and y-axis replace
2- And most important for me is that errorbars should be horizontal (in graph below these are vertical).
Some errorbars in the graph is overlapping and I tried to avoid this problem using transform command. As I said before if I can manage the replacement of X and Y axis I would be happy.
Below I am sharing the code I wrote:
import ax as ax
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.transforms import Affine2D
y_values = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
p1 = [1, 0.77, 0.67, 0.85, 0.78, 1.05, 0.63]
p2 = [3, 2, 1.5, 1.20, 1.10, 1.40, 1.10]
x_err = [0.1, 0.2, 0.4, 0.5, 0.3, 0.2, 0.3]
y_err = [0.6, 0.2, 0.4, 0.5, 0.3, 0.2, 0.3]
fig, ax = plt.subplots()
trans1 = Affine2D().translate(-0.1, 0.0) + ax.transData
trans2 = Affine2D().translate(+0.1, 0.0) + ax.transData
er1 = ax.errorbar(y_values, p1, x_err, marker="o", linestyle="none", transform=trans1)
er2 = ax.errorbar(y_values, p2, y_err, marker="o", linestyle="none", transform=trans2)
errorbar plot

Use rows in a dataframe as inputs for function and visualization

Please help me iterate over each row of the following .CSV file, loaded as a Pandas dataframe called df_data in my subsequent code:
aud,wllt_cnt,bcr
Group A,64700,0.15116928389684975
Group B,7654,0.02786525362332031
Group C,11183,0.01278621197465396
Group D,8025,0.00881753794562903
Currently, I created the following function by providing references to specific locations (please refer to variables total_group_size and crc):
import pandas as pd
import itertools
import numpy as np
import statsmodels.stats.api as sms
import seaborn as sns
%matplotlib inline
from matplotlib import pyplot as plt
# Group A
# making a dataframe that captures the combinations of holdout / lift
def expand_grid(data_dict):
rows = itertools.product(*data_dict.values())
return pd.DataFrame.from_records(rows, columns=data_dict.keys())
# making a function for testing
def test(holdout,lift):
cl = 0.9
alpha = 1-cl
total_group_size = df_data.iloc[0,1]
wllt_cnt_reach = total_group_size*0.6
crc = df_data.iloc[0,2]
conversion_rate_control = crc
conversion_rate_test = conversion_rate_control*(1+lift)
es = sms.proportion_effectsize(conversion_rate_test, conversion_rate_control)
n1 = wllt_cnt_reach*(1-holdout)
n2 = wllt_cnt_reach*holdout
return sms.NormalIndPower().solve_power(es, nobs1=n1, alpha=alpha, ratio=n2/n1, alternative='two-sided')
holdout=np.array([0.01, 0.02, 0.03, 0.04, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5])
lift=np.array([0.01, 0.02, 0.025, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.125, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4])
df = expand_grid({'holdout':holdout,'lift':lift})
Then applying the function row-wise:
df['power'] = df.apply(lambda x:test(x[0],x[1]),axis=1)
And plotting a heatmap (here I also used reference to the df_data dataframe when giving a plot title (please refer to plt_title):
plt_title = df_data.iloc[0,0]
x_axis_labels = [1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50]
y_axis_labels = [1, 2, 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 12.5, 15, 20, 25, 30, 35, 40]
fig, ax = plt.subplots(figsize = (13,6))
ax = sns.heatmap(data=df.pivot(index='lift',columns='holdout',values='power'),
annot=True, fmt='.0%', cmap=sns.color_palette("RdYlGn", 200), xticklabels=x_axis_labels, yticklabels=y_axis_labels,
cbar_kws={'label': 'Power'})
fig.axes[0].invert_yaxis()
ax.set_title(plt_title)
plt.xlabel("Holdout")
plt.ylabel("% Lift")
b, t = plt.ylim()
b -= 0.5
t += 0.5
plt.ylim(b, t)
plt.show()
I need to optimize this code, so that I don't have to copy-paste the same code to apply it to each row of the df_data dataframe. My understanding is that I need to change the function test from above to take more inputs, then wrap the plotting into another function and show a heatmap for each of the groups (rows in my .CSV file), but I'm stuck at the first step in this action plan.
Please help me take this off, and thank you in advance!

Arguments for LogLocator in MatPlotLib

In MatPlotLib, I want to plot a graph with a linear x-axis and a logarithmic y-axis. For the x-axis, there should be labels at multiples of 4, and minor ticks at multiples of 1. I have been able to do this using the MultipleLocator class.
However, I am having difficulty doing a similar thing for the logarithmic y-axis. I want there to be labels at 0.1, 0.2, 0.3 etc., and minor ticks at 0.11, 0.12, 0.13 etc. I have tried doing this with the LogLocator class, but I'm not sure what the right parameters are.
Here is what I have tried to far:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
y = [0.32, 0.30, 0.28, 0.26, 0.24, 0.22, 0.20, 0.18, 0.16, 0.14, 0.12, 0.10]
fig = plt.figure()
ax1 = fig.add_subplot(111)
x_major = MultipleLocator(4)
x_minor = MultipleLocator(1)
ax1.xaxis.set_major_locator(x_major)
ax1.xaxis.set_minor_locator(x_minor)
ax1.set_yscale("log")
y_major = LogLocator(base=10)
y_minor = LogLocator(base=10)
ax1.yaxis.set_major_locator(y_major)
ax1.yaxis.set_minor_locator(y_minor)
ax1.plot(x, y)
plt.show()
This shows the following plot:
The x-axis is as I want it, but not the y-axis. There is a label on the y-axis at 0.1, but no labels at 0.2 and 0.3. Also, there are no ticks at 0.11, 0.12, 0.13 etc.
I have tried some different values for the LogLocator constructor, such as subs, numdecs, and numticks, but I cannot get the right plot. The documentation at https://matplotlib.org/api/ticker_api.html#matplotlib.ticker.LogLocator doesn't really explain these parameters very well.
What parameter values should I be using?
I think you still want MultipleLocator rather than LogLocator because your desired tick location is still "on every integer that is multiple of base in the view interval" rather than "subs[j] * base**i". For example:
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
y = [0.32, 0.30, 0.28, 0.26, 0.24, 0.22, 0.20, 0.18, 0.16, 0.14, 0.12, 0.10]
fig = plt.figure(figsize=(8, 12))
ax1 = fig.add_subplot(111)
x_major = MultipleLocator(4)
x_minor = MultipleLocator(1)
ax1.xaxis.set_major_locator(x_major)
ax1.xaxis.set_minor_locator(x_minor)
ax1.set_yscale("log")
# You would need to erase default major ticklabels
ax1.set_yticklabels(['']*len(ax1.get_yticklabels()))
y_major = MultipleLocator(0.1)
y_minor = MultipleLocator(0.01)
ax1.yaxis.set_major_locator(y_major)
ax1.yaxis.set_minor_locator(y_minor)
ax1.plot(x, y)
plt.show()
LogLocator always put major tick labels at "every base**i". Therefore, it is impossible to use it for your desired major tick labels. You can use parameter subs for your minor tick labels like this:
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, LogLocator
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
y = [0.32, 0.30, 0.28, 0.26, 0.24, 0.22, 0.20, 0.18, 0.16, 0.14, 0.12, 0.10]
fig = plt.figure()
ax1 = fig.add_subplot(111)
x_major = MultipleLocator(4)
x_minor = MultipleLocator(1)
ax1.xaxis.set_major_locator(x_major)
ax1.xaxis.set_minor_locator(x_minor)
ax1.set_yscale("log")
y_major = LogLocator(base=10)
y_minor = LogLocator(base=10, subs=[1.1, 1.2, 1.3])
ax1.yaxis.set_major_locator(y_major)
ax1.yaxis.set_minor_locator(y_minor)
ax1.plot(x, y)
plt.show()

Adding additional sliders in matplotlib

I am trying to create a third slider to control my plot.
fig, ax = plt.subplots()
plt.subplots_adjust(left=0.25, bottom=0.25)
l, = plt.plot(u,v, lw=1, color='red')
plt.axis([-20, 20, -20,20])
amp_slider_ax = fig.add_axes([0.25, 0.15, 0.65, 0.03], axisbg=axis_color)
samp = Slider(amp_slider_ax, 'Ey', 1, 10.0, valinit=a0)
freq_slider_ax = fig.add_axes([0.25, 0.1, 0.65, 0.03], axisbg=axis_color)
sfreq = Slider(freq_slider_ax, 'gamma (Ex/Ey)', 0.01, 1.3, valinit=f0)
#new slider
fbz_slider_ax = fig.add_axes([3, 7, 0.65, 0.03], axisbg=axis_color)
sbz = Slider(fbz_slider_ax, 'Bz', 0.01, 1.3, valinit=b0)
I don't see why my third slider is not being initialized. Can someone provide an example with 3 sliders, please. When I call the slider object, I do not get any errors either.
In the line fig.add_axes([3, 7, 0.65, 0.03]) you are adding an axes at coordinates (3,7). The point (3,7) does not lie inside the figure, as the figure goes from 0 to 1 in both directions.
The solution is of course to add the axes somewhere inside the figure.

How to make plots in normalized coordinates in python?

I am trying to make four sets of plots in a 2x2 or 1x4 grid. Each set then has three more panels, say, a scatter plot with histograms of the x- and y-axes on the sides.
Instead of setting the axes for all 12 plots, I'd like to divide my canvas into 4 parts, and then divide each one individually. For example,
def plot_subset():
# these coords are normalized to this subset of plots
pos_axScatter=[0.10, 0.10, 0.65, 0.65]
pos_axHistx = [0.10, 0.75, 0.65, 0.20]
pos_axHisty = [0.75, 0.10, 0.20, 0.20]
axScatter = plt.axes(pos_axScatter)
axHistx = plt.axes(pos_axHistx)
axHisty = plt.axes(pos_axHisty)
def main():
# need to divide the canvas to a 2x2 grid
plot_subset(1)
plot_subset(2)
plot_subset(3)
plot_subset(4)
plt.show()
I have tried GridSpec and subplots but cannot find a way to make plot_subset() work in the normalized space. Any help would be much appreciated!
You can use BboxTransformTo() to do this:
from matplotlib import transforms
fig = plt.figure(figsize=(16, 4))
fig.subplots_adjust(0.05, 0.05, 0.95, 0.95, 0.04, 0.04)
gs1 = plt.GridSpec(1, 4)
gs2 = plt.GridSpec(4, 4)
for i in range(4):
bbox = gs1[0, i].get_position(fig)
t = transforms.BboxTransformTo(bbox)
fig.add_axes(t.transform_bbox(gs2[:3, :3].get_position(fig)))
fig.add_axes(t.transform_bbox(gs2[3, :3].get_position(fig)))
fig.add_axes(t.transform_bbox(gs2[:3, 3].get_position(fig)))
the output:

Categories