How do I set histogram axis to always be an integer? - python

I am using a bit of code to run and generate reports with Python. This code takes information from an online survey tool and runs basic statistics on the data then generates a word document based on the results. I am creating a number of graphs along the way. I have the following function to help me build some of the histograms.
def histogram_by(df, df_column, sort_by, height):
"""
df = location of the data
df_column = column in the data frame with the required data
sort_by = the column used to catagorize the data
height = the calculated height of the subplots, changes depending on number of plots
"""
f, ax = generate_subplots(df[sort_by].nunique(), height)
df[df_column].hist(
ax=ax,
by=df[sort_by],
xrot=360,
bins=np.linspace(1, 5, 9))
plt.tight_layout()
plt.savefig('plt.png')
So in the first picture it shows what the graphs looks like when there is enough data to force integers. This happens in most cases.
In the second picture there is not enough data to force the Y-Axis to make integers, so it creates floats. It also appears that the graphs in this version are a bit wider in comparison to the 'correct' output. Any ideas?
The amount of data changes based on how many people answered the surveys. Is there any way to force the Y-Axis to use integers instead of defaulting to floats?
Thanks for taking the time to help me out.
Best,
Chris

First create a minimal example of the issue.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(4,1.7))
data = np.random.randint(1,9, size=52)
ax.hist(data, bins=np.arange(0,9)+0.5, ec="k")
plt.show()
Now, you can get rid of the decimals on the y axis by telling the default AutoLocator to use only integers
ax.locator_params(axis='y', integer=True)
Result:

Related

Python: Histogram return wrong values for counts (EDIT: more general with example)

EDIT: Ive found a general example where it doesnt work either!
I am trying to extract the data for a histogram, but different counts seem wrong. As an example code:
import matplotlib.pyplot as plt
import numpy as np
data = np.random.rand(1000000)
bins = np.arange(0,1,0.0001)
a,b,c = plt.hist(data,bins)
This gives me this rather messy histogram, and i've saved the counts as a and the interval as b. Now, plotting a and b, I should expect the same histogram, right? But that's not what I get:
plt.scatter(b[0:len(b)-1],a,s=2)
which gives me this, which doesnt match at all! Furthurmore, when I try and find the maximum value of a, it gives me 144, which fits fine with the scatterplot, but not with the histogram function.
If I count the numbers myself with the following code:
len(np.intersect1d(np.where(data>=b[np.argmax(a)]),np.where(data<b[np.argmax(a)+1])))
then it also gives me 144, in accordance with the values. So is the displayed histogram just wrong for some reason, and I should ignore it and just take the extracted data?
Old, unedited post:
For a physics course I am trying to bin my results in the following way:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as ss
from scipy.optimize import curve_fit
plt.rc("font", family=["Helvetica", "Arial"])
plt.rc("axes", labelsize=18)
plt.rc("xtick", labelsize=16, top=True, direction="in")
plt.rc("ytick", labelsize=16, right=True, direction="in")
plt.rc("axes", titlesize=22)
plt.rc("legend", fontsize=16)
data_Ra = np.loadtxt('Ra226_cal2_ch001.txt',skiprows=5)
t_Ra = data_Ra[:,0]*10**-8 # time in seconds
channels_Ra = data_Ra[:,1]
channels_Ra = channels_Ra[np.where(channels_Ra>0)] # removing all the measurements at channel = 0
intervalspace = 2 #The intervals in which we count
bins=np.arange(0,4000,intervalspace)
counts, intervals , stuff = plt.hist(channels_Ra,bins)
plt.xlabel('Channels')
plt.ylabel('Counts')
plt.show()
Here, the histogram plot looks totally fine, with a max near 13000 counts. But when I then use np.max(counts), I am given about 24000, and when I try and just plot the values it gives me with:
plt.scatter(intervals[0:len(intervals)-1]+intervalspace/2,counts,s=1)
plt.xlabel('Channels')
plt.ylabel('Counts')
plt.title('Ra225')
plt.show()
it looks like this, which is totally different, and I can't figure out why. I am expecting the scatterplot to resemble the histogram, and while the peaks are located at the same x-vales, the height do not match.
This problem is in other large datasets as well.
I dont think i'm allowed to drop the txt-file here? So im not sure how much more I can show, but any help will be appreciated!
I don't know why you interpret the results in that way.
If you look at the histogram plot, you will be able to see the maximum value of the y-axis is 25,000. That means that there are some values close to 25,000. This fact can be verified in the scatter plot.
Your scatter plot shows actual values. It would be clearer if you describe how your expected plot looks like.
If you want discard some outlier points, you should apply some filtering before plotting the data.

How to print multiple plots together in python?

I am trying to print about 42 plots in 7 rows, 6 columns, but the printed output in jupyter notebook, shows all the plots one under the other. I want them in (7,6) format for comparison. I am using matplotlib.subplot2grid() function.
Note: I do not get any error, and my code works, however the plots are one under the other, vs being in a grid/ matrix form.
Here is my code:
def draw_umap(n_neighbors=15, min_dist=0.1, n_components=2, metric='euclidean', title=''):
fit = umap.UMAP(
n_neighbors=n_neighbors,
min_dist=min_dist,
n_components=n_components,
metric=metric
)
u = fit.fit_transform(df);
plots = []
plt.figure(0)
fig = plt.figure()
fig.set_figheight(10)
fig.set_figwidth(10)
for i in range(7):
for j in range(6):
plt.subplot2grid((7,6), (i,j), rowspan=7, colspan=6)
plt.scatter(u[:,0], u[:,1], c= df.iloc[:,0])
plt.title(title, fontsize=8)
n=range(7)
d=range(6)
for n in n_neighbors:
for d in dist:
draw_umap(n_neighbors=n, min_dist=d, title="n_neighbors={}".format(n) + " min_dist={}".format(d))
I did refer to this post to get the plots in a grid and followed the code.
I also referred to this post, and modified my code for size of the fig.
Is there a better way to do this using Seaborn?
What am I missing here? Please help!
Both questions that you have linked contain solutions that seem more complicated than necessary. Note that subplot2grid is useful only if you want to create subplots of varying sizes which I understand is not your case. Also note that according to the docs Using GridSpec, as demonstrated in GridSpec demo is generally preferred, and I would also recommend this function only if you want to create subplots of varying sizes.
The simple way to create a grid of equal-sized subplots is to use plt.subplots which returns an array of Axes through which you can loop to plot your data as shown in this answer. That solution should work fine in your case seeing as you are plotting 42 plots in a grid of 7 by 6. But the problem is that in many cases you may find yourself not needing all the Axes of the grid, so you will end up with some empty frames in your figure.
Therefore, I suggest using a more general solution that works in any situation by first creating an empty figure and then adding each Axes with fig.add_subplot as shown in the following example:
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.4
# Create sample dataset
rng = np.random.default_rng(seed=1) # random number generator
nvars = 8
nobs = 50
xs = rng.uniform(size=(nvars, nobs))
ys = rng.normal(size=(nvars, nobs))
# Create figure with appropriate space between subplots
fig = plt.figure(figsize=(10, 8))
fig.subplots_adjust(hspace=0.4, wspace=0.3)
# Plot data by looping through arrays of variables and list of colors
colors = plt.get_cmap('tab10').colors
for idx, x, y, color in zip(range(len(xs)), xs, ys, colors):
ax = fig.add_subplot(3, 3, idx+1)
ax.scatter(x, y, color=color)
This could be done in seaborn as well, but I would need to see what your dataset looks like to provide a solution relevant to your case.
You can find a more elaborate example of this approach in the second solution in this answer.

Seaborn showing x-tick labels overlapping

I am trying to make a box plot that looks like this.
Now, there are a lot of tickmarks that I do not need and truly do not show any additional information.
The code I am using is the following:
plot=sns.boxplot(y=MSE, x=Sim,
width=0.5,
palette='colorblind')
plot=sns.stripplot(y=MSE, x=Sim,
jitter=True,
marker='o',
alpha=0.15,
color='black')
plt.xlabel('xlabel')
plt.ylabel('ylabel')
plt.gca().invert_xaxis()
Where MSE and SIM are two numpy arrays of 400 elements each.
I reviewed some solutions that use locator_params and set_xticklabels. However, I want to know:
why this happen and,
is there a simple transformation in the MSE and SIM arrays to solve this?
I hope my questions are clear enough.
Thanks in advance.
Not very sure what you have as Sim, if it is an array of floats, then they are converted to categorical before plotting. The thing you can do, since the labels are not useful, is to use a range of values thats as long as the y-values.
With that, it still overlaps a lot because you are trying to fit 400 x ticks onto the x-axis, and the font size are set by default to be something readable. For example:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
fig,ax = plt.subplots(figsize=(15,6))
MSE = [np.random.normal(0,1,10) for i in range(100)]
Sim = np.arange(len(MSE))
g = sns.boxplot(y=MSE, x=Sim, width=0.5,palette='colorblind',ax=ax)
You can set the font size to be smaller and they don't overlap but I guess its hardly readable:
So like you said in your case, they are not useful, you can do:
ax.set(xticks=Sim[0::10])

Multiple functions in one graph

So, I have a numpy.ndarray called CT with shape (10, 500).
Each row is a function and defined over the x-variables called Gm. Gm is a numpy.ndarray with shape (1,500).
I need to graph the 10 functions in the CT matrix (as a function of Gm) in one graph and try the following:
# consumption functions over time
plt.figure(figsize=(10,10))
TimeSteps = CT.shape[0]
for t in range(0,TimeSteps):
plt.plot(Gm,CT[t].reshape(1,DiscG),'go',label='t')
plt.show()
This works, but all graphs are shown with the same color (green) and it is not possible to distinguish if the graph is t = 0, 1, 2, etc.
Any idea as to how you get plt to choose a different color for each graph and make it possible to label them and put it in a text box.
It is common curtosy when asking a question to have a minimal and verifiable example. The questions you posed as problems are actually examples of the code working as intended but not as you want them to be. Here is an example of scatter dots with different colors and different labels as you posed on your question and answered by me and #DavidG.
import matplotlib.pyplot as plt
import numpy as np
# dummy data
x = np.random.rand(10, 100)
fig, ax = plt.subplots()
[ax.plot(xi, marker = 'o', label = idx) for idx, xi in enumerate(x)]
ax.legend()
fig.show()
The color cycles here stem from the standard color map used by matplotlib if you want to use specific colors or change the standard cycles please look at the documentation provided by matplotlib
OK - found another simpler way ... simply to transpose the input:
plt.figure(figsize=(10,10))
plt.plot(Gm.transpose(),CT.transpose(),marker='o')
plt.show()
That way the whole function gets a unique color, and it seems resolved. So my initial guess running a for loop was too complicated.

How do you generate an animated square wave from binary number for the respective decimal number using for loop

I am using the following codes to generate the square wave format [Eg: from 0 till 5] using for loop. I am able to print the respective binary values but not able to plot the square wave dynamically.In addition to this I am not able to dynamically resize the x axis in the plot window. I could not find any suitable code at Matplotlib animation section. Could any one help me in this?
import numpy as np
import matplotlib.pyplot as plt
limit=int(raw_input('Enter the value of limit:'))
for x in range(0,limit):
y=bin(x)[2:]
data = [int(i) for i in y]
print data
xs = np.repeat(range(len(data)),2)
ys = np.repeat(data, 2)
xs = xs[1:]
ys = ys[:-1]
plt.plot(xs, ys)
plt.xlim(0,len(data)+0.5)
plt.ylim(-2, 2)
plt.grid()
plt.show()
#plt.hold(True)
#plt.pause(0.5)
plt.clf()
Your question as stated is pretty vague so I'm going to I'm going to go out on a limb and assume that what you want is to plot a series of equal length binary codes using the same figure with some delay in between.
So, two problems here:
Generating the appropriate binary codes
Plotting those codes successively
1. Generating the appropriate binary codes
From what I can reasonably guess, you want to plot binary codes of the same length. So you'll have to zero pad your codes so they are the same length. One way to do this is with python's built in zfill function.
e.g.
bin(1).zfill(4)
This also brings light to the fact that you will have to know the length of the largest binary string you want to plot if you want to keep the x-axis range constant. Since it's not clear if you even want constant length strings I'll just leave it at this.
2. Plotting those codes successively
There are a couple different ways to create animations in matplotlib. I find manually updating data is a little bit more flexible and less buggy than the animations API currently is so I will be doing that here. I've also cut down some parts of the code that were not clear to me.
Here's a simple a implementation:
import matplotlib.pyplot as plt
import numpy as np
# Enable interactive plotting
plt.ion()
# Create a figure and axis for plotting on
fig = plt.figure()
ax = fig.add_subplot(111)
# Add the line 'artist' to the plotting axis
# use 'steps' to plot binary codes
line = plt.Line2D((),(),drawstyle='steps-pre')
ax.add_line(line)
# Apply any static formatting to the axis
ax.grid()
ax.set_ylim(-2, 2)
# ...
try:
limit = int(raw_input('Enter the value of limit:'))
codelength = int(np.ceil(np.log2(limit)))+1 # see note*
ax.set_xlim(0,codelength)
for x in range(0,limit):
# create your fake data
y = bin(x)[2:].zfill(codelength)
data = [int(i) for i in y]
print data
xs = range(len(data))
line.set_data(xs,data) # Update line data
fig.canvas.draw() # Ask to redraw the figure
# Do a *required* pause to allow figure to redraw
plt.pause(2) # delay in seconds between frames
except KeyboardInterrupt: # allow user to escape with Ctrl+c
pass
finally: # Always clean up!
plt.close(fig)
plt.ioff()
del ax,fig
Result
*Note: I padded the binary codes by an extra zero to get the plot to look right.

Categories