How to plot a histogram in python? - python

I am plotting a histogram with this data.
dict_values([2.5039286220812003e-18, 8.701119009863531e-17, 9.181036322384948e-17, 8.972473923736572e-17, 9.160265320730097e-17, 8.826609291023463e-17, 8.888913336226638e-17, 8.993242948900264e-17, 9.556623462346049e-17, 8.847279448923369e-17, 8.86804710730486e-17, 8.806035948033239e-17])
This is my code:
print(len(new_dictonary.values()))
plt.figure(figsize=(15, 5))
plt.hist(new_dictonary.values())
plt.show()
I expect to have 12 bar, but I got only two bars. I have to use plt.hist
How could correct my code to have the right picture?

Edited answer: The problem is that your values are very small in magnitude and 11 out of 12 are very close to each other and the remaining one is far away. So to have each value plotted individually as a separate bar, you need a large number of bins. Now if you limit your x-axis to show the 11 similar values out of 12, you will see that having bins=1000 (a large number) shows 11 bars.
plt.hist(new_dictonary, bins=1000, edgecolor='k')
plt.xlim(0.8e-16, 1e-16)
If you show them all, you will see how far they are. I don't know how you plan to fit a distribution to such data.
plt.hist(new_dictonary, bins=1000, edgecolor='k')

Related

Random (false data) lines appearing in contourf plot at certain # of levels

I'm trying to use matplotlib and contourf to generate some filled (polar) contour plots of velocity data. I have some data (MeanVel_Z_Run16_np) I am plotting on theta (Th_Run16) and r (R_Run16), as shown here:
fig,ax = plt.subplots(subplot_kw={'projection':'polar'})
levels = np.linspace(-2.5,4,15)
cplot = ax.contourf(Th_Run16,R_Run16,MeanVel_Z_Run16_np,levels,cmap='plasma')
ax.set_rmax(80)
ax.set_rticks([15,30,45,60])
rlabels = ax.get_ymajorticklabels()
for label in rlabels:
label.set_color('#E6E6FA')
cbar = plt.colorbar(cplot,pad=0.1,ticks=[0,3,6,9,12,15])
cbar.set_label(r'$V_{Z}$ [m/s]')
plt.show()
This generates the following plot:
Velocity plot with 15 levels:
Which looks great (and accurate), outside of that random straight orange line roughly between 90deg and 180deg. I know that this is not real data because I plotted this in MATLAB and it did not appear there. Furthermore, I have realized it appears to relate to the number of contour levels I use. For example, if I bump this code up to 30 levels instead of 15, the result changes significantly, with odd triangular regions of uniform value:
Velocity plot with 30 levels:
Does anyone know what might be going on here? How can I get contourf to just plot my data without these strange misrepresentations? I would like to use 15 contour levels at least. Thank you.

Why is the legend shown in a Seaborn JointGrid incorrect?

I am experimenting with JointGrid from Seaborn. I used .plot_joint() to plot my scatter plot, group-colored using the hue parameter. I have filtered my dataset to only include 2 of the 5 groups, to prevent too much overlap in the plots.
The plotted points appear correct, in that they match what I expect from the two groups I chose. Additionally, I double-checked my filtering by viewing the filtered dataframe. That too was correct as it contained only the two groups I chose.
However the legend that is automatically plotted along with the scatterplot is incorrect. It shows 4 groups (not sure why not 5), and the coloring is also incorrect. For 2 groups I would expect only the Red and Blue colors (the first 2 colors in the Set1 palette), but my 2nd group is colored with the 4th color in the Set1 palette.
plt.rcParams['figure.figsize'] = (12, 4)
df_tmp = df[df.Kmeans_Clusters.isin([0, 3])].copy()
# initialize Joint Grid
grid = sns.JointGrid(data=df_tmp, x='MP', y='PTS')
# plot scatter (main plot)
grid = grid.plot_joint(sns.scatterplot, data=df_tmp, hue='Kmeans_Clusters',
palette='Set1')
# plot marginal distplot for cluster 0, X & Y
sns.distplot(df_tmp[df_tmp.Kmeans_Clusters == 0].MP, ax=grid.ax_marg_x,
vertical=False, color='firebrick', label='Cluster0')
sns.distplot(df_tmp[df_tmp.Kmeans_Clusters == 0].PTS, ax=grid.ax_marg_y,
vertical=True, color='firebrick', label='Cluster0')
# plot marginal distplot for cluster 3, X & Y
sns.distplot(df_tmp[df_tmp.Kmeans_Clusters == 3].MP, ax=grid.ax_marg_x,
vertical=False, color='steelblue', label='Cluster3')
sns.distplot(df_tmp[df_tmp.Kmeans_Clusters == 3].PTS, ax=grid.ax_marg_y,
vertical=True, color='steelblue', label='Cluster3')
plt.suptitle('PTS vs MP, Cluster 0 & 3\n1982-2019', y=1.05, fontsize=20)
plt.show()
jointgrid_incorrect_legend_and_coloring
--- Update---
I just tried this with a simple scatterplot (no JointGrid) and I can repeat my previous observation. Is there just something I am not understanding with the hue parameter and the scatterplot() function?
I do not see this issue with lmplot()
plt.rcParams['figure.figsize'] = (12, 4)
df_tmp = df[df.Kmeans_Clusters.isin([0, 3])].copy()
sns.scatterplot(data=df_tmp, y='PTS', x='MP', hue='Kmeans_Clusters', palette='Set1')
plt.title('PTS vs MP\n1982-2019')
plt.xlabel('Minutes Played Annually')
plt.ylabel('Points Scored Annually')
plt.show()
Again, once I fine tuned my searches, I was able to find the solution. In fact, here's another stackoverflow question that asks the same thing and is answered in detail: The `hue` parameter in Seaborn.relplot() skips an integer when given numerical data?.
Pasting the solution I used, as described in the link above:
"""
An alternative is to make sure the values are treated categorical Unfortunately, even if you plug in the numbers as strings, they will be converted to numbers falling back to the same mechanism described above. This may be seen as a bug.
However, one choice you have is to use real categories, like e.g. single letters.
'cluster':list("ABCDE")
works fine,
"""

python bar chart not centered

I am trying to build a simple histogram. For some reason, my bars are behaving abnormally. As you can see in this picture, my bar over "3" is moved to the right side. I am not sure what caused it. I did align='mid' but it did not fix it.
This is the code that I used to create it:
def createBarChart(colName):
df[colName].hist(align='mid')
plt.title(str(colName))
RUNS = [1,2,3,4,5]
plt.xticks(RUNS)
plt.show()
for column in colName:
createBarChart(column)
And this is what I got:
bar is not centered over 3
To recreate my data:
df = pd.DataFrame(np.random.randint(1,6,size=(100, 4)), columns=list('ABCD'))
Thank you for your help!
P/s: idk if this info is relevant, but I am using seaborn-whitegrid style. I tried to recreate a plot with sample data and it's still showing up. Is it a bug?
hist created using random data
The hist function is behaving exactly as it is supposed to. By default it splits the data you pass into 10 bins, with the left edge of the first bin at the data's minimum value and the right edge of the last bin at its maximum. The chart below shows the randomly generated data binned this way, with red dashed lines to mark the edges of the bins.
The way around this is to define the bin edges yourself, with a slight adjustment to the minimum and maximum values to centre the bars over the x axis ticks. This can be done quite easily with numpy's linspace function (using column A in the randomly generated data frame as an example):
bins = np.linspace(df["A"].min() - .5, df["A"].max() + .5, 6)
df["A"].hist(bins=bins)
We ask for 6 values because we are defining the bin edges, this will result in 5 bins, as shown in this chart:
If you wanted to keep the gaps between the bars you can increase the number of bins to 9 and adjust the offset slightly, but this wouldn't work in all cases (it works here because every value is either 1, 2, 3, 4 or 5).
bins = np.linspace(df["A"].min() - .25, df["A"].max() + .25, 10)
df["A"].hist(bins=bins)
Finally, as this data contains discrete values and really you are plotting the counts, you could use the value_counts function to create a series that can then be plotted as a bar chart:
df["A"].value_counts().sort_index().plot(kind="bar")
# Provide a 'color' argument if you need all of the bars to look the same.
df["A"].value_counts().sort_index().plot(kind="bar", color="steelblue")
Try using something like this in your code to create all of the histogram bars to the same place.
plt.hist("Your data goes here", bins=range(1,7), align='left', rwidth=1, normed=True)
place your data where I put your data goes here

Matplotlib markers which plot and render fast

I'm using matplotlib to plot 5 sets of approx. 400,000 data points each. Although each set of points is plotted in a different color, I need different markers for people reading the graph on black and white print-outs. The issue I'm facing is that almost all of the possible markers available in the documentation at http://matplotlib.org/api/markers_api.html take too much time to plot and render while displaying. I could only find two markers which plot and render quickly, these are '-' and '--'. Here's my code:
plt.plot(series1,'--',label='Label 1',lw=5)
plt.plot(series2,'-',label='Label 2',lw=5)
plt.plot(series3,'^',label='Label 3',lw=5)
plt.plot(series4,'*',label='Label 4',lw=5)
plt.plot(series5,'_',label='Label 5',lw=5)
I tried multiple markers. Series 1 and series 2 plot quickly and render in no time. But series 3, 4, and 5 take forever to plot and AGES to display.
I'm not able to figure out the reason behind this. Does someone know of more markers that plot and render quickly?
The first two ('--' and '-') are linestyles not markers. Thats why they are rendered faster.
It doesn't make sense to plot ~400,000 markers. You wont be able to see all of them... However, what you could do is to only plot a subset of the points.
So add the line with all your data (even though you could probably also subsample that too) and then add a second "line" with only the markers.
for that you need an "x" vectors, which you can subsample too:
# define the number of markers you want
nrmarkers = 100
# define a x-vector
x = np.arange(len(series3))
# calculate the subsampling step size
subsample = int(len(series3) / nrmarkers)
# plot the line
plt.plot(x, series3, color='g', label='Label 3', lw=5)
# plot the markers (using every `subsample`-th data point)
plt.plot(x[::subsample], series3[::subsample], color='g',
lw=5, linestyle='', marker='*')
# similar procedure for series4 and series5
Note: The code is written from scratch and not tested

How to set y-axis for historgram in Python?

According to the documentation, one can set the range of the x-axis using the hist function, but there doesn't seem to be a way to control the y-axis.
I have a figure with 4 subplots, arranged in a 2x2 fashion, all of which are histograms. I have made their x-axis to be entirely the same by setting the range, but have been unable to figure out how to do likewise with the y-axis. But when I try to control the y-axis, using set_ylim, I get an error. When I tried using pylab.axis, the plots didn't turn out correctly (the bars of the historgram all had a y-value of 0.
pylab.hist(myData[x], bins = 20, range=(0,400))
pylab.axis([0,400,0,300])
How do I control the y-axis of the histogram? Essentially what I"m looking for is something like range in the hist function, but for the y-axis.
Update:
plotNumber = 1
for i in xrange(4):
pylab.subplot(2, 2, plotNumber)
pylab.hist(myData[i], bins = 20, range=(0,400))
pylab.title('Some Title')
pylab.xlabel('X')
pylab.ylabel('Y')
plotNumber += 1
pylab.show()
But when I include
pylab.axis([0,400,0,300])
All the y-values correspond to 0 (the histogram is flat).
Answer is given here: setting y-axis limit in matplotlib
axes = plt.gca()
axes.set_xlim([xmin,xmax])
axes.set_ylim([ymin,ymax])
For me this works for histogram subplots.
If you're looking to set ticks on the y-axis every n values, you can use:
pylab.yticks(range(min, max, n))
I am using Python 2.7.

Categories