This question already has answers here:
pandas bar plot combined with line plot shows the time axis beginning at 1970
(2 answers)
How can I make a barplot and a lineplot in the same seaborn plot with different Y axes nicely?
(2 answers)
Line plot over bar plot using Seaborn - Line plot won't render
(1 answer)
Problem in combining bar plot and line plot (python)
(2 answers)
Closed 1 year ago.
I'm trying to combine a seaborn barplot with a seaborn lineplot. For some reason, I am able to do both seperately, but when combining the two the x-axis is all over the place.
Figure 1 shows the bar plot, Figure 2 shows the line plot (both working fine) and Figure 3 is my attempt at combining both. I've read somewhere that seaborn uses categorical x-axis values, so my feeling is that this is part of the answer. Nevertheless, I can't seem to get it right.
Worth mentioning, my goal of this whole exercise is to get a moving-average line that follows the barplot. So any insights/workarounds to achieve that are also welcome.
This is my code:
dfGroup = pd.DataFrame({
'Year': [1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920],
'Total Deaths': [0, 0, 2, 3, 2, 3, 4, 5, 6, 7, 8],
'Total Affected': [0, 1, 0, 2, 3, 6, 9, 8, 12, 13, 15]
})
# Add 3-year rolling average
dfGroup['rolling_3years'] = dfGroup['Total Deaths'].rolling(3).mean().shift(0)
dfGroup = dfGroup.fillna(0)
# Make a smooth line from the 3-year rolling average
from scipy.interpolate import make_interp_spline
X_Y_Spline = make_interp_spline(dfGroup['Year'], dfGroup['rolling_3years'])
# Returns evenly spaced numbers over a specified interval.
X_ = np.linspace(dfGroup['Year'].min(), dfGroup['Year'].max(), 500)
Y_ = X_Y_Spline(X_)
# Plot the data
a4_dims = (15, 10)
fig, ax1 = plt.subplots(figsize=a4_dims)
ax1 = sns.barplot(x = "Year", y = "Total Deaths",
data = dfGroup, color='#42b7bd')
ax2 = ax1.twinx()
ax2 = sns.lineplot(X_, Y_, marker='o')
This is what my dfGroup dataframe looks like:
Related
This question already has answers here:
Specify format of floats for tick labels
(5 answers)
How to print a number using commas as thousands separators
(30 answers)
Closed last month.
How to use commas to separate thousands for numbers with five or more digits in the plot picture. e.g., "10000" should be "10,000"
Please here is the python code
import matplotlib.pyplot as plt
E = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
m = [383, 2428, 6172, 10895, 17148, 23316, 30829, 36641, 44228, 54342]
n = [216, 881, 2040, 3811, 6101, 8363, 12158, 15833, 19538, 23956]
plt.plot(E, m, 'g', label = "proposed", linestyle="--")
plt.plot(E, n, 'r', label = "baseline", linestyle=":")
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()
I would like to plot data in subplots using matplotlib.pyplot in python. Each subplot will contain data of different ranges. I would like to plot them using pyplot.scatter, and use one single colorbar for the entire plot. Thus, the colorbar should encompass the entire range of the values in every subplot. However, when I use a loop to plot the subplots and call a colorbar outside of the loop, it only uses the range of values from the last subplot. A lot of examples available concern the sizing the position of the colorbar, so this answer (how to make one universal colorbar for multiple subplots) is not obvious.
I have the following self-contained example code. Here, two subplots are rendered, one that should be colored with frigid temperatures typical of Russia and the other with tropical temperatures of Brazil. However, the end result shows a colorbar that only ranges the tropical Brazilian temperatures, making the Russia subplot erroneous:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
core_list = ['Russia', 'Brazil']
core_depth = [0, 2, 4, 6, 8, 10]
lo = [-33, 28]
hi = [10, 38]
df = pd.DataFrame([], columns = ['Location', 'Depth', '%TOC', 'Temperature'])
#Fill df
for ii, name in enumerate(core_list):
for jj in core_depth:
df.loc[len(df.index)] = [name, jj, (np.random.randint(1, 20))/10, np.random.randint(lo[ii], hi[ii])]
#Russia data have much colder temperatures than Brazil data due to hi and lo
#Plot data from each location using scatter plots
fig, axs = plt.subplots(nrows = 1, ncols = 2, sharey = True)
for nn, name in enumerate(core_list):
core_mask = df['Location'] == name
data = df.loc[core_mask]
plt.sca(axs[nn])
plt.scatter(data['Depth'], data['%TOC'], c = data['Temperature'], s = 50, edgecolors = 'k')
axs[nn].set_xlabel('%TOC')
plt.text(1.25*min(data['%TOC']), 1.75, name)
if nn == 0:
axs[nn].set_ylabel('Depth')
cbar = plt.colorbar()
cbar.ax.set_ylabel('Temperature, degrees C')
#How did Russia get so warm?!? Temperatures and ranges of colorbar are set to last called location.
#How do I make one colorbar encompass global temperature range of both data sets?
The output of this code shows that the temperatures in Brazil and Russia fall within the same range of colors:
We know intuitively, and from glancing at the data, that this is wrong. So, how do we tell pyplot to plot this correctly?
The answer is straightforward using the vmax and vmin controls of pyplot.scatter. These must be set with a universal range of data, not just the data focused on in any single iteration of a loop. Thus, to change the code above:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
core_list = ['Russia', 'Brazil']
core_depth = [0, 2, 4, 6, 8, 10]
lo = [-33, 28]
hi = [10, 38]
df = pd.DataFrame([], columns = ['Location', 'Depth', '%TOC', 'Temperature'])
#Fill df
for ii, name in enumerate(core_list):
for jj in core_depth:
df.loc[len(df.index)] = [
name,
jj,
(np.random.randint(1, 20))/10,
np.random.randint(lo[ii], hi[ii])
]
#Russia data have much colder temperatures than Brazil data due to hi and lo
#Plot data from each location using scatter plots
fig, axs = plt.subplots(nrows = 1, ncols = 2, sharey = True)
for nn, name in enumerate(core_list):
core_mask = df['Location'] == name
data = df.loc[core_mask]
plt.sca(axs[nn])
plt.scatter(
data['Depth'],
data['%TOC'],
c=data['Temperature'],
s=50,
edgecolors='k',
vmax=max(df['Temperature']),
vmin=min(df['Temperature'])
)
axs[nn].set_xlabel('%TOC')
plt.text(1.25*min(data['%TOC']), 1.75, name)
if nn == 0:
axs[nn].set_ylabel('Depth')
cbar = plt.colorbar()
cbar.ax.set_ylabel('Temperature, degrees C')
Now, the output shows a temperature difference between Russia and Brazil, which one would expect after a cursory glance at the data. The change that fixes this problem occurs within the for loop, however it references all of the data to find a max and min:
plt.scatter(data['Depth'], data['%TOC'], c = data['Temperature'], s = 50, edgecolors = 'k', vmax = max(df['Temperature']), vmin = min(df['Temperature']) )
I want to to a violin plot of binned data but at the same time be able to plot a model prediction and visualize how well the model describes the main part of the individual data distributions. My problem here is, I guess, that the x-axis after the violin plot does not behave like a regular axis with numbers, but more like string-values that just accidentally happen to be numbers. Maybe not a good description, but in the example I would like to have a "normal" plot a function, e.g. f(x) = 2*x**2, and at x=1, x=5.2, x=18.3 and x=27 I would like to have the violin in the background.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
np.random.seed(10)
collectn_1 = np.random.normal(1, 2, 200)
collectn_2 = np.random.normal(802, 30, 200)
collectn_3 = np.random.normal(90, 20, 200)
collectn_4 = np.random.normal(70, 25, 200)
ys = [collectn_1, collectn_2, collectn_3, collectn_4]
xs = [1, 5.2, 18.3, 27]
sns.violinplot(x=xs, y=ys)
xx = np.arange(0, 30, 10)
plt.plot(xx, 2*xx**2)
plt.show()
Somehow this code actually does not plot violins but only bars, this is only a problem in this example and not in the original code though. In my real code I want to have different "half-violins" on both sides, therefore I use sns.violinplot(x="..", y="..", hue="..", data=.., split=True).
I think that would be hard to do with seaborn because it does not provide an easy way to manipulate the artists that it creates, particularly if there are other things plotted on the same Axes. Matplotlib's violinplot allows setting the position of the violins, but does not provide an option for plotting only half violins. Therefore, I would suggest using statsmodels.graphics.boxplots.violinplot, which does both.
from statsmodels.graphics.boxplots import violinplot
df = sns.load_dataset('tips')
x_col = 'day'
y_col = 'total_bill'
hue_col = 'smoker'
xs = [1, 5.2, 18.3, 27]
xx = np.arange(0, 30, 1)
yy = 0.1*xx**2
cs = ['C0','C1']
fig, ax = plt.subplots()
ax.plot(xx,yy)
for (_,gr0),side,c in zip(df.groupby(hue_col),['left','right'],cs):
print(side)
data = [gr1 for (_,gr1) in gr0.groupby(x_col)[y_col]]
violinplot(ax=ax, data=data, positions=xs, side=side, show_boxplot=False, plot_opts=dict(violin_fc=c))
# violinplot above messes up which ticks are shown, the line below restores a sensible tick locator
ax.xaxis.set_major_locator(matplotlib.ticker.MaxNLocator())
I would like to plot a chart with plotly that shows only the existing values in the x-axis.
When I execute the code below, a chart that looks like in the following image appears:
The range on the x-axis as well as the range on the y-axis is evenly set from zero up to the maximal value.
import plotly.graph_objs as go
from plotly.offline import plot
xValues = [1, 2, 27, 50]
yValues = [7, 1, 2, 3]
trace = go.Scatter( x = xValues, y = yValues, mode='lines+markers', name='high limits' )
plottedData = [trace]
plot( plottedData )
Now, I would like to show only the existing values on the x axis. Related to my example, I want just the values [1, 2, 27, 50] to appear. And they should have the same space in between. Is this possible? If yes, how?
You can force the xaxis.type to be category like this:
plot( dict(data=plottedData, layout=go.Layout(xaxis = {"type": "category"} )))
I am trying to generate a histogram using matplotlib. I am reading data from the following file:
https://github.com/meghnasubramani/Files/blob/master/class_id.txt
My intent is to generate a histogram with the following bins: 1, 2-5, 5-100, 100-200, 200-1000, >1000.
When I generate the graph it doesn't look nice.
I would like to normalize the y axis to (frequency of occurrence in a bin/total items). I tried using the density parameter but whenever I try that my graph ends up completely blank. How do I go about doing this.
How do I get the width's of the bars to be the same, even though the bin ranges are varied?
Is it also possible to specify the ticks on the histogram? I want to have the ticks correspond to the bin ranges.
import matplotlib.pyplot as plt
FILE_NAME = 'class_id.txt'
class_id = [int(line.rstrip('\n')) for line in open(FILE_NAME)]
num_bins = [1, 2, 5, 100, 200, 1000, max(class_id)]
x = plt.hist(class_id, bins=num_bins, histtype='bar', align='mid', rwidth=0.5, color='b')
print (x)
plt.legend()
plt.xlabel('Items')
plt.ylabel('Frequency')
As suggested by importanceofbeingernest, we can use bar charts to plot categorical data and we need to categorize values in bins, for ex with pandas:
import matplotlib.pyplot as plt
import pandas
FILE_NAME = 'class_id.txt'
class_id_file = [int(line.rstrip('\n')) for line in open(FILE_NAME)]
num_bins = [0, 2, 5, 100, 200, 1000, max(class_id_file)]
categories = pandas.cut(class_id_file, num_bins)
df = pandas.DataFrame(class_id_file)
dfg = df.groupby(categories).count()
bins_labels = ["1-2", "2-5", "5-100", "100-200", "200-1000", ">1000"]
plt.bar(range(len(categories.categories)), dfg[0]/len(class_id_file), tick_label=bins_labels)
#plt.bar(range(len(categories.categories)), dfg[0]/len(class_id_file), tick_label=categories.categories)
plt.xlabel('Items')
plt.ylabel('Frequency')
Not what you asked for, but you could also stay with histogram and choose logarithm scale to improve readability:
plt.xscale('log')