1) Why are lines so dense?
In the dataset the time is by hour, if time was by day would it make a difference. I would like to see a line chart for each host.
2) How can I re-label the legend from count to host?
fig, ax = plt.subplots(figsize=(15,7))
df.groupby('host').plot(x='time', y='count',ax=ax, legend=True)
You are plotting hourly data of more than 6 months. That's ~4k data points, of course it is dense. Daily data would be better, although it's still going to be dense.
There are a couple of options:
You could either use seaborn
import seaborn as sns
fig, ax = plt.subplots(figsize=(15,7))
sns.lineplot(x='time', y='count', ax=ax, hue='host')
Or do a loop on groupby:
fig, ax = plt.subplots(figsize=(15,7))
for h, d in df.groupby('host'):
d.plot(x='time', y='count', ax=ax, label=h)
Ad 2) just add label='name' as parameter, i.e:
fig, ax = plt.subplots(figsize=(15,7))
df.groupby('host').plot(x='time', y='count',ax=ax, legend=True, label='host')
Related
I'm trying to plot a simple box plot next to a simple histogram in the same figure using seaborn (0.11.2) and pandas (1.3.4) in a jupyter notebook (6.4.5).
I've tried multiple approaches with nothing working.
fig, ax = plt.subplots(1, 2)
sns.boxplot(x='rent', data=df, ax=ax[0])
sns.displot(x='rent', data=df, bins=50, ax=ax[1])
There is an extra plot or grid that gets put next to the boxplot, and this extra empty plot shows up any time I try to create multiple axes.
Changing:
fig, ax = plt.subplots(2)
Gets:
Again, that extra empty plot next to the boxplot, but this time below it.
Trying the following code:
fig, (axbox, axhist) = plt.subplots(1,2)
sns.boxplot(x='rent', data=df, ax=axbox)
sns.displot(x='rent', data=df, bins=50, ax=axhist)
Gets the same results.
Following the answer in this post, I try:
fig, axs = plt.subplots(ncols=2)
sns.boxplot(x='rent', data=df, ax=axs[0])
sns.displot(x='rent', data=df, bins-50, ax=axs[1])
results in the same thing:
If I just create the figure and then the plots underneath:
plt.figure()
sns.boxplot(x='rent', data=df)
sns.displot(x='rent', data=df, bins=50)
It just gives me the two plots on top of each other, which I assume is just making two different figures.
I'm not sure why that extra empty plot shows up next to the boxplot when I try to do multiple axes in seaborn.
If I use pyplot instead of seaborn, I can get it to work:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
ax1.hist(df['rent'], bins=50)
ax2.boxplot(df['rent'])
Results in:
The closest I've come is to use seaborn only on the boxplot, and pyplot for the histogram:
plt.figure(figsize=(8, 5))
plt.subplot(1, 2, 1)
sns.boxplot(x='rent', data=df)
plt.subplot(1, 2, 2)
plt.hist(df['rent'], bins=50)
Results:
What am I missing? Why can't I get this to work with two seaborn plots on the same figure, side by side (1 row, 2 columns)?
Try this function:
def creating_box_hist(column, df):
# creating a figure composed of two matplotlib.Axes objects (ax_box and ax_hist)
f, (ax_box, ax_hist) = plt.subplots(2, sharex=True, gridspec_kw={"height_ratios": (.15, .85)})
# assigning a graph to each ax
sns.boxplot(df[column], ax=ax_box)
sns.histplot(data=df, x=column, ax=ax_hist)
# Remove x axis name for the boxplot
ax_box.set(xlabel='')
plt.show()
I have a CSV file which has multiple columns, now I am trying to plot side by side count plot for selected columns, using below code, I am able to make only two-column, but when I trying to add more column, it's not working. How to plot multiple selected columns and plot it side by side.
While I plotting two graphs, its overlapping, how to increase the gap.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
train_data = pd.read_csv(r"train_ctrUa4K.csv")
plt.figure(figsize=(10, 8))
fig, ax =plt.subplots(1,2)
sns.countplot(train_data['Gender'], ax=ax[0])
sns.countplot(train_data['Dependents'], ax=ax[1])
#sns.countplot(train_data['Self_Employed'], ax=ax[1])
#sns.countplot(train_data['Property_Area'], ax=ax[1,1])
fig.show()
change the number of columns in the call to subplots()
fig, ax = plt.subplots(1,4)
sns.countplot(train_data['Gender'], ax=ax[0])
sns.countplot(train_data['Dependents'], ax=ax[1])
sns.countplot(train_data['Self_Employed'], ax=ax[2])
sns.countplot(train_data['Property_Area'], ax=ax[3])
If you have too many subplots to fit on a single line, you can increase the number of rows as well. Be careful that if you have more than one row and more than one column, then the variable ax will be a 2D array:
fig, ax = plt.subplots(2,2)
sns.countplot(train_data['Gender'], ax=ax[0,0])
sns.countplot(train_data['Dependents'], ax=ax[0,1])
sns.countplot(train_data['Self_Employed'], ax=ax[1,0])
sns.countplot(train_data['Property_Area'], ax=ax[1,1])
This question already has answers here:
Annotate bars with values on Pandas bar plots
(4 answers)
Closed 1 year ago.
I would like to create an annotation to a bar chart that compares the value of the bar to two reference values. An overlay such as shown in the picture, a kind of staff gauge, is possible, but I'm open to more elegant solutions.
The bar chart is generated with the pandas API to matplotlib (e.g. data.plot(kind="bar")), so a plus would be if the solution is playing nicely with that.
You may use smaller bars for the target and benchmark indicators. Pandas cannot annotate bars automatically, but you can simply loop over the values and use matplotlib's pyplot.annotate instead.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
a = np.random.randint(5,15, size=5)
t = (a+np.random.normal(size=len(a))*2).round(2)
b = (a+np.random.normal(size=len(a))*2).round(2)
df = pd.DataFrame({"a":a, "t":t, "b":b})
fig, ax = plt.subplots()
df["a"].plot(kind='bar', ax=ax, legend=True)
df["b"].plot(kind='bar', position=0., width=0.1, color="lightblue",legend=True, ax=ax)
df["t"].plot(kind='bar', position=1., width=0.1, color="purple", legend=True, ax=ax)
for i, rows in df.iterrows():
plt.annotate(rows["a"], xy=(i, rows["a"]), rotation=0, color="C0")
plt.annotate(rows["b"], xy=(i+0.1, rows["b"]), color="lightblue", rotation=+20, ha="left")
plt.annotate(rows["t"], xy=(i-0.1, rows["t"]), color="purple", rotation=-20, ha="right")
ax.set_xlim(-1,len(df))
plt.show()
There's no direct way to annotate a bar plot (as far as I am aware) Some time ago I needed to annotate one so I wrote this, perhaps you can adapt it to your needs.
import matplotlib.pyplot as plt
import numpy as np
ax = plt.subplot(111)
ax.set_xlim(-0.2, 3.2)
ax.grid(b=True, which='major', color='k', linestyle=':', lw=.5, zorder=1)
# x,y data
x = np.arange(4)
y = np.array([5, 12, 3, 7])
# Define upper y limit leaving space for the text above the bars.
up = max(y) * .03
ax.set_ylim(0, max(y) + 3 * up)
ax.bar(x, y, align='center', width=0.2, color='g', zorder=4)
# Add text to bars
for xi, yi, l in zip(*[x, y, list(map(str, y))]):
ax.text(xi - len(l) * .02, yi + up, l,
bbox=dict(facecolor='w', edgecolor='w', alpha=.5))
ax.set_xticks(x)
ax.set_xticklabels(['text1', 'text2', 'text3', 'text4'])
ax.tick_params(axis='x', which='major', labelsize=12)
plt.show()
I am trying to add custom xticks to a relatively complicated bar graph plot and I am stuck.
I am plotting from two data frames, merged_90 and merged_15:
merged_15
Volume y_err_x Area_2D y_err_y
TripDate
2015-09-22 1663.016032 199.507503 1581.591701 163.473202
merged_90
Volume y_err_x Area_2D y_err_y
TripDate
1990-06-10 1096.530711 197.377497 1531.651913 205.197493
I want to create a bar graph with two axes (i.e. Area_2D and Volume) where the Area_2D and Volume bars are grouped based on their respective data frame. An example script would look like:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy
fig = plt.figure()
ax1 = fig.add_subplot(111)
merged_90.Volume.plot(ax=ax1, color='orange', kind='bar',position=2.5, yerr=merged_90['y_err_x'] ,use_index=False , width=0.1)
merged_15.Volume.plot(ax=ax1, color='red', kind='bar',position=0.9, yerr=merged_15['y_err_x'] ,use_index=False, width=0.1)
ax2 = ax1.twinx()
merged_90.Area_2D.plot(ax=ax2,color='green', kind='bar',position=3.5, yerr=merged_90['y_err_y'],use_index=False, width=0.1)
merged_15.Area_2D.plot(ax=ax2,color='blue', kind='bar',position=0, yerr=merged_15['y_err_y'],use_index=False, width=0.1)
ax1.set_xlim(-0.5,0.2)
x = scipy.arange(1)
ax2.set_xticks(x)
ax2.set_xticklabels(['2015'])
plt.tight_layout()
plt.show()
The resulting plot is:
One would think I could change:
x = scipy.arange(1)
ax2.set_xticks(x)
ax2.set_xticklabels(['2015'])
to
x = scipy.arange(2)
ax2.set_xticks(x)
ax2.set_xticklabels(['1990','2015'])
but that results in:
I would like to see the ticks ordered in chronological order (i.e. 1990,2015)
Thanks!
Have you considered dropping the second axis and plotting them as follows:
ind = np.array([0,0.3])
width = 0.1
fig, ax = plt.subplots()
Rects1 = ax.bar(ind, [merged_90.Volume.values, merged_15.Volume.values], color=['orange', 'red'] ,width=width)
Rects2 = ax.bar(ind + width, [merged_90.Area_2D.values, merged_15.Area_2D.values], color=['green', 'blue'] ,width=width)
ax.set_xticks([.1,.4])
ax.set_xticklabels(('1990','2015'))
This produces:
I omitted the error and colors but you can easily add them. That would produce a readable graph given your test data. As you mentioned in comments you would still rather have two axes, presumably for different data with proper scales. To do this you could do:
fig = plt.figure()
ax1 = fig.add_subplot(111)
merged_90.Volume.plot(ax=ax, color='orange', kind='bar',position=2.5, use_index=False , width=0.1)
merged_15.Volume.plot(ax=ax, color='red', kind='bar',position=1.0, use_index=False, width=0.1)
ax2 = ax1.twinx()
merged_90.Area_2D.plot(ax=ax,color='green', kind='bar',position=3.5,use_index=False, width=0.1)
merged_15.Area_2D.plot(ax=ax,color='blue', kind='bar',position=0,use_index=False, width=0.1)
ax1.set_xlim([-.45, .2])
ax2.set_xlim(-.45, .2])
ax1.set_xticks([-.35, 0])
ax1.set_xticklabels([1990, 2015])
This produces:
Your problem was with resetting just one axis limit and not the other, they are created as twins but do not necessarily follow the changes made to one another.
I'm playing with seaborn for the first time, trying to plot different columns of a pandas dataframe on different plots using matplotlib subplots. The simple code below produces the expected figure but the last plot does not have a proper y range (it seems linked to the full range of values in the dataframe).
Does anyone have an idea why this happens and how to prevent it? Thanks.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pds
import seaborn as sns
X = np.arange(0,10)
df = pds.DataFrame({'X': X, 'Y1': 4*X, 'Y2': X/2., 'Y3': X+3, 'Y4': X-7})
fig, axes = plt.subplots(ncols=2, nrows=2)
ax1, ax2, ax3, ax4 = axes.ravel()
sns.set(style="ticks")
sns.despine(fig=fig)
sns.regplot(x='X', y='Y1', data=df, fit_reg=False, ax=ax1)
sns.regplot(x='X', y='Y2', data=df, fit_reg=False, ax=ax2)
sns.regplot(x='X', y='Y3', data=df, fit_reg=False, ax=ax3)
sns.regplot(x='X', y='Y4', data=df, fit_reg=False, ax=ax4)
plt.show()
Update: I modified the above code with:
fig, axes = plt.subplots(ncols=2, nrows=3)
ax1, ax2, ax3, ax4, ax5, ax6 = axes.ravel()
If I plot data on any axis but the last one I obtain what I'm looking for:
Of course I don't want the empty frames. All plots present the data with a similar visual aspect.
When data is plotted on the last axis, it gets a y range that is too wide like in the first example. Only the last axis seems to have this problem. Any clue?
If you want the scales to be the same on all axes you could create subplots with this command:
fig, axes = plt.subplots(ncols=2, nrows=2, sharey=True, sharex=True)
Which will make all plots to share relevant axis:
If you want manually to change the limits of that particular ax, you could add this line at the end of plotting commands:
ax4.set_ylim(top=5)
# or for both limits like this:
# ax4.set_ylim([-2, 5])
Which will give something like this: