With a dataframe and basic plot such as this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(123456)
rows = 75
df = pd.DataFrame(np.random.randint(-4,5,size=(rows, 3)), columns=['A', 'B', 'C'])
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df = df.cumsum()
df.plot()
What is the best way of annotating the last points on the lines so that you get the result below?
In order to annotate a point use ax.annotate(). In this case it makes sense to specify the coordinates to annotate separately. I.e. the y coordinate is the data coordinate of the last point of the line (which you can get from line.get_ydata()[-1]) while the x coordinate is independent of the data and should be the right hand side of the axes (i.e. 1 in axes coordinates). You may then also want to offset the text a bit such that it does not overlap with the axes.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
rows = 75
df = pd.DataFrame(np.random.randint(-4,5,size=(rows, 3)), columns=['A', 'B', 'C'])
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df = df.cumsum()
ax = df.plot()
for line, name in zip(ax.lines, df.columns):
y = line.get_ydata()[-1]
ax.annotate(name, xy=(1,y), xytext=(6,0), color=line.get_color(),
xycoords = ax.get_yaxis_transform(), textcoords="offset points",
size=14, va="center")
plt.show()
Method 1
Here is one way, or at least a method, which you can adapt to aesthetically fit in whatever way you want, using the plt.annotate method:
[EDIT]: If you're going to use a method like this first one, the method outlined in ImportanceOfBeingErnest's answer is better than what I've proposed.
df.plot()
for col in df.columns:
plt.annotate(col,xy=(plt.xticks()[0][-1]+0.7, df[col].iloc[-1]))
plt.show()
For the xy argument, which is the x and y coordinates of the text, I chose the last x coordinate in plt.xticks(), and added 0.7 so that it is outside of your x axis, but you can coose to make it closer or further as you see fit.
METHOD 2:
You could also just use the right y axis, and label it with your 3 lines. For example:
fig, ax = plt.subplots()
df.plot(ax=ax)
ax2 = ax.twinx()
ax2.set_ylim(ax.get_ylim())
ax2.set_yticks([df[col].iloc[-1] for col in df.columns])
ax2.set_yticklabels(df.columns)
plt.show()
This gives you the following plot:
I've got some tips from the other answers and believe this is the easiest solution.
Here is a generic function to improve the labels of a line chart. Its advantages are:
you don't need to mess with the original DataFrame since it works over a line chart,
it will use the already set legend label,
removes the frame,
just copy'n paste it to improve your chart :-)
You can just call it after creating any line char:
def improve_legend(ax=None):
if ax is None:
ax = plt.gca()
for spine in ax.spines:
ax.spines[spine].set_visible(False)
for line in ax.lines:
data_x, data_y = line.get_data()
right_most_x = data_x[-1]
right_most_y = data_y[-1]
ax.annotate(
line.get_label(),
xy=(right_most_x, right_most_y),
xytext=(5, 0),
textcoords="offset points",
va="center",
color=line.get_color(),
)
ax.legend().set_visible(False)
This is the original chart:
Now you just need to call the function to improve your plot:
ax = df.plot()
improve_legend(ax)
The new chart:
Beware, it will probably not work well if a line has null values at the end.
Related
I have created a barplot for given days of the year and the number of people born on this given day (figure a). I want to set the x-axes in my seaborn barplot to xlim = (0,365) to show the whole year.
But, once I use ax.set_xlim(0,365) the bar plot is simply moved to the left (figure b).
This is the code:
#data
df = pd.DataFrame()
df['day'] = np.arange(41,200)
df['born'] = np.random.randn(159)*100
#plot
f, axes = plt.subplots(4, 4, figsize = (12,12))
ax = sns.barplot(df.day, df.born, data = df, hue = df.time, ax = axes[0,0], color = 'skyblue')
ax.get_xaxis().set_label_text('')
ax.set_xticklabels('')
ax.set_yscale('log')
ax.set_ylim(0,10e3)
ax.set_xlim(0,366)
ax.set_title('SE Africa')
How can I set the x-axes limits to day 0 and 365 without the bars being shifted to the left?
IIUC, the expected output given the nature of data is difficult to obtain straightforwardly, because, as per the documentation of seaborn.barplot:
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
This means the function seaborn.barplot creates categories based on the data in x (here, df.day) and they are linked to integers, starting from 0.
Therefore, it means even if we have data from day 41 onwards, seaborn is going to refer the starting category with x = 0, making for us difficult to tweak the lower limit of x-axis post function call.
The following code and corresponding plot clarifies what I explained above:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# data
rng = np.random.default_rng(101)
day = np.arange(41,200)
born = rng.integers(low=0, high=10e4, size=200-41)
df = pd.DataFrame({"day":day, "born":born})
# plot
f, ax = plt.subplots(figsize=(4, 4))
sns.barplot(data=df, x='day', y='born', ax=ax, color='b')
ax.set_xlim(0,365)
ax.set_xticks(ticks=np.arange(0, 365, 30), labels=np.arange(0, 365, 30))
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()
I suggest using matplotlib.axes.Axes.bar to overcome this issue, although handling colors of the bars would be not straightforward compared to sns.barplot(..., hue=..., ...) :
# plot
f, ax = plt.subplots(figsize=(4, 4))
ax.bar(x=df.day, height=df.born) # instead of sns.barplot
ax.get_xaxis().set_label_text('')
ax.set_xlim(0,365)
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()
Hi I'm trying to plot a pointplot and scatterplot on one graph with the same dataset so I can see the individual points that make up the pointplot.
Here is the code I am using:
xlPath = r'path to data here'
df = pd.concat(pd.read_excel(xlPath, sheet_name=None),ignore_index=True)
sns.pointplot(data=df, x='ID', y='HM (N/mm2)', palette='bright', capsize=0.15, alpha=0.5, ci=95, join=True, hue='Layer')
sns.scatterplot(data=df, x='ID', y='HM (N/mm2)')
plt.show()
When I plot, for some reason the points from the scatterplot are offsetting one ID spot right on the x-axis. When I plot the scatter or the point plot separately, they each are in the correct ID spot. Why would plotting them on the same plot cause the scatterplot to offset one right?
Edit: Tried to make the ID column categorical, but that didn't work either.
Seaborn's pointplot creates a categorical x-axis while here the scatterplot uses a numerical x-axis.
Explicitly making the x-values categorical: df['ID'] = pd.Categorical(df['ID']), isn't sufficient, as the scatterplot still sees numbers. Changing the values to strings does the trick. To get them in the correct order, sorting might be necessary.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# first create some test data
df = pd.DataFrame({'ID': np.random.choice(np.arange(1, 49), 500),
'HM (N/mm2)': np.random.uniform(1, 10, 500)})
df['Layer'] = ((df['ID'] - 1) // 6) % 4 + 1
df['HM (N/mm2)'] += df['Layer'] * 8
df['Layer'] = df['Layer'].map(lambda s: f'Layer {s}')
# sort the values and convert the 'ID's to strings
df = df.sort_values('ID')
df['ID'] = df['ID'].astype(str)
fig, ax = plt.subplots(figsize=(12, 4))
sns.pointplot(data=df, x='ID', y='HM (N/mm2)', palette='bright',
capsize=0.15, alpha=0.5, ci=95, join=True, hue='Layer', ax=ax)
sns.scatterplot(data=df, x='ID', y='HM (N/mm2)', color='purple', ax=ax)
ax.margins(x=0.02)
plt.tight_layout()
plt.show()
With a dataframe and basic plot such as this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(123456)
rows = 75
df = pd.DataFrame(np.random.randint(-4,5,size=(rows, 3)), columns=['A', 'B', 'C'])
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df = df.cumsum()
df.plot()
What is the best way of annotating the last points on the lines so that you get the result below?
In order to annotate a point use ax.annotate(). In this case it makes sense to specify the coordinates to annotate separately. I.e. the y coordinate is the data coordinate of the last point of the line (which you can get from line.get_ydata()[-1]) while the x coordinate is independent of the data and should be the right hand side of the axes (i.e. 1 in axes coordinates). You may then also want to offset the text a bit such that it does not overlap with the axes.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
rows = 75
df = pd.DataFrame(np.random.randint(-4,5,size=(rows, 3)), columns=['A', 'B', 'C'])
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df = df.cumsum()
ax = df.plot()
for line, name in zip(ax.lines, df.columns):
y = line.get_ydata()[-1]
ax.annotate(name, xy=(1,y), xytext=(6,0), color=line.get_color(),
xycoords = ax.get_yaxis_transform(), textcoords="offset points",
size=14, va="center")
plt.show()
Method 1
Here is one way, or at least a method, which you can adapt to aesthetically fit in whatever way you want, using the plt.annotate method:
[EDIT]: If you're going to use a method like this first one, the method outlined in ImportanceOfBeingErnest's answer is better than what I've proposed.
df.plot()
for col in df.columns:
plt.annotate(col,xy=(plt.xticks()[0][-1]+0.7, df[col].iloc[-1]))
plt.show()
For the xy argument, which is the x and y coordinates of the text, I chose the last x coordinate in plt.xticks(), and added 0.7 so that it is outside of your x axis, but you can coose to make it closer or further as you see fit.
METHOD 2:
You could also just use the right y axis, and label it with your 3 lines. For example:
fig, ax = plt.subplots()
df.plot(ax=ax)
ax2 = ax.twinx()
ax2.set_ylim(ax.get_ylim())
ax2.set_yticks([df[col].iloc[-1] for col in df.columns])
ax2.set_yticklabels(df.columns)
plt.show()
This gives you the following plot:
I've got some tips from the other answers and believe this is the easiest solution.
Here is a generic function to improve the labels of a line chart. Its advantages are:
you don't need to mess with the original DataFrame since it works over a line chart,
it will use the already set legend label,
removes the frame,
just copy'n paste it to improve your chart :-)
You can just call it after creating any line char:
def improve_legend(ax=None):
if ax is None:
ax = plt.gca()
for spine in ax.spines:
ax.spines[spine].set_visible(False)
for line in ax.lines:
data_x, data_y = line.get_data()
right_most_x = data_x[-1]
right_most_y = data_y[-1]
ax.annotate(
line.get_label(),
xy=(right_most_x, right_most_y),
xytext=(5, 0),
textcoords="offset points",
va="center",
color=line.get_color(),
)
ax.legend().set_visible(False)
This is the original chart:
Now you just need to call the function to improve your plot:
ax = df.plot()
improve_legend(ax)
The new chart:
Beware, it will probably not work well if a line has null values at the end.
Updated question and code!
Probably, the tips dataset is not the best example to use, however my issue is reproduced in it, i.e. we see that both point and bar plots share the same Y
I need to combine line and bar plots on one chart. To do this I used seaborn and the following code:
tips = sns.load_dataset('tips')
g = sns.FacetGrid(tips, hue='sex', col='sex', size=4, aspect=2.1, sharey=False, sharex=False)
g = g.map(sns.pointplot, 'day', 'tip', ci=0)
g = g.map(sns.barplot, 'day', 'total_bill', ci=0)
g.set_xticklabels(rotation=45, fontsize=9)
g.set_xticklabels(rotation=45, fontsize=9)
plt.show()
Here is the result:
Everything is okay except the fact that one Y axis is used for both bars and lines on each facetgrid object. I am new to seaborn and currently cannot find a solution. Tried to add "sharey=False" to this line of code
> `g.map(sns.pointplot, 'date', 'worthusdcount')`
however it didn't help.
Any solutions on how to add second Y axis would be appreciated
Here's an example where you apply a custom mapping function to the dataframe of interest. Within the function, you can call plt.gca() to get the current axis at the facet being currently plotted in FacetGrid. Once you have the axis, twinx() can be called just like you would in plain old matplotlib plotting.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
def facetgrid_two_axes(*args, **kwargs):
data = kwargs.pop('data')
dual_axis = kwargs.pop('dual_axis')
alpha = kwargs.pop('alpha', 0.2)
kwargs.pop('color')
ax = plt.gca()
if dual_axis:
ax2 = ax.twinx()
ax2.set_ylabel('Second Axis!')
ax.plot(data['x'],data['y1'], **kwargs, color='red',alpha=alpha)
if dual_axis:
ax2.bar(df['x'],df['y2'], **kwargs, color='blue',alpha=alpha)
df = pd.DataFrame()
df['x'] = np.arange(1,5,1)
df['y1'] = 1 / df['x']
df['y2'] = df['x'] * 100
df['facet'] = 'foo'
df2 = df.copy()
df2['facet'] = 'bar'
df3 = pd.concat([df,df2])
win_plot = sns.FacetGrid(df3, col='facet', size=6)
(win_plot.map_dataframe(facetgrid_two_axes, dual_axis=True)
.set_axis_labels("X", "First Y-axis"))
plt.show()
This isn't the prettiest plot as you might want to adjust the presence of the second y-axis' label, the spacing between plots, etc. but the code suffices to show how to plot two series of differing magnitudes within FacetGrids.
I am trying to convert Line garph to Bar graph using python panda.
Here is my code which gives perfect line graph as per my requirement.
conn = sqlite3.connect('Demo.db')
collection = ['ABC','PQR']
df = pd.read_sql("SELECT * FROM Table where ...", conn)
df['DateTime'] = df['Timestamp'].apply(lambda x: dt.datetime.fromtimestamp(x))
df.groupby('Type').plot(x='DateTime', y='Value',linewidth=2)
plt.legend(collection)
plt.show()
Here is my DataFrame df
http://postimg.org/image/75uy0dntf/
Here is my Line graph output from above code.
http://postimg.org/image/vc5lbi9xv/
I want to draw bar graph instead of line graph.I want month name on x axis and value on y axis. I want colorful bar graph.
Attempt made
df.plot(x='DateTime', y='Value',linewidth=2, kind='bar')
plt.show()
It gives improper bar graph with date and time(instead of month and year) on x axis. Thank you for help.
Here is a code that might do what you want.
In this code, I first sort your database by time. This step is important, because I use the indices of the sorted database as abscissa of your plots, instead of the timestamp. Then, I group your data frame by type and I plot manually each group at the right position (using the sorted index). Finally, I re-define the ticks and the tick labels to display the date in a given format (in this case, I chose MM/YYYY but that can be changed).
import datetime
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
types = ['ABC','BCD','PQR']*3
vals = [126,1587,141,10546,1733,173,107,780,88]
ts = [1414814371, 1414814371, 1406865621, 1422766793, 1422766793, 1425574861, 1396324799, 1396324799, 1401595199]
aset = zip(types, vals, ts)
df = pd.DataFrame(data=aset, columns=['Type', 'Value', 'Timestamp'])
df = df.sort(['Timestamp', 'Type'])
df['Date'] = df['Timestamp'].apply(lambda x: datetime.datetime.fromtimestamp(x).strftime('%m/%Y'))
groups = df.groupby('Type')
ngroups = len(groups)
colors = ['r', 'g', 'b']
fig = plt.figure()
ax = fig.add_subplot(111, position=[0.15, 0.15, 0.8, 0.8])
offset = 0.1
width = 1-2*offset
#
for j, group in enumerate(groups):
x = group[1].index+offset
y = group[1].Value
ax.bar(x, y, width=width, color=colors[j], label=group[0])
xmin, xmax = min(df.index), max(df.index)+1
ax.set_xlim([xmin, xmax])
ax.tick_params(axis='x', which='both', top='off', bottom='off')
plt.xticks(np.arange(xmin, xmax)+0.5, list(df['Date']), rotation=90)
ax.legend()
plt.show()
I hope this works for you. This is the output that I get, given my subset of your database.