I came across this different behaviour in the third example plot below. Why am I able to correctly edit the x-axis' ticks with pandas line() and area() plots, but not with bar()? What's the best way to fix the (general) third example?
import numpy as np
import pandas as pd
import matplotlib.ticker as ticker
import matplotlib.pyplot as plt
x = np.arange(73,145,1)
y = np.cos(x)
df = pd.Series(y,x)
ax1 = df.plot.line()
ax1.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax1.xaxis.set_minor_locator(ticker.MultipleLocator(2.5))
plt.show()
ax2 = df.plot.area(stacked=False)
ax2.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax2.xaxis.set_minor_locator(ticker.MultipleLocator(2.5))
plt.show()
ax3 = df.plot.bar()
ax3.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax3.xaxis.set_minor_locator(ticker.MultipleLocator(2.5))
plt.show()
Problem:
The bar plot is meant to be used with categorical data. Therefore the bars are not actually at the positions of x but at positions 0,1,2,...N-1. The bar labels are then adjusted to the values of x.
If you then put a tick only on every tenth bar, the second label will be placed at the tenth bar etc. The result is
You can see that the bars are actually positionned at integer values starting at 0 by using a normal ScalarFormatter on the axes:
ax3 = df.plot.bar()
ax3.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax3.xaxis.set_minor_locator(ticker.MultipleLocator(2.5))
ax3.xaxis.set_major_formatter(ticker.ScalarFormatter())
Now you can of course define your own fixed formatter like this
n = 10
ax3 = df.plot.bar()
ax3.xaxis.set_major_locator(ticker.MultipleLocator(n))
ax3.xaxis.set_minor_locator(ticker.MultipleLocator(n/4.))
seq = ax3.xaxis.get_major_formatter().seq
ax3.xaxis.set_major_formatter(ticker.FixedFormatter([""]+seq[::n]))
which has the drawback that it starts at some arbitrary value.
Solution:
I would guess the best general solution is not to use the pandas plotting function at all (which is anyways only a wrapper), but the matplotlib bar function directly:
fig, ax3 = plt.subplots()
ax3.bar(df.index, df.values, width=0.72)
ax3.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax3.xaxis.set_minor_locator(ticker.MultipleLocator(2.5))
Related
Consider the following snippet
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
data = np.random.rand(10,5)
cols = ["a","b","c","d","e"]
df = pd.DataFrame(data=data, columns = cols)
df.index.name="Time (s)"
fig,axes = plt.subplots(3,2,sharex=True, squeeze=False)
axes = axes.T.flat
axes[5].remove()
df.plot(subplots=True,grid=True,legend=True,ax = axes[0:5])
that produces the following plot
I wish to show the xticks in the subplots where they are missing as I wrote in red with reference to the above picture.
I wish to show only the xticks where I marked in red, not the labels. The labels are fine where they currently are and shall be kept there.
After some search, I tried with
for ax in axes:
ax.tick_params(axis="x")
and
for ax in axes:
ax.spines.set(visible=True)
but with no success.
Any hints?
EDIT: As someone kindly suggested, if I set sharex=False, then when I horizontally zoom on one axes I will not have the same zoom effect on the other axes and this is not what I want.
What I want is to: a) show the xticks in all axes, b) when I horizontally zoom on one axes all the other axes are horizontally zoomed of the same amount.
You need to turn off sharing x properties by setting sharex=False (which is the default value by the way in matplotlib.pyplot.subplots):
Replace this:
fig,axes = plt.subplots(3,2,sharex=True, squeeze=False)
By this:
fig,axes = plt.subplots(3,2, squeeze=False)
# Output:
I am trying to include 2 seaborn countplots with different scales on the same plot but the bars display as different widths and overlap as shown below. Any idea how to get around this?
Setting dodge=False, doesn't work as the bars appear on top of each other.
The main problem of the approach in the question, is that the first countplot doesn't take hue into account. The second countplot won't magically move the bars of the first. An additional categorical column could be added, only taking on the 'weekend' value. Note that the column should be explicitly made categorical with two values, even if only one value is really used.
Things can be simplified a lot, just starting from the original dataframe, which supposedly already has a column 'is_weeked'. Creating the twinx ax beforehand allows to write a loop (so writing the call to sns.countplot() only once, with parameters).
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
sns.set_style('dark')
# create some demo data
data = pd.DataFrame({'ride_hod': np.random.normal(13, 3, 1000).astype(int) % 24,
'is_weekend': np.random.choice(['weekday', 'weekend'], 1000, p=[5 / 7, 2 / 7])})
# now, make 'is_weekend' a categorical column (not just strings)
data['is_weekend'] = pd.Categorical(data['is_weekend'], ['weekday', 'weekend'])
fig, ax1 = plt.subplots(figsize=(16, 6))
ax2 = ax1.twinx()
for ax, category in zip((ax1, ax2), data['is_weekend'].cat.categories):
sns.countplot(data=data[data['is_weekend'] == category], x='ride_hod', hue='is_weekend', palette='Blues', ax=ax)
ax.set_ylabel(f'Count ({category})')
ax1.legend_.remove() # both axes got a legend, remove one
ax1.set_xlabel('Hour of Day')
plt.tight_layout()
plt.show()
use plt.xticks(['put the label by hand in your x label'])
I cannot work out how to change the scale of the y-axis. My code is:
grid = sns.catplot(x='Nationality', y='count',
row='Age', col='Gender',
hue='Type',
data=dfNorthumbria2, kind='bar', ci='No')
I wanted to just go up in full numbers rather than in .5
Update
I just now found this tutorial the probably easiest solution will be the following:
grid.set(yticks=list(range(5)))
From the help of grid.set
Help on method set in module seaborn.axisgrid:
set(**kwargs) method of seaborn.axisgrid.FacetGrid instance
Set attributes on each subplot Axes.
Since seaborn is build on top of matplotlib you can use yticks from plt
import matplotlib.pyplot as plt
plt.yticks(range(5))
However this changed only the yticks of the upper row in my mockup example.
For this reason you probably want to change the y ticks based on the axis with ax.set_yticks(). To get the axis from your grid object you can implemented a list comprehension as follows:
[ax[0].set_yticks(range(0,150,5) )for ax in grid.axes]
A full replicable example would look like this (adapted from here)
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks")
exercise = sns.load_dataset("exercise")
grid = sns.catplot(x="time", y="pulse", hue="kind",
row="diet", data=exercise)
# plt.yticks(range(0,150,5)) # Changed only one y-axis
# Changed y-ticks to steps of 20
[ax[0].set_yticks(range(0,150,20) )for ax in grid.axes]
Is there a way to add a secondary legend to a scatterplot, where the size of the scatter is proportional to some data?
I have written the following code that generates a scatterplot. The color of the scatter represents the year (and is taken from a user-defined df) while the size of the scatter represents variable 3 (also taken from a df but is raw data):
import pandas as pd
colors = pd.DataFrame({'1985':'red','1990':'b','1995':'k','2000':'g','2005':'m','2010':'y'}, index=[0,1,2,3,4,5])
fig = plt.figure()
ax = fig.add_subplot(111)
for i in df.keys():
df[i].plot(kind='scatter',x='variable1',y='variable2',ax=ax,label=i,s=df[i]['variable3']/100, c=colors[i])
ax.legend(loc='upper right')
ax.set_xlabel("Variable 1")
ax.set_ylabel("Variable 2")
This code (with my data) produces the following graph:
So while the colors/years are well and clearly defined, the size of the scatter is not.
How can I add a secondary or additional legend that defines what the size of the scatter means?
You will need to create the second legend yourself, i.e. you need to create some artists to populate the legend with. In the case of a scatter we can use a normal plot and set the marker accordingly.
This is shown in the below example. To actually add a second legend we need to add the first legend to the axes, such that the new legend does not overwrite the first one.
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np; np.random.seed(1)
import pandas as pd
plt.rcParams["figure.subplot.right"] = 0.8
v = np.random.rand(30,4)
v[:,2] = np.random.choice(np.arange(1980,2015,5), size=30)
v[:,3] = np.random.randint(5,13,size=30)
df= pd.DataFrame(v, columns=["x","y","year","quality"])
df.year = df.year.values.astype(int)
fig, ax = plt.subplots()
for i, (name, dff) in enumerate(df.groupby("year")):
c = matplotlib.colors.to_hex(plt.cm.jet(i/7.))
dff.plot(kind='scatter',x='x',y='y', label=name, c=c,
s=dff.quality**2, ax=ax)
leg = plt.legend(loc=(1.03,0), title="Year")
ax.add_artist(leg)
h = [plt.plot([],[], color="gray", marker="o", ms=i, ls="")[0] for i in range(5,13)]
plt.legend(handles=h, labels=range(5,13),loc=(1.03,0.5), title="Quality")
plt.show()
Have a look at http://matplotlib.org/users/legend_guide.html.
It shows how to have multiple legends (about halfway down) and there is another example that shows how to set the marker size.
If that doesn't work, then you can also create a custom legend (last example).
Using a complicated script that nests among other pandas.DataFrame.plot() and GridSpec in a subplot setting, I have the following problem:
When I create a 2-cols 1-row gridspec, the tick lables are all correct. When I create a 1-col 2-rows gridspec however, as soon as I plot onto the first (upper row) axes using pandas.DataFrame.plot(), the x-ticklabels for the top row disappear (the ticks remain).
It is not the case that the top ticks change once I draw something on the lower ax, sharex appears to not be the issue.
However, my x-labels are still stored:
axes[0].get_xaxis().get_ticklabels()
Out[59]:
<a list of 9 Text major ticklabel objects>
It's just that they're not displayed. I suspected a NullFormatter, but that's not the case either:
axes[0].get_xaxis().get_major_formatter()
Out[57]:
<matplotlib.ticker.ScalarFormatter at 0x7f7414330710>
I get both ticks and labels on the top of the first axes when I do
axes[0].get_xaxis().tick_top()
However, when I then go back to tick_bottom(), I only have ticks on bottom, not the labels.
What can cause my stored labels to not to be displayed despite a "normal" formatter?
Here's a simple example:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import gridspec
df = pd.DataFrame(np.random.rand(100,2), columns=['A', 'B'])
figure = plt.figure()
GridSpec = gridspec.GridSpec(nrows=2, ncols=1)
[plt.subplot(gsSpec) for gsSpec in GridSpec]
axes = figure.axes
df.plot(secondary_y=['B'], ax=axes[0], sharex=False)
It's the secondary_y=['B'] that causes the xticks to disappear. I'm not sure why it does that.
Fortunately, you can use plt.setp(ax.get_xticklabels(), visible=True) (docs) to turn them back on manually:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import gridspec
df = pd.DataFrame(np.random.rand(100,2), columns=['A', 'B'])
figure = plt.figure()
GridSpec = gridspec.GridSpec(nrows=2, ncols=1)
axes = [plt.subplot(gsSpec) for gsSpec in GridSpec]
ax = axes[0]
df.plot(secondary_y=['B'], ax=ax, sharex=True)
plt.setp(ax.get_xticklabels(), visible=True)