I have a dataframe having two columns- VOL, INVOL and for a particular year, the value are the same. Hence, while plotting in seaborn, I am not able to see the value of the other column when they converge.
For example:
My dataframe is
When I use seaborn, using the below code
f5_test = df5_test.melt('FY', var_name='cols', value_name='vals')
g = sns.catplot(x="FY", y="vals", hue='cols', data=df5_test, kind='point')
the chart is not showing the same point of 0.06.
I tried using pandas plotting, having the same result.
Please advise what I should do. Thanks in advance.
You plot looks legitimate. Two lines perfectly overlap since the data from 2016 to 2018 is exactly the same. I think maybe you can try to plot the two lines separately and add or subtract some small value to one of them to move the line a little bit. For example:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'FY': [2012, 2013, 2014, 2015, 2016, 2017, 2018],
'VOL_PCT': [0, 0.08, 0.07, 0.06, 0, 0, 0.06],
'INVOL_PC': [0, 0, 0, 0, 0, 0, 0.06]})
# plot
fig, ax = plt.subplots()
sns.lineplot(df.FY, df.VOL_PCT)
sns.lineplot(df.FY+.01, df.INVOL_PC-.001)
In addition, given the type of your data, you could also consider using stack plots. For example:
fig, ax = plt.subplots()
labels = ['VOL_PCT', 'INVOL_PC']
ax.stackplot(df.FY, df.VOL_PCT, df.INVOL_PC, labels=labels)
ax.legend(loc='upper left');
Ref. Stackplot
Related
trying to create some boxplots of pandas dataframes.
I have dataframes that typically look like this (not sure if there was a good way to show it so just took a screenshot).
I am creating a boxplot for each dataframe (after transposing) using the df.boxplot() method, it comes out almost exactly how I want it using the code below:
ax = crit_storm_df[tp_cols].T.boxplot()
ax.set_xlabel("Duration (m)")
ax.set_ylabel("Max Flow (cu.m/sec")
ax.set_xlim(0, None)
ax.set_ylim(0, None)
ax.set_title(crit_storm_df.name)
plt.show()
Example pic of output graph. What's lacking though is I want to add a legend with one entry for each box that represents a column in my dataframe in the pic above. Since I transposed the df before plotting, I would like to have a legend entry for each row, i.e. "tp01", "tp02" etc.
Anyone know what I should be doing instead? Is there a way to do this through the df.boxplot() method or do I need to do something in matplotlib?
I tried ax.legend() but it doesn't do anything except give me a warning:
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
Any help would be appreciated, thanks!
If you simply want your boxes to have different colors, you can use seaborn. It's default behavior there:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.random.randn(10, 4),
columns=['Col1', 'Col2', 'Col3', 'Col4'])
ax = sns.boxplot(data=df)
plt.legend(ax.patches, df.columns)
plt.show()
Edit: adding legend
Output:
To get similar type of required graph this code is help you to do that :
import matplotlib.pyplot as plt
import pandas as pd
data = {
'Duration': [10, 15, 20, 25, 30, 45, 60, 90, 120, 180, 270, 360],
'tp01': [13.1738, 13.1662, 14.3903, 14.2772, 14.3223, 12.5686, 14.8710, 8.9785, 9.2224, 7.4957, 3.6493, 5.7982],
'tp02': [13.1029, 14.2570, 16.5373, 12.6589, 11.0455, 12.6777, 8.1715, 9.3830, 8.3498, 6.0930, 6.4310, 7.4538],
'tp03': [14.5263, 13.6724, 11.4800, 13.4982, 12.3987, 11.6688, 10.4089, 7.0736, 5.8004, 10.1354, 5.5874, 5.6749],
'tp04': [14.7589, 11.6993, 12.5825, 13.5627, 11.9481, 10.7803, 8.9388, 5.7076, 12.7690, 9.7546, 9.5004, 5.9912],
'tp05': [15.5543, 14.1007, 11.7304, 13.3218, 12.4318, 9.5237, 11.9014, 5.6778, 14.2627, 3.7422, 6.4555, 3.3458],
'tp06': [13.5196, 12.5939, 12.5679, 11.4414, 9.3590, 9.6083, 9.6704, 10.5239, 9.1028, 6.0336, 7.0258, 5.9800],
'tp07': [14.7476, 13.3925, 13.0324, 13.3649, 14.7832, 8.1078, 7.1307, 15.4406, 5.0187, 6.9497, 3.6492, 4.8642],
'tp08': [13.3995, 14.3639, 12.7579, 10.6844, 10.3281, 10.2541, 8.8257, 8.8773, 8.3498, 5.7315, 7.8469, 6.7316],
'tp09': [16.7954, 17.1788, 15.9850, 10.8780, 12.5249, 10.2174, 7.5735, 7.3753, 7.1157, 4.8536, 9.1581, 5.6369],
'tp10': [15.7671, 16.1570, 11.6122, 15.2340, 13.2356, 13.2270, 11.6810, 7.1157, 8.0048, 5.5782, 6.0876, 5.7982],
}
df = pd.DataFrame(data).set_index("Duration")
fig, ax = plt.subplots()
df.T.plot(kind='box', ax=ax)
labels = df.columns
lines = [plt.Line2D([0, 1], [0, 1], color=c, marker='o', markersize=10) for c in plt.rcParams['axes.prop_cycle'].by_key()['color'][:len(labels)]]
ax.legend(lines, labels, loc='best')
ax.set_xlabel("Duration (m)")
ax.set_ylabel("Max Flow (cu.m/sec")
ax.set_xlim(0, None)
ax.set_ylim(0, None)
ax.set_xticklabels(df.index)
plt.show()
Result:
I have created a barplot for given days of the year and the number of people born on this given day (figure a). I want to set the x-axes in my seaborn barplot to xlim = (0,365) to show the whole year.
But, once I use ax.set_xlim(0,365) the bar plot is simply moved to the left (figure b).
This is the code:
#data
df = pd.DataFrame()
df['day'] = np.arange(41,200)
df['born'] = np.random.randn(159)*100
#plot
f, axes = plt.subplots(4, 4, figsize = (12,12))
ax = sns.barplot(df.day, df.born, data = df, hue = df.time, ax = axes[0,0], color = 'skyblue')
ax.get_xaxis().set_label_text('')
ax.set_xticklabels('')
ax.set_yscale('log')
ax.set_ylim(0,10e3)
ax.set_xlim(0,366)
ax.set_title('SE Africa')
How can I set the x-axes limits to day 0 and 365 without the bars being shifted to the left?
IIUC, the expected output given the nature of data is difficult to obtain straightforwardly, because, as per the documentation of seaborn.barplot:
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
This means the function seaborn.barplot creates categories based on the data in x (here, df.day) and they are linked to integers, starting from 0.
Therefore, it means even if we have data from day 41 onwards, seaborn is going to refer the starting category with x = 0, making for us difficult to tweak the lower limit of x-axis post function call.
The following code and corresponding plot clarifies what I explained above:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# data
rng = np.random.default_rng(101)
day = np.arange(41,200)
born = rng.integers(low=0, high=10e4, size=200-41)
df = pd.DataFrame({"day":day, "born":born})
# plot
f, ax = plt.subplots(figsize=(4, 4))
sns.barplot(data=df, x='day', y='born', ax=ax, color='b')
ax.set_xlim(0,365)
ax.set_xticks(ticks=np.arange(0, 365, 30), labels=np.arange(0, 365, 30))
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()
I suggest using matplotlib.axes.Axes.bar to overcome this issue, although handling colors of the bars would be not straightforward compared to sns.barplot(..., hue=..., ...) :
# plot
f, ax = plt.subplots(figsize=(4, 4))
ax.bar(x=df.day, height=df.born) # instead of sns.barplot
ax.get_xaxis().set_label_text('')
ax.set_xlim(0,365)
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()
I'm trying to programmatically create a combo of lineplots and barbplots inside a series of subplots using a For Loop in Python.
Here is reproductible version of the code I tried so far:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df_q = pd.DataFrame.from_dict ({'Year': [2020,2021,2022,2020,2021,2022,2020,2021,2022],
'Category': ['A','A','A','B','B','B','C','C','C'],
'Requested': [5,8,3,15,6,9,11,14,9],
'Allowed': [5,6,2,10,6,7,5,11,9]})
df_a = pd.DataFrame.from_dict ({'Year': [2020,2021,2022,2020,2021,2022,2020,2021,2022],
'Category': ['A','A','A','B','B','B','C','C','C'],
'Amount': [4,10,3,12,6,11,7,5,12]})
df_qt = df_q.melt(id_vars=['Year', 'Category'],
value_vars=['Requested', 'Allowed' ], ignore_index=True)
catgs = df_a['Category'].unique()
fig, ax = plt.subplots(nrows=len(catgs), figsize=(20,len(catgs)*6))
for i, cat in enumerate(catgs):
filt_q = df_qt.loc [df_qt['Category'] == cat]
filt_a= df_a.loc [df_a['Category'] == cat]
sns.barplot(data = filt_q, x='Year', y='value', hue= 'variable', alpha=0.7, ax=ax[i])
sns.lineplot(data =filt_a['Amount'], linewidth = 1.5, markers=True,
marker="o",markersize=10, color='darkred', label='Amount', ax=ax[i])
ax[i].set_title('Category: {}'.format(cat), size=25)
However, there seems to be a lag between the lineplot and barplot starting from the second subplot. See screenshot below.
Any idea what I'm doing wrong ?
Try this:
filt_a= df_a.loc[df_a['Category'] == cat].reset_index()
The reason you see the drift in Category B and C were due to the x-coordinate of the bar chart:
# The x-coordinates are not 2020, 2021, 2022.
# They are 0, 1, 2. The *x-labels* are 2020, 2021, 2022
sns.barplot(data=filt_q, x='Year', y='value', ...)
# The x-coordinates are the index of the filt_a dataframe
# Calling reset_index resets it to 0, 1, 2, ...
sns.lineplot(data=filt_a['Amount'], ...)
I was trying to use matplotlib and pandas to create a bar chart that shows the daily change in COVID-19 cases for the USA.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv')
display(data.head(10))
df = data.groupby('date').sum()
df['index'] = range(len(df))
df['IsChanged'] = df['cases'].diff()
df.at['2020-01-21', 'IsChanged'] = 0.0
x = df['index']
z = df['IsChanged']
plt.figure(figsize=(20,10))
plt.grid(linestyle='--')
plt.bar(x,z)
plt.show()
The graph that I get though, looks like this:
.
The width of the bars of the chart are not even. I tried setting a specific width, but that didn't work. Is there a way to fix this?
This can be resolved by specifying the resolution. For example, try setting dpi=300. The graph in the answer is an image of the output with the DPI specified in your code.
plt.figure(figsize=(20,10),dpi=300)
While this does not directly answer your question, one might want to consider plt.fill_between() when the barplot has lots of bars. (since if you can not distinguish the bars from each other the barplot kind of loses its purpose)
For example
plt.fill_between(x, 0, z, step='mid', facecolor=(0.3, 0.3, 0.45 ,.4), edgecolor=(0, 0, 0, 1))
plt.grid(ls= ':', color='#6e6e6e', lw=0.5);
or even:
plt.fill_between(x, 0, z, facecolor=(0.3, 0.3, 0.45 ,.4), edgecolor=(0, 0, 0, 1))
plt.grid(ls= ':', color='#6e6e6e', lw=0.5);
This question is related to a previous question I posted here. My code for my seaborn scatterplot looks as follows:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.DataFrame()
df['First PCA dimension'] = [1,2,3,4]
df['Second PCA dimension'] = [0,5,5,7]
df['Third PCA dimension'] = [1,2,6,4]
df['Data points'] = [1,2,3,4]
plt.figure(figsize=(42,30))
plt.title('2-D PCA of my data points',fontsize=32)
colors = ["#FF9926", "#2ACD37","#FF9926", "#FF0800"]
b = sns.scatterplot(x="First PCA dimension", y="Second PCA dimension", hue="Data points", palette=sns.color_palette(colors), data=df, legend="full", alpha=0.3)
sns.set_context("paper", rc={"font.size":48,"axes.titlesize":48,"axes.labelsize":48})
b.set_ylabel('mylabely', size=54)
b.set_xlabel('mylabelx', size=54)
b.set_xticklabels([1,2,3,4,5,6,7,8], fontsize = 36)
lgnd = plt.legend(fontsize='22')
for handle in lgnd.legendHandles:
handle.set_sizes([26.0])
plt.show()
The alpha value of 0.3 sets a transparency value for each point in my scatterplot. However, I would like to have a different transparency value for each data point (based on the category it belongs to) instead. Is this possible by providing a list of alpha values, similar to the way I provide a list of colours in the example above?
As noted in comments, this is something you can't currently do with seaborn.
However, you can hack it by using key colours for the markers, and find-replacing those colours using PathCollection.get_facecolor() and PathCollection.set_facecolor() with RGBA colours.
So for example, I needed a swarmplot on top of a violinplot, with certain classes of points at different opacities. To change greys into transparent blacks (what I needed to do), we can do:
seaborn.violinplot(...)
points = seaborn.swarmplot(...)
for c in points.collections:
if not isinstance(c, PathCollection):
continue
fc = c.get_facecolor()
if fc.shape[1] == 4:
for i, r in enumerate(fc):
# change mid-grey to 50% black
if numpy.array_equiv(r, array([0.5, 0.5, 0.5, 1])):
fc[i] = array([0, 0, 0, 0.5])
# change white to transparent
elif numpy.array_equiv(r, array([1, 1, 1, 1])):
fc[i] = array([0, 0, 0, 0])
c.set_facecolor(fc)
Very awful, but it got me what I needed for a one-shot graphic.