Multiple single plots in seaborn with pandas groupby data

Multiple single plots in seaborn with pandas groupby data - python

My issue is very specific, i guess, but i can't seem to find a proper solution, and im clueless with the error output that i get.
Anyway, i have a pandas dataframe loaded from an sqlite database.
data_frame = pd.read_sql_query(
"SELECT (total_comb + total_comb_rc) as total_comb, p_val, w_length from {tn}".format(
tn=table_name), conn)
With that loaded, i group the data by the 'w_length' value.
for i, group in data_frame.groupby('w_length'):
Now, i want to plot a scatter plot for each group created with seaborn lmplot.
for i, group in data_frame.groupby('w_length'):
sns.lmplot(x=group['total_comb'], y=group['p_val'],
data=group,
fit_reg=False)
sns.despine()
plt.savefig('test_scatter'+i+'.png', dpi=400)
But for some reason im getting, this output.
'[ 6.95485628e-02 3.53641178e-01 3.46862200e+06 4.11684800e+06] not in index'
and no plot file.
I know im doing something wrong, but i cant seem to figure it out.
pd: i know i can do something like this.
sns.lmplot(x='total_comb', y='p_val',
data=data_frame,
fit_reg=False,
hue="w_length", x_jitter=.1, col="w_length", col_wrap=3, size=4)
but i also need the separeted plots for each 'w_length'.
Thanks!!

Supposing the problem is not due to the data collection from the sql database, it's probably due to the fact that you call
sns.lmplot(x=group['total_comb'], y=group['p_val'], data=group)
instead of
sns.lmplot(x='total_comb', y='p_val', data=group)
Here is a working example, which produces two separate plots:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np; np.random.seed(42)
x = np.arange(24)
y = np.random.randint(1,10, len(x))
cat = np.random.choice(["A", "B"], size=len(x))
df = pd.DataFrame({"x": x, "y": y, "cat": cat})
for i, group in df.groupby('cat'):
sns.lmplot(x="x", y="y", data=group, fit_reg=False)
plt.savefig(__file__+str(i)+".png")
plt.show()

Related

Python vs matplotlib - Chart generation issue

I have the below python code. but as an output it gives a chart like in the attachment. And its really messy in python. Can anybody tell me hw to fix the issue and make the day in ascenting order in X axis?
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel("C/desktop/data.xlsx")
df = df.loc[df['month'] == 8]
df = df.astype({'day': str})
plt.plot( 'day', 'cases', data=df)
In the first instance, i didnt take the day as str. So it came like this.
Because it had decimal numbers, i have converted it to str. now this happens.

What you got is typical of an unsorted dataset with many points per group.
As you did not provide an example, here is one:
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'day': np.random.randint(1,21,size=100),
'cases': np.random.randint(0,50000,size=100),
})
plt.plot('day', 'cases', data=df)
There is no reason to plot a line in this case, you can use a scatter plot instead:
plt.scatter('day', 'cases', data=df)
To make more sense of your data, you can also compute an aggregated value (ex. mean):
plt.plot('day', 'cases', data=df.groupby('day', as_index=False)['cases'].mean())

How to make clustered heatmap of a large dataset look nicer?

I have a distance matrix which I normalized, trimmed the row and column headers with python regular expressions and tried to make a clustered heatmap from it with the following code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv('distance_matrix_Mult_Align(distance).csv', index_col=0)
row_sums = df.sum(axis=1)
new_matrix = df / row_sums[:, np.newaxis]
def acc_id(s):
import re
match = re.search('\|(.*)\|', s)
if match:
return match.group(1)
sns.clustermap(new_matrix.rename(columns=acc_id, index=acc_id),
row_cluster=False,
xticklabels=True,
yticklabels=True,
cmap='RdBu',
center=0,
vmin=0,
vmax=1)
plt.figure()
plt.show
My clustered map look like this:
I have tried to read the documentations of clustermap and pyplot: https://seaborn.pydata.org/generated/seaborn.clustermap.html
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.figure.html#matplotlib.pyplot.figure
But I can not seem to understand how to make the plot look something useful. I would really appreciate any help. Thanks!

The problem is in your vmax = 1 argument. If you look at the maximum value in the whole dataset using new_matrix.max().max() , it is about 0.17.
So, just removing vmax as: or just set a lower value for vmax

Changing the order of pandas/matplotlib line plotting without changing data order

Given the following example:
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
df.plot(linewidth=10)
The order of plotting puts the last column on top:
How can I make this keep the data & legend order but change the behaviour so that it plots X on top of Y on top of Z?
(I know I can change the data column order and edit the legend order but I am hoping for a simpler easier method leaving the data as is)
UPDATE: final solution used:
(Thanks to r-beginners) I used the get_lines to modify the z-order of each plot
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
fig = plt.figure()
ax = fig.add_subplot(111)
df.plot(ax=ax, linewidth=10)
lines = ax.get_lines()
for i, line in enumerate(lines, -len(lines)):
line.set_zorder(abs(i))
fig
In a notebook produces:

Get the default zorder and sort it in the desired order.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(2021)
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
ax = df.plot(linewidth=10)
l = ax.get_children()
print(l)
l[0].set_zorder(3)
l[1].set_zorder(1)
l[2].set_zorder(2)
Before definition
After defining zorder

I will just put this answer here because it is a solution to the problem, but probably not the one you are looking for.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# generate data
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
# read columns in reverse order and plot them
# so normally, the legend will be inverted as well, but if we invert it again, you should get what you want
df[df.columns[::-1]].plot(linewidth=10, legend="reverse")
Note that in this example, you don't change the order of your data, you just read it differently, so I don't really know if that's what you want.
You can also make it easier on the eyes by creating a corresponding method.
def plot_dataframe(df: pd.DataFrame) -> None:
df[df.columns[::-1]].plot(linewidth=10, legend="reverse")
# then you just have to call this
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
plot_dataframe(df)

Pandas: grouping and xticks in plots

I'm working with same date which has the following structure:
I want to group the data by the column B,and get the mean value for plot and compare.
sub_data = data_composite.groupby(['B']).aggregate(np.mean)
ax = sub_data.plot()
Obtaining:
However, I would like to get the correspondent xticks in the figure. Which it will be KP40, KP08, etc... Something like this:
Is there any way to do that?
Thank very much. Kind regards,

It should work for you
import numpy as np
import matplotlib.pyplot as plt
tmp_labels = data_composite.drop_duplicates(subset='B', keep = 'first')
xlabels = tmp_labels['B'].values
plt.xticks(np.arange(sub_data.shape[0]),list(xlabels), rotation=90)

I found out this can be done easier using the index column as input for plt.xticks.
sub_data = data_composite.groupby(['B']).aggregate(np.mean)
ax = sub_data.plot()
ax.set_xlabel(x)
ax.set_ylabel('Quantity of {}'.format(y))
plt.xticks(np.arange(groupped_data.shape[0]),list(groupped_data.index), rotation=90);

Plot stacked bar chart from pandas data frame

I have dataframe:
payout_df.head(10)
What would be the easiest, smartest and fastest way to replicate the following excel plot?
I've tried different approaches, but couldn't get everything into place.
Thanks

If you just want a stacked bar chart, then one way is to use a loop to plot each column in the dataframe and just keep track of the cumulative sum, which you then pass as the bottom argument of pyplot.bar
import pandas as pd
import matplotlib.pyplot as plt
# If it's not already a datetime
payout_df['payout'] = pd.to_datetime(payout_df.payout)
cumval=0
fig = plt.figure(figsize=(12,8))
for col in payout_df.columns[~payout_df.columns.isin(['payout'])]:
plt.bar(payout_df.payout, payout_df[col], bottom=cumval, label=col)
cumval = cumval+payout_df[col]
_ = plt.xticks(rotation=30)
_ = plt.legend(fontsize=18)

Besides the lack of data, I think the following code will produce the desired graph
import pandas as pd
import matplotlib.pyplot as plt
df.payout = pd.to_datetime(df.payout)
grouped = df.groupby(pd.Grouper(key='payout', freq='M')).sum()
grouped.plot(x=grouped.index.year, kind='bar', stacked=True)
plt.show()
I don't know how to reproduce this fancy x-axis style. Also, your payout column must be a datetime, otherwise pd.Grouper won't work (available frequencies).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiple single plots in seaborn with pandas groupby data - python

Related

Python vs matplotlib - Chart generation issue

How to make clustered heatmap of a large dataset look nicer?

Changing the order of pandas/matplotlib line plotting without changing data order

Pandas: grouping and xticks in plots

Plot stacked bar chart from pandas data frame

Categories

Resources