Change line shape to aggregated plot - python

I'm attempting to make a plot of net sentiment score by genre over time. I currently have a line plot that works, but I'd like to make each genre a different color/shape. However, I have about 12 different genres and I'd prefer to not manually assign a color/shape to each genre. Is there a way for matplotlib to dynamically assign a unique combination to each genre? Here's my current code.
# plot net sentiment per year by genre
fig, ax = plt.subplots(figsize=(15,7))
genre_known.groupby(['year', 'genre']).mean()['net_sentiment'].unstack().plot(ax=ax)
plt.xlabel('Year')
plt.ylabel('Sentiment')
plt.title('Sentiment by Year')

You can use a styling dictionary like this:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df_tips = sns.load_dataset('tips')
fig, ax = plt.subplots(figsize=(15,7))
style_dic = {k:v for k,v in zip(df_tips['day'].unique(),['ro-','b^-','gs-','m+-'])}
df_tips.groupby(['sex','day'])['total_bill'].mean().unstack().plot(ax=ax, style=style_dic);
Output:

Related

How to plot labels in alphabetical orders with coulour-gradient in python?

I want to plot labels "nutrition_grade_fr" in alphabetical orders with coulour-gradient in python.
Here is the code for my graph but it does not show the labels in alphabetical order nor with the corresponding hue:
import seaborn as sns
import matplotlib.pyplot as plt
df = nutri_app
for elem in ['energy_100g', 'sugars_100g','fat_100g','proteins_100g']:
#Create barplot
fig, ax = plt.subplots(figsize=(30,6))
sns.barplot(x="pnns_group1", y=elem, hue="nutrition_grade_fr", data=df)
#Add title
ax.set_title(f"nutriscore by categories related to {elem} quantity variable")
#Show plot
plt.show()
You need to pandas.DataFrame.sort_values and choose a color palette (e.g, Blues):
import seaborn as sns
import matplotlib.pyplot as plt
df = nutri_app.sort_values(by='nutrition_grade_fr') #ascending=True by default
for elem in ['energy_100g', 'sugars_100g','fat_100g','proteins_100g']:
#Create barplot
fig, ax = plt.subplots(figsize=(30,6))
sns.barplot(x="pnns_group1", y=elem, hue="nutrition_grade_fr", data=df, palette='Blues')
#Add title
ax.set_title(f"nutriscore by categories related to {elem} quantity variable")
#Show plot
plt.show();

Annotate Min/Max/Median in Matplotlib Violin Plot

Given this example code:
import pandas as pd
import matplotlib.pyplot as plt
data = 'https://raw.githubusercontent.com/marsja/jupyter/master/flanks.csv'
df = pd.read_csv(data, index_col=0)
# Subsetting using Pandas query():
congruent = df.query('TrialType == "congruent"')['RT']
incongruent = df.query('TrialType == "incongruent"')['RT']
# Combine data
plot_data = list([incongruent, congruent])
fig, ax = plt.subplots()
xticklabels = ['Incongruent', 'Congruent']
ax.set_xticks([1, 2])
ax.set_xticklabels(xticklabels)
ax.violinplot(plot_data, showmedians=True)
Which results in the following plot:
How can I annotate the min, max, and mean lines with their respective values?
I haven't been able to find examples online that allude to how to annotate violin plots in this way. If we set plot = ax.violinplot(plot_data, showmedians=True) then we can access attributes like plot['cmaxes'] but I cant quite figure out how to use that for annotations.
Here is an example of what I am trying to achieve:
So this was as easy as getting the medians/mins/maxes and then enumerating, adding the annotation with plt.text, and adding some small values for positioning:
medians = results_df.groupby(['model_cat'])['test_f1'].median()
for i, v in enumerate(medians):
plt.text((i+.85), (v+.001), str(round(v, 3)), fontsize = 12)

How can I put labels in two charts using matplotlib

I'm trying to plot two histogram using the result of a group by. But the labels just appear in one of the labels.
How can I put the label in both charts?
And how can I put different title for the charts (e.g. first as Men's grade and Second as Woman's grade)
import pandas as pd
import matplotlib.pyplot as plt
microdataEnem = pd.read_csv('C:\\Users\\Lucas\\AppData\\Local\\Programs\\Python\\Python39\\Scripts\\Data Science\\Data Analysis\\Projects\\ENEM\\DADOS\\MICRODADOS_ENEM_2019.csv', sep = ';', encoding = 'ISO-8859-1', nrows=10000)
sex_essaygrade = ['TP_SEXO', 'NU_NOTA_REDACAO']
filter_sex_essaygrade = microdataEnem.filter(items = sex_essaygrade)
filter_sex_essaygrade.dropna(subset = ['NU_NOTA_REDACAO'], inplace = True)
filter_sex_essaygrade.groupby('TP_SEXO').hist()
plt.xlabel('Grade')
plt.ylabel('Number of students')
plt.show()
Instead of using filter_sex_essaygrade.groupby('TP_SEXO').hist() you can try the following format: axs = filter_sex_essaygrade['NU_NOTA_REDACAO'].hist(by=filter_sex_essaygrade['TP_SEXO']). This will automatically title each histogram with the group name.
You'll want to set an the variable axs equal to this histogram object so that you can modify the x and y labels for both plots.
I created some data similar to yours, and I get the following result:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(42)
sex_essaygrade = ['TP_SEXO', 'NU_NOTA_REDACAO']
## create two distinct sets of grades
sample_grades = np.concatenate((np.random.randint(low=70,high=100,size=100), np.random.randint(low=80,high=100,size=100)))
filter_sex_essaygrade = pd.DataFrame({
'NU_NOTA_REDACAO': sample_grades,
'TP_SEXO': ['Men']*100 + ['Women']*100
})
axs = filter_sex_essaygrade['NU_NOTA_REDACAO'].hist(by=filter_sex_essaygrade['TP_SEXO'])
for ax in axs.flatten():
ax.set_xlabel("Grade")
ax.set_ylabel("Number of students")
plt.show()

Multiple boxplot in a single Graphic in Python

I'm a beginner in Python.
In my internship project I am trying to plot bloxplots from data contained in a csv
I need to plot bloxplots for each of the 4 (four) variables showed above (AAG, DENS, SRG e RCG). Since each variable presents values ​​in the range from [001] to [100], there will be 100 boxplots for each variable, which need to be plotted in a single graph as shown in the image.
This is the graph I need to plot, but for each variable there will be 100 bloxplots as each one has 100 columns of values:
The x-axis is the "Year", which ranges from 2025 to 2030, so I need a graph like the one shown in figure 2 for each year and the y-axis is the sets of values ​​for each variable.
Using Pandas-melt function and seaborn library I was able to plot only the boxplots of a column. But that's not what I need:
import pandas as pd
import seaborn as sns
df = pd.read_csv("2DBM_50x50_Central_Aug21_Sim.cliped.csv")
mdf= df.melt(id_vars=['Year'], value_vars='AAG[001]')
print(mdf)
ax=sns.boxplot(x='Year', y='value',width = 0.2, data=mdf)
Result of the code above:
What can I try to resolve this?
The following code gives you five subplots, where each subplot only contains the data of one variable. Then a boxplot is generated for each year. To change the range of columns used for each variable, change the upper limit in var_range = range(1, 101), and to see the outliers change showfliers to True.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv("2DBM_50x50_Central_Aug21_Sim.cliped.csv")
variables = ["AAG", "DENS", "SRG", "RCG", "Thick"]
period = range(2025, 2031)
var_range = range(1, 101)
fig, axes = plt.subplots(2, 3)
flattened_axes = fig.axes
flattened_axes[-1].set_visible(False)
for i, var in enumerate(variables):
var_columns = [f"TB_acc_{var}[{j:05}]" for j in var_range]
data = df.melt(id_vars=["Period"], value_vars=var_columns, value_name=var)
ax = flattened_axes[i]
sns.boxplot(x="Period", y=var, width=0.2, data=data, ax=ax, showfliers=False)
plt.tight_layout()
plt.show()
output:

Plot all categories in boxplot or violinplot

I have a matplotlib figure with a few violinplots on it (although this question would apply to any similar plot or other dataframe situation, not just violinplots). I currently run my code and it spits out the figure, with one violinplot per category. The code looks something like the following:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(data=np.random.random_integers(low=0,high=1000,size=(100,1)),
columns=['row0']
)
df['r0_range']='temp' #create a new column 'r0_range', give it a preliminary value
#make assignments depending on value of row0
df['r0_range'][df['row0']<=250]='[0,250]'
df['r0_range'][df['row0']>250]='(250,500]'
df['r0_range'][df['row0']>500]='(500,750]'
df['r0_range'][df['row0']>750]='(750,1000]'
fig1, ax1 = plt.subplots(1,1)
ax1 = sns.violinplot(data=df, x='r0_range', y='row0', inner=None, ax=ax1)
Which pops out the following:
I want to include on my figure a fifth violinplot that represents all of the data in all of the categories. Is there an elegant way to do that without having to copy the row0 data into new rows of the dataframe?
Perhaps something like this will do what you are looking for:
df = pd.DataFrame(data=np.random.randint(0, 1001, 100), columns=['row0'])
g = df.groupby(pd.cut(df['row0'], [0, 250, 500, 750, 1000]))
for name, data in g.groups.items():
df[name] = df.loc[data]['row0']
sns.violinplot(data=df, inner=None, ax=ax1)

Categories