How to change python pyplot legend with 4 legend instead of 2 - python

Thanks for taking time on my question.
I have 2 DataFrame composed of several columns:
df=pd.DataFrame([['A',10, 22], ['A',12, 15], ['A',0, 2], ['A', 20, 25], ['A', 5, 5], ['A',12, 11], ['B', 0 ,0], ['B', 9 ,0], ['B', 8 ,50], ['B', 0 ,0], ['B', 18 ,5], ['B', 7 ,6],['C', 10 ,11], ['C', 9 ,10], ['C', 8 ,2], ['C', 6 ,2], ['C', 8 ,5], ['C', 6 ,8]],
columns=['Name', 'Value_01','Value_02'])
df_agreement=pd.DataFrame([['A', '<66%', '>80'],['B', '>80%', '>66% & <80%'], ['C', '<66%', '<66%']], columns=['Name', 'Agreement_01', 'Agreement_02'])
my goal is to create boxplot for this DataFrame, with ['Value_01', 'Value_02'] as values and 'Name' as x-values. To do so, I perform a sns boxplot with the following code:
fig = plt.figure()
# Change seaborn plot size
fig.set_size_inches(60, 40)
plt.xticks(rotation=70)
plt.yticks(fontsize=40)
df_02=pd.melt(df, id_vars=['Name'],value_vars=['Value_01', 'Value_02'])
bp=sns.boxplot(x='Name',y='value',hue="variable",showfliers=True, data=df_02,showmeans=True,meanprops={"marker": "+",
"markeredgecolor": "black",
"markersize": "20"})
bp.set_xlabel("Name", fontsize=45)
bp.set_ylabel('Value', fontsize=45)
bp.legend(handles=bp.legend_.legendHandles, labels=['V_01', 'V_02'])
Okay this part works, I do have 6 boxplots, two for each name.
What is becoming tricky is that I want to use the df_agreement to change the color of my boxplots, regarding it is <66% or not. So I added this in my code:
list_color_1=[]
list_color_2=[]
for i in range(0, len(df_agreement)):
name=df_agreement.loc[i,'Name']
if df_agreement.loc[i,'Agreement_01']=="<66%":
list_color_1.append(i*2)
if df_agreement.loc[i,'Agreement_02']=="<66%":
list_color_2.append(i*2+1)
for k in list_color_1:
mybox = bp.artists[k]
# Change the appearance of that box
mybox.set_facecolor("#D1DBE6") #facecolor is the inside color of the boxplot
mybox.set_edgecolor('black') #edgecolor is the line color of the box
mybox.set_linewidth(2)
for k in list_color_2:
mybox = bp.artists[k]
# Change the appearance of that box
mybox.set_facecolor("#EFDBD1") #facecolor is the inside color of the boxplot
mybox.set_edgecolor('black') #edgecolor is the line color of the box
mybox.set_linewidth(2)
it works well, I have my boxplots that have changed regarding the value on df_agreement.
But, unfortunatelly, I would like also to change the legend with ["V_01", "V_02", "V_01 with less 66% agreement", "V_02 with less 66% agreement"], and obviously with the corresponding color in the legend.
Would you have an idea to perform that ?
Thank you very much ! :)

You could add custom legend elements, extending the list of handles. Here is an example.
handles, labels = bp.get_legend_handles_labels()
new_handles = handles + [plt.Rectangle((0, 0), 0, 0, facecolor="#D1DBE6", edgecolor='black', linewidth=2),
plt.Rectangle((0, 0), 0, 0, facecolor="#EFDBD1", edgecolor='black', linewidth=2)]
bp.legend(handles=new_handles,
labels=['V_01', 'V_02', "V_01 with less\n than 66% agreement", "V_02 with less\n than 66% agreement"])

Related

How to create a Crosstab Plot?

I would like to create a 'Crosstab' plot like the below using matplotlib or seaborn:
Using the following dataframe:
import pandas as pd
data = [['A', 'C', 2], ['A', 'D', 8], ['B', 'C', 25], ['B', 'D', 30]]
df = pd.DataFrame(data = data, columns = ['col', 'row', 'val'])
col row val
0 A C 2
1 A D 8
2 B C 25
3 B D 30
An option in matplotlib could be by adding Rectangles to the origin via plt.gca and add_patch. The problem is that I did here all manually like this:
from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_axes([0, 0, 1, 1])
plt.xlim(-10, 40)
plt.ylim(-40, 40)
plt.rcParams['figure.figsize'] = (10,16)
someX, someY = 0, 0
currentAxis = plt.gca()
currentAxis.add_patch(Rectangle((someX, someY), 30, 30, facecolor="purple"))
ax.text(15, 15, '30')
currentAxis.add_patch(Rectangle((someX, someY), 25, -25, facecolor="blue"))
ax.text(12.5, -12.5, '25')
currentAxis.add_patch(Rectangle((someX, someY), -2, -2, facecolor="red"))
ax.text(-1, -1, '2')
currentAxis.add_patch(Rectangle((someX, someY), -8, 8, facecolor="green"))
ax.text(-4, 4, '8')
Output:
As you can see, the plot doesn't look that nice. So I was wondering if it is possible to somehow automatically create 'Crosstab' plots using matplotlib or seaborn?
I am not sure whether matplotlib or seaborn have dedicated functions for this type of plot or not, but using plt.bar and plt.bar_label instead of Rectangle and plt.Text might help automatize things a little (label placement etc.).
See code below:
import matplotlib.pyplot as plt
data = [['A', 'C', 2], ['A', 'D', 8], ['B', 'C', 25], ['B', 'D', 30]]
pos={'A':-1,'B':0,'C':-1,'D':1}
fig,ax=plt.subplots(figsize=(10,10))
p=[ax.bar(pos[d[0]]*d[2],pos[d[1]]*d[2],width=d[2],align='edge') for d in data]
[ax.bar_label(p[i],labels=[data[i][2]], label_type='center',fontsize=18) for i in range(len(data))]
ax.set_aspect('equal')

How to split a grouped plot in Seaborn Python?

I have a data frame like this:
df:
Type Col-1 Col-2
A 3 8
A 4 7
A 5 9
A 6 6
A 7 7
B 4 8
B 2 7
B 6 6
B 4 9
B 5 7
I have 2 violin plots for Col-1 & Col-2. Now, I want to create a single violin plot with 2 violin images for Type A & B. In the violin plot, I want to split every violin such that the left half of the violin denotes Col-1 & right half of the violin denotes Col-2. I created two separate violin plots for col-1 and col-2 but now I want to make it a single plot and represent 2 columns at a time by splitting. How can I do it?
This is my code for separate plots:
def violin(data):
for col in data.columns:
x = data[col].to_frame().reset_index()
ax = sns.violinplot(data=x, x='type',y=col,inner='quart',split=True)
plt.show()
violin(df)
This is what my current violin plots look like. I want to make them in single plot:
Can anyone help me with this?
Seaborn works easiest with data in "long form", combining the value columns.
Here is how the code could look like:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'Type': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
'Col-1': [4, 3, 5, 6, 7, 4, 2, 6, 4, 5],
'Col-2': [7, 8, 9, 6, 7, 8, 7, 6, 9, 7]})
df_long = df.melt(id_vars=['Type'], value_vars=['Col-1', 'Col-2'], var_name='Col', value_name='Value')
plt.figure(figsize=(12, 5))
sns.set()
sns.violinplot(data=df_long, x='Type', y='Value', hue='Col', split=True, palette='spring')
plt.tight_layout()
plt.show()

Add a custom value on seaborn boxplots graphs

I am having trouble with a specific demand on my graphs.
For now, I had to do the following instructions:
Read two dataframes
Create boxplots for the first dataframe and color the boxplots depending on the values of the second dataframe (the code is below, and more information are in my previous StackQuestion)
The code below works and my problems come after:
df=pd.DataFrame([['A',10, 22], ['A',12, 15], ['A',0, 2], ['A', 20, 25], ['A', 5, 5], ['A',12, 11], ['B', 0 ,0], ['B', 9 ,0], ['B', 8 ,50], ['B', 0 ,0], ['B', 18 ,5], ['B', 7 ,6],['C', 10 ,11], ['C', 9 ,10], ['C', 8 ,2], ['C', 6 ,2], ['C', 8 ,5], ['C', 6 ,8]],
columns=['Name', 'Value_01','Value_02'])
df_agreement=pd.DataFrame([['A', '<66%', '>80'],['B', '>80%', '>66% & <80%'], ['C', '<66%', '<66%']], columns=['Name', 'Agreement_01', 'Agreement_02'])
fig = plt.figure()
# Change seaborn plot size
fig.set_size_inches(60, 40)
plt.xticks(rotation=70)
plt.yticks(fontsize=40)
df_02=pd.melt(df, id_vars=['Name'],value_vars=['Value_01', 'Value_02'])
bp=sns.boxplot(x='Name',y='value',hue="variable",showfliers=True, data=df_02,showmeans=True,meanprops={"marker": "+",
"markeredgecolor": "black",
"markersize": "20"})
bp.set_xlabel("Name", fontsize=45)
bp.set_ylabel('Value', fontsize=45)
handles, labels = bp.get_legend_handles_labels()
new_handles = handles + [plt.Rectangle((0, 0), 0, 0, facecolor="#D1DBE6", edgecolor='black', linewidth=2),
plt.Rectangle((0, 0), 0, 0, facecolor="#EFDBD1", edgecolor='black', linewidth=2)]
bp.legend(handles=new_handles,
labels=['V_01', 'V_02', "V_01 with less\n than 66% agreement", "V_02 with less\n than 66% agreement"])
list_color_1=[]
list_color_2=[]
for i in range(0, len(df_agreement)):
name=df_agreement.loc[i,'Name']
if df_agreement.loc[i,'Agreement_01']=="<66%":
list_color_1.append(i*2)
if df_agreement.loc[i,'Agreement_02']=="<66%":
list_color_2.append(i*2+1)
for k in list_color_1:
mybox = bp.artists[k]
# Change the appearance of that box
mybox.set_facecolor("#D1DBE6") #facecolor is the inside color of the boxplot
mybox.set_edgecolor('black') #edgecolor is the line color of the box
mybox.set_linewidth(2)
for k in list_color_2:
mybox = bp.artists[k]
# Change the appearance of that box
mybox.set_facecolor("#EFDBD1") #facecolor is the inside color of the boxplot
mybox.set_edgecolor('black') #edgecolor is the line color of the box
mybox.set_linewidth(2)
Now I have a new dataFrame, equivalent to the first one (df), but with different values:
df_02=pd.DataFrame([['A',5, 20], ['A',15, 2], ['A',3, 5], ['A', 21, 24], ['A', 6, 6], ['A',10, 10], ['B', 0 ,0], ['B', 9 ,0], ['B', 9 ,5], ['B', -4 ,-2], ['B', 8 ,7], ['B', 8 ,9],['C', 10 ,15], ['C', 9 ,10], ['C', 8 ,2], ['C', 6 ,2], ['C', 8 ,5], ['C', 6 ,8]],
columns=['Name', 'Value_01','Value_02'])
What I would like to do is that on the boxplots, I would add a bar (only on each boxplot) corresponding to the value of my second dataframe (df_02).
Is there anyone who would have a guess for that one ?

Create a weighted graph based on Dataframe

consider a data frame like this:
id
source
Target
Weight
1
A
B
1
2
A
C
2
3
A
D
3
4
A
E
4
I want to depict a graph with networkX which shows us two things:
1-Node with more connections has a larger size, respectively.
2-Edge with more weight has a thicker line in between.
We can set the edge_attr to Weight when we create the Graph from_pandas_edgelist then when we draw the graph we can get_edge_attributes and pass that as the width of whatever drawing operation.
For node_size we can use nx.degree to get the Degree from the Graph:
nx.degree(G)
[('A', 4), ('B', 1), ('C', 1), ('D', 1), ('E', 1)]
We can then scale up the degree by some factor since these values are going to be quite small. I've chosen a factor of 200 here, but this can be adjusted:
[d[1] * 200 for d in nx.degree(G)]
[800, 200, 200, 200, 200]
All together it can look like:
G = nx.from_pandas_edgelist(
df,
source='source',
target='Target',
edge_attr='Weight' # Set Edge Attribute to Weight Column
)
# Get Degree values and scale
scaled_degree = [d[1] * 200 for d in nx.degree(G)]
nx.draw(G,
# Weights Based on Column
width=list(nx.get_edge_attributes(G, 'Weight').values()),
# Node size based on degree
node_size=scaled_degree,
# Colour Based on Degree
node_color=scaled_degree,
# Set color map to determine colours
cmap='rainbow',
with_labels=True)
plt.show()
Setup Used:
import networkx as nx
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({
'id': [1, 2, 3, 4],
'source': ['A', 'A', 'A', 'A'],
'Target': ['B', 'C', 'D', 'E'],
'Weight': [1, 2, 3, 4]
})

Is there a way to create a stacked bar graph from pandas?

I have an sqlite database setup with some data. I have imported it through sql statements via pandas:
df1 = pd.read_sql_query("Select avg(Duration),keyword,filename from keywords group by keyword,filename order by filename", con)
The data looks as follows:
Based on this I want to construct a stacked bar graph that looks like this:
I've tried various different solutions including matplotlib, pandas.plot but im unable to successfully construct this graph.
Thanks in advance.
This snippet should work:
import pandas as pd
import matplotlib.pyplot as plt
data = [[2, 'A', 'output.xml'], [5, 'B', 'output.xml'],
[3, 'A', 'output.xml'], [2, 'B', 'output.xml'],
[5, 'C', 'output2.xml'], [1, 'B', 'output2.xml'],
[6, 'C', 'output.xml'], [3, 'C', 'output2.xml'],
[3, 'A', 'output2.xml'], [3, 'B', 'output.xml'],
[2, 'C', 'output.xml'], [1, 'C', 'output2.xml']
]
df = pd.DataFrame(data, columns = ['duration', 'Keyword', 'Filename'])
df2 = df.groupby(['Filename', 'Keyword'])['duration'].sum().unstack('Keyword').fillna(0)
df2[['A','B', 'C']].plot(kind='bar', stacked=True)
It is similar to this question with the difference that I sum the values of the the concerned field instead of counting.
1.You just have to use:
ax=df.pivot_table(index='fillname',columns='keyword',values='avg(duration)').plot(kind='bar',stacked=True,figsize=(15,15),fontsize=25)
ax.legend(fontsize=25)
2. Example
df=pd.DataFrame()
df['avg(duration)']=[7,4,5,9,3,2]
df['keywoard']=['a','b','c','a','b','c']
df['fillname']=['out1','out1','out1','out2','out2','out2']
df
2.1 Output df example:
avg(duration) keywoard fillname
0 7 a out1
1 4 b out1
2 5 c out1
3 9 a out2
4 3 b out2
5 2 c out2
2.2 Drawing
ax=df.pivot_table(index='fillname',columns='keywoard',values='avg(duration)').plot(kind='bar',stacked=True,figsize=(15,15),fontsize=25)
ax.legend(fontsize=25)
2.3 Output image example:
3. In addiccion using:
#set ylim
plt.ylim(-1, 20)
plt.xlim(-1,4)
#grid on
plt.grid()
# set y=0
ax.axhline(0, color='black', lw=1)
#change size of legend
ax.legend(fontsize=25,loc=(0.9,0.4))
#hiding upper and right axis layout
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
#changing the thickness
ax.spines['bottom'].set_linewidth(3)
ax.spines['left'].set_linewidth(3)
#setlabels
ax.set_xlabel('fillname',fontsize=20,color='r')
ax.set_ylabel('avg(duration)',fontsize=20,color='r')
#rotation
plt.xticks(rotation=0)

Categories