How to create a Crosstab Plot? - python

I would like to create a 'Crosstab' plot like the below using matplotlib or seaborn:
Using the following dataframe:
import pandas as pd
data = [['A', 'C', 2], ['A', 'D', 8], ['B', 'C', 25], ['B', 'D', 30]]
df = pd.DataFrame(data = data, columns = ['col', 'row', 'val'])
col row val
0 A C 2
1 A D 8
2 B C 25
3 B D 30
An option in matplotlib could be by adding Rectangles to the origin via plt.gca and add_patch. The problem is that I did here all manually like this:
from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_axes([0, 0, 1, 1])
plt.xlim(-10, 40)
plt.ylim(-40, 40)
plt.rcParams['figure.figsize'] = (10,16)
someX, someY = 0, 0
currentAxis = plt.gca()
currentAxis.add_patch(Rectangle((someX, someY), 30, 30, facecolor="purple"))
ax.text(15, 15, '30')
currentAxis.add_patch(Rectangle((someX, someY), 25, -25, facecolor="blue"))
ax.text(12.5, -12.5, '25')
currentAxis.add_patch(Rectangle((someX, someY), -2, -2, facecolor="red"))
ax.text(-1, -1, '2')
currentAxis.add_patch(Rectangle((someX, someY), -8, 8, facecolor="green"))
ax.text(-4, 4, '8')
Output:
As you can see, the plot doesn't look that nice. So I was wondering if it is possible to somehow automatically create 'Crosstab' plots using matplotlib or seaborn?

I am not sure whether matplotlib or seaborn have dedicated functions for this type of plot or not, but using plt.bar and plt.bar_label instead of Rectangle and plt.Text might help automatize things a little (label placement etc.).
See code below:
import matplotlib.pyplot as plt
data = [['A', 'C', 2], ['A', 'D', 8], ['B', 'C', 25], ['B', 'D', 30]]
pos={'A':-1,'B':0,'C':-1,'D':1}
fig,ax=plt.subplots(figsize=(10,10))
p=[ax.bar(pos[d[0]]*d[2],pos[d[1]]*d[2],width=d[2],align='edge') for d in data]
[ax.bar_label(p[i],labels=[data[i][2]], label_type='center',fontsize=18) for i in range(len(data))]
ax.set_aspect('equal')

Related

How to change python pyplot legend with 4 legend instead of 2

Thanks for taking time on my question.
I have 2 DataFrame composed of several columns:
df=pd.DataFrame([['A',10, 22], ['A',12, 15], ['A',0, 2], ['A', 20, 25], ['A', 5, 5], ['A',12, 11], ['B', 0 ,0], ['B', 9 ,0], ['B', 8 ,50], ['B', 0 ,0], ['B', 18 ,5], ['B', 7 ,6],['C', 10 ,11], ['C', 9 ,10], ['C', 8 ,2], ['C', 6 ,2], ['C', 8 ,5], ['C', 6 ,8]],
columns=['Name', 'Value_01','Value_02'])
df_agreement=pd.DataFrame([['A', '<66%', '>80'],['B', '>80%', '>66% & <80%'], ['C', '<66%', '<66%']], columns=['Name', 'Agreement_01', 'Agreement_02'])
my goal is to create boxplot for this DataFrame, with ['Value_01', 'Value_02'] as values and 'Name' as x-values. To do so, I perform a sns boxplot with the following code:
fig = plt.figure()
# Change seaborn plot size
fig.set_size_inches(60, 40)
plt.xticks(rotation=70)
plt.yticks(fontsize=40)
df_02=pd.melt(df, id_vars=['Name'],value_vars=['Value_01', 'Value_02'])
bp=sns.boxplot(x='Name',y='value',hue="variable",showfliers=True, data=df_02,showmeans=True,meanprops={"marker": "+",
"markeredgecolor": "black",
"markersize": "20"})
bp.set_xlabel("Name", fontsize=45)
bp.set_ylabel('Value', fontsize=45)
bp.legend(handles=bp.legend_.legendHandles, labels=['V_01', 'V_02'])
Okay this part works, I do have 6 boxplots, two for each name.
What is becoming tricky is that I want to use the df_agreement to change the color of my boxplots, regarding it is <66% or not. So I added this in my code:
list_color_1=[]
list_color_2=[]
for i in range(0, len(df_agreement)):
name=df_agreement.loc[i,'Name']
if df_agreement.loc[i,'Agreement_01']=="<66%":
list_color_1.append(i*2)
if df_agreement.loc[i,'Agreement_02']=="<66%":
list_color_2.append(i*2+1)
for k in list_color_1:
mybox = bp.artists[k]
# Change the appearance of that box
mybox.set_facecolor("#D1DBE6") #facecolor is the inside color of the boxplot
mybox.set_edgecolor('black') #edgecolor is the line color of the box
mybox.set_linewidth(2)
for k in list_color_2:
mybox = bp.artists[k]
# Change the appearance of that box
mybox.set_facecolor("#EFDBD1") #facecolor is the inside color of the boxplot
mybox.set_edgecolor('black') #edgecolor is the line color of the box
mybox.set_linewidth(2)
it works well, I have my boxplots that have changed regarding the value on df_agreement.
But, unfortunatelly, I would like also to change the legend with ["V_01", "V_02", "V_01 with less 66% agreement", "V_02 with less 66% agreement"], and obviously with the corresponding color in the legend.
Would you have an idea to perform that ?
Thank you very much ! :)
You could add custom legend elements, extending the list of handles. Here is an example.
handles, labels = bp.get_legend_handles_labels()
new_handles = handles + [plt.Rectangle((0, 0), 0, 0, facecolor="#D1DBE6", edgecolor='black', linewidth=2),
plt.Rectangle((0, 0), 0, 0, facecolor="#EFDBD1", edgecolor='black', linewidth=2)]
bp.legend(handles=new_handles,
labels=['V_01', 'V_02', "V_01 with less\n than 66% agreement", "V_02 with less\n than 66% agreement"])

use a dataframe column for the color of a graph line with matplotlib

I draw a graph like this:
p1=ax1.plot(df['timestamp'], df['break_even'], color='blue', zorder = 0)
but I would like the line to change color based on another column:
p1=ax1.plot(df['timestamp'], df['break_even'], color=df['trade_color'], zorder = 0)
this will not work, I get:
ValueError: Invalid RGBA argument: 0 red
1 green
2 red
3 red
4 green
...
how can this be achieved?
this is an example to test:
data = [[1, 10, 'red'], [2, 15, 'green'], [3, 14, 'blue']]
df = pd.DataFrame(data, columns = ['x', 'y', 'color'])
fig, ax = plt.subplots()
ax.plot(df['x'], df['y'], color='darkorange', zorder = 0)
this will work, but:
ax.plot(df['x'], df['y'], color=df['color'], zorder = 0)
will not. How can I get each line segment to use the color I need? (I have just 2 colors if it makes a difference)
Just plot part of the dataframe each time with the color you want:
import pandas as pd
import matplotlib.pyplot as plt
data = [[1, 10, 'red'], [2, 15, 'green'], [3, 14, 'blue']]
df = pd.DataFrame(data, columns = ['x', 'y', 'color'])
fig, ax = plt.subplots()
for i in df.index:
'''
Get two rows each time, every row has a point (x, y)
Two points can draw a line, use the color defined by first row
'''
partial = df.iloc[i:i+2, :]
ax.plot(partial['x'], partial['y'], color=partial['color'].iloc[0], zorder = 0)
plt.show()

Matplotlib twin y axis

I want to plot my data with 0 at the middle of y axis. Just like this:
This is what I came up with:
Using this code:
import matplotlib.pyplot as plt
group_a_names = ['A', 'B', 'C', 'D', 'E']
group_a_values = [2, 4, 6, 8, 10]
group_b_names = ['F', 'G', 'H', 'I', 'J']
group_b_values = [1, 2, 3, 4, 5]
fig, ax1 = plt.subplots(figsize=(5, 4), dpi=100)
ax2 = ax1.twiny()
ax1.plot(group_a_names, group_a_values)
ax2.plot(group_b_names, group_b_values)
plt.show()
How can I visualize my data just like the first image? Also mirror the y tick labels/marks on the right side?
Try this:
import matplotlib.pyplot as plt
group_a_names = ['A', 'B', 'C', 'D', 'E']
group_a_values = [2, 4, 6, 8, 10]
group_b_names = ['F', 'G', 'H', 'I', 'J']
group_b_values = [-2, -4, -6, -8, -10]
fig, ax1 = plt.subplots(figsize=(5, 4), dpi=100)
ax1.plot(group_a_names, group_a_values)
# add second x axis
ax3 = ax1.twiny()
ax3.plot(group_b_names, group_b_values)
# add second y axis
ax2 = ax1.twinx()
# set y axis range
plt.ylim(-10, 10)
plt.show()
Result:
This worked for me:
ticks = np.arange(2, 11, 2)
plt.yticks(ticks, [10, 5, 0, 5, 10])
ax1.yaxis.set_ticks_position('both')
ax1.tick_params(axis="y", labelright=True)
I just flipped the other values and flip back the negative labels.
import matplotlib.pyplot as plt
group_a_names = ['A', 'B', 'C', 'D', 'E']
group_a_values = [2, 4, 6, 8, 10]
group_b_names = ['F', 'G', 'H', 'I', 'J']
group_b_values = [1, 2, 3, 4, 5]
group_b_values_neg = list(map(lambda n: n * -1, group_b_values))
max_value = max(group_a_values + group_b_values)
fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)
ax2 = ax1.twiny()
ax1.plot(group_a_names, group_a_values, c="blue")
ax2.plot(group_b_names, group_b_values_neg, c="red")
ax1.set_ylim(max_value * -1, max_value)
ax2.set_ylim(max_value * -1, max_value)
ax2.set_yticklabels([str(abs(x)) for x in ax2.get_yticks()])
ax1.yaxis.set_ticks_position('both')
ax1.tick_params(axis="y", labelright=True)
plt.show()

Is there a way to create a stacked bar graph from pandas?

I have an sqlite database setup with some data. I have imported it through sql statements via pandas:
df1 = pd.read_sql_query("Select avg(Duration),keyword,filename from keywords group by keyword,filename order by filename", con)
The data looks as follows:
Based on this I want to construct a stacked bar graph that looks like this:
I've tried various different solutions including matplotlib, pandas.plot but im unable to successfully construct this graph.
Thanks in advance.
This snippet should work:
import pandas as pd
import matplotlib.pyplot as plt
data = [[2, 'A', 'output.xml'], [5, 'B', 'output.xml'],
[3, 'A', 'output.xml'], [2, 'B', 'output.xml'],
[5, 'C', 'output2.xml'], [1, 'B', 'output2.xml'],
[6, 'C', 'output.xml'], [3, 'C', 'output2.xml'],
[3, 'A', 'output2.xml'], [3, 'B', 'output.xml'],
[2, 'C', 'output.xml'], [1, 'C', 'output2.xml']
]
df = pd.DataFrame(data, columns = ['duration', 'Keyword', 'Filename'])
df2 = df.groupby(['Filename', 'Keyword'])['duration'].sum().unstack('Keyword').fillna(0)
df2[['A','B', 'C']].plot(kind='bar', stacked=True)
It is similar to this question with the difference that I sum the values of the the concerned field instead of counting.
1.You just have to use:
ax=df.pivot_table(index='fillname',columns='keyword',values='avg(duration)').plot(kind='bar',stacked=True,figsize=(15,15),fontsize=25)
ax.legend(fontsize=25)
2. Example
df=pd.DataFrame()
df['avg(duration)']=[7,4,5,9,3,2]
df['keywoard']=['a','b','c','a','b','c']
df['fillname']=['out1','out1','out1','out2','out2','out2']
df
2.1 Output df example:
avg(duration) keywoard fillname
0 7 a out1
1 4 b out1
2 5 c out1
3 9 a out2
4 3 b out2
5 2 c out2
2.2 Drawing
ax=df.pivot_table(index='fillname',columns='keywoard',values='avg(duration)').plot(kind='bar',stacked=True,figsize=(15,15),fontsize=25)
ax.legend(fontsize=25)
2.3 Output image example:
3. In addiccion using:
#set ylim
plt.ylim(-1, 20)
plt.xlim(-1,4)
#grid on
plt.grid()
# set y=0
ax.axhline(0, color='black', lw=1)
#change size of legend
ax.legend(fontsize=25,loc=(0.9,0.4))
#hiding upper and right axis layout
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
#changing the thickness
ax.spines['bottom'].set_linewidth(3)
ax.spines['left'].set_linewidth(3)
#setlabels
ax.set_xlabel('fillname',fontsize=20,color='r')
ax.set_ylabel('avg(duration)',fontsize=20,color='r')
#rotation
plt.xticks(rotation=0)

Plotting multiple columns of different sizes with Pandas

I'm fairly new to Pandas, but typically what I do with data (when all columns are of equal sizes), I build np.zeros(count) matrices, then use a for loop to populate the data from a text file (np.genfromtxt()) to do my graphing and analysis in matplotlib.
However, I am now trying to implement similar analysis with columns of different sizes on the same plot from a CSV file.
For instance:
data.csv:
A B C D E F
1 2 3 4 5 6
2 3 4 5 6 7
3 4 5 6
4 5
df = pandas.read_csv('data.csv')
ax = df.plot(x = 'A', y = 'B')
df.plot(x = 'C', y = 'D', ax = ax)
df.plot(x = 'E', y = 'F', ax = ax)
This code plots the first two on the same graph, but the rest of the information is lost (and there are a lot more columns of mismatched sizes, but the x/y columns I am plotting are the all the same size).
Is there an easier way to do all of this? Thanks!
Here is how you could generalize your solution :
I edited my answer to add an error handling. If you have a lonely last column, it'll still work.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
data = {
'A' : [1, 2, 3, 4],
'B' : [2, 3, 4, 5],
'C' : [3, 4, 5, np.nan],
'D' : [4, 5, 6, np.nan],
'E' : [5, 6, np.nan, np.nan],
'F' : [6, 7, np.nan, np.nan]
}
df = pd.DataFrame(data)
def Chris(df):
ax = df.plot(x='A', y='B')
df.plot(x='C', y='D', ax=ax)
df.plot(x='E', y='F', ax=ax)
plt.show()
def IMCoins(df):
fig, ax = plt.subplots()
try:
for idx in range(0, df.shape[1], 2):
df.plot(x = df.columns[idx],
y = df.columns[idx + 1],
ax= ax)
except IndexError:
print('Index Error: Log the error.')
plt.show()
Chris(df)
IMCoins(df)

Categories