Labels in table of visualization of Pandas - python

​
​Hi, I am plotting a Pandas dataframe. The Pandas Dataframe look like this:
;Cosine;Neutralized
author;0.842075;0.641600
genre;0.839696;0.903227
author+genre;0.833966;0.681121
And the code for plotting that I am using is:
fig = ari_total.plot(kind="bar", legend = False, colormap= "summer",
figsize= ([7,6]), title = "Homogeinity "+corpora+" (texts: "+str(amount_texts)+")", table=True,
use_index=False, ylim =[0,1]).get_figure()
The result is nice, but it has a problem:
As you can see, the labs from the index of the table "author", "genre" and "author+gender" are render over 0, 1 and 2.
My question: how can I delete this numbers and still using the same function? I am using the argument use_index=False, which I thought they would delete the labels from the bars, but it actually only replace them with this numbers...
I would be very thankfull if you could help. Regards!

Use fig.axes[0].get_xaxis().set_visible(False).
code:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame()
df['Cosine'] = [0.842075,0.839696,0.833966]
df['Neutralized'] = [0.641600,0.903227,0.681121]
df.index = ['author', 'genre', 'author+genre']
fig = df.plot(kind="bar", legend = False, colormap= "summer",
figsize= ([7,6]), title = "whatever", table=True,
use_index=False, ylim =[0,1]).get_figure()
fig.axes[0].get_xaxis().set_visible(False)
plt.show()
result:

Related

How would I add a legend entry for each column of pandas dataframe in graph generated by df.boxplot()

trying to create some boxplots of pandas dataframes.
I have dataframes that typically look like this (not sure if there was a good way to show it so just took a screenshot).
I am creating a boxplot for each dataframe (after transposing) using the df.boxplot() method, it comes out almost exactly how I want it using the code below:
ax = crit_storm_df[tp_cols].T.boxplot()
ax.set_xlabel("Duration (m)")
ax.set_ylabel("Max Flow (cu.m/sec")
ax.set_xlim(0, None)
ax.set_ylim(0, None)
ax.set_title(crit_storm_df.name)
plt.show()
Example pic of output graph. What's lacking though is I want to add a legend with one entry for each box that represents a column in my dataframe in the pic above. Since I transposed the df before plotting, I would like to have a legend entry for each row, i.e. "tp01", "tp02" etc.
Anyone know what I should be doing instead? Is there a way to do this through the df.boxplot() method or do I need to do something in matplotlib?
I tried ax.legend() but it doesn't do anything except give me a warning:
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
Any help would be appreciated, thanks!
If you simply want your boxes to have different colors, you can use seaborn. It's default behavior there:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.random.randn(10, 4),
columns=['Col1', 'Col2', 'Col3', 'Col4'])
ax = sns.boxplot(data=df)
plt.legend(ax.patches, df.columns)
plt.show()
Edit: adding legend
Output:
To get similar type of required graph this code is help you to do that :
import matplotlib.pyplot as plt
import pandas as pd
data = {
'Duration': [10, 15, 20, 25, 30, 45, 60, 90, 120, 180, 270, 360],
'tp01': [13.1738, 13.1662, 14.3903, 14.2772, 14.3223, 12.5686, 14.8710, 8.9785, 9.2224, 7.4957, 3.6493, 5.7982],
'tp02': [13.1029, 14.2570, 16.5373, 12.6589, 11.0455, 12.6777, 8.1715, 9.3830, 8.3498, 6.0930, 6.4310, 7.4538],
'tp03': [14.5263, 13.6724, 11.4800, 13.4982, 12.3987, 11.6688, 10.4089, 7.0736, 5.8004, 10.1354, 5.5874, 5.6749],
'tp04': [14.7589, 11.6993, 12.5825, 13.5627, 11.9481, 10.7803, 8.9388, 5.7076, 12.7690, 9.7546, 9.5004, 5.9912],
'tp05': [15.5543, 14.1007, 11.7304, 13.3218, 12.4318, 9.5237, 11.9014, 5.6778, 14.2627, 3.7422, 6.4555, 3.3458],
'tp06': [13.5196, 12.5939, 12.5679, 11.4414, 9.3590, 9.6083, 9.6704, 10.5239, 9.1028, 6.0336, 7.0258, 5.9800],
'tp07': [14.7476, 13.3925, 13.0324, 13.3649, 14.7832, 8.1078, 7.1307, 15.4406, 5.0187, 6.9497, 3.6492, 4.8642],
'tp08': [13.3995, 14.3639, 12.7579, 10.6844, 10.3281, 10.2541, 8.8257, 8.8773, 8.3498, 5.7315, 7.8469, 6.7316],
'tp09': [16.7954, 17.1788, 15.9850, 10.8780, 12.5249, 10.2174, 7.5735, 7.3753, 7.1157, 4.8536, 9.1581, 5.6369],
'tp10': [15.7671, 16.1570, 11.6122, 15.2340, 13.2356, 13.2270, 11.6810, 7.1157, 8.0048, 5.5782, 6.0876, 5.7982],
}
df = pd.DataFrame(data).set_index("Duration")
fig, ax = plt.subplots()
df.T.plot(kind='box', ax=ax)
labels = df.columns
lines = [plt.Line2D([0, 1], [0, 1], color=c, marker='o', markersize=10) for c in plt.rcParams['axes.prop_cycle'].by_key()['color'][:len(labels)]]
ax.legend(lines, labels, loc='best')
ax.set_xlabel("Duration (m)")
ax.set_ylabel("Max Flow (cu.m/sec")
ax.set_xlim(0, None)
ax.set_ylim(0, None)
ax.set_xticklabels(df.index)
plt.show()
Result:

Replacing data position in table - Matplotlib

i'm trying to create a table with below result in matplotlib , data extracted from MysQl.
i used theses charts inside An application built by PYQT5
but unfortunately this is what i get :
code used :
def dashboard_fleet_statistics_fleets_visisted_current_year(self):
try:
mydb = con.connect(host= "localhost", user ="root", password='''''', db="fleet")
cursor = mydb.cursor()
cursor.execute('''SELECT (fleet_code) ,fleet_name,COUNT(fleet_code) AS "No Of Visits",(Year(date_of_visit)) AS "Year Of Visit"
FROM vehicle_tyre_parameters
WHERE fleet_code != "" AND Year(date_of_visit)= Year(curdate())
GROUP BY fleet_code''')
result4 = cursor.fetchall()
print(list(result4))
fleet_code=[]
fleet_name=[]
no_of_visits =[]
fleet_year=[]
for row in result4 :
fleet_code.append(row[0])
fleet_name.append(row[1])
no_of_visits.append(row[2])
fleet_year.append(row[3])
print(list(fleet_code))
print(list(fleet_name))
print(list(no_of_visits))
print(list(fleet_year))
fig, ax = plt.subplots()
values=[fleet_code,fleet_name,no_of_visits,fleet_year]
table = ax.table(cellText=values,rowLabels=['Fleet Code','Fleet Name','No of Visits','Year of Visit'] ,colWidths=[.5,.5],colLoc='center',loc='center',bbox=[-0.3, 1, 1, 0.275])
#modify table
table.set_fontsize(14)
table.scale(1,4)
ax.axis('off')
table[(1, 0)].set_facecolor("white")
table[(2, 0)].set_facecolor("gray")
plt.tight_layout()
#display table
plt.show()
Appreciate your help to get a table with 1st picture ! Thanks
I cannot really access your data, but implementing the example you attached is pretty straightforward. I first put all the information in a pandas dataframe, and then used this neat trick that allows you to convert a dataframe to a matplotlib table. The rest is really playing with the design.
Here is a code snippet:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Create figure
fig, ax = plt.subplots(figsize = (25,2))
fig.patch.set_visible(False)
ax.axis('off')
ax.axis('tight')
# Make some data and put in dataframe
data = [['FC-1', 'NAME', 5, 2021], ['FC-6', 'NAME', 5, 2021]]
df = pd.DataFrame(data, columns=['Fleet code', 'Fleet name', 'Number of visits', 'Year of visit'])
# Create the table from the dataframe:
tab = ax.table(cellText=df.values, colLabels=df.columns, loc='center', cellLoc='center',
colColours = ['lightsteelblue', 'lightsteelblue', 'lightsteelblue', 'lightsteelblue'],
cellColours = [['w','w','w','w'], ['lightgray','lightgray','lightgray','lightgray']])
# Design the table:
# Fonts:
tab.auto_set_font_size(False)
tab.set_fontsize(20)
tab.scale(2, 2)
fig.tight_layout()
plt.show()
Here is the end result:

Why do the bar chart ticks merge into one when plotting dataframe but work when plotting row?

I need to make a graph that would look like this:
Here's some sample data:
data = {"Small-Mid":367, "Large":0, "XXL":0, "FF":328, "AA":0, "Total":695}
df = pd.DataFrame([data], columns=data.keys())
It's a dataframe that has only one row, if I try to plot the whole dataframe I get this ugly thing:
fig, ax = plt.subplots(figsize=(11.96, 4.42))
df.plot(kind="bar")
plt.show()
The ugly thing, two graphs, one empty the other one just wrong:
If I plot by selecting the row then it looks fine:
fig, ax = plt.subplots(figsize=(11.96, 4.42))
row = df.iloc[0]
row.plot(kind='bar')
plt.show()
A much nicer graph:
The issue is that I need the Total bar to be a different colour than the other bars and I can't do that when plotting the row, because it only accepts a single value rather than a dictionary for colours.
What I don't understand is why does it return two plots when plotting the whole dataframe and why are all the bars put as one tick mark, as well as how do I make it work?
You should re-shape your dataframe with pandas.melt:
df = pd.melt(frame = df,
var_name = 'variable',
value_name = 'value')
Then you can plot your bar chart with seaborn.barplot:
sns.barplot(ax = ax, data = df, x = 'variable', y = 'value')
Complete Code
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = {"Small-Mid":367, "Large":0, "XXL":0, "FF":328, "AA":0, "Total":695}
df = pd.DataFrame([data], columns=data.keys())
df = pd.melt(frame = df,
var_name = 'variable',
value_name = 'value')
fig, ax = plt.subplots(figsize=(11.96, 4.42))
sns.barplot(ax = ax, data = df, x = 'variable', y = 'value')
plt.show()
If you want only 'Total' column to be a different color from others, you can define a color-correspondence dictionary:
colors = {"Small-Mid":'blue', "Large":'blue', "XXL":'blue', "FF":'blue', "AA":'blue', "Total":'red'}
and pass it to seaborn as palette parameter:
sns.barplot(ax = ax, data = df, x = 'variable', y = 'value', palette = colors.values())

Facetgrid to plot stacked normalised counts - Seaborn

I'm aiming to use Seaborn facet grid to plot counts of values but normalised, rather than pure counts. Using below, each row should display each unique value in Item. The x-axis should display Num and the values come from Label.
However, each row isn't being partitioned. The same data is displayed for each Item.
import pandas as pd
import Seaborn as sns
df = pd.DataFrame({
'Num' : [1,2,1,2,3,2,1,3,2],
'Label' : ['A','B','C','B','A','C','C','A','B'],
'Item' : ['Up','Left','Up','Left','Down','Right','Up','Down','Right'],
})
g = sns.FacetGrid(df,
row = 'Item',
row_order = ['Up','Right','Down','Left'],
aspect = 2,
height = 4,
sharex = True,
legend_out = True
)
g.map(sns.histplot, x = 'Num', hue = 'Label', data = df, multiple = 'fill', shrink=.8)
g.add_legend()
Maybe you can try g.map_dataframe(sns.histplot, x='Num', hue = 'Label', multiple = 'fill', shrink=.8). I'm not good at seaborn, I just look it up at https://seaborn.pydata.org/generated/seaborn.FacetGrid.html and map_dataframe seems work better than map.

How to combine two heatmaps in Seaborn in Python so both are shown in the same heatmap?

This is link to the data I'm using:
https://github.com/fivethirtyeight/data/tree/master/drug-use-by-age
I'm using Jupyter Lab, and here's the code:
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sb
url = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/drug-use-by-age/drug-use-by-age.csv'
df = pd.read_csv(url, index_col = 0)
df.dtypes
df.replace('-', np.nan, inplace=True)
df = df.iloc[:,:].astype(float)
df = df.loc[:, df.columns != 'n']
#df.columns = df.columns.str.rstrip('-use')
df
fig, axes = plt.subplots(1,2, figsize=(20, 8))
fig.subplots_adjust(wspace=0.1)
fig.colorbar(ax.collections[0], ax=ax,location="right", use_gridspec=False, pad=0.2)
#plt.figure(figsize=(16, 16))
df_percentage = df.iloc[:,range(0,26,2)]
plot_precentage = sb.heatmap(df_percentage, cmap='Reds', ax=axes[0], cbar_kws={'format': '%.0f%%', 'label': '% used in past 12 months'})
df_frequency = df.iloc[:,range(1,27,2)]
plot_frequency = sb.heatmap(df_frequency, cmap='Blues', ax=axes[1], cbar_kws= dict(label = 'median frequency a user used'))
I can just show two of them in a subplot in separate diagrams.
I want to make it look like this (this is made in paint):
Also show the data side by side. Is there a simple way to achieve that?
A pretty simple solution with mask option:
mask = np.vstack([np.arange(df.shape[1])]* df.shape[0]) % 2
fig, axes = plt.subplots()
plot_precentage = sns.heatmap(df,mask=mask, cmap='Reds', ax=axes,
cbar_kws={'format': '%.0f%%',
'label': '% used in past 12 months'}
)
plot_frequency = sns.heatmap(df, mask=1-mask, cmap='Blues', ax=axes,
cbar_kws= dict(label = 'median frequency a user used')
)
Output:

Categories