python reader graph gives a blank canvass [duplicate] - python

This question already has answers here:
How to rank plot in seaborn boxplot
(2 answers)
How can I sort a boxplot in pandas by the median values?
(4 answers)
Closed 10 months ago.
Mtcars is a public dataset in R. I'm not sure it's a public dataset in python.
mtcars <- mtcars
I created this boxplot in R and part of what I'm doing is reordering the y-axis with the reorder() function.
ggplot(mtcars, aes(x = mpg, y = reorder(origin, mpg), color = origin)) +
geom_boxplot() +
theme(legend.position = "none") +
labs(title = "Mtcars", subtitle = "Box Plot") +
theme(plot.title = element_text(face = "bold")) +
ylab("country")
Now in python I have this boxplot that I created with seaborn:
plt.close()
seaborn.boxplot(x="mpg", y="origin", data=mtcars)
plt.suptitle("Mtcars", x=0.125, y=0.97, ha='left', fontweight = 'bold')
plt.title("boxplot", loc = 'left')
plt.show()
I'm trying to render it now but the same kind of treatment for R doesn't work.
plt.close()
seaborn.boxplot(x="mpg", y=reorder("origin", 'mpg'), data=mtcars)
plt.suptitle("Mtcars", x=0.125, y=0.97, ha='left', fontweight = 'bold')
plt.title("boxplot", loc = 'left')
plt.show()
It's not surprising it doesn't work because it's a different language; I do know that! But how would I do this reordering in python using Seaborn? I'm having trouble understanding if this is even part of the plotting process.

You can compute a custom order and feed it to seaborn's boxplot order parameter:
import seaborn as sns
mtcars = sns.load_dataset('mpg')
order = mtcars.groupby('origin')['mpg'].median().sort_values(ascending=False)
sns.boxplot(x="mpg", y="origin", data=mtcars, order=order.index)
plt.suptitle("Mtcars", x=0.125, y=0.97, ha='left', fontweight = 'bold')
plt.title("boxplot", loc = 'left')
plt.show()
NB. order also acts like a filter, so if values are missing, of non-existent they will be omitted in the graph
output:

Related

Python translate matplotlib to a plotnine chart

I am currently working through the book Hands On Machine Learning and am trying to replicate a visualization where we plot the lat and lon co-ordinates on a scatter plot of San Diego. I have taken the plot code from the book which uses the code below (matplotlib method). I would like to replicate the same visualization using plotnine. Could someone help me with the translation.
matplotlib method
# DATA INGEST -------------------------------------------------------------
# Import the file from github
url = "https://raw.githubusercontent.com/ageron/handson-ml2/master/datasets/housing/housing.csv" # Make sure the url is the raw version of the file on GitHub
download = requests.get(url).content
# Reading the downloaded content and turning it into a pandas dataframe
housing = pd.read_csv(io.StringIO(download.decode('utf-8')))
# Then plot
import matplotlib.pyplot as plt
# The size is now related to population divided by 100
# the colour is related to the median house value
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True)
plt.legend()
plt.show()
plotnine method
from plotnine import ggplot, geom_point, aes, stat_smooth, scale_color_cmap
# Lets try the same thing in ggplot
(ggplot(housing, aes('longitude', 'latitude', size = "population", color = "median_house_value"))
+ geom_point(alpha = 0.1)
+ scale_color_cmap(name="jet"))
If your question was the colour mapping, then you were close: just needed cmap_name='jet' instead of name='jet'.
If it is a broader styling thing, below is close to what you had with matplotlib.
matplotlib method
plotline method
p = (ggplot(housing, aes(x='longitude', y='latitude', size='population', color='median_house_value'))
+ theme_matplotlib()
+ geom_point(alpha=0.4)
+ annotate('text', x=-114.6, y=42, label='population', size=8)
+ annotate('point', x=-115.65, y=42, size=5, color='#6495ED', fill='#6495ED', alpha=0.8)
+ labs(x=None, color='Median house value')
+ scale_y_continuous(breaks=np.arange(34,44,2))
+ scale_color_cmap(cmap_name='jet')
+ scale_size_continuous(range=(0.05, 6))
+ guides(size=False)
+ theme(
text = element_text(family='DejaVu Sans', size=8),
axis_text_x = element_blank(),
axis_ticks_minor=element_blank(),
legend_key_height = 34,
legend_key_width = 9,
)
)
p
I am not sure to what capacity it's possible to modify the formatting of colour bar in plotnine. If others have additional ideas, I would be most interested - I think the matplotlib colour bar looks nicer.

How to place values inside stacked horizontal bar chart using python matplotlib? [duplicate]

This question already has answers here:
Stacked bars are unexpectedly annotated with the sum of bar heights
(2 answers)
How to add value labels on a bar chart
(7 answers)
Closed 10 months ago.
I want to create a stacked horizontal bar plot with values of each stack displayed inside it and the total value of the stacks just after the bar. Using python matplotlib, I could create a simple barh. My dataframe looks like below:
import pandas as pd
df = pd.DataFrame({"single":[168,345,345,352],
"comp":[481,44,23,58],})
item = ["white_rice",
"pork_and_salted_vegetables",
"sausage_and_potato_in_tomato_sauce",
"curry_vegetable",]
df.index = item
Expect to get bar plot like below except that it is not horizontal:
The code I tried is here...and i get AttributeError: 'DataFrame' object has no attribute 'rows'. Please help me with horizontal bar plot. Thanks.
fig, ax = plt.subplots(figsize=(10,4))
colors = ['c', 'y']
ypos = np.zeros(len(df))
for i, row in enumerate(df.index):
ax.barh(df.index, df[row], x=ypos, label=row, color=colors[i])
bottom += np.array(df[row])
totals = df.sum(axis=0)
x_offset = 4
for i, total in enumerate(totals):
ax.text(totals.index[i], total + x_offset, round(total), ha='center',) # weight='bold')
x_offset = -15
for bar in ax.patches:
ax.text(
# Put the text in the middle of each bar. get_x returns the start so we add half the width to get to the middle.
bar.get_y() + bar.get_height() / 2,
bar.get_width() + bar.get_x() + x_offset,
# This is actual value we'll show.
round(bar.get_width()),
# Center the labels and style them a bit.
ha='center',
color='w',
weight='bold',
size=10)
labels = df.index
ax.set_title('Label Distribution Overview')
ax.set_yticklabels(labels, rotation=90)
ax.legend(fancybox=True)
Consider the following approach to get something similar with matplotlib only (I use matplotlib 3.5.0). Basically the job is done with bar/barh and bar_label combination. You may change label_type and add padding to tweak plot appearance. Also you may use fmt to format values. Edited code with total values added.
import matplotlib.pyplot as plt
import pandas as pd
import random
def main(data):
data['total'] = data['male'] + data['female']
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.suptitle('Plot title')
ax1.bar(x=data['year'].astype(str), height=data['female'], label='female')
ax1.bar_label(ax1.containers[0], label_type='center')
ax1.bar(x=data['year'].astype(str), height=data['male'], bottom=data['female'], label='male')
ax1.bar_label(ax1.containers[1], label_type='center')
ax1.bar_label(ax1.containers[1], labels=data['total'], label_type='edge')
ax1.legend()
ax2.barh(y=data['year'].astype(str), width=data['female'], label='female')
ax2.bar_label(ax2.containers[0], label_type='center')
ax2.barh(y=data['year'].astype(str), width=data['male'], left=data['female'], label='male')
ax2.bar_label(ax2.containers[1], label_type='center')
ax2.bar_label(ax2.containers[1], labels=data['total'], label_type='edge')
ax2.legend()
plt.show()
if __name__ == '__main__':
N = 4
main(pd.DataFrame({
'year': [2010 + val for val in range(N)],
'female': [int(10 + 100 * random.random()) for dummy in range(N)],
'male': [int(10 + 100 * random.random()) for dummy in range(N)]}))
Result (with total values added):

How to add percentages on countplot in seaborn [duplicate]

This question already has answers here:
How to add percentages on top of grouped bars
(6 answers)
How to annotate grouped bar plot with percent by hue/legend group
(1 answer)
Closed 1 year ago.
I have same issue with this post, and already try this solution (also the comment). But i got weird percentage result. Since I am not eligible yet to comment, I post this question.
As far as I tweak this, it's happen because of the weird order of this line but i can't find the solution.
a = [p.get_height() for p in plot.patches]
My expected output is the total percentage of each Class will be 100%
Here the first source code I use
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
df = sns.load_dataset("titanic")
def with_hue(plot, feature, Number_of_categories, hue_categories):
a = [p.get_height() for p in plot.patches]
patch = [p for p in plot.patches]
for i in range(Number_of_categories):
total = feature.value_counts().values[i]
# total = np.sum(a[::hue_categories])
for j in range(hue_categories):
percentage = '{:.1f}%'.format(100 * a[(j*Number_of_categories + i)]/total)
x = patch[(j*Number_of_categories + i)].get_x() + patch[(j*Number_of_categories + i)].get_width() / 2 - 0.15
y = patch[(j*Number_of_categories + i)].get_y() + patch[(j*Number_of_categories + i)].get_height()
p3.annotate(percentage, (x, y), size = 11)
plt.show()
plt.figure(figsize=(12,8))
p3 = sns.countplot(x="class", hue="who", data=df)
p3.set(xlabel='Class', ylabel='Count')
with_hue(p3, df['class'],3,3)
and the first output
while using total value with total = np.sum(a[::hue_categories]) give this output
First, note that in matplotlib and seaborn, a subplot is called an "ax". Giving such a subplot a name such as "p3" or "plot" leads to unnecessary confusion when studying the documentation and online example code.
The bars in the seaborn bar plot are organized, starting with all the bars belonging to the first hue value, then the second, etc. So, in the given example, first come all the blue, then all the orange and finally all the green bars. This makes looping through ax.patches somewhat complicated. Luckily, the same patches are also available via ax.collections, where each hue group forms a separate collection of bars.
Here is some example code:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
def percentage_above_bar_relative_to_xgroup(ax):
all_heights = [[p.get_height() for p in bars] for bars in ax.containers]
for bars in ax.containers:
for i, p in enumerate(bars):
total = sum(xgroup[i] for xgroup in all_heights)
percentage = f'{(100 * p.get_height() / total) :.1f}%'
ax.annotate(percentage, (p.get_x() + p.get_width() / 2, p.get_height()), size=11, ha='center', va='bottom')
df = sns.load_dataset("titanic")
plt.figure(figsize=(12, 8))
ax3 = sns.countplot(x="class", hue="who", data=df)
ax3.set(xlabel='Class', ylabel='Count')
percentage_above_bar_relative_to_xgroup(ax3)
plt.show()

How to make the x axis of figure wider using pyplot in python since figsize is not working [duplicate]

This question already has answers here:
How do I change the size of figures drawn with Matplotlib?
(14 answers)
Closed 1 year ago.
I want to make the x axis of a figure wider in matplotlib and I use the following code.
But it seems that figsize does not have any effect. How I can change the size of the figure?
data_dates = np.loadtxt(file,usecols = 0, dtype=np.str)
data1 = np.loadtxt(file,usecols = 1)
data2 = np.loadtxt(file,usecols = 2)
data3 = np.loadtxt(file,usecols = 3)
plt.plot(figsize=(30,5))
plt.plot(data_dates,data1, label = "T")
plt.plot(data_dates,data2, label = "WS")
plt.plot(data_dates,data3, label = "WD")
plt.xlabel('Date', fontsize=8)
plt.xticks(rotation=90,fontsize=4)
plt.ylabel(' Percentage Difference (%)')
plt.legend()
plt.savefig(outfile,format='png',dpi=200,bbox_inches='tight')
a sample of the file is
01/06/2019 0.1897540512577196 0.28956205456965856 0.10983099750547703
02/06/2019 0.1914523564094276 0.1815325705314345 0.0004533827128655877
03/06/2019 0.2365346386184113 0.12301344973593868 0.058843355966174876
04/06/2019 0.2085897993039386 0.005466902564359565 0.014087537281676313
05/06/2019 0.15563355684612554 0.16249844426472368 0.11036007669758358
06/06/2019 0.11981475728282368 0.11015459703126898 0.03501167308950372
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(30, 5)
plt.plot(data_dates,data1, label = "T")
plt.plot(data_dates,data2, label = "WS")
plt.plot(data_dates,data3, label = "WD")
plt.xlabel('Date', fontsize=8)
plt.xticks(rotation=90,fontsize=4)
plt.ylabel(' Percentage Difference (%)')
plt.legend()
plt.savefig("test.png",format='png',dpi=200,bbox_inches='tight')
Instead of creating the figure explicitly using subplots you could also use the get-current-figure method: fig = plt.gcf().

How to label my x-axis with years extracted from my time-series data?

I have data in this format / shape etc in a dataframe that I would like to represent in the form of a graph showing the total counts per each month. I have resampled the data so that it shows one row for one month, and then I wrote the following code to chart it out:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
#Read in data & create total column
stacked_bar_data = new_df
stacked_bar_data["total"] = stacked_bar_data.var1 + stacked_bar_data.var2
#Set general plot properties
sns.set_style("whitegrid")
sns.set_context({"figure.figsize": (24, 10)})
sns.set_context("poster")
#Plot 1 - background - "total" (top) series
sns.barplot(x = stacked_bar_data.index, y = stacked_bar_data.total, color = "red")
#Plot 2 - overlay - "bottom" series
bottom_plot = sns.barplot(x = stacked_bar_data.index, y = stacked_bar_data.attended, color = "#0000A3")
topbar = plt.Rectangle((0,0),1,1,fc="red", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='#0000A3', edgecolor = 'none')
l = plt.legend([bottombar, topbar], ['var1', 'var2'], loc=1, ncol = 2, prop={'size':18})
l.draw_frame(False)
#Optional code - Make plot look nicer
sns.despine(left=True)
bottom_plot.set_ylabel("Count")
# bottom_plot.set_xlabel("date")
#Set fonts to consistent 16pt size
for item in ([bottom_plot.xaxis.label, bottom_plot.yaxis.label] +
bottom_plot.get_xticklabels() + bottom_plot.get_yticklabels()):
item.set_fontsize(16)
# making sure our xticks is formatted correctly
plt.xticks(fontsize=20)
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
years_fmt = mdates.DateFormatter('%Y')
bottom_plot.xaxis.set_major_locator(years)
bottom_plot.xaxis.set_major_formatter(years_fmt)
bottom_plot.xaxis.set_minor_locator(months)
plt.show()
# bottom_plot.axes.xaxis.set_visible(False)
Thing is, my chart doesn't show me the years at the bottom. I believe I have all the pieces necessary to solve this problem, but for some reason I can't figure out what I'm doing wrong.
I think I'm doing something wrong with how I set up the subplots of the sns.barplot. Maybe I should be assigning them to fig and ax or something like that? That's how I saw it done on the matplotlib site. I just can't managed to transfer that logic over to my example.
Any help would be most appreciated. Thanks!
There are few things to consider. First of all, please try to convert your date column (new_df.date) to datetime.
new_df.date = pd.to_datetime(new_df.date)
Second of all do not use this part:
bottom_plot.xaxis.set_major_locator(years)
bottom_plot.xaxis.set_major_formatter(years_fmt)
bottom_plot.xaxis.set_minor_locator(months)
Instead use:
x_dates = stacked_bar_data['date'].dt.strftime('%Y').sort_values().unique()
bottom_plot.set_xticklabels(labels=x_dates, rotation=0, ha='center')
This is because seaborn re-locates the bars to integer positions. Even if we set them to be dates - Note, that you used indices explicitly. Below is fully working example. Note - this gives you major ticks only. You'll have to work the minor ticks out. My comments and things I've commented out after double #.
stacked_bar_data.date = pd.to_datetime(stacked_bar_data.date)
stacked_bar_data["total"] = stacked_bar_data.var1 + stacked_bar_data.var2
#Set general plot properties
sns.set_style("whitegrid")
sns.set_context({"figure.figsize": (14, 7)}) ## modified size :)
sns.set_context("poster")
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
years_fmt = mdates.DateFormatter('%Y')
sns.barplot(x = stacked_bar_data.index, y = stacked_bar_data.total, color = "red")
bottom_plot = sns.barplot(x = stacked_bar_data.index, y = stacked_bar_data.attended, color = "#0000A3")
topbar = plt.Rectangle((0,0),1,1,fc="red", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='#0000A3', edgecolor = 'none')
l = plt.legend([bottombar, topbar], ['var1', 'var2'], loc=1, ncol = 2, prop={'size':18})
l.draw_frame(False)
#Optional code - Make plot look nicer
sns.despine(left=True)
bottom_plot.set_ylabel("Count")
# bottom_plot.set_xlabel("date")
# making sure our xticks is formatted correctly
## plt.xticks(fontsize=20) # not needed as you change font below in the loop
## Do not use at all
## bottom_plot.xaxis.set_major_locator(years)
## bottom_plot.xaxis.set_major_formatter(years_fmt)
## bottom_plot.xaxis.set_minor_locator(months)
#Set fonts to consistent 16pt size
for item in ([bottom_plot.xaxis.label, bottom_plot.yaxis.label] +
bottom_plot.get_xticklabels() + bottom_plot.get_yticklabels()):
item.set_fontsize(16)
## This part is required if you want to stick to seaborn
## This is because the moment you start using seaborn it will "re-position" the bars
## at integer position rather than dates. W/o seaborn there is no such need
x_dates = stacked_bar_data['date'].dt.strftime('%Y').sort_values().unique()
bottom_plot.set_xticklabels(labels=x_dates, rotation=0, ha='center')
plt.show()

Categories