Normalized Group Values in Seaborn - python

I want to plot normalized count grouped values with seaborn. At first, I tried doing the following:
fig, ax = plt.subplots(figsize=(10, 6))
ax = sns.histplot(
data = df,
x = 'age_bins',
hue = 'Showup',
multiple="dodge",
stat = 'count',
shrink = 0.4,
)
Original Count
Now I want to normalize each bar relative to the overall 'bin' count. The only way I successeded to do so was by doing this:
fig, ax = plt.subplots(figsize=(10, 6))
ax = sns.histplot(
data = df,
x = 'age_bins',
hue = 'Showup',
multiple="fill",
stat = 'count',
shrink = 0.4,
)
multiple = 'fill'
Now this made me achieve what I wanted in terms of values, but is there anyway to plot the same results but with bars dodged beside each other instead of above each other?

You can group by ages and "showup", count them, then change "showup" to individual columns. Then divide each row by the row total and create a bar plot via pandas:
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
import pandas as pd
import numpy as np
ages = ['<10', '<20', '<30', '<40', '<50', '<60', '<70', '++70']
df = pd.DataFrame({'age_bins': np.random.choice(ages, 10000),
'Showup': np.random.choice([True, False], 10000, p=[0.88, 0.12])})
df_counts = df.groupby(['age_bins', 'Showup']).size().unstack().reindex(ages)
df_percentages = df_counts.div(df_counts.sum(axis=1), axis=0) * 100
sns.set() # set default seaborn style
fig, ax = plt.subplots(figsize=(10, 6))
df_percentages.plot.bar(rot=0, ax=ax)
ax.set_xlabel('')
ax.set_ylabel('Percentage per age group')
ax.yaxis.set_major_formatter(PercentFormatter(100))
plt.tight_layout()
plt.show()

Related

How to plot colors for two variables in scatterplot in python?

I have a dataset with two different variables, i want to give colors to each with different color, Can anyone help please? Link to my dataset : "https://github.com/mayuripandey/Data-Analysis/blob/main/word.csv"
import matplotlib.pyplot as plt
import pandas as pd
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(x = df['Friends Network-metrics'], y = df['Number of Followers'],cmap = "magma")
plt.xlabel("Friends Network-metrics")
plt.ylabel("Number of Followers")
plt.show()
Not very clear what you want to do here. But I'll provide a solution that may help you a bit.
Could use seaborn to implement the colors on the variables. Otherwise, you'd need to iterate through the points to set the color. Or create a new column that conditionally inputs a color for a value.
I don't know what your variable is, but you just want to put that in for the hue parameter:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv('https://raw.githubusercontent.com/mayuripandey/Data-Analysis/main/word.csv')
# Use the 'hue' argument to provide a factor variable
sns.lmplot(x='Friends Network-metrics',
y='Number of Followers',
height=8,
aspect=.8,
data=df,
fit_reg=False,
hue='Sentiment',
legend=True)
plt.xlabel("Friends Network-metrics")
plt.ylabel("Number of Followers")
plt.show()
This can give you a view like this:
If you were looking for color scale for one of the variables though, you would do the below. However, the max value is so big that the range also doesn't make it really an effective visual:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/mayuripandey/Data-Analysis/main/word.csv')
fig, ax = plt.subplots(figsize=(10, 6))
g = ax.scatter(x = df['Friends Network-metrics'],
y = df['Number of Followers'],
c = df['Friends Network-metrics'],
cmap = "magma")
fig.colorbar(g)
plt.xlabel("Friends Network-metrics")
plt.ylabel("Number of Followers")
plt.show()
So you could adjust the scale (I'd also add edgecolors = 'black' as its hard to see the light plots):
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/mayuripandey/Data-Analysis/main/word.csv')
fig, ax = plt.subplots(figsize=(10, 6))
g = ax.scatter(x = df['Friends Network-metrics'],
y = df['Number of Followers'],
c = df['Friends Network-metrics'],
cmap = "magma",
vmin=0, vmax=10000,
edgecolors = 'black')
fig.colorbar(g)
plt.xlabel("Friends Network-metrics")
plt.ylabel("Number of Followers")
plt.show()

matplotlib multiple Y-axis pandas plot

Could someone give me a tip on how to do multiple Y axis plots?
This is some made up data below, how could I put Temperature its own Y axis, Pressure on its own Y axis, and then have both Value1 and Value2 on the same Y axis. I am trying to go for the same look and feel of this SO post answer. Thanks for any tips, I don't understand ax3 = ax.twinx() process, like as far as do I need to define an ax.twinx() for each separate Y axis plot I need?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
rows,cols = 8760,4
data = np.random.rand(rows,cols)
tidx = pd.date_range('2019-01-01', periods=rows, freq='H')
df = pd.DataFrame(data, columns=['Temperature','Value1','Pressure','Value2'], index=tidx)
# using subplots() function
fig, ax = plt.subplots(figsize=(25,8))
plt.title('Multy Y Plot')
ax2 = ax.twinx()
ax3 = ax.twinx()
ax4 = ax.twinx()
plot1, = ax.plot(df.index, df.Temperature)
plot2, = ax2.plot(df.index, df.Value1, color = 'r')
plot3, = ax3.plot(df.index, df.Pressure, color = 'g')
plot4, = ax4.plot(df.index, df.Value2, color = 'b')
ax.set_xlabel('Date')
ax.set_ylabel('Temperature')
ax2.set_ylabel('Value1')
ax3.set_ylabel('Pressure')
ax4.set_ylabel('Value2')
plt.legend([plot1,plot2,plot3,plot4],list(df.columns))
# defining display layout
plt.tight_layout()
# show plot
plt.show()
This will output everything jumbled up on the same side without separate Y axis for Pressure, Value1, and Value2.
You are adding 4 different plots in one, which is not helpful. I would recommend breaking it into 2 plots w/ shared x-axis "Date":
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
rows,cols = 8760,4
data = np.random.rand(rows,cols)
tidx = pd.date_range('2019-01-01', periods=rows, freq='H')
df = pd.DataFrame(data, columns=['Temperature','Value1','Pressure','Value2'], index=tidx)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(25,8))
plt.title('Multy Y Plot')
ax1b = ax1.twinx()
plot1a, = ax1.plot(df.index, df.Temperature)
plot1b, = ax1b.plot(df.index, df.Pressure, color='r')
ax1.set_ylabel('Temperature')
ax1b.set_ylabel('Pressure')
ax2b = ax2.twinx()
plot2a, = ax2.plot(df.index, df.Value1, color='k')
plot2b, = ax2b.plot(df.index, df.Value2, color='g')
ax2.set_xlabel('Date')
ax2.set_ylabel('Value1')
ax2b.set_ylabel('Value2')
plt.legend([plot1a, plot1b, plot2a, plot2b], df.columns)
# defining display layout
plt.tight_layout()
# show plot
plt.show()
Here I have added in the first plot (on the top) Temperature and Pressure and on the second plot (on the bottom) Value 1 and Value 2. Normally, we add in the same plot things that make sense to compare on the same x-axis. Pressure and Temperature is a valid combination that is why I combined those two together. But you can do as you wish.
This answer below uses mpatches is how to make the subplot of Value1 and Value2 on the same axis. The solution for this post has subplot for Value1 and Value2 on different axis. Thanks for the help #tzinie!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
rows,cols = 8760,4
data = np.random.rand(rows,cols)
tidx = pd.date_range('2019-01-01', periods=rows, freq='H')
df = pd.DataFrame(data, columns=['Temperature','Value1','Pressure','Value2'], index=tidx)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(25,8))
plt.title('Multy Y Plot')
ax1b = ax1.twinx()
plot1a, = ax1.plot(df.index, df.Temperature, color='r') # red
plot1b, = ax1b.plot(df.index, df.Pressure, color='b') # blue
ax1.set_ylabel('Temperature')
ax1b.set_ylabel('Pressure')
ax2.plot(df.index, df.Value1, color='k') # black
ax2.plot(df.index, df.Value2, color='g') # green
ax2.set_xlabel('Date')
ax2.set_ylabel('Value1 & Value2')
red_patch = mpatches.Patch(color='red', label='Temperature')
blue_patch = mpatches.Patch(color='blue', label='Pressure')
green_patch = mpatches.Patch(color='green', label='Value2')
black_patch = mpatches.Patch(color='black', label='Value1')
plt.legend(handles=[red_patch,blue_patch,green_patch,black_patch])
# defining display layout
#plt.tight_layout()
# show plot
plt.show()

How to show the bar for small values in python chart? [duplicate]

I drawn the comparison bar chart for very small values with the following code,
import pandas as pd
import matplotlib.pyplot as plt
data = [[ 0.00790019035339353, 0.00002112],
[0.0107705593109131, 0.0000328540802001953],
[0.0507792949676514, 0.0000541210174560547]]
df = pd.DataFrame(data, columns=['A', 'B'])
df.plot.bar()
plt.bar(df['A'], df['B'])
plt.show()
Due to very small values I can't visualise the chart colour for the ('B' column) smaller value (e.g. 0.00002112) in the graph.
How can I modify the code to visualise smaller value(B column) colour in the graph? Thanks..
A common way to display data with different orders of magnitude is
to use a logarithmic scaling for the y-axis. Below the logarithm
to base 10 is used but other bases could be chosen.
import pandas as pd
import matplotlib.pyplot as plt
data = [[ 0.00790019035339353, 0.00002112],
[0.0107705593109131, 0.0000328540802001953],
[0.0507792949676514, 0.0000541210174560547]]
df = pd.DataFrame(data, columns=['A', 'B'])
df.plot.bar()
plt.yscale("log")
plt.show()
Update:
To change the formatting of the yaxis labels an instance of ScalarFormatter can be used:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter
data = [[ 0.00790019035339353, 0.00002112],
[0.0107705593109131, 0.0000328540802001953],
[0.0507792949676514, 0.0000541210174560547]]
df = pd.DataFrame(data, columns=['A', 'B'])
df.plot.bar()
plt.yscale("log")
plt.gca().yaxis.set_major_formatter(ScalarFormatter())
plt.show()
You could create 2 y-axis like this:
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
width = 0.2
df['A'].plot(kind='bar', color='green', ax=ax1, width=width, position=1, label = 'A')
df['B'].plot(kind='bar', color='blue', ax=ax2, width=width, position=0, label = 'B')
ax1.set_ylabel('A')
ax2.set_ylabel('B')
# legend
h1, l1 = ax1.get_legend_handles_labels()
h2, l2 = ax2.get_legend_handles_labels()
ax1.legend(h1+h2, l1+l2, loc=2)
plt.show()

How to plot only max values using python

I want to make a graph about how the maximum value of a cluster of points at any given x coordinate changes over time.
What I have achieved so far:
What I want to achieve:
I was thinking that making a subset of the data with only the day and value, and then getting the maximum value of the array either by iterating trough it or using a function. But I don't know if it's possible like here:
Here's my code
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('strong.csv', names=[
'time', 'exercise', 'set_number', 'mass', 'reps'],parse_dates=['time'])
df.time = pd.to_datetime(df.time,format='%Y-%m-%d')
df_exercise = df[(df.exercise == 'Bench Press (Barbell)')]
fig, ax = plt.subplots()
ax.scatter(
df_exercise.time,df_exercise.mass, c='Orange', s=30
)
ax.set(xlabel='Day', ylabel='Weight [ kg ]',
title='Time/Weight')
plt.xticks(fontsize=8,rotation=45)
plt.show()
plt.savefig('grafic.png')
You could group the dataframe by date and aggregate the maxima:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'time': np.repeat(pd.date_range('2021-03-01', periods=6), 2),
'mass': np.random.randint(20, 56, 12),
'excersie': 'Bench Press (Barbell)'})
df.time = pd.to_datetime(df.time, format='%Y-%m-%d')
df_exercise = df # just creating a dataframe similar to the question's
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(12, 5))
ax1.scatter(df_exercise.time, df_exercise.mass, c='limegreen', s=30)
df_plot = df_exercise.groupby('time')['mass'].agg('max')
ax2.scatter(df_exercise.time, df_exercise.mass, c='limegreen', s=30, alpha=0.3)
ax2.scatter(df_plot.index, df_plot.values, c='orange', s=30)
ax2.plot(df_plot.index, df_plot.values, c='black', lw=2, zorder=0)
for ax in (ax1, ax2):
ax.set(xlabel='Day', ylabel='Weight [ kg ]', title='Time/Weight')
ax.tick_params(axis='x', labelsize=8, labelrotation=45)
plt.tight_layout()
plt.show()

Share X axis between line and bar plot in Python's Matplotlib

I have the following script for generating a figure with two subplots: one line plot, and one bar plot.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
plt.close('all')
np.random.seed(42)
n = 1000
idx = pd.date_range(end='2020-02-27', periods=n)
df = pd.Series(np.random.randint(-5, 5, n),
index=idx)
curve = df.cumsum()
bars = df.resample('M').sum()
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
curve.plot(ax=ax1)
bars.plot(kind='bar', ax=ax2)
fig.set_tight_layout(True)
I would like to share the x axis between the two subplots, however the command ax2 = fig.add_subplot(212, sharex=ax1) will result in an empty graph for the line plot like the following figure.
Here is my version based on Matplotlib (without pandas api for plotting), may be it would be helpful.
I explicitly set the width of bars.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline
plt.close('all')
np.random.seed(42)
n = 1000
idx = pd.date_range(end='2020-02-27', periods=n)
df = pd.Series(np.random.randint(-5, 5, n), index=idx)
curve = df.cumsum()
bars = df.resample('M').sum()
#fig = plt.figure()
#ax1 = fig.add_subplot(211)
#ax2 = fig.add_subplot(212)
#curve.plot(ax=ax1)
#bars.plot(kind='bar', ax=ax2)
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, gridspec_kw={'hspace': 0})
ax1.plot(curve.index, curve.values)
ax2.bar(bars.index, bars.values, width = (bars.index[0] - bars.index[1])/2)
fig.set_tight_layout(True)
_ = plt.xticks(bars.index, bars.index, rotation=90)

Categories