subplot by group in python pandas - python

I wanna make subplots for the following data. I averaged and grouped together.
I wanna make subpolts by country for x-axis resource and y-axis average.
country resource average
india water 76
india soil 45
india tree 60
US water 45
US soil 70
US tree 85
Germany water 76
Germany soil 65
Germany water 56
Grouped = df.groupby(['country','resource'])['TTR in minutes'].agg({'average': 'mean'}).reset_index()
I tried but couldn't plot in subplots

g = df.groupby('country')
fig, axes = plt.subplots(g.ngroups, sharex=True, figsize=(8, 6))
for i, (country, d) in enumerate(g):
ax = d.plot.bar(x='resource', y='average', ax=axes[i], title=country)
ax.legend().remove()
fig.tight_layout()

Related

Graph - How to change the color of the Y line from a certain day?

I have one question.
How Can I change the color of the Y line from a certain day?
For example I want to change the color to red the price from the 100th day:
I have the code:
df = pd.DataFrame.from_dict(cryptocurrency.prices)
df["time"] = pd.to_datetime(df["time"], unit="s")
df.set_index("time", inplace=False)
df["close"].plot(figsize=(12, 8), title=cryptocurrency.name, label="Price")
plt.legend()
plt.grid()
plt.show()
Thank you for help!
So suppose that I have a data frame as follows, which is called df:
median_house_value
1 176500.0
2 270500.0
3 330000.0
4 81700.0
5 67000.0
.. ...
95 96000.0
96 90600.0
97 121900.0
98 209400.0
99 200000.0
And with the following code, we have reached the plot you see:
df['median_house_value'].plot(figsize=(12, 8))
plt.legend()
plt.grid()
plt.show()
Output:
And I separate the desired part of the dataframe in this way:
df1 = df[50:100]
df1:
median_house_value
51 47500.0
52 156500.0
53 187500.0
54 191800.0
.. ...
98 209400.0
99 200000.0
And we have the new plot as follows:
df["median_house_value"].plot(figsize=(12, 8))
plt.plot(df1["median_house_value"], 'r')
plt.legend()
plt.grid()
plt.show()

Plot two one seaborn plot from two dataframes

I try to plot two dataframes with seaborn into one figure.
given these test data:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df['Name'] = 'Adam'
df.iloc[::5, 4] = 'Berta'
df.head(10)
A B C D Name
0 40 75 45 6 Berta
1 52 98 55 44 Adam
2 57 61 70 17 Adam
3 52 5 20 28 Adam
4 63 53 74 49 Adam
5 53 28 97 26 Berta
6 64 38 73 56 Adam
7 25 65 34 64 Adam
8 95 91 92 60 Adam
9 6 54 5 58 Adam
and
df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df1['Location'] = 'New York'
df1.iloc[::5, 4] = 'Tokyo'
df1.head(10)
A B C D Location
0 89 16 23 15 Tokyo
1 7 35 26 21 New York
2 64 94 51 61 New York
3 84 16 15 36 New York
4 55 62 0 2 New York
5 73 93 4 1 Tokyo
6 93 11 27 69 New York
7 14 52 50 45 New York
8 26 77 86 32 New York
9 21 10 68 11 New York
A)The first plot I would like to plot a relplot or scatterplot where both dataframes have the same x and y axes, but a different "hue". If I try:
sb.relplot(data=df, x='Name', y='C', hue="Name", height=8.27, aspect=11.7/8.27)
sb.relplot(data=df1, x='Location', y='C', hue="Location", height=8.27, aspect=11.7/8.27)
plt.show()
The latter plot will overwrite the first or creates a new one. Any ideas?
B) Now we have the same y-axes (let's say "amount"), but with different x-axes (strings).
I found this here: How to overlay two seaborn relplots? and it looks pretty good, but if I try:
fig, ax = plt.subplots()
sb.scatterplot(x="Name", y='A', data=df, hue="Name", ax=ax)
ax2 = ax.twinx()
sb.scatterplot(data=df1, x='Location', y='A', hue="Location", ax =ax2)
plt.show()
then the second scatterplot plots the values over the values of the first one overwriting the names for x. But I would like to add the second scatterplot on the right. Is this possible?
In my opinion it doesn't make sense to concatenate the two dataframes.
Thanks very much!
Having gathered all questions you asked I assume you either want to plot two subplots in one row for two DataFrames or plot two sets of data on one figure.
As for the 'A' plot:
fig, ax = plt.subplots(1, 2, figsize=(8, 4), sharey=True)
sb.scatterplot(data=df, x='Name', y='A', hue='Name',
ax=ax[0])
sb.scatterplot(data=df1, x='Location', y='A', hue='Location',
ax=ax[1])
plt.show()
Here I created both fig and ax using plt.subplots() so then I could locate each scatter plot on a separate subplot, indicating number of rows (1) and columns (2) and a shared Y-axis. Here's what I got (sorry for not bothering for legend location and other decorations):
As for the 'B' plot, if you would want everything on one plot, then you may try:
fig, ax = plt.subplots(1, 1, figsize=(8, 4))
sb.scatterplot(data=df, x='Name', y='A', hue='Name', palette=['blue', 'orange'],
ax=ax)
sb.scatterplot(data=df1, x='Location', y='A', hue='Location', palette=['red', 'green'],
ax=ax)
ax.set_xlabel('Name/Location')
plt.show()
Here I made a single subplot and assigned both scatter plots to it. Might require color mapping and renaming X-axis:

Python Matplotlib bars subplots by Category and Aggregation

I have a table like this:
data = {'Category':["Toys","Toys","Toys","Toys","Food","Food","Food","Food","Food","Food","Food","Food","Furniture","Furniture","Furniture"],
'Product':["AA","BB","CC","DD","SSS","DDD","FFF","RRR","EEE","WWW","LLLLL","PPPPPP","LPO","NHY","MKO"],
'QTY':[100,200,300,50,20,800,300,450,150,320,400,1000,150,900,1150]}
df = pd.DataFrame(data)
df
Out:
Category Product QTY
0 Toys AA 100
1 Toys BB 200
2 Toys CC 300
3 Toys DD 50
4 Food SSS 20
5 Food DDD 800
6 Food FFF 300
7 Food RRR 450
8 Food EEE 150
9 Food WWW 320
10 Food LLLLL 400
11 Food PPPPP 1000
12 Furniture LPO 150
13 Furniture NHY 900
14 Furniture MKO 1150
So, I need to make bars subplots like this (Sum Products in each Category):
My problem is that I can't figure out how to combine categories, series, and aggregation.
I manage to split them into 3 subplots (1 always stays blank) but I can not unite them ...
import matplotlib.pyplot as plt
fig, axarr = plt.subplots(2, 2, figsize=(12, 8))
df['Category'].value_counts().plot.bar(
ax=axarr[0][0], fontsize=12, color='b'
)
axarr[0][0].set_title("Category", fontsize=18)
df['Product'].value_counts().plot.bar(
ax=axarr[1][0], fontsize=12, color='b'
)
axarr[1][0].set_title("Product", fontsize=18)
df['QTY'].value_counts().plot.bar(
ax=axarr[1][1], fontsize=12, color='b'
)
axarr[1][1].set_title("QTY", fontsize=18)
plt.subplots_adjust(hspace=.3)
plt.show()
Out
What do I need to add to combine them?
This would be a lot easier with seaborn and FacetGrid
import pandas as pd
import seaborn as sns
data = {'Category':["Toys","Toys","Toys","Toys","Food","Food","Food","Food","Food","Food","Food","Food","Furniture","Furniture","Furniture"],
'Product':["AA","BB","CC","DD","SSS","DDD","FFF","RRR","EEE","WWW","LLLLL","PPPPPP","LPO","NHY","MKO"],
'QTY':[100,200,300,50,20,800,300,450,150,320,400,1000,150,900,1150]}
df = pd.DataFrame(data)
g = sns.FacetGrid(df, col='Category', sharex=False, sharey=False, col_wrap=2, height=3, aspect=1.5)
g.map_dataframe(sns.barplot, x='Product', y='QTY')

Plot number of observations for categorical groups

I have a data frame that looks like -
id age_bucket state gender duration category1 is_active
1 (40, 70] Jammu and Kashmir m 123 ABB 1
2 (17, 24] West Bengal m 72 ABB 0
3 (40, 70] Bihar f 109 CA 0
4 (17, 24] Bihar f 52 CA 1
5 (24, 30] MP m 23 ACC 1
6 (24, 30] AP m 103 ACC 1
7 (30, 40] West Bengal f 182 GF 0
I want to create a bar plot with how many people are active for each age_bucket and state (top 10). For for gender and category1 I want to create a pie chart with the proportion of active people. The top of the bar should display the total count for active and inactive members and similarly % should be display on pie chart based on is_active.
How to do it in python using seaborn or matplotlib?
I have done so far -
import seaborn as sns
%matplotlib inline
sns.barplot(x='age_bucket',y='is_active',data=df)
sns.barplot(x='category1',y='is_active',data=df)
It sounds like you want to count the observations rather than plotting a value from a column along the yaxis. In seaborn, the function for this is countplot():
sns.countplot('age_bucket', hue='is_active', data=df)
Since the returned object is a matplotlib axis, you could assign it to a variable (e.g. ax) and then use ax.annotate to place text in the the figure manually:
ax = sns.countplot('age_bucket', hue='is_active', data=df)
ax.annotate('1 1', (0, 1), ha='center', va='bottom', fontsize=12)
Seaborn has no way of creating pie charts, so you would need to use matplotlib directly. However, it is often easier to tell counts and proportions from bar charts so I would generally recommend that you stick to those unless you have a specific constraint that forces you to use a pie chart.

How to draw plots on Specific pandas columns

So I have the df.head() being displayed below.I wanted to display the progression of salaries across time spans.As you can see the teams will get repeated across the years and the idea is to
display how their salaries changed over time.So for teamID='ATL' I will have a graph that starts by 1985 and goes all the way to the present time.
I think I will need to select teams by their team ID and have the x axis display time (year) and Y axis display year. I don't know how to do that on Pandas and for each team in my data frame.
teamID yearID lgID payroll_total franchID Rank W G win_percentage
0 ATL 1985 NL 14807000.0 ATL 5 66 162 40.740741
1 BAL 1985 AL 11560712.0 BAL 4 83 161 51.552795
2 BOS 1985 AL 10897560.0 BOS 5 81 163 49.693252
3 CAL 1985 AL 14427894.0 ANA 2 90 162 55.555556
4 CHA 1985 AL 9846178.0 CHW 3 85 163 52.147239
5 ATL 1986 NL 17800000.0 ATL 4 55 181 41.000000
You can use seaborn for this:
import seaborn as sns
sns.lineplot(data=df, x='yearID', y='payroll_total', hue='teamID')
To get different plot for each team:
for team, d in df.groupby('teamID'):
d.plot(x='yearID', y='payroll_total', label='team')
import pandas as pd
import matplotlib.pyplot as plt
# Display the box plots on 3 separate rows and 1 column
fig, axes = plt.subplots(nrows=3, ncols=1)
# Generate a plot for each team
df[df['teamID'] == 'ATL'].plot(ax=axes[0], x='yearID', y='payroll_total')
df[df['teamID'] == 'BAL'].plot(ax=axes[1], x='yearID', y='payroll_total')
df[df['teamID'] == 'BOS'].plot(ax=axes[2], x='yearID', y='payroll_total')
# Display the plot
plt.show()
depending on how many teams you want to show you should adjust the
fig, axes = plt.subplots(nrows=3, ncols=1)
Finally, you could create a loop and create the visualization for every team

Categories