I have a dataset, df that looks like this:
Date
Code
City
State
Quantity x
Quantity y
Population
Cases
Deaths
2019-01
10001
Los Angeles
CA
445
0
0
2019-01
10002
Sacramento
CA
4450
556
0
0
2020-03
12223
Houston
TX
440
4440
35000000
23
11
...
...
...
...
...
...
...
...
...
2021-07
10002
Sacramento
CA
3220
NA
5444000
211
22
My start and end date are the same for all cities. I have over 4000 different cities, and would like to plot a 2-yaxis graph for each city, using something similar to the following code:
import matplotlib.pyplot as plt
fig, ax1 = plt.subplots(figsize=(9,9))
color = 'tab:red'
ax1.set_xlabel('Date')
ax1.set_ylabel('Quantity X', color=color)
ax1.plot(df['Quantity x'], color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax2 = ax1.twinx()
color2 = 'tab:blue'
ax2.set_ylabel('Deaths', color=color2)
ax2.plot(df['Deaths'], color=color2)
ax2.tick_params(axis='y', labelcolor=color2)
plt.show()
I would like to create a loop so that the code above runs for every Code that is related to a City, with quantity x and deaths, and it saves each graph made into a folder. How can I create a loop that does that, and stops every different Code?
Observations: Some values on df['Quantity x] and df[Population] are left blank.
If I understood you correctly, you are looking for a filtering functionality:
import matplotlib.pyplot as plt
import pandas as pd
def plot_quantity_and_death(df):
# your code
fig, ax1 = plt.subplots(figsize=(9, 9))
color = 'tab:red'
ax1.set_xlabel('Date')
ax1.set_ylabel('Quantity X', color=color)
ax1.plot(df['Quantity x'], color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax2 = ax1.twinx()
color2 = 'tab:blue'
ax2.set_ylabel('Deaths', color=color2)
ax2.plot(df['Deaths'], color=color2)
ax2.tick_params(axis='y', labelcolor=color2)
# save & close addon
plt.savefig(f"Code_{str(df['Code'].iloc[0])}.png")
plt.close()
df = pd.DataFrame() # this needs to be replaced by your dataset
# get unique city codes, loop over them, filter data and plot it
unique_codes = pd.unique(df['Code'])
for code in unique_codes:
filtered_df = df[df['Code'] == code]
plot_quantity_and_death(filtered_df)
Related
I'm dealing with the well-known Gapminder data file (here:
https://www.kaggle.com/datasets/tklimonova/gapminder-datacamp-2007?select=gapminder_full.csv)
df.head():
country year population continent life_exp gdp_cap
0 Afghanistan 2007 31889923 Asia 43.828 974.580338
1 Albania 2007 3600523 Europe 76.423 5937.029526
2 Algeria 2007 33333216 Africa 72.301 6223.367465
3 Angola 2007 12420476 Africa 42.731 4797.231267
4 Argentina 2007 40301927 Americas 75.320 12779.379640
I tried a scatter plot but get confused by the many lines appearing on the plot:
plt.style.use('seaborn')
x = np.array(df['gdp_cap'])
y = np.array(df['life_exp'])
plt.scatter(x, y, marker = 'o', alpha = 1)
coeff = np.polyfit(x, y, 2)
plt.plot(x, coeff[0]*(x**2) + coeff[1]*x + coeff[2])
plt.show()
What I am doing wrong ???
Your second plot overdraws the first plot. Do add another plt.show() to prevent overdrawing:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
plt.style.use('seaborn')
x = np.array(df['gdp_cap'])
y = np.array(df['life_exp'])
plt.scatter(x, y, marker = 'o', alpha = 1)
plt.show()
coeff = np.polyfit(x, y, 2)
plt.plot(x, coeff[0]*(x**2) + coeff[1]*x + coeff[2])
plt.show()
Output:
I want to make a gdp vs life expectancy for Ireland over the course of a few years. I want to plot the first point on a scatter plot, then I want wait a few seconds and have the next point plot.
ax = plt.figure()
for i in year:
plt.scatter(ie_gdp[i], ie_life[i])
plt.draw()
plt.show()
plt.pause(1)
So far this is all I can come up with. However, using JupterLab this plots an individual plot for each point. I've tried looking at animations online, but they all use live data. I already have the datasets cleaned and reay in ie_gdp and ie_life.
%matplotlib inline
fig = plt.figure(figsize = (15,15))
ax = fig.add_subplot(1,1,1)
def animate(i):
xs = []
ys = []
for y in year:
xs.append(ie_gdp[y])
ys.append(ie_life[y])
ax.cla()
ax.scatter(xs,ys)
ani = animation.FuncAnimation(fig, animate, interval = 10000)
plt.show()
Above is my attempt at using animations, but it also doesn't work. I get this error: AttributeError: 'list' object has no attribute 'shape'
Any help would be appreciated.
I'm not sure I understand your intended animation, but I animated the x-axis as year, y-axis as average age, and the size of the scatter plot as GDP value. The sample data is from the data provided by Plotly, so please replace it with your own data.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
#from IPython.display import HTML
#from matplotlib.animation import PillowWriter
# for sample data
import plotly.express as px
df = px.data.gapminder()
ie_df = df[df['country'] == 'Ireland']
ie_df
country continent year lifeExp pop gdpPercap iso_alpha iso_num
744 Ireland Europe 1952 66.910 2952156 5210.280328 IRL 372
745 Ireland Europe 1957 68.900 2878220 5599.077872 IRL 372
746 Ireland Europe 1962 70.290 2830000 6631.597314 IRL 372
747 Ireland Europe 1967 71.080 2900100 7655.568963 IRL 372
748 Ireland Europe 1972 71.280 3024400 9530.772896 IRL 372
749 Ireland Europe 1977 72.030 3271900 11150.981130 IRL 372
750 Ireland Europe 1982 73.100 3480000 12618.321410 IRL 372
751 Ireland Europe 1987 74.360 3539900 13872.866520 IRL 372
752 Ireland Europe 1992 75.467 3557761 17558.815550 IRL 372
753 Ireland Europe 1997 76.122 3667233 24521.947130 IRL 372
754 Ireland Europe 2002 77.783 3879155 34077.049390 IRL 372
755 Ireland Europe 2007 78.885 4109086 40675.996350 IRL 372
fig = plt.figure(figsize=(10,10))
ax = plt.axes(xlim=(1952,2012), ylim=(0, 45000))
scat = ax.scatter([], [], [], cmap='jet')
def animate(i):
tmp = ie_df.iloc[:i,:]
ax.clear()
scat = ax.scatter(tmp['year'], tmp['lifeExp'], s=tmp['gdpPercap'], c=tmp['gdpPercap'], ec='k')
ax.set_xlabel('Year', fontsize=15)
ax.set_ylabel('lifeExp', fontsize=15)
ax.set_title('Ireland(1952-2007)')
return scat,
anim = FuncAnimation(fig, animate, frames=12, interval=1000, repeat=False)
#anim.save('gdp_life.gif', writer='Pillow')
plt.show()
I have a table like this:
data = {'Category':["Toys","Toys","Toys","Toys","Food","Food","Food","Food","Food","Food","Food","Food","Furniture","Furniture","Furniture"],
'Product':["AA","BB","CC","DD","SSS","DDD","FFF","RRR","EEE","WWW","LLLLL","PPPPPP","LPO","NHY","MKO"],
'QTY':[100,200,300,50,20,800,300,450,150,320,400,1000,150,900,1150]}
df = pd.DataFrame(data)
df
Out:
Category Product QTY
0 Toys AA 100
1 Toys BB 200
2 Toys CC 300
3 Toys DD 50
4 Food SSS 20
5 Food DDD 800
6 Food FFF 300
7 Food RRR 450
8 Food EEE 150
9 Food WWW 320
10 Food LLLLL 400
11 Food PPPPP 1000
12 Furniture LPO 150
13 Furniture NHY 900
14 Furniture MKO 1150
So, I need to make bars subplots like this (Sum Products in each Category):
My problem is that I can't figure out how to combine categories, series, and aggregation.
I manage to split them into 3 subplots (1 always stays blank) but I can not unite them ...
import matplotlib.pyplot as plt
fig, axarr = plt.subplots(2, 2, figsize=(12, 8))
df['Category'].value_counts().plot.bar(
ax=axarr[0][0], fontsize=12, color='b'
)
axarr[0][0].set_title("Category", fontsize=18)
df['Product'].value_counts().plot.bar(
ax=axarr[1][0], fontsize=12, color='b'
)
axarr[1][0].set_title("Product", fontsize=18)
df['QTY'].value_counts().plot.bar(
ax=axarr[1][1], fontsize=12, color='b'
)
axarr[1][1].set_title("QTY", fontsize=18)
plt.subplots_adjust(hspace=.3)
plt.show()
Out
What do I need to add to combine them?
This would be a lot easier with seaborn and FacetGrid
import pandas as pd
import seaborn as sns
data = {'Category':["Toys","Toys","Toys","Toys","Food","Food","Food","Food","Food","Food","Food","Food","Furniture","Furniture","Furniture"],
'Product':["AA","BB","CC","DD","SSS","DDD","FFF","RRR","EEE","WWW","LLLLL","PPPPPP","LPO","NHY","MKO"],
'QTY':[100,200,300,50,20,800,300,450,150,320,400,1000,150,900,1150]}
df = pd.DataFrame(data)
g = sns.FacetGrid(df, col='Category', sharex=False, sharey=False, col_wrap=2, height=3, aspect=1.5)
g.map_dataframe(sns.barplot, x='Product', y='QTY')
I am coming from R ggplot2 background and, and bit confused in matplotlib plot
here my dataframe
languages = ['en','cs','es', 'pt', 'hi', 'en', 'es', 'es']
counties = ['us','ch','sp', 'br', 'in', 'fr', 'ar', 'pr']
count = [32, 432,43,55,6,23,455,23]
df = pd.DataFrame({'language': languages,'county': counties, 'count' : count})
language county count
0 en us 32
1 cs ch 432
2 es sp 43
3 pt br 55
4 hi in 6
5 en fr 23
6 es ar 455
7 es pr 23
Now I want to plot
A stacked bar chart where x axis show language and y axis show complete count, the big total height show total count for that language and stacked bar show number of countries for that language
A side by side, with same parameters only countries show side by side instead of stacked one
Most of the example show it directly using dataframe and matplotlib plot but I want to plot it in sequential script so I have more control over it, also can edit whatever I want, something like this script
ind = np.arange(df.languages.nunique())
width = 0.35
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.bar(ind, df.languages, width, color='r')
ax.bar(ind, df.count, width,bottom=df.languages, color='b')
ax.set_ylabel('Count')
ax.set_title('Score y language and country')
ax.set_xticks(ind, df.languages)
ax.set_yticks(np.arange(0, 81, 10))
ax.legend(labels=[df.countries])
plt.show()
btw, my panda pivot code for same plotting
df.pivot(index = "Language", columns = "Country", values = "count").plot.bar(figsize=(15,10))
plt.xticks(rotation = 0,fontsize=18)
plt.xlabel('Language' )
plt.ylabel('Count ')
plt.legend(fontsize='large', ncol=2,handleheight=1.5)
plt.show()
import matplotlib.pyplot as plt
languages = ['en','cs','es', 'pt', 'hi', 'en', 'es', 'es']
counties = ['us','ch','sp', 'br', 'in', 'fr', 'ar', 'pr']
count = [32, 432,43,55,6,23,455,23]
df = pd.DataFrame({'language': languages,'county': counties, 'count' : count})
modified = {}
modified['language'] = np.unique(df.language)
country_count = []
total_count = []
for x in modified['language']:
country_count.append(len(df[df['language']==x]))
total_count.append(df[df['language']==x]['count'].sum())
modified['country_count'] = country_count
modified['total_count'] = total_count
mod_df = pd.DataFrame(modified)
print(mod_df)
ind = mod_df.language
width = 0.35
p1 = plt.bar(ind,mod_df.total_count, width)
p2 = plt.bar(ind,mod_df.country_count, width,
bottom=mod_df.total_count)
plt.ylabel("Total count")
plt.xlabel("Languages")
plt.legend((p1[0], p2[0]), ('Total Count', 'Country Count'))
plt.show()
First,modify the dataframe to below dataframe.
language country_count total_count
0 cs 1 432
1 en 2 55
2 es 3 521
3 hi 1 6
4 pt 1 55
This is the plot:
As the value of country count is small, you cannot clearly see the stacked country count.
import seaborn as sns
import matplotlib.pyplot as plt
figure, axis = plt.subplots(1,1,figsize=(10,5))
sns.barplot(x="language",y="count",data=df,ci=None)#,hue='county')
axis.set_title('Score y language and country')
axis.set_ylabel('Count')
axis.set_xlabel("Language")
sns.countplot(x=df.language,data=df)
I have a DataFrame:
wilayah branch Income Januari 2018 Income Januari 2019 Income Febuari 2018 Income Febuari 2019 Income Jan-Feb 2018 Income Jan-Feb 2019
1 sunarto 1000 1500 2000 3000 3333 4431
1 pemabuk 500 700 3000 3000 4333 5431
1 pemalas 2000 2200 4000 3000 5333 6431
1 hasuntato 9000 1200 6000 3000 2222 2121
1 sibodoh 1000 1500 3434 3000 2233 2121
...
My expectation to to create a bar graph where x axis is every name in branch (e.g sunarto, pemabuk, pemalas, etc), and y axis is income.
Let's say I will compare sunarto's income januari 2018 and income januari 2019, pemabuk's income januari 2018 and income januari 2019, and so on (1 name in x axis, 2 values as comparison of two values). Then I will sort values high to low value from Income Jan-Feb 2019 in my bar graph.
I tried:
import matplotlib.pyplot as plt
import pandas as pd
fig, ax = plt.subplots()
ax = df1[["Sunarto","Income Januari 2018", "Income Januari 2019"]].plot(x='branch', kind='bar', color=["g","b"],rot=45)
plt.show()
Consider a groupby aggregation then run DataFrame.plot. Below will line all branches on x-axis with different income columns as color_coded keys in legend.
agg_df = df.groupby('branch').sum()
fig, ax = plt.subplots(figsize=(15,5))
agg_df.plot(kind='bar', edgecolor='w', ax=ax, rot=22, width=0.5, fontsize = 15)
# ADD TITLES AND LABELS
plt.title('Income by Branches, Jan/Feb 2018-2019', weight='bold', size=24)
plt.xlabel('Branch', weight='bold', size=24)
plt.ylabel('Income', weight='bold', size=20)
plt.tight_layout()
plt.show()
plt.clf()
Should you want each separate branch plots on specific columns, iterate off a groupby list:
dfs = df.groupby('branch')
for i,g in dfs:
ord_cols = (pd.melt(g.drop(columns="wilayah"), id_vars = "branch")
.sort_values("value")["variable"].values
)
fig, ax = plt.subplots(figsize=(8,4))
(g.reindex(columns=ord_cols)
.plot(kind='bar', edgecolor='w', ax=ax, rot=0, width=0.5, fontsize = 15)
)
# ADD TITLES AND LABELS
plt.title('Income by {} Branch, Jan/Feb 2018-2019'.format(i),
weight='bold', size=16)
plt.xlabel('Branch', weight='bold', size=16)
plt.ylabel('Income', weight='bold', size=14)
plt.tight_layout()
plt.show()