I want to make a gdp vs life expectancy for Ireland over the course of a few years. I want to plot the first point on a scatter plot, then I want wait a few seconds and have the next point plot.
ax = plt.figure()
for i in year:
plt.scatter(ie_gdp[i], ie_life[i])
plt.draw()
plt.show()
plt.pause(1)
So far this is all I can come up with. However, using JupterLab this plots an individual plot for each point. I've tried looking at animations online, but they all use live data. I already have the datasets cleaned and reay in ie_gdp and ie_life.
%matplotlib inline
fig = plt.figure(figsize = (15,15))
ax = fig.add_subplot(1,1,1)
def animate(i):
xs = []
ys = []
for y in year:
xs.append(ie_gdp[y])
ys.append(ie_life[y])
ax.cla()
ax.scatter(xs,ys)
ani = animation.FuncAnimation(fig, animate, interval = 10000)
plt.show()
Above is my attempt at using animations, but it also doesn't work. I get this error: AttributeError: 'list' object has no attribute 'shape'
Any help would be appreciated.
I'm not sure I understand your intended animation, but I animated the x-axis as year, y-axis as average age, and the size of the scatter plot as GDP value. The sample data is from the data provided by Plotly, so please replace it with your own data.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
#from IPython.display import HTML
#from matplotlib.animation import PillowWriter
# for sample data
import plotly.express as px
df = px.data.gapminder()
ie_df = df[df['country'] == 'Ireland']
ie_df
country continent year lifeExp pop gdpPercap iso_alpha iso_num
744 Ireland Europe 1952 66.910 2952156 5210.280328 IRL 372
745 Ireland Europe 1957 68.900 2878220 5599.077872 IRL 372
746 Ireland Europe 1962 70.290 2830000 6631.597314 IRL 372
747 Ireland Europe 1967 71.080 2900100 7655.568963 IRL 372
748 Ireland Europe 1972 71.280 3024400 9530.772896 IRL 372
749 Ireland Europe 1977 72.030 3271900 11150.981130 IRL 372
750 Ireland Europe 1982 73.100 3480000 12618.321410 IRL 372
751 Ireland Europe 1987 74.360 3539900 13872.866520 IRL 372
752 Ireland Europe 1992 75.467 3557761 17558.815550 IRL 372
753 Ireland Europe 1997 76.122 3667233 24521.947130 IRL 372
754 Ireland Europe 2002 77.783 3879155 34077.049390 IRL 372
755 Ireland Europe 2007 78.885 4109086 40675.996350 IRL 372
fig = plt.figure(figsize=(10,10))
ax = plt.axes(xlim=(1952,2012), ylim=(0, 45000))
scat = ax.scatter([], [], [], cmap='jet')
def animate(i):
tmp = ie_df.iloc[:i,:]
ax.clear()
scat = ax.scatter(tmp['year'], tmp['lifeExp'], s=tmp['gdpPercap'], c=tmp['gdpPercap'], ec='k')
ax.set_xlabel('Year', fontsize=15)
ax.set_ylabel('lifeExp', fontsize=15)
ax.set_title('Ireland(1952-2007)')
return scat,
anim = FuncAnimation(fig, animate, frames=12, interval=1000, repeat=False)
#anim.save('gdp_life.gif', writer='Pillow')
plt.show()
Related
I have a dataset, df that looks like this:
Date
Code
City
State
Quantity x
Quantity y
Population
Cases
Deaths
2019-01
10001
Los Angeles
CA
445
0
0
2019-01
10002
Sacramento
CA
4450
556
0
0
2020-03
12223
Houston
TX
440
4440
35000000
23
11
...
...
...
...
...
...
...
...
...
2021-07
10002
Sacramento
CA
3220
NA
5444000
211
22
My start and end date are the same for all cities. I have over 4000 different cities, and would like to plot a 2-yaxis graph for each city, using something similar to the following code:
import matplotlib.pyplot as plt
fig, ax1 = plt.subplots(figsize=(9,9))
color = 'tab:red'
ax1.set_xlabel('Date')
ax1.set_ylabel('Quantity X', color=color)
ax1.plot(df['Quantity x'], color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax2 = ax1.twinx()
color2 = 'tab:blue'
ax2.set_ylabel('Deaths', color=color2)
ax2.plot(df['Deaths'], color=color2)
ax2.tick_params(axis='y', labelcolor=color2)
plt.show()
I would like to create a loop so that the code above runs for every Code that is related to a City, with quantity x and deaths, and it saves each graph made into a folder. How can I create a loop that does that, and stops every different Code?
Observations: Some values on df['Quantity x] and df[Population] are left blank.
If I understood you correctly, you are looking for a filtering functionality:
import matplotlib.pyplot as plt
import pandas as pd
def plot_quantity_and_death(df):
# your code
fig, ax1 = plt.subplots(figsize=(9, 9))
color = 'tab:red'
ax1.set_xlabel('Date')
ax1.set_ylabel('Quantity X', color=color)
ax1.plot(df['Quantity x'], color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax2 = ax1.twinx()
color2 = 'tab:blue'
ax2.set_ylabel('Deaths', color=color2)
ax2.plot(df['Deaths'], color=color2)
ax2.tick_params(axis='y', labelcolor=color2)
# save & close addon
plt.savefig(f"Code_{str(df['Code'].iloc[0])}.png")
plt.close()
df = pd.DataFrame() # this needs to be replaced by your dataset
# get unique city codes, loop over them, filter data and plot it
unique_codes = pd.unique(df['Code'])
for code in unique_codes:
filtered_df = df[df['Code'] == code]
plot_quantity_and_death(filtered_df)
I'm dealing with the well-known Gapminder data file (here:
https://www.kaggle.com/datasets/tklimonova/gapminder-datacamp-2007?select=gapminder_full.csv)
df.head():
country year population continent life_exp gdp_cap
0 Afghanistan 2007 31889923 Asia 43.828 974.580338
1 Albania 2007 3600523 Europe 76.423 5937.029526
2 Algeria 2007 33333216 Africa 72.301 6223.367465
3 Angola 2007 12420476 Africa 42.731 4797.231267
4 Argentina 2007 40301927 Americas 75.320 12779.379640
I tried a scatter plot but get confused by the many lines appearing on the plot:
plt.style.use('seaborn')
x = np.array(df['gdp_cap'])
y = np.array(df['life_exp'])
plt.scatter(x, y, marker = 'o', alpha = 1)
coeff = np.polyfit(x, y, 2)
plt.plot(x, coeff[0]*(x**2) + coeff[1]*x + coeff[2])
plt.show()
What I am doing wrong ???
Your second plot overdraws the first plot. Do add another plt.show() to prevent overdrawing:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
plt.style.use('seaborn')
x = np.array(df['gdp_cap'])
y = np.array(df['life_exp'])
plt.scatter(x, y, marker = 'o', alpha = 1)
plt.show()
coeff = np.polyfit(x, y, 2)
plt.plot(x, coeff[0]*(x**2) + coeff[1]*x + coeff[2])
plt.show()
Output:
I thought this would turn out easy, but I am struggling now for a few hours to animate my seaborn scatter plots iterating over my datetime values.
The x and y variables are coordinates, and I would like to animate them according to the datetime variable, colored by their "id".
My data set looks like this:
df.head(10)
Out[64]:
date id x y
0 2019-10-09 15:20:01.418 3479 353 118
1 2019-10-09 15:20:01.418 3477 315 92
2 2019-10-09 15:20:01.418 3473 351 176
3 2019-10-09 15:20:01.418 3476 318 176
4 2019-10-09 15:20:01.418 3386 148 255
5 2019-10-09 15:20:01.418 3390 146 118
6 2019-10-09 15:20:01.418 3447 469 167
7 2019-10-09 15:20:03.898 3476 318 178
8 2019-10-09 15:20:03.898 3479 357 117
9 2019-10-09 15:20:03.898 3386 144 257
The plot that should be iterated looks like this:
.
Below is a quick example. You might want to fix the axes limits to make the transitions nicer.
import pandas as pd
import seaborn as sns
import matplotlib.animation
import matplotlib.pyplot as plt
def animate(date):
df2 = df.query('date == #date')
ax = plt.gca()
ax.clear()
return sns.scatterplot(data=df2, x='x', y='y', hue='id', ax=ax)
fig, ax = plt.subplots()
ani = matplotlib.animation.FuncAnimation(fig, animate, frames=df.date.unique(), interval=100, repeat=True)
plt.show()
NB. I assumed that date is sorted in the order of the frames
edit: If using a Jupyter notebook, you should wrap the animation to display it. See for example this post.
from matplotlib import animation
from IPython.display import HTML
import matplotlib.pyplot as plt
import seaborn as sns
xmin, xmax = df.x.agg(['min', 'max'])
ymin, ymax = df.y.agg(['min', 'max'])
def animate(date):
df2 = df.query('date == #date')
ax = plt.gca()
ax.clear() # needed only to keep the points of the current frame
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
return sns.scatterplot(data=df2, x='x', y='y', hue='id', ax=ax)
fig, ax = plt.subplots()
anim = animation.FuncAnimation(fig, animate, frames=df.date.unique(), interval=100, repeat=True)
HTML(anim.to_html5_video())
I been working on a Pie Chart to display data based on year wise.I have tried for quite a while I am successful at achieving at slicing rows:
df = pd.DataFrame(dict( Year = dates[:3],
robbery = robbery[:3],
fraud = fraud[:3],
sexual = sexual[:3]
))
fig, axes = plt.subplots(1,3, figsize=(12,8))
for ax, idx in zip(axes, df.index):
ax.pie(df.loc[idx],explode=explode,shadow=True, labels=df.columns, autopct='%.2f%%')
ax.set(ylabel='', title=idx, aspect='equal')
axes[0].legend(bbox_to_anchor=(0, 0.5))
plt.show()
but I have checked this link to display pie chart but they have worked on numpy array to achieve charts.
In my scenario, I stuck on displaying all the data on pie chart with a year wise at once here is my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(dict(
robbery = robbery,
fraud = fraud,
assualt = sexual
), index=dates)
print(df)
plt.style.use('ggplot')
colors = plt.rcParams['axes.color_cycle']
fig, axes = plt.subplots(nrows=2, ncols=2)
for ax, col in zip(axes.flat, df.columns):
ax.pie(df[col], labels=df.index, autopct='%.2f', colors=colors)
ax.set(ylabel='', title=col, aspect='equal')
axes[0, 0].legend(bbox_to_anchor=(0, 0.5))
fig.savefig('your_file.png') # Or whichever format you'd like
plt.show()
DataFrame:
assualt fraud robbery
1997-1998 2988 11897 1212
1998-1999 6033 27660 2482
1999-2000 5924 28421 2418
2000-2001 5631 29539 2298
2001-2002 5875 30295 2481
2002-2003 7434 27141 1940
2003-2004 5673 27986 2053
2004-2005 5695 30070 1879
2005-2006 6099 26031 1903
2006-2007 7038 25845 1889
2007-2008 6671 21009 1736
2008-2009 6046 17768 1791
2009-2010 5496 18974 1934
2010-2011 5666 18458 1726
2011-2012 4933 14157 1748
2012-2013 4972 16849 1962
2013-2014 5328 18819 1762
2014-2015 5909 21915 1341
2015-2016 6067 21891 1354
2016-2017 6448 27390 1608
2017-2018 6355 25438 1822
1997-1998 2988 11897 1212
1998-1999 6033 27660 2482
1999-2000 5924 28421 2418
2000-2001 5631 29539 2298
2001-2002 5875 30295 2481
2002-2003 7434 27141 1940
2003-2004 5673 27986 2053
2004-2005 5695 30070 1879
2005-2006 6099 26031 1903
2006-2007 7038 25845 1889
2007-2008 6671 21009 1736
2008-2009 6046 17768 1791
2009-2010 5496 18974 1934
2010-2011 5666 18458 1726
2011-2012 4933 14157 1748
2012-2013 4972 16849 1962
2013-2014 5328 18819 1762
2014-2015 5909 21915 1341
2015-2016 6067 21891 1354
2016-2017 6448 27390 1608
2017-2018 6355 25438 1822
pie chart looks like this:
I generated an example 9 x 3 dataframe and a 3 x 3 subplots then populated the pie chart one row at a time.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(9, 3)),
columns=['a', 'b', 'c'])
fig, axes = plt.subplots(3,3, figsize=(12,8))
for i in range(int(len(df.index)/3)):
for j in range(int(len(df.index)/3)):
idx = i * 3 + j
ax = axes[i][j]
ax.pie(df.loc[idx],shadow=True, labels=df.columns, autopct='%.2f%%')
ax.set(ylabel='', title=idx, aspect='equal')
axes[0][0].legend(bbox_to_anchor=(0, 0.5))
plt.show()
I currently have a dataframe that has as an index the years from 1990 to 2014 (25 rows). I want my plot to have the X axis with all the years showing. I'm using add_subplot as I plan to have 4 plots in this figure (all of them with the same X axis).
To create the dataframe:
import pandas as pd
import numpy as np
index = np.arange(1990,2015,1)
columns = ['Total Population','Urban Population']
pop_plot = pd.DataFrame(index=index, columns=columns)
pop_plot = df_.fillna(0)
pop_plot['Total Population'] = np.arange(150,175,1)
pop_plot['Urban Population'] = np.arange(50,125,3)
Total Population Urban Population
1990 150 50
1991 151 53
1992 152 56
1993 153 59
1994 154 62
1995 155 65
1996 156 68
1997 157 71
1998 158 74
1999 159 77
2000 160 80
2001 161 83
2002 162 86
2003 163 89
2004 164 92
2005 165 95
2006 166 98
2007 167 101
2008 168 104
2009 169 107
2010 170 110
2011 171 113
2012 172 116
2013 173 119
2014 174 122
The code that I currently have:
fig = plt.figure(figsize=(10,5))
ax1 = fig.add_subplot(2,2,1, xticklabels=pop_plot.index)
plt.subplot(2, 2, 1)
plt.plot(pop_plot)
legend = plt.legend(pop_plot, bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand')
legend.get_frame().set_alpha(0)
ax1.set_xticks(range(len(pop_plot.index)))
This is the plot that I get:
When I comment the set_xticks I get the following plot:
#ax1.set_xticks(range(len(pop_plot.index)))
I've tried a couple of answers that I found here, but I didn't have much success.
It's not clear what ax1.set_xticks(range(len(pop_plot.index))) should be used for. It will set the ticks to the numbers 0,1,2,3 etc. while your plot should range from 1990 to 2014.
Instead, you want to set the ticks to the numbers of your data:
ax1.set_xticks(pop_plot.index)
Complete corrected example:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
index = np.arange(1990,2015,1)
columns = ['Total Population','Urban Population']
pop_plot = pd.DataFrame(index=index, columns=columns)
pop_plot['Total Population'] = np.arange(150,175,1)
pop_plot['Urban Population'] = np.arange(50,125,3)
fig = plt.figure(figsize=(10,5))
ax1 = fig.add_subplot(2,2,1)
ax1.plot(pop_plot)
legend = ax1.legend(pop_plot, bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand')
legend.get_frame().set_alpha(0)
ax1.set_xticks(pop_plot.index)
plt.show()
The easiest option is to use the xticks parameter for pandas.DataFrame.plot
Pass the dataframe index to xticks: xticks=pop_plot.index
# given the dataframe in the OP
ax = pop_plot.plot(xticks=pop_plot.index, figsize=(15, 5))
# move the legend
ax.legend(bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand', frameon=False)