I'm having a data-frame as follows:
match_id team team_score
411 RCB 263
7937 KKR 250
620 RCB 248
206 CSK 246
11338 KKR 241
61 CSK 240
562 RCB 235
Now, I want to plot a bar plot for all these values as an individual bars, what I'm getting in output is something different:
Is there any way I can make different bars for same x-axis values??
When 'team' is used as x, all the values for each team are averaged and a small error bar shows a confidence interval. To have each entry of the table as a separate bar, the index of the dataframe can be used for x. After creating the bars, they can be labeled with the team names.
Optionally, hue='team'colors the bars per team. Then dodge=False is needed to have the bars positioned nicely. In that case, Seaborn also creates a legend, which is not so useful, as the same information now also is present as the x-values. The legend can be suppressed via ax.legend_.remove().
from matplotlib import pyplot as plt
import pandas as pd
from io import StringIO
import seaborn as sns
data_str = StringIO("""match_id team team_score
411 RCB 263
7937 KKR 250
620 RCB 248
206 CSK 246
11338 KKR 241
61 CSK 240
562 RCB 235""")
df = pd.read_csv(data_str, delim_whitespace=True)
color_dict = {'RCB': 'dodgerblue', 'KKR': 'darkviolet', 'CSK': 'gold'}
ax = sns.barplot(x=df.index, y='team_score', hue='team', palette=color_dict, dodge=False, data=df)
ax.set_xticklabels(df['team'])
ax.legend_.remove()
plt.tight_layout()
plt.show()
Related
I thought this would turn out easy, but I am struggling now for a few hours to animate my seaborn scatter plots iterating over my datetime values.
The x and y variables are coordinates, and I would like to animate them according to the datetime variable, colored by their "id".
My data set looks like this:
df.head(10)
Out[64]:
date id x y
0 2019-10-09 15:20:01.418 3479 353 118
1 2019-10-09 15:20:01.418 3477 315 92
2 2019-10-09 15:20:01.418 3473 351 176
3 2019-10-09 15:20:01.418 3476 318 176
4 2019-10-09 15:20:01.418 3386 148 255
5 2019-10-09 15:20:01.418 3390 146 118
6 2019-10-09 15:20:01.418 3447 469 167
7 2019-10-09 15:20:03.898 3476 318 178
8 2019-10-09 15:20:03.898 3479 357 117
9 2019-10-09 15:20:03.898 3386 144 257
The plot that should be iterated looks like this:
.
Below is a quick example. You might want to fix the axes limits to make the transitions nicer.
import pandas as pd
import seaborn as sns
import matplotlib.animation
import matplotlib.pyplot as plt
def animate(date):
df2 = df.query('date == #date')
ax = plt.gca()
ax.clear()
return sns.scatterplot(data=df2, x='x', y='y', hue='id', ax=ax)
fig, ax = plt.subplots()
ani = matplotlib.animation.FuncAnimation(fig, animate, frames=df.date.unique(), interval=100, repeat=True)
plt.show()
NB. I assumed that date is sorted in the order of the frames
edit: If using a Jupyter notebook, you should wrap the animation to display it. See for example this post.
from matplotlib import animation
from IPython.display import HTML
import matplotlib.pyplot as plt
import seaborn as sns
xmin, xmax = df.x.agg(['min', 'max'])
ymin, ymax = df.y.agg(['min', 'max'])
def animate(date):
df2 = df.query('date == #date')
ax = plt.gca()
ax.clear() # needed only to keep the points of the current frame
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
return sns.scatterplot(data=df2, x='x', y='y', hue='id', ax=ax)
fig, ax = plt.subplots()
anim = animation.FuncAnimation(fig, animate, frames=df.date.unique(), interval=100, repeat=True)
HTML(anim.to_html5_video())
I will like to know how I can go about plotting a barchart with upper and lower limits of the bins represented by the values in the age_classes column of the dataframe shown below with pandas, seaborn or matplotlib. A sample of the dataframe looks like this:
age_classes total_cases male_cases female_cases
0 0-9 693 381 307
1 10-19 931 475 454
2 20-29 4530 1919 2531
3 30-39 7466 3505 3885
4 40-49 13701 6480 7130
5 50-59 20975 11149 9706
6 60-69 18089 11761 6254
7 70-79 19238 12281 6868
8 80-89 16252 8553 7644
9 >90 4356 1374 2973
10 Unknown 168 84 81
If you want a chart like this:
then you can make it with sns.barplot setting age_classes as x and one columns (in my case total_cases) as y, like in this code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('data.csv')
fig, ax = plt.subplots()
sns.barplot(ax = ax,
data = df,
x = 'age_classes',
y = 'total_cases')
plt.show()
What could be the problem if Matplotlib is printing a line plot twice or multiple like this one:
Here is my code:
import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
from scipy import integrate
def compute_integrated_spectral_response_ikonos(file, sheet):
df = pd.read_excel(file, sheet_name=sheet, header=2)
blue = integrate.cumtrapz(df['Blue'], df['Wavelength'])
green = integrate.cumtrapz(df['Green'], df['Wavelength'])
red = integrate.cumtrapz(df['Red'], df['Wavelength'])
nir = integrate.cumtrapz(df['NIR'], df['Wavelength'])
pan = integrate.cumtrapz(df['Pan'], df['Wavelength'])
plt.figure(num=None, figsize=(6, 4), dpi=80, facecolor='w', edgecolor='k')
plt.plot(df[1:], blue, label='Blue', color='darkblue');
plt.plot(df[1:], green, label='Green', color='b');
plt.plot(df[1:], red, label='Red', color='g');
plt.plot(df[1:], nir, label='NIR', color='r');
plt.plot(df[1:], pan, label='Pan', color='darkred')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
plt.xlabel('Wavelength (nm)')
plt.ylabel('Spectral Response (%)')
plt.title(f'Integrated Spectral Response of {sheet} Bands')
plt.show()
compute_integrated_spectral_response_ikonos('Sorted Wavelengths.xlsx', 'IKONOS')
Here is my dataset.
This is because plotting df[1:] is plotting the entire dataframe as the x-axis.
>>> df[1:]
Wavelength Blue Green Red NIR Pan
1 355 0.001463 0.000800 0.000504 0.000532 0.000619
2 360 0.000866 0.000729 0.000391 0.000674 0.000361
3 365 0.000731 0.000806 0.000597 0.000847 0.000244
4 370 0.000717 0.000577 0.000328 0.000729 0.000435
5 375 0.001251 0.000842 0.000847 0.000906 0.000914
.. ... ... ... ... ... ...
133 1015 0.002601 0.002100 0.001752 0.002007 0.149330
134 1020 0.001602 0.002040 0.002341 0.001793 0.136372
135 1025 0.001946 0.002218 0.001260 0.002754 0.118682
136 1030 0.002417 0.001376 0.000898 0.000000 0.103634
137 1035 0.001300 0.001602 0.000000 0.000000 0.089097
[137 rows x 6 columns]
The slice [1:] just gives the dataframe without the first row. Altering each instance of df[1:] to df['Wavelength'][1:] gives us what I presume is the expected output:
>>> df['Wavelength'][1:]
1 355
2 360
3 365
4 370
5 375
133 1015
134 1020
135 1025
136 1030
137 1035
Name: Wavelength, Length: 137, dtype: int64
Output:
I got the idea to try and visualize data for election donations from the fec website. Basically, I would like to create a stacked bar chart, with the X-axis being the State, Y-axis being the donated amount, and the 'stacks' being the different candidates, showing how much each candidate received from each state.
Code:
import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path
pathName = r"R:\Downloads\indiv20\by_date"
dataDir = Path(pathName)
filename = "itcont_2020_20010425_20190425.txt"
fullName = dataDir / filename
data = pd.read_csv(fullName, low_memory=False, sep="|", usecols=[0, 9, 12, 14])
data.columns = ['Filer ID', 'State', 'Occupation', 'Donation Amount ($)']
data = data.dropna(subset=['Donation Amount ($)'])
donations_by_state = data.groupby('State').sum()
plt.bar(donations_by_state.index, donations_by_state['Donation Amount ($)'])
plt.ylabel('Donation Amount ($)')
plt.xlabel('State')
plt.title('Donations per State')
plt.show()
This plots the total contributions per state, and works great. However, when I try this groupby method to group all the data I want, I'm not sure how to plot a stacked bar chart from this data:
donations_per_candidate_per_state = data['Donation Amount ($)'].groupby([data['State'], data['Filer ID']]).sum()
State Filer ID
AA C00005561 350
C00010603 600
C00042366 115
C00309567 1675
C00331694 2500
C00365536 270
C00401224 4495
C00411330 100
C00492991 300
C00540500 300
C00641381 250
C00696948 2800
C00697441 250
C00699090 67
C00703108 1400
AB C00401224 1386
AE C00000935 295
C00003418 276
C00010603 1750
C00027466 320
C00193433 105
C00211037 251
C00216614 226
C00341396 20
C00369033 150
C00394957 50
C00401224 26538
C00438713 50
C00457325 310
C00492785 300
...
ZZ C00580100 1490
C00603084 95
C00607861 750
C00608380 125
C00618371 2199
C00630665 1000
C00632133 600
C00632398 400
C00639500 208
C00639591 1450
C00640623 6402
C00653816 1000
C00666149 1000
C00666453 2800
C00683102 1000
C00689430 3524
C00693234 13283
C00693713 1000
C00694018 2750
C00694455 12761
C00695510 1045
C00696245 250
C00696419 3000
C00696526 500
C00696948 31296
C00697441 34396
C00698050 350
C00698258 2800
C00699090 5757
C00700732 475
Name: Donation Amount ($), Length: 32662, dtype: int64
It seems to have the data tabulated in the way I need, just not sure how to plot it.
You can use the following as described here:
df = donations_per_candidate_per_state.unstack('Filer ID')
df.plot(kind='bar', stacked=True)
I am new to Python and struggling to solve this one efficiently. I read a number of examples but they were complex and lack of understanding. For the below dataframe, I like to subplot per columns while ignoring the first two i.e Site_ID and Cell_ID:
Availability
VoLTE CSSR
VoLTE Attempts
Each subplot (Availability etc..), will include the "Grouped" Site_ID as legends. Each subplot is saved to a desired location.
Sample Data:
Date Site_ID Cell_ID Availability VoLTE CSSR VoLTE Attempts
22/03/2019 23181 23181B11 100 99.546435 264
03/03/2019 91219 91219A11 100 99.973934 663
17/04/2019 61212 61212A80 100 99.898843 1289
29/04/2019 91219 91219B26 99.907407 100 147
24/03/2019 61212 61212A11 100 99.831425 812
25/04/2019 61212 61212B11 100 99.91107 2677
29/03/2019 91219 91219A26 100 99.980066 1087
05/04/2019 91705 91705C11 100 99.331263 1090
04/04/2019 91219 91219A26 100 99.984588 914
19/03/2019 61212 61212B11 94.21875 99.934376 2318
23/03/2019 23182 23182B11 100 99.47367 195
02/04/2019 91219 91219A26 100 99.980123 958
26/03/2019 23181 23181A11 100 99.48185 543
19/03/2019 61212 61212A11 94.21875 99.777605 1596
18/04/2019 23182 23182B11 100 99.978012 264
26/03/2019 23181 23181C11 100 99.829911 1347
01/03/2019 91219 91219A11 100 99.770661 1499
12/03/2019 91219 91219B11 100 99.832273 1397
19/04/2019 61212 61212B80 100 99.987946 430
12/03/2019 91705 91705C11 100 98.789819 1000
Here is my inefficient solution and given there are over 100 columns, I am quite worried.
#seperates dataframes
Avail = new_df.loc[:,["Site_ID","Cell_ID","Availability"]]
V_CSSR = new_df.loc[:,["Site_ID","Cell_ID","VoLTE CSSR"]]
V_Atte = new_df.loc[:,["Site_ID","Cell_ID","VoLTE Attempts"]]
#plot each dataframe
Avail.groupby("Site_ID")["Availability"].plot(y="Availability", legend = True)
V_CSSR.groupby("Site_ID")["VoLTE CSSR"].plot(y="VoLTE CSSR", legend = True)
V_Atte.groupby("Site_ID")["VoLTE Attempts"].plot(y="VoLTE Attempts", legend = True)
This is the outcome I am after.
Not the best solution, but you can try:
fig, axes = plt.subplots(1,3, figsize=(10,4))
for col, ax in zip(cols, axes):
for site in df.Site_ID.unique():
tmp_df = df[df.Site_ID.eq(site)]
ax.plot(tmp_df.Date, tmp_df[col], label=site)
ax.set_title(col)
ax.legend()
plt.show()
Output: