Plotting on a large number of facets - python

I want to make a plot similar to the one that appears on the seaborn page: https://seaborn.pydata.org/examples/many_facets.html.
index state_name totalprod
0 28 North Dakota 475085000.0
1 3 California 347535000.0
2 34 South Dakota 266141000.0
3 5 Florida 247048000.0
4 21 Montana 156562000.0
To see the variation during the period in a graph initially, and then get the percentages. I'm trying the following,
grid = sns.FacetGrid(df_bystate, col="state_name", palette="tab20c", col_wrap=4)
grid.map(plt.axhline, y=0, ls=":")
grid.map(plt.plot, [("year", "totalprod")], marker="o")
grid.set(xticks=np.arange(5), yticks=[-5, 5], xlim=(-10, 10), ylim=(-5.5, 5.5))
I get the following error:
ValueError: Buffer has wrong number of dimensions (expected 1, got 3)
But the graphics are not generated with the data that supposed. I'm new to this and I do not know what I'm doing wrong, so I ask for your patience.
Thanks!!

Related

Display all values on a maplotlib barplot

I have a data frame with 20 values, and I am trying to bar.plot it using matplotlib. when I do it, I am not seeing the 20 bars but 10. I have 5 nana values in it and 4 of them.
Here is a sample of dataframe:
Name Bonus
Jack Carpenter 890
John Clegg 653
Mike Holiday 367
Rene Moukad 900
........... ...
my code is standard:
fig,ax = plt.subplots(figsize=(16,6))
plt.bar(df.Name, df.Bonus)
fig.autofmt_xdate(rotation=45)

bokeh error FactorRange must specify a unique list of categorical factors for an axis: duplicate factors found: 'M. Laxmikanth'

My data looks like this
data.iloc[:,1]
41 Barack Obama
95 Muthuvenkatachalam S
16 M. Laxmikanth
21 R S Aggarwal
32 R.S. Aggarwal
58 MTG Editorial Board
7 James Clear
54 Ramesh Singh
70 Nitin Singhania
88 M. Laxmikanth
81 Think Tank of Kiran Institute of…
99 Solimo
2 Wonder House Books
39 Navdeep Kaur
33 Jeff Kinney
Name: Author, dtype: object
We can see both 16 and 88 are the same, which is M. Laxmikant
p = figure(x_range=data.iloc[:,1], plot_width=800, plot_height=550, title="Authors Highest Priced Book", toolbar_location=None, tools="")
p.vbar(x=data.iloc[:,1], top=data.iloc[:,4], width=0.9)
p.xgrid.grid_line_color = None
p.y_range.start = 0
p.xaxis.major_label_orientation = math.pi/2
show(p)
I have imported all the necessary libraries. This gives me the error
ERROR:bokeh.core.validation.check:E-1019 (DUPLICATE_FACTORS): FactorRange must specify a unique list of categorical factors for an axis: duplicate factors found: 'M. Laxmikanth'
How could I fix it?
You need to reduce data.iloc[:,1] to a unique set of values and then order them in the order you want them to appear on the axis. A common way to unique a sequence in Python is to pass it to set.

Plot number of observations for categorical groups

I have a data frame that looks like -
id age_bucket state gender duration category1 is_active
1 (40, 70] Jammu and Kashmir m 123 ABB 1
2 (17, 24] West Bengal m 72 ABB 0
3 (40, 70] Bihar f 109 CA 0
4 (17, 24] Bihar f 52 CA 1
5 (24, 30] MP m 23 ACC 1
6 (24, 30] AP m 103 ACC 1
7 (30, 40] West Bengal f 182 GF 0
I want to create a bar plot with how many people are active for each age_bucket and state (top 10). For for gender and category1 I want to create a pie chart with the proportion of active people. The top of the bar should display the total count for active and inactive members and similarly % should be display on pie chart based on is_active.
How to do it in python using seaborn or matplotlib?
I have done so far -
import seaborn as sns
%matplotlib inline
sns.barplot(x='age_bucket',y='is_active',data=df)
sns.barplot(x='category1',y='is_active',data=df)
It sounds like you want to count the observations rather than plotting a value from a column along the yaxis. In seaborn, the function for this is countplot():
sns.countplot('age_bucket', hue='is_active', data=df)
Since the returned object is a matplotlib axis, you could assign it to a variable (e.g. ax) and then use ax.annotate to place text in the the figure manually:
ax = sns.countplot('age_bucket', hue='is_active', data=df)
ax.annotate('1 1', (0, 1), ha='center', va='bottom', fontsize=12)
Seaborn has no way of creating pie charts, so you would need to use matplotlib directly. However, it is often easier to tell counts and proportions from bar charts so I would generally recommend that you stick to those unless you have a specific constraint that forces you to use a pie chart.

How to draw plots on Specific pandas columns

So I have the df.head() being displayed below.I wanted to display the progression of salaries across time spans.As you can see the teams will get repeated across the years and the idea is to
display how their salaries changed over time.So for teamID='ATL' I will have a graph that starts by 1985 and goes all the way to the present time.
I think I will need to select teams by their team ID and have the x axis display time (year) and Y axis display year. I don't know how to do that on Pandas and for each team in my data frame.
teamID yearID lgID payroll_total franchID Rank W G win_percentage
0 ATL 1985 NL 14807000.0 ATL 5 66 162 40.740741
1 BAL 1985 AL 11560712.0 BAL 4 83 161 51.552795
2 BOS 1985 AL 10897560.0 BOS 5 81 163 49.693252
3 CAL 1985 AL 14427894.0 ANA 2 90 162 55.555556
4 CHA 1985 AL 9846178.0 CHW 3 85 163 52.147239
5 ATL 1986 NL 17800000.0 ATL 4 55 181 41.000000
You can use seaborn for this:
import seaborn as sns
sns.lineplot(data=df, x='yearID', y='payroll_total', hue='teamID')
To get different plot for each team:
for team, d in df.groupby('teamID'):
d.plot(x='yearID', y='payroll_total', label='team')
import pandas as pd
import matplotlib.pyplot as plt
# Display the box plots on 3 separate rows and 1 column
fig, axes = plt.subplots(nrows=3, ncols=1)
# Generate a plot for each team
df[df['teamID'] == 'ATL'].plot(ax=axes[0], x='yearID', y='payroll_total')
df[df['teamID'] == 'BAL'].plot(ax=axes[1], x='yearID', y='payroll_total')
df[df['teamID'] == 'BOS'].plot(ax=axes[2], x='yearID', y='payroll_total')
# Display the plot
plt.show()
depending on how many teams you want to show you should adjust the
fig, axes = plt.subplots(nrows=3, ncols=1)
Finally, you could create a loop and create the visualization for every team

Using matplotlib to obtain an overlaid histogram

I am new to python and I'm trying to plot an overlaid histogram for a manipulated data set from Kaggle. I tried doing it with matplotlib. This is a dataset that shows the history of gun violence in USA in recent years. I have selected only few columns for EDA.
import pandas as pd
data_set = pd.read_csv("C:/Users/Lenovo/Documents/R related
Topics/Assignment/Assignment_day2/04 Assignment/GunViolence.csv")
state_wise_crime = data_set[['date', 'state', 'n_killed', 'n_injured']]
date_value = pd.to_datetime(state_wise_crime['date'])
import datetime
state_wise_crime['Month']= date_value.dt.month
state_wise_crime.drop('date', axis = 1)
no_of_killed = state_wise_crime.groupby(['state','Year'])
['n_killed','n_injured'].sum()
no_of_killed = state_wise_crime.groupby(['state','Year']
['n_killed','n_injured'].sum()
I want an overlaid histogram that shows the no. of people killed and no.of people injured with the different states on the x-axis
Welcome to Stack Overflow! From next time, please post your data like in below format (not a link or an image) to make us easier to work on the problem. Also, if you ask about a graph output, showing the contents of desired graph (even with hand drawing) would be very helpful :)
df
state Year n_killed n_injured
0 Alabama 2013 9 3
1 Alabama 2014 591 325
2 Alabama 2015 562 385
3 Alabama 2016 761 488
4 Alabama 2017 856 544
5 Alabama 2018 219 135
6 Alaska 2014 49 29
7 Alaska 2015 84 70
8 Alaska 2016 103 88
9 Alaska 2017 70 69
As I commented in your original post, a bar plot would be more appropriate than histogram in this case since your purpose appears to be visualizing the summary statistics (sum) of each year with state-wise comparison. As far as I know, the easiest option is to use Seaborn. It depends on how you want to show the data, but below is one example. The code is as simple as below.
import seaborn as sns
sns.barplot(x='Year', y='n_killed', hue='state', data=df)
Output:
Hope this helps.

Categories