How to set color range in Matplotlib? - python

Looking for some help on properly coding the color range, and how to set the colors. Currently I am getting the default color range and I am unsure how to change the range and also the colors selected. I assume the code should be either in the plot method or near it. Looking to change the default purple to yellow to a custom set, while also manually setting the ranges.
Sales range from 0-15
Code:
f,ax = plt.subplots(1, figsize=(12,12))
ax = AZ3.plot(column='Sales',ax=ax,edgecolor='black')
f.suptitle('AZ')
lims = plt.axis('equal')
patchList = []
for key in legend_dict:
data_key = mpatches.Patch(color=legend_dict[key], label=key)
patchList.append(data_key)
plt.legend(handles=patchList, loc=3)
plt.savefig('legend.png', bbox_inches='tight')
for idx, row in AZ3.iterrows():
plt.annotate(s=row['Name'], xy=row['coords'],
horizontalalignment='center', color='white', size=(12))
Map:
enter image description here
Sample Data:
FIPS Name State Sales
04007 Gila AZ 1
04027 Yuma AZ 10
04012 La Paz AZ 6
04019 Pima AZ 5
04009 Graham AZ 2
04021 Pinal AZ 7
04025 Yavapai AZ 3
04001 Apache AZ 8
04023 Santa AZ 9
04005 Coco AZ 0
04003 Cochise AZ 0
04011 Green AZ 0
04013 Maricopa AZ 15
04015 Mohave AZ 1
04017 Navajo AZ 4
Thanks,
Justin

Related

How to add bar labels using Matplotlib [duplicate]

This question already has answers here:
How to add value labels on a bar chart
(7 answers)
Closed 10 months ago.
I have the following data frame.
_id message_date_time country
0 {'$oid': '61f7dfd24b11720cdbda5c86'} {'$date': '2021-12-24T12:30:09Z'} RUS
1 {'$oid': '61f7eb7b4b11720cdbda9322'} {'$date': '2021-12-20T21:58:20Z'} RUS
2 {'$oid': '61f7fdad4b11720cdbdb0beb'} {'$date': '2021-12-15T15:29:13Z'} RUS
3 {'$oid': '61f8234f4b11720cdbdbec52'} {'$date': '2021-12-10T00:03:43Z'} USA
4 {'$oid': '61f82c274b11720cdbdc21c7'} {'$date': '2021-12-09T15:10:35Z'} USA
With these values
df["country"].value_counts()
RUS 156
USA 139
FRA 19
GBR 11
AUT 9
AUS 8
DEU 7
CAN 4
BLR 3
ROU 3
GRC 3
NOR 3
NLD 3
SWE 2
ESP 2
CHE 2
POL 1
HUN 1
DNK 1
ITA 1
ISL 1
BIH 1
Name: country, dtype: int64
I'm trying to plot using the country and frequency of it using the following:
plt.figure(figsize=(15, 8))
plt.xlabel("Frequency")
plt.ylabel("Country")
plt.hist(df["country"])
plt.show()
What I need is to show the country frequency above every bar and keep a very small space between the bars.
Arguably the easiest way it to use plt.bar(). For example:
counts = df["country"].value_counts()
names, values = counts.index.tolist(), counts.values.tolist()
plt.bar(names, values)
height_above_bar = 0.05 # distance of count from bar
fontsize = 12 # the fontsize that you want the count to have
for i, val in enumerate(values):
plt.text(i, val + height_above_bar, str(val), fontsize=12)
plt.show()
For this I have used countplot from seaborn as it's better for checking the counts of each object in a series.
plt.figure(figsize = (20,5))
bars = plt.bar(df["country"], df["counts"])
for bar in bars.patches:
plt.annotate(s = bar.get_height(), xy = (bar.get_x() + bar.get_width() / 2, bar.get_height()), va = "bottom", ha = "center")
plt.show()
The output should be something like this,
If you want something else to be on the graph instead of the height, just change the s parameter in the annotate function to a value of your choice.

Pandas Groubpy plotting with unstack()

I have the following code
df = pd.DataFrame({
'type':['john','bill','john','bill','bill','bill','bill','john','john'],
'num':[1006,1004,1006,1004,1006,1006,1006,1004,1004],
'date':[2017,2016,2015,2017,2017,2013,2012,2013,2012],
'pos':[0,0,1,4,0,3,3,8,9],
'force':[5,2,7,10,6,12,4,7,8]})
fig, ax = plt.subplots()
grp=df.sort_values('date').groupby(['type'])
for name, group in grp :
print(name)
print(group)
group.plot(x='date', y='force', label=name)
plt.show()
The result obtained is as follows:
bill
type num date pos force
6 bill 1006 2012 3 4
5 bill 1006 2013 3 12
1 bill 1004 2016 0 2
3 bill 1004 2017 4 10
4 bill 1006 2017 0 6
john
type num date pos force
8 john 1004 2012 9 8
7 john 1004 2013 8 7
2 john 1006 2015 1 7
0 john 1006 2017 0 5
[img1_force_Bill][1]
[img2_Force_john][2]
how can i get 4 Fig, in each one 2 lines:
Fig1 for bill: line1(x=date , y= force) for num(1004)/
line2(x=date , y= force) for num(1006)
Fig2 for bill: line1(x=date , y= pos) for num(1004)/
line2(x=date , y= pos) for num(1006)
Fig3 for john: line1(x=date , y= force) for num(1004)/
line2(x=date , y= force) for num(1006)
Fig4 for john: line1(x=date , y= pos) for num(1004)/
line2(x=date , y= pos) for num(1006)
Let's try this:
df = pd.DataFrame({
'type':['john','bill','john','bill','bill','bill','bill','john','john'],
'num':[1006,1004,1006,1004,1006,1006,1006,1004,1004],
'date':[2017,2016,2015,2017,2017,2013,2012,2013,2012],
'pos':[0,0,1,4,0,3,3,8,9],
'force':[5,2,7,10,6,12,4,7,8]})
fig, ax = plt.subplots(2,2)
axi=iter(ax.flatten())
grp=df.sort_values('date').groupby(['type'])
for name, group in grp :
# print(name)
# print(group)
group.set_index(['date','num'])['force'].unstack().plot(title=name+' - force', ax=next(axi), legend=False)
group.set_index(['date','num'])['pos'].unstack().plot(title=name+ ' - pos', ax=next(axi), legend=False)
plt.tight_layout()
plt.legend(loc='upper center', bbox_to_anchor=(0, -.5), ncol=2)
plt.show()
Output:
Update per comment below:
dfj = df[df['type'] == 'john']
ax = dfj.set_index(['date','num'])['force'].unstack().plot(title=name+' - force', legend=False)
ax.axhline(y=dfj['force'].max(), color='red', alpha=.8)
Chart:
#Scott Boston
.... thank you alot for your help.
unfortunately after using the following code with big data to plot 2 lines
for name, group in grp_new:
axn= group.set_index(['date', 'num'])['pos'].unstack().plot(title= name+' _pos', legend=False)
the plot looks like plot2Lines .They are not continuous plots.I tried to plot single lines and it were ok.

Plot number of observations for categorical groups

I have a data frame that looks like -
id age_bucket state gender duration category1 is_active
1 (40, 70] Jammu and Kashmir m 123 ABB 1
2 (17, 24] West Bengal m 72 ABB 0
3 (40, 70] Bihar f 109 CA 0
4 (17, 24] Bihar f 52 CA 1
5 (24, 30] MP m 23 ACC 1
6 (24, 30] AP m 103 ACC 1
7 (30, 40] West Bengal f 182 GF 0
I want to create a bar plot with how many people are active for each age_bucket and state (top 10). For for gender and category1 I want to create a pie chart with the proportion of active people. The top of the bar should display the total count for active and inactive members and similarly % should be display on pie chart based on is_active.
How to do it in python using seaborn or matplotlib?
I have done so far -
import seaborn as sns
%matplotlib inline
sns.barplot(x='age_bucket',y='is_active',data=df)
sns.barplot(x='category1',y='is_active',data=df)
It sounds like you want to count the observations rather than plotting a value from a column along the yaxis. In seaborn, the function for this is countplot():
sns.countplot('age_bucket', hue='is_active', data=df)
Since the returned object is a matplotlib axis, you could assign it to a variable (e.g. ax) and then use ax.annotate to place text in the the figure manually:
ax = sns.countplot('age_bucket', hue='is_active', data=df)
ax.annotate('1 1', (0, 1), ha='center', va='bottom', fontsize=12)
Seaborn has no way of creating pie charts, so you would need to use matplotlib directly. However, it is often easier to tell counts and proportions from bar charts so I would generally recommend that you stick to those unless you have a specific constraint that forces you to use a pie chart.

How to draw plots on Specific pandas columns

So I have the df.head() being displayed below.I wanted to display the progression of salaries across time spans.As you can see the teams will get repeated across the years and the idea is to
display how their salaries changed over time.So for teamID='ATL' I will have a graph that starts by 1985 and goes all the way to the present time.
I think I will need to select teams by their team ID and have the x axis display time (year) and Y axis display year. I don't know how to do that on Pandas and for each team in my data frame.
teamID yearID lgID payroll_total franchID Rank W G win_percentage
0 ATL 1985 NL 14807000.0 ATL 5 66 162 40.740741
1 BAL 1985 AL 11560712.0 BAL 4 83 161 51.552795
2 BOS 1985 AL 10897560.0 BOS 5 81 163 49.693252
3 CAL 1985 AL 14427894.0 ANA 2 90 162 55.555556
4 CHA 1985 AL 9846178.0 CHW 3 85 163 52.147239
5 ATL 1986 NL 17800000.0 ATL 4 55 181 41.000000
You can use seaborn for this:
import seaborn as sns
sns.lineplot(data=df, x='yearID', y='payroll_total', hue='teamID')
To get different plot for each team:
for team, d in df.groupby('teamID'):
d.plot(x='yearID', y='payroll_total', label='team')
import pandas as pd
import matplotlib.pyplot as plt
# Display the box plots on 3 separate rows and 1 column
fig, axes = plt.subplots(nrows=3, ncols=1)
# Generate a plot for each team
df[df['teamID'] == 'ATL'].plot(ax=axes[0], x='yearID', y='payroll_total')
df[df['teamID'] == 'BAL'].plot(ax=axes[1], x='yearID', y='payroll_total')
df[df['teamID'] == 'BOS'].plot(ax=axes[2], x='yearID', y='payroll_total')
# Display the plot
plt.show()
depending on how many teams you want to show you should adjust the
fig, axes = plt.subplots(nrows=3, ncols=1)
Finally, you could create a loop and create the visualization for every team

subplot by group in python pandas

I wanna make subplots for the following data. I averaged and grouped together.
I wanna make subpolts by country for x-axis resource and y-axis average.
country resource average
india water 76
india soil 45
india tree 60
US water 45
US soil 70
US tree 85
Germany water 76
Germany soil 65
Germany water 56
Grouped = df.groupby(['country','resource'])['TTR in minutes'].agg({'average': 'mean'}).reset_index()
I tried but couldn't plot in subplots
g = df.groupby('country')
fig, axes = plt.subplots(g.ngroups, sharex=True, figsize=(8, 6))
for i, (country, d) in enumerate(g):
ax = d.plot.bar(x='resource', y='average', ax=axes[i], title=country)
ax.legend().remove()
fig.tight_layout()

Categories