I have tried multiple codes to condition the bar plot colour to a particular value. It seems that the color function only checks for the first item in the index (in this case Germany) and sets the condition for all other items in the index. I would really appreciate if anybody could help:
colors = ['red' if 'Germany' else 'lightgrey' for x in first5_countries.index] #it colors all bars red
colors = ['r' if 'IT' else 'b' for index in first5_countries.index] #it colors everything red
colors = ['r' if pop_mln>85 else 'b' for pop_mln in first5_countries.pop_mln] #all bars blue
colors = ['r' if index=='Italy' else 'b' for index in first5_countries.index] #all bars blue
colors = ['b', 'b', 'b', 'r', 'b'] #yields blue
The whole code:
sorted_df = population_2019.sort_values(by='pop_mln', ascending=False)
first5_countries = sorted_df[:5]
colors = ['r' if index=='Italy' else 'b' for index in first5_countries.index]
first5_countries[['pop_mln']].plot.bar(figsize=(20,5), legend=False, color=colors)
plt.ylabel('Total population (in million)', size=12)
plt.xticks(rotation=30, ha='right')
plt.xlabel('')
plt.grid(axis='y')
plt.show()
Printout of first5_countries:
geo sex age year total_pop pop_mln
geo_full
Germany DE T TOTAL 2019 83019213.0 83.019213
France FR T TOTAL 2019 67012883.0 67.012883
United Kingdom UK T TOTAL 2019 66647112.0 66.647112
Italy IT T TOTAL 2019 60359546.0 60.359546
Spain ES T TOTAL 2019 46937060.0 46.937060
population
first5_countries.index.values
array(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain'],
dtype=object)
You can define your colors like this:
colors = ['red' if x=='Italy' else 'lightgray' for x in first5_countries.index]
And then pass to the plot function:
first5_countries['population_mln'].plot.bar(figsize=(20,5),color=colors, legend=False)
Together, you would do:
colors = ['red' if x=='Italy' else 'lightgray' for x in first5_countries.index]
first5_countries['pop_mln'].plot.bar(figsize=(20,5),color=colors, legend=False)
Output would be something like this:
Related
I want to return only the top 5 teams that gained the maximum medals (gold, silver and Bronze) in the plot.
medal_ranks = olympics[olympics['Medal'] != 'NaN'].groupby(['NOC', 'Year', 'Sport', 'Event', 'Season'])
medal_ranks = medal_ranks.first()
medal_ranks = medal_ranks.reset_index()
medal_ranks['NOC'].value_counts().head(5)
medal_colors = ['darkgoldenrod', 'gold', 'silver']
cross = pd.crosstab(medal_ranks.NOC, medal_ranks.Medal).plot(kind='bar',color = medal_colors,
stacked=True, figsize=(18,5))
(The Teams that I want to show:
USA 5261
GBR 4152
FRA 4135
ITA 3738
CAN 3649)
I'm a begginer in the subject and didn't find anything to help me here so far.
I'm struggling in grouping my data and then filtering with a value I need.
Like the example,
I need to Know, for example, how many Red Cars Juan bought.
(Red Cars sells for each client).
When I try, I loose the group or the filter, I can't do both.
Can someone help me or suggest a post please?
Edit1.
With the help of the community, I find this as my solution:
df = df.loc[:, df.columns.intersection(['Name', 'Car colour', 'Amount'])]
df = df.query('Car colour == Red')
df.groupby(['Name', 'Car colour'])['Amount'].sum().reset_index()
If you want to consider amount sold by group of Name and Car_color then try
df.groupby(['Name', 'Car colour'])['Amount'].sum().reset_index()
# Name Car colour Amount
0 Juan green 1
1 Juan red 3
2 Wilson blue 1
3 carlos yellow 1
GroupBy.sum
df.groupby(['Name','Car Color']).sum()
output:
import pandas as pd
data = {"Name": ["Juan", "Wilson", "Carlos", "Juan", "Juan", "Wilson", "Juan", "Carlos"],
"Car Color": ["Red", "Blue", "Yellow", "Red", "Red", "Red", "Red", "Green"],
"Amount": [24, 28, 40, 22, 29, 33, 31, 50]}
df = pd.DataFrame(data)
print(df)
You can group by multiple columns by passing a list of column names to the groupby function, then taking the sum of each group.
import pandas as pd
df = pd.DataFrame({'Name': ['Juan', 'Wilson', 'Carlos', 'Juan', 'Juan', 'Wilson', 'Juan'],
'Car Color': ['Red', 'Blue', 'Yellow', 'Red', 'Red', 'Red', 'Green'],
'Amount': [1, 1, 1, 1, 1, 1, 1]})
print(df)
agg_df = df.groupby(['Name', 'Car Color']).sum()
print(agg_df)
Output:
Name Car Color
Carlos Yellow 1
Juan Green 1
Red 3
Wilson Blue 1
Red 1
Note that the resulting dataframe has a multi-index, so you can get the number of red cars that Juan bought by passing a tuple of values to loc.
cars = agg_df.loc[[('Juan', 'Red')]]
print(cars)
Output:
Amount
Name Car Color
Juan Red 3
I have a series as below:
Country
India 691904
China 659962
United Kingdom of Great Britain and Northern Ireland 551500
Philippines 511391
Pakistan 241600
United States of America 241122
Iran (Islamic Republic of) 175923
Sri Lanka 148358
Republic of Korea 142581
After i plotted a horizontal bar graph, the graph x-axis was arranged from low to high
Q:
1) How can I arrange the y-axis from High to Low?
2) Is there any easier way to colour all bar grey except for highest value (in this case India) without typing all the country name?
Thanks!
ax = top15.plot(kind='barh',
figsize=(20,10),
color = {'red':'India'})
ax.set_title('Immigration from 1980 - 2013')
ax.set_xlabel('Total Number')
Firstly use sort_values():
top15=top15.sort_values(ascending=True)
Finally:
color=['0.8','0.8','0.8','0.8','0.8','0.8','0.8','0.8','r']
#created a list of colors
ax = top15.plot(kind='barh',
figsize=(20,10),
color = color)
ax.set_title('Immigration from 1980 - 2013')
ax.set_xlabel('Total Number')
Output:
Let's say we have this df
d = pd.DataFrame({'year': [2010, 2020, 2010], 'colors': ['red', 'white', 'blue'], "shirt" : ["red shirt", "green and red shirt", "yellow shirt"] })
like this:
year colors shirt
0 2010 red red shirt
1 2020 white green and red shirt
2 2010 blue yellow shirt
I want to filter out rows in which the "shirt" column contains the "colors" column also considering the "year" column
desired output:
year colors shirt
0 2010 red red shirt
I tried this d[(d.year == 2010) & (d.shirt.str.contains(d.colors))] but I am getting this error:
'Series' objects are mutable, thus they cannot be hashed
It is a big df that I am working on. How can I solve with some pandas function?
I believe you need df.apply
Ex:
df = pd.DataFrame({'year': [2010, 2020, 2010], 'colors': ['red', 'white', 'blue'], "shirt" : ["red shirt", "green and red shirt", "yellow shirt"] })
print(df[(df.year == 2010) & df.apply(lambda x: x.colors in x.shirt, axis=1)])
Output:
year colors shirt
0 2010 red red shirt
I have created a matplotlib pie chart:
df.plot(kind='pie', subplots=True, figsize=(6, 4))
My dataframe consists of two columns - Country and Value (% distribution) and has about 25 countries listed. I would like to only plot the top 10 countries by values (by highest %) and within the plot, calculate the remaining countries % value and give it the title of 'All Other Countries'. How do I do this using matplotlib using the .plot function?
Country Value
Albania 4%
Brazil 3%
Denmark 5%
France 10%
Mexico 3%
Nigeria 15%
Spain 4%
U.S. 5%
As already stated in the comments, the best way to do this is probably to do the manipulations before plotting. Here's a way how to do it:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
countries = [
'Albania',
'Brazil',
'Denmark',
'France',
'Mexico',
'Nigeria',
'Spain',
'Germany',
'Finland',
]
#the full dataframe
df = pd.DataFrame(
data = {'country': countries, 'value' :np.random.rand(len(countries))},
).sort_values('value', ascending = False)
#the top 5
df2 = df[:5].copy()
#others
new_row = pd.DataFrame(data = {
'country' : ['others'],
'value' : [df['value'][5:].sum()]
})
#combining top 5 with others
df2 = pd.concat([df2, new_row])
#plotting -- for comparison left all countries and right
#the others combined
fig, axes = plt.subplots(nrows = 1, ncols = 2, figsize = (9,4))
df.plot(kind = 'pie', y = 'value', labels = df['country'], ax = axes[0])
df2.plot(kind = 'pie', y = 'value', labels = df2['country'], ax = axes[1])
axes[0].set_title('all countries')
axes[1].set_title('top 5')
plt.show()
The result looks like this.
Hope this helps.