How do a plot a pie chart using Pandas with columns total - python

I would like to do a pie chart based on the columns values with the whole sums per console,
So total units % for the whole year for Megadrive, Super Nintendo and NES and ignoring the other two columns using Python.
Consoles Units Sold
Quarter Megadrive Super Nintendo NES Total Units Total Sales
Q1 1230 7649 765 9644 316500
Q2 345 3481 874 4700 274950
Q3 654 2377 1234 4265 337050
Q4 1234 6555 2666 10455 334050
But I can only find the way using all columns or by rows.
Thanks

Related

Creating single chart from three categoric values using python

I am fairly new to python and its terminology and can be clumsy at describing the problem.Sorry for that.
What I got is three cities that produced three fruits for two years, and I need to draw the single-static chart that summarizes the data best.
The fact that dataframe have 3 categoric values (city, fruits and year) and one measure makes me confused.
At first I try to use stack bar chart, however If I use fruits in the bars and cities in X axis, I could not find where to use year value.
I tried to use pivot method to convert year value into measure, but I could not advance with two measures this time.
I mainly used Matplotlib.
Any help appreciated,
data= {
'city':['amsterdam','amsterdam','amsterdam','amsterdam','amsterdam','amsterdam','paris','paris','paris','paris','paris','paris','berlin','berlin','berlin','berlin','berlin','berlin'],
'fruits':['apples','oranges','bananas','apples','oranges','bananas','apples','oranges','bananas','apples','oranges','bananas','apples','oranges','bananas','apples','oranges','bananas'],
'year':[2000,2000,2000,2001,2001,2001,2000,2000,2000,2001,2001,2001,2000,2000,2000,2001,2001,2001],
'amount':[384,289,347,242,390,274,175,334,245,116,252,366,255,400,300,240,600,180]
}
df=pd.DataFrame(data)
df.head()
city
fruits
year
amount
0
paris
apples
2000
384
1
paris
oranges
2000
289
2
paris
bananas
2000
347
3
paris
apples
2001
242
4
paris
oranges
2001
390

Comparing Values in Multiple Columns and Returning all Declining Regions

I have a data frame that is similar to the following, and lets say I have sales amounts for different regions for two different years:
Company
2021 Region 1 Sales
2021 Region 2 Sales
2020 Region 1 Sales
2020 Region 2 Sales
Company 1
300000
150000
250000
149000
Company 2
10000
17000
100000
80000
Company 3
12000
20000
22000
90000
I would like to compare each region for each year to determine which regions have declined in 2021. One caveat is that the regional sales have to be at least $25,000 to be counted. Therefore, I am looking to add a new column with all of the region names that had less than $25,000 in sales in 2021, but more than $25,000 in 2020. The output would look like this, although there will be more columns or "regions" to compare than 2.
Company
2021 Region 1 Sales
2021 Region 2 Sales
2020 Region 1 Sales
2020 Region 2 Sales
2021 Lost Regions
Company 1
300000
150000
250000
149000
None
Company 2
10000
17000
100000
80000
Region 1; Region 2
Company 3
12000
20000
22000
90000
Region 2
Thank you in advance for any assistance, and no rush on this. Hopefully there is a concise way to do this without using if-then and writing out a lot of combinations.
number_of_regions = 2 # You have to change this
def find_declined_regions(row):
result = []
for i in range(1, number_of_regions+1):
if row[f"2021 Region {i} Sales"] < 25000 and row[f"2020 Region {i} Sales"] > 25000:
result.append(f"Region {i}")
return "; ".join(result)
df.apply(find_declined_regions, axis=1)
df is your DataFrame and you have to change number_of_regions based on your problem.
EDIT:
if columns names are all different, There art two cases:
1- You have a list of all regions, so you can do this:
for region in all_regions:
if row[f"2021 {region} Sales"] < 25000 and row[f"2020 {region} Sales"] > 25000:
2- You don't have a list of all regions, so you have to create one:
all_regions = [col[5:-6] for col in df.columns[1:int(len(df.columns)/2)+1]]

How do you sum up rows in Pandas based on conditions for multuples columns and remove the duplicates?

First let me apologise for the long winded question. I've struggled to find an answer on Stackoverflow that addresses my specific issue.
I am new to Pandas and Python programming so I would appreciate all the help I can get.
I have a dataframe:
ID Name Colour Power Year Money (millions)
0 1234567 Tony Stark Red Genius 2020 20000
1 9876543 Peter Parker Red Spider 2021 75
2 1415926 Miles Morales Green Spider 2021 55
3 7777777 Dante Brisco Blue hybrid 2020 3
4 4355681 Thor Odinson Blue Lightning 2020 655
5 1928374 Bruce Wayne Yellow Bat 2021 12000
6 5555555 Eddie Brock Black Symbiote 2021 755
7 8183822 Billie Butcher Yellow V 2021 34
8 6666654 Ian Wilm Red Lightning 2020 34
9 4241111 Harry Potter Green Wizard 2020 24
10 7765434 Blu Malk Red Wizard 2021 77
11 6464647 Yu Hant Black Wizard 2021 65
I want to create a new df that looks like this:
**Colour Total Year 2020 Year 2021**
Red 20186 20034 152
Green 79 24 55
Blue 658 658 -------
Yellow 12034 ------- 12034
Black 820 ------- 820
Where the "Colour" column becomes the new primary key/ID, the duplicates are removed and the values per year are summed up along with an overall total. I have managed to sum up the Total but I am struggling to write a function that will sum up rows by year and than assign the sum to the respective colour. I would eventually like to Create new columns based on calculations from the Yearly columns (percentages)
Here is what I have after creating the DF from an excel file :
#This line helps me calculate the total from the old df.
df['Total'] = df.groupby(['Colour'])['Money (millions)'].transform('sum')
#This line drops the duplicates from the line above. So now I have a total column that matches the #Colours
new_df = df.drop_duplicates(subset=['Colour'])
When I repeat the process for the Yearly column using the same technique it sums up the overall total for the whole year and assigns it to every colour.
I would eventually like to Create new columns based on calculations from the Yearly columns (percentages) e.g.
new_df['Success Rate'] = new_df['Total'].apply(lambda x: (x/100)*33)
I'd be grateful for any help provided :)
You can use:
df = pd.pivot_table(df, index='Colour', values='Money (millions)', columns='Year', aggfunc='sum', margins=True)
df
Out[1]:
Year 2020 2021 All
Colour
Black NaN 820.0 820
Blue 658.0 NaN 658
Green 24.0 55.0 79
Red 20034.0 152.0 20186
Yellow NaN 12034.0 12034
All 20716.0 13061.0 33777
I think this is pivot_table with margins:
df.pivot_table(index='Colour', columns='Year',
values='Money (millions)',
aggfunc='sum',
margins_name='Total',
margins=True)
OUtput:
Year 2020 2021 Total
Colour
Black NaN 820.0 820
Blue 658.0 NaN 658
Green 24.0 55.0 79
Red 20034.0 152.0 20186
Yellow NaN 12034.0 12034
Total 20716.0 13061.0 33777

Pandas Python - Grouping counts to others

I am conducting data analysis for a project using python and pandas where I have the following data:
The numbers are the count.
USA: 5000
Canada: 7000
UK: 6000
France: 6500
Spain: 4000
Japan: 5
China: 7
Hong Kong: 10
Taiwan: 6
New Zealand: 8
South Africa: 11
My task is to make a pie chart that represent the count.
df['Country'].value_counts().plot.pie()
What I will get is a pie chart, but I would like to combined the countries with smaller counts and put them into a category like other.
How can I do that?
IIUC using np.where setting the boundary , then groupby + sum , notice here I am using pandas.Series.groupby
s=df['Country'].value_counts()
s.groupby(np.where(s>=4000,s.index,'other')).sum()#.plot.pie()
Out[64]:
Canada 7000
France 6500
Spain 4000
UK 6000
USA 5000
other 47

Plotting pandas groupby

I have a dataframe with some car data - the structure is pretty simple. I have an ID, the year of production, the kilometers, the price and the fuel type (petrol/diesel).
In [106]:
stack.head()
Out[106]:
year km price fuel
0 2003 165.286 2.350 petrol
1 2005 195.678 3.350 diesel
2 2002 125.262 2.450 petrol
3 2002 161.000 1.999 petrol
4 2002 164.851 2.599 diesel
I am trying to produce a chart with pylab/matplotlib where the x-axis will be the year and then, using groupby, to have two plots (one for each fuel type) with averages by year (mean function) for price and km.
Any help would be appreciated.
Maybe there's a more straight way to do it, but I would do the following. First groupby and take the means for price:
meanprice = df.groupby(['year','fuel'])['price'].mean().reset_index()
and for km:
meankm = df.groupby(['year','fuel'])['km'].mean().reset_index()
Then I would merge the two resulting dataframes to get all data in one:
d = pd.merge(meanprice,meankm,on=['year','fuel']).set_index('year')
Setting the index as year ley us get the things easy while plotting with pandas. The resulting dataframe is:
fuel price km
year
2002 diesel 2.5990 164.851
2002 petrol 2.2245 143.131
2003 petrol 2.3500 165.286
2005 diesel 3.3500 195.678
at the end you can plot filtering by fuel:
d[d['fuel']=='diesel'].plot(kind='bar')
d[d['fuel']=='petrol'].plot(kind='bar')
obtaining something like:
I don't know if it is the kind of plot you expected, but you can easily modify them with the kind keyword. Hope that helps.

Categories