I've created a new row for storing mean values of all columns. Now I'm trying to assign name to the very first cell of the new row
I've tried the conventional method of assigning value by pointing to the cell index. It doesn't return any error but it doesn't seems to store the value in the cell.
Items Description Duration China Japan Korea
0 GDP 2012-2013 40000 35000 12000
1 GDP 2013-2014 45000 37000 12500
2 NAN NAN 42500 36000 12250
data11.loc[2,'Items Description'] = 'Average GDP'
Instead of returning below dataframe the code is still giving the previous output.
Items Description Duration China Japan Korea
0 GDP 2012-2013 40000 35000 12000
1 GDP 2013-2014 45000 37000 12500
2 Average GDP NAN 42500 36000 12250
For me working nice, but here are 2 alternatives for set value by last row and column name.
First is DataFrame.loc with specify last index value by indexing:
data11.loc[data11.index[-1], 'Items Description'] = 'Average GDP'
Or DataFrame.iloc with -1 for get last row and Index.get_loc for get position of column Items Description:
data11.iloc[-1, data11.columns.get_loc('Items Description')] = 'Average GDP'
print (data11)
Items Description Duration China Japan Korea
0 GDP 2012-2013 40000 35000 12000
1 GDP 2013-2014 45000 37000 12500
2 Average GDP NAN 42500 36000 12250
Related
There are 2 dataframes, and they have simillar data.
A dataframe
Index Business Address
1 Oils Moskva, Russia
2 Foods Tokyo, Japan
3 IT California, USA
... etc.
B dataframe
Index Country Country Calling Codes
1 USA +1
2 Egypt +20
3 Russia +7
4 Korea +82
5 Japan +81
... etc.
I will add a column named 'Country Calling Codes' to A dataframe, too.
After this, 'Country' column in B dataframe will be compared with the data of 'Address' column. If the string of 'A.Address' includes string of 'B.Country', 'B.Country Calling Codes' will be inserted to 'A.Country Calling Codes' of compared row.
Result is:
Index Business Address Country Calling Codes
1 Oils Moskva, Russia +7
2 Foods Tokyo, Japan +81
3 IT California, USA +1
I don't know how to deal with the issue because I don't have much experience using pandas. I should be very grateful to you if you might help me.
Use Series.str.extract for get possible strings by Country column and then Series.map by Series:
d = B.drop_duplicates('Country').set_index('Country')['Country Calling Codes']
s = A['Address'].str.extract(f'({"|".join(d.keys())})', expand=False)
A['Country Calling Codes'] = s.map(d)
print (A)
Index Business Address Country Calling Codes
0 1 Oils Moskva, Russia +7
1 2 Foods Tokyo, Japan +81
2 3 IT California, USA +1
Detail:
print (A['Address'].str.extract(f'({"|".join(d.keys())})', expand=False))
0 Russia
1 Japan
2 USA
Name: Address, dtype: object
I have a df named population with a column named countries. I want to merge rows so they reflect regions = ( africa, west hem, asia, europe, mideast). I have another df named regionref from kaggle that have all countries of the world and the region they are associated with.
How do I create a new column in the population df that has the corresponding regions for the countries in the country column, using the region column from the kaggle dataset.
so essentially this is the population dataframe
CountryName 1960 1950 ...
US
Zambia
India
And this is the regionref dataset
Country Region GDP...
US West Hem
Zambia Africa
India Asia
And I want the population df to look like
CountryName Region 1960 1950 ...
US West Hem
Zambia Africa
India Asia
EDIT: I tried the concatenation but for some reason the two columns are not recognizing the same values
population['Country Name'].isin(regionref['Country']).value_counts()
This returned False for all values, as in there are no values in common.
And this is the output, as you can see there are values in common
You just need a join functionality, or to say, concatenate, in pandas way.
Given two DataFrames pop, region:
pop = pd.DataFrame([['US', 1000, 2000], ['CN', 2000, 3000]], columns=['CountryName', 1950, 1960])
CountryName 1950 1960
0 US 1000 2000
1 CN 2000 3000
region = pd.DataFrame([['US', 'AMER', '5'], ['CN', 'ASIA', '4']], columns = ['Country', 'Region', 'GDP'])
Country Region GDP
0 US AMER 5
1 CN ASIA 4
You can do:
pd.concat([region.set_index('Country'), pop.set_index('CountryName')], axis = 1)\
.drop('GDP', axis =1)
Region 1950 1960
US AMER 1000 2000
CN ASIA 2000 3000
The axis = 1 is for concatenating horizontally. You have to set column index for joining it correctly.
I have a pandas dataframe which looks like this:
Country Sold
Japan 3432
Japan 4364
Korea 2231
India 1130
India 2342
USA 4333
USA 2356
USA 3423
I have use the code below and get the sum of the "sold" column
df1= df.groupby(df['Country'])
df2 = df1.sum()
I want to ask how to calculate the percentage of the sum of "sold" column.
You can get the percentage by adding this code
df2["percentage"] = df2['Sold']*100 / df2['Sold'].sum()
In the output dataframe, a column with the percentage of each country is added.
We can divide the original Sold column by a new column consisting of the grouped sums but keeping the same length as the original DataFrame, by using transform
df.assign(
pct_per=df['Sold'] / df.groupby('Country').transform(pd.DataFrame.sum)['Sold']
)
Country Sold pct_per
0 Japan 3432 0.440226
1 Japan 4364 0.559774
2 Korea 2231 1.000000
3 India 1130 0.325461
4 India 2342 0.674539
5 USA 4333 0.428501
6 USA 2356 0.232991
7 USA 3423 0.338509
Simple Solution
You were almost there.
First you need to group by country
Then create the new percentage column (by dividing grouped sales with sum of all sales)
# reset_index() is only there because the groupby makes the grouped column the index
df_grouped_countries = df.groupby(df.Country).sum().reset_index()
df_grouped_countries['pct_sold'] = df_grouped_countries.Sold / df.Sold.sum()
Are you looking for the percentage after or before aggregation?
import pandas as pd
countries = [['Japan',3432],['Japan',4364],['Korea',2231],['India',1130], ['India',2342],['USA',4333],['USA',2356],['USA',3423]]
df = pd.DataFrame(countries,columns=['Country','Sold'])
df1 = df.groupby(df['Country'])
df2 = df1.sum()
df2['percentage'] = (df2['Sold']/df2['Sold'].sum()) * 100
df2
I have this dataframe, I want to create a column that shows the % change for the Amount from the Period to Current Period, while grouping by Company_Id, Country, and Period.
Company_Id Country Period Amount
MOO17 USA Previous Period 500
KQR20 UK Previous Period 1000
KQR20 UK Current Period 20000
ABY88 Ireland Previous Period 1000
ABY88 Ireland Current Period 250
SOQ99 Japan Previous Period 8000
SOQ99 Japan Current Period 25000
RTU89 China Current Period 20000
RTU89 China Previous Period 1000
WER67 Canada Current Period 5000
WER67 Canada Previous Period 20000
I have tried the following:
df['desired']= df['Amount'] / df.groupby(['Company_Id','Country','Period])['Amount'].shift(1)
df= df.sort_values(by=['Company_Id','Country','Period],ascending=[True, True, False],inplace=True)
df['desired'] = df.groupby(['Company_Id','Country','Period])['Amount].pct_change()
I keep getting nans or values that don't align with the groupings I need.
Desired Output:
Company_Id Country Period Amount Desired
MOO17 USA Previous Period 500 na
KQR20 UK Previous Period 1000 na
KQR20 UK Current Period 20000 1900%
ABY88 Ireland Previous Period 1000 na
ABY88 Ireland Current Period 250 -75%
SOQ99 Japan Previous Period 8000 na
SOQ99 Japan Current Period 25000 212.5%
RTU89 China Current Period 20000 na
RTU89 China Previous Period 1000 -95%
WER67 Canada Current Period 5000 na
WER67 Canada Previous Period 20000 300%
df2['desired'] = df2.groupby(['Company_Id','Country'])['Amount'].pct_change()*100
If you want to add the percentage symbol, you can do as below. But, it will change the data type of the column to object from float64
df2['desired'] = (df2.groupby(['Company_Id','Country'])['Amount'].pct_change()*100).astype(str) + '%'
I have a pandas dataframe that contains budget data but my sales data is located in another dataframe that is not the same size. How can I get my sales data updated in my budget data? How can I write conditions so that it makes these updates?
DF budget:
cust type loc rev sales spend
0 abc new north 500 0 250
1 def new south 700 0 150
2 hij old south 700 0 150
DF sales:
cust type loc sales
0 abc new north 15
1 hij old south 18
DF budget outcome:
cust type loc rev sales spend
0 abc new north 500 15 250
1 def new south 700 0 150
2 hij old south 700 18 150
Any thoughts?
Assuming that 'cust' column is unique in your other df, you can call map on the sales df after setting the index to be the 'cust' column, this will map for each 'cust' in budget df to it's sales value, additionally you will get NaN where there are missing values so you call fillna(0) to fill those values:
In [76]:
df['sales'] = df['cust'].map(df1.set_index('cust')['sales']).fillna(0)
df
Out[76]:
cust type loc rev sales spend
0 abc new north 500 15 250
1 def new south 700 0 150
2 hij old south 700 18 150