enter image description here
enter image description here
Hi Dear Friends,
I have a little problem while trying to pivot my dataframe
I tried to put the column 0 as the columns and the column 1 as values.
But that doesn't give me what i expected.
I don't know why there are NaN everywhere
Anyone can help me to structure this?
Thank you
It look like that pivot() with setting only columns= and values= creates a cross table with index (0,1,2,…) and the column.
How about preparing new columns with same value and setting the column to index= of pivot() as following?
columns1['tmp'] = 0
columns1.pivot(columns=0 , values=1 , index='tmp').reset_index(drop=True)
Related
How can I get the data from this dataframe into 2 rows only, deleting the NaN? (I concated 3 different Dataframes into a new one, showing averages from another Dataframe)enter image description here
This is what i want to achieve:
0 Bitcoin (BTC) 36568.673315 5711.3.059220. 1.229602e+06
1 Ethereum (ETH) 2550.870272 670225.756425 8.806719e+05
It can either be in a new dataframe or using the old one. Thank you so much for ur help :)
Try this:
df.bfill(axis ='rows', inplace=True) # filling the missing values
df.dropna(inplace=True) # drop rows with Nulls
I have the following dataframe :
And I was wondering how to get :
As you can see blue rows are subrows and the idea is to group them together depending on the name :
I tried :
DFTest= pd.read_excel("XXXXXXXXXXX/Test.xlsx")
DFTest.groupby(['Name'], as_index=False).sum().reset_index(drop=True)
But This does delete the blank rows (0,1,2,5,6,7).
How would I group subrows together and keep Blank rows as they are ?
This does the job:
grouped_df = df.groupby("Name", as_index = False)
df_sum = grouped_df.agg(np.sum)
pd.concat([df[df["Numb2"].isna()], df_sum])
Firstly I get the sum of all the values of the Numb2 column and then concatenate this new dataframe with the rows that have an NaN value in the Numb2 column.
This dataframe won't be the same as the one in the image you shared. But I don't think that'll be any problem.
But if is a problem then use the code below to get the dataframe sorted,
new_df.sort_values(by = "Name")
I hope this helped you!
I try to create a pivot table in pandas that should only show features in a grouped column if the features have a minimum number. Eg: When I group features in column Level_1, it should only included features that have a minimum of 3 appearances in column "Level_2", so I don´t have any grouped features with only 1 or 2 appearances. I want to use this df for a plot sunburst chart and such small numbers make the chart impossible to read.
I have created the following line of code, that only creates a boolean mask, it seems, but not the reduced dataframe I am looking for.
df_new = df.groupby('Level_1').agg({'Level_2': 'count'}) > 2
What generates a useless df full of NaNs is:
df_new = df[df.groupby('Level_1').agg({'Level_2': 'count'}) > 2]
What is needed to filter the df to leave only Level_1 features with a quantity of at least 3 each?
Thank you!
Try this instead. In the future, please attach an example df, and a resulting df that you are trying to get to.
df_new = df.loc[df.groupby('Level_1')['Level_2'].transform('count').gt(2)]
After using the groupby and sum operations as follows:
companyGrouped = dailyStocks.groupby(['SYMBOL'])
sumByCompany = companyGrouped.sum()
I end up with a new row for the group by and sum key, this is undesirable as I later want to merge this with another dataframe using [SYMBOL]. AN image of the table, obtained using: sumByCompany.head() is shown below.
I've tried a few things to get round this issue, but trying to manually delete this row and set the index as 'SYMBOL' does not seem elegant! Thanks for any help!!!
enter image description here
Solved with df.reset_index(level=0, inplace=True)
enter link description here
I have a dataframe df with two columns date and data. I want to take the first difference of the data column and add it as a new column.
It seems that df.set_index('date').shift() or df.set_index('date').diff() give me the desired result. However, when I try to add it as a new column, I get NaN for all the rows.
How can I fix this command:
df['firstdiff'] = df.set_index('date').shift()
to make it work?