I need help getting two rows in the same datafram merged/joined.
The first table is the df that I have right now
The second one is the one that I would like to have
I need to combine Jim and Bill. I don't want to overwrite values in either tables. I just want to update NaN values in the row (Bill) with the values with row(Jim) e.g city
There are about 20 columns that I need updating because of that I cannot just update the Bill/City cell
Thanks
You can try
df.loc['Bill'] = df.loc['Bill'].fillna(df.loc['Jim'])
# or
df.loc['Bill'].fillna(df.loc['Jim'], inplace=True)
Related
How can I get the data from this dataframe into 2 rows only, deleting the NaN? (I concated 3 different Dataframes into a new one, showing averages from another Dataframe)enter image description here
This is what i want to achieve:
0 Bitcoin (BTC) 36568.673315 5711.3.059220. 1.229602e+06
1 Ethereum (ETH) 2550.870272 670225.756425 8.806719e+05
It can either be in a new dataframe or using the old one. Thank you so much for ur help :)
Try this:
df.bfill(axis ='rows', inplace=True) # filling the missing values
df.dropna(inplace=True) # drop rows with Nulls
I have a data frame that contains product sales for each day starting from 2018 to 2021 year. Dataframe contains four columns (Date, Place, Product Category and Sales). From the first two columns (Date, Place) I want to use the available data to fill in the gaps. Once the data is added, I would like to delete rows that do not have data in ProductCategory. I would like to do in python pandas.
The sample of my data set looked like this:
I would like the dataframe to look like this:
Use fillna with method 'ffill' that propagates last valid observation forward to next valid backfill. Then drop the rows that contain NAs.
df['Date'].fillna(method='ffill',inplace=True)
df['Place'].fillna(method='ffill',inplace=True)
df.dropna(inplace=True)
You are going to use the forward-filling method to replace null values with the value of the nearest one above it df['Date', 'Place'] = df['Date', 'Place'].fillna(method='ffill'). Next, to drop rows with missing values df.dropna(subset='ProductCategory', inplace=True). Congrats, now you have your desired df 😄
Documentation: Pandas fillna function, Pandas dropna function
compute the frequency of catagories in the column by plotting,
from plot you can see bars reperesenting the most repeated values
df['column'].value_counts().plot.bar()
and get the most frequent value using index, index[0] gives most repeated and
index[1] gives 2nd most repeated and you can choose as per your requirement.
most_frequent_attribute = df['column'].value_counts().index[0]
then fill missing values by above method
df['column'].fillna(df['column'].most_freqent_attribute,inplace=True)
to fill multiple columns with same method just define this as funtion, like this
def impute_nan(df,column):
most_frequent_category=df[column].mode()[0]
df[column].fillna(most_frequent_category,inplace=True)
for feature in ['column1','column2']:
impute_nan(df,feature)
I have 2 dataframe which has 2 common column name, emp_id and emp_name.when i joined these two dataframe on=emp_id, separate columns created emp_name_x,emp_name_y which contains nan values as well and there are some rows where emp_name_x =emp_name_y. I want to make them into one column.If anyone can help me.
I have an excel file that I've converted into a dataframe. I would like to insert a column (or two depending on the situation) in between other columns.
For example:
I have columns:
Table-Chair-Bed
I want to insert column grass and column water in between Chair and Bed. I have tried:
df.insert(loc=2, column='grass', value='')
df.insert(loc=3, column='water', value='')
This does work but what if the columns change from the data source some of the time and the columns are like this: Couch-Kitchen-Table-Chair-Bed
I still want to insert these new columns in between Chair and Bed but don't want to have to re-write the code every time (because...automation). Is there a way to have the code look for the column names and insert the new columns in between them without using the location number, that way the column order or number of columns won't matter.
You can find the position of the 'Chair' column and then add them after that.
df.insert(df.columns.get_loc('Chair') + 1, column='grass', value='')
cLoc1 = df.columns.get_loc("Chair")
cLoc2 = df.columns.get_loc("Bed")
df.insert(loc=cLoc1, column='grass', value='')
df.insert(loc=cLoc2, column='water', value='')
Basically, you get the location of the column you are looking for and then pass it on to your code.
I have a dataframe df with two columns date and data. I want to take the first difference of the data column and add it as a new column.
It seems that df.set_index('date').shift() or df.set_index('date').diff() give me the desired result. However, when I try to add it as a new column, I get NaN for all the rows.
How can I fix this command:
df['firstdiff'] = df.set_index('date').shift()
to make it work?