delete 'nan' rows and not "NaN" in pandas - python

I need to delete the rows that contains 'nan%' in in the 'Precison' and 'Recall' columns,
as below image shows,
I just need to remove all rows that shows 'nan%' both in 'Precision' and in 'Recall'.
dropna() does not work here.

You can select all rows if not equal nan% in both columns:
df[df[['Precison','Recall']].ne('nan%').all(axis=1)]
Or you can replace all nan% to NaN for working DataFrame.dropna:
df = df.replace('nan%', np.nan).dropna(subset=['Precison','Recall'])

Related

Same index in Pandas Dataframe, delete NaN rows

How can I get the data from this dataframe into 2 rows only, deleting the NaN? (I concated 3 different Dataframes into a new one, showing averages from another Dataframe)enter image description here
This is what i want to achieve:
0 Bitcoin (BTC) 36568.673315 5711.3.059220. 1.229602e+06
1 Ethereum (ETH) 2550.870272 670225.756425 8.806719e+05
It can either be in a new dataframe or using the old one. Thank you so much for ur help :)
Try this:
df.bfill(axis ='rows', inplace=True) # filling the missing values
df.dropna(inplace=True) # drop rows with Nulls

Collapse rows of a dataframe with common values and fill in blanks

I have a single data frame and every row is duplicated except for two values. In all cases the corresponding duplicate has a blank value in the corresponding row. I want to 'collapse' these rows and fill in the blanks.
In the example below, I want to collapse the top DataFrame to mirror the bottom
You can use groupby + first; first skips over NaN values by default:
collapsed_df = df.groupby("feature_id").first().reset_index()
If the empty spaces are not NaN values, probably will want to fill them with NaN first:
df = df.replace('', np.nan)

Deleting Rows in a Dataframe

I want to delete some specific rows in a Dataframe in Python . The dataframe consists of a series of tables and we have to delete the rows where only the first cell has values . For example the in the bottom , rows highlighted in yellows.
If Unwanted rows have specific part string common then you could explicitly delete those using
df_new = df[~df.columnName.str.contains("FINANCIAL SERVICES")]
and if the row cells are NULL use dropna
df.dropna(subset=df.columns[1:], how= 'all', inplace = True)

Delete rows from a pandas DataFrame based on a conditional expression in another dataframe

I have two pandas dataframes, df1 and df2, with both equal number of rows. df2 has 11 rows which contain NaN values. I know how to drop the empty rows in df2, by applying:
df2.dropna(subset=['HIGH'], inplace=True)
But now I want to delete these same rows from df1 (the rows with the same row numbers that have been deleted from df2). I tried the following but this does not seem to work.
df1.drop(df2[df2['HIGH'] == 'NaN'].index, inplace=False)
Any other suggestions?
You can get all rows with NaN values in it with:
is_NaN = df2.isnull()
row_has_NaN = is_NaN.any(axis=1)
rows_with_NaN = df2[row_has_NaN]
After that you can delete the rows with NaN. (like you said in the question)
Now you can get every index out of 'rows_with_NaN'. With every index you can delete it out of df1 (Should have the same index like you said).
I hope this is correct! (No test done)

How can I drop duplicates in pandas without dropping NaN values

I have a dataframe which I query and I want to get only unique values out of a certain column.
I tried to do that executing this code:
database = pd.read_csv(db_file, sep='\t')
query = database.loc[database[db_specifications[0]].isin(elements)].drop_duplicates(subset=db_specification[1])
db_specification is just a list containing two columns that I query.
Some of the values are NaN and I don't want to consider them duplicates of each other, how can I achieve that?
You can start by selecting all NaN and then drop duplicate on the rest of the dataframe.
mask = data.isna().any()
data = pd.concat([data[mask], data[~mask]])

Categories