Pandas replace NaN values with respect to the columns - python

I have the following data frame.
df = pd.DataFrame({'A': [2, np.nan], 'B': [1, np.nan]})
df.fillna(0) replaces all the NaN values with 0. But
I want to replace the NaN values in the column 'A' with 1 and in the column 'B' with 0, simultaneously. How can I do that ?

Use:
df["A"].fillna(1 , inplace=True) # for col A - NaN -> 1
df["B"].fillna(0 , inplace=True) # for col B - NaN -> 0

This does it in one line
(df['column_name'].fillna(0,inplace=True),df['column_name'].fillna(1,inplace=True))
print(df)

fillna method also exists for Series objects.
df["A"] = df["A"].fillna(1)
df["B"] = df["B"].fillna(0)

Related

How to skip NaN values when splitting up a column

I am trying to split a column up into two columns based on a delimeter. The column presently has text that is separated by a '-'. Some of the values in the column are NaN, so when I run the code below, I get the following error message: ValueError: Columns must be same length as key.
I don't want to delete the NaN values, but am not sure how to skip them so that this splitting works.
The code I have right now is:
df[['A','B']] = df['A'].str.split('-',expand=True)
Your code works well with NaN values but you have to use n=1 as parameter of str.split:
Suppose this dataframe:
df = pd.DataFrame({'A': ['hello-world', np.nan, 'raise-an-exception']}
print(df)
# Output:
A
0 hello-world
1 NaN
2 raise-an-exception
Reproducible error:
df[['A', 'B']] = df['A'].str.split('-', expand=True)
print(df)
# Output:
...
ValueError: Columns must be same length as key
Use n=1:
df[['A', 'B']] = df['A'].str.split('-', n=1, expand=True)
print(df)
# Output:
A B
0 hello world
1 NaN NaN
2 raise an-exception
An alternative is to generate more columns:
df1 = df['A'].str.split('-', expand=True)
df1.columns = df1.columns.map(lambda x: chr(x+65))
print(df1)
# Output:
A B C
0 hello world None
1 NaN NaN NaN
2 raise an exception
Maybe filter them out with loc:
df.loc[df['A'].notna(), ['A','B']] = df.loc[df['A'].notna(), 'A'].str.split('-',expand=True)

Filter for rows in pandas dataframe where values in a column are greater than x or NaN

I'm trying to figure out how to filter a pandas dataframe so that that the values in a certain column are either greater than a certain value, or are NaN. Lets say my dataframe looks like this:
df = pd.DataFrame({"col1":[1, 2, 3, 4], "col2": [4, 5, np.nan, 7]})
I've tried:
df = df[df["col2"] >= 5 | df["col2"] == np.nan]
and:
df = df[df["col2"] >= 5 | np.isnan(df["col2"])]
But the first causes an error, and the second excludes rows where the value is NaN. How can I get the result to be this:
pd.DataFrame({"col1":[2, 3, 4], "col2":[5, np.nan, 7]})
Please Try
df[df.col2.isna()|df.col2.gt(4)]
col1 col2
1 2 5.0
2 3 NaN
3 4 7.0
Also, you can fill nan with the threshold:
df[df.fillna(5)>=5]

Why isn't this replace in DataFrame doing what I intended?

I'm trying to replace NaN in train_df with values of corresponding indexes in dff. I can't understand what I'm doing wrong.
train_df.replace(to_replace = train_df["Age"].values ,
value = dff["Age"].values ,
inplace = True ,
regex = False ,
limit = None)
dff.Age.mean()
Output : 30.128401985359698
train_df.Age.mean()
Output : 28.96758312013303
You replace everything in train_df not just NaN.
The replace docs say:
Replace values given in to_replace with value.
If you just want to replace the NaN you should take a look at fillna or maybe you could use indexing with isna.
fillna Docs
isna Docs
Example with fillna
df1 = pd.DataFrame({"a": [1, 2, np.nan, 4]})
df2 = pd.DataFrame({"a": [5, 5, 3, 5]})
df1.fillna(df2, inplace=True)
Example with isna
df1[pd.isna(df1)] = df2
Results
>> df1
a
0 1.0
1 2.0
2 3.0
3 4.0

Pandas: Change a value in dataframe basing on a other value [duplicate]

If I have a dataframe with multiple columns ['x', 'y', 'z'], how do I forward fill only one column 'x'? Or a group of columns ['x','y']?
I only know how to do it by axis.
tl;dr:
cols = ['X', 'Y']
df.loc[:,cols] = df.loc[:,cols].ffill()
And I have also added a self containing example:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> ## create dataframe
... ts1 = [0, 1, np.nan, np.nan, np.nan, np.nan]
>>> ts2 = [0, 2, np.nan, 3, np.nan, np.nan]
>>> d = {'X': ts1, 'Y': ts2, 'Z': ts2}
>>> df = pd.DataFrame(data=d)
>>> print(df.head())
X Y Z
0 0 0 0
1 1 2 2
2 NaN NaN NaN
3 NaN 3 3
4 NaN NaN NaN
>>>
>>> ## apply forward fill
... cols = ['X', 'Y']
>>> df.loc[:,cols] = df.loc[:,cols].ffill()
>>> print(df.head())
X Y Z
0 0 0 0
1 1 2 2
2 1 2 NaN
3 1 3 3
4 1 3 NaN
for col in ['X', 'Y']:
df[col] = df[col].ffill()
Alternatively with the inplace parameter:
df['X'].ffill(inplace=True)
df['Y'].ffill(inplace=True)
And no, you cannot do df[['X','Y]].ffill(inplace=True) as this first creates a slice through the column selection and hence inplace forward fill would create a SettingWithCopyWarning. Of course if you have a list of columns you can do this in a loop:
for col in ['X', 'Y']:
df[col].ffill(inplace=True)
The point of using inplace is that it avoids copying the column.
Two columns can be ffill() simultaneously as given below:
df1 = df[['X','Y']].ffill()
I used below code, Here for X and Y method can be different also instead of ffill().
df1 = df.fillna({
'X' : df['X'].ffill(),
'Y' : df['Y'].ffill(),
})
The simplest version I think.
cols = ['X', 'Y']
df[cols] = df[cols].ffill()

Filter numeric values from a column of pandas dataframe

I have a dataframe that looks like the below. I am trying to extract only numeric values from all the columns in a list, whether it is on the right, left or middle of any characters. If the column value doesnot have numeric value, impute 0 instead of Nan
df = pd.DataFrame({
'A': ['1', 3, "1", "cad -2", 3, 4.876, np.nan],
'B': ['116', 'CAD -2.6399', 'CAD -3', '$-', '$5%', 'A', '-1.2 2']
})
df
I tried the below code but it is giving NAN for row no 4 for variable "B"
l = ["A", "B"]
for columns in l:
if df[columns].dtype == 'object':
df[columns] = df[columns].astype('str').str.extract("([-+]?\d*\.\d+|[-+]?\d*\\d+)").astype(float)
df
I want my output to look like below:
A B
1 116
3 -2.6399
1 -3
-2 0
3 5
4.876 0
NaN -1.2
What about something like this :
mask_nulls_data = df.isnull()
for column in df.columns:
if df[column].dtype == 'object':
df[column] = df[column].astype('str').str.extract("([-+]?\d*\.\d+|[-+]?\d*\\d+)").astype(float)
#Only put zeros where extract method filled by NaN
mask_nulls_string = df.isnull() & ~mask_nulls_data
df[mask_nulls_string] = 0

Categories