This question already has answers here:
Replacing few values in a pandas dataframe column with another value
(8 answers)
Closed 4 months ago.
I have a dataframe with multiple columns. I know to change the value based on condition for one specific column. But how can I change the value based on condition over all columns for the whole dataframe? I want to replace // with 1
col1;col2;col3;col4;
23;54;12;//;
54;//;2;//;
8;2;//;1;
Let's try
df = df.replace('//', 1)
# or
df = df.mask(df.eq('//'), 1)
Related
This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 2 years ago.
I have a dataframe like the one below having 3 types of status -'complete','start' and 'fail'. I want to create another dataframe from this keeping only the "fail" status entries with their corresponding level number.
Let's do this:
fail_df = df[df['status']=='fail']
or this with str.contains:
fail_df = df[df['status'].str.contains(r'fail',case=False)]
Both ways will give a new dataframe with only status being 'fail'. However, the str.contains method is more robust to typo's.
This question already has answers here:
index of non "NaN" values in Pandas
(3 answers)
Closed 2 years ago.
I have a dataframe called CentroidXY and I want to find the indexes of the rows in the column called 'X' that corresponds to numeric values (not NaN). I tried:
foo = CentroidXY.index[CentroidXY['X'] == int].tolist()
However this gives me back no indexes, although my column contains numeric values. Does anyone have any idea on how to do this?
You could use:
CentroidXY.index[CentroidXY['X'].notna()]
This question already has answers here:
How to drop rows of Pandas DataFrame whose value in a certain column is NaN
(15 answers)
Closed 4 years ago.
I have a large dataframe with a column populated by Nan and integers.
I identified the rows that are NOT empty (i.e. return False for notnull()):
df.loc[df.score.notnull()]
How do I remove these rows and keep the rows with missing values?
This code doesn't work:
df.drop(df.score.notnull()]
Assuming you wanted in the same dataframe you could use:
df = df[df.score.isnull()]
You could use df.loc[df.score.isnull()] or df.loc[~df.score.notnull()].
This question already has answers here:
Pandas get topmost n records within each group
(6 answers)
Closed 5 years ago.
Suppose I have a MNIST dataset in this way.
df = pd.read_csv('data/train.csv')
data = df.loc[df['label'].isin([1,6])]
I am trying to select only those rows whose column ['label'] == 1 or 6.
But, I am want to get only 500 rows from each column ['label']
How do I do it?
You can group them and select the number you want for each value:
data = df.loc[df['label'].isin([1,6])].groupby('label').head(500)
Use groupby first then filer i.e
ndf= df.groupby('label').head(500)
data = ndf.loc[ndf['label'].isin([1,6])]
This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 5 years ago.
I have a DataFrame "df" with three columns named: "Particle", "Frequency1", "Frequency2" and a lot of rows.
I want to delete the rows where Frequency1 and Frequency2 are simoustaneously equal to 0.
What is the sintax for doing this?
You can also use: df = (df[df.Frequency1 == 0] & df[df.Frequency2 == 0]).
This will delete the row which has 0 in both columns of 'Frequency1andFrequency2`.