find rows in dataframe that are not NaN python [duplicate] - python

This question already has answers here:
index of non "NaN" values in Pandas
(3 answers)
Closed 2 years ago.
I have a dataframe called CentroidXY and I want to find the indexes of the rows in the column called 'X' that corresponds to numeric values (not NaN). I tried:
foo = CentroidXY.index[CentroidXY['X'] == int].tolist()
However this gives me back no indexes, although my column contains numeric values. Does anyone have any idea on how to do this?

You could use:
CentroidXY.index[CentroidXY['X'].notna()]

Related

pandas: .isna() shows that whole column is NaNs, but it is strings [duplicate]

This question already has answers here:
Python Pandas Counting the Occurrences of a Specific value
(8 answers)
Closed 2 months ago.
I have a pandas dataframe with a column that is populated by "yes" or "no" strings.
When I do .value_counts() to this column, i receive the correct distribution.
But, when I run .isna() it shows that the whole column is NaNs.
I suspect later it creates problems for me.
Example:
df = pd.DataFrame(np.array([[0,1,2,3,4],[40,30,20,10,0], ['yes','yes','no','no','yes']]).T, columns=['A','B','C'])
len(df['C'].isna()) # 5 --> why?!
df['C'].value_counts() # yes : 3, no: 2 --> as expected.
len gives you the length of the Series (irrespective of its content), not the number of True values.
Use sum if you want the count of True:
df['C'].isna().sum()
# 0

Change value based on condition in whole dataframe with multiple columns [duplicate]

This question already has answers here:
Replacing few values in a pandas dataframe column with another value
(8 answers)
Closed 4 months ago.
I have a dataframe with multiple columns. I know to change the value based on condition for one specific column. But how can I change the value based on condition over all columns for the whole dataframe? I want to replace // with 1
col1;col2;col3;col4;
23;54;12;//;
54;//;2;//;
8;2;//;1;
Let's try
df = df.replace('//', 1)
# or
df = df.mask(df.eq('//'), 1)

Drop rows with non-missing values [duplicate]

This question already has answers here:
How to drop rows of Pandas DataFrame whose value in a certain column is NaN
(15 answers)
Closed 4 years ago.
I have a large dataframe with a column populated by Nan and integers.
I identified the rows that are NOT empty (i.e. return False for notnull()):
df.loc[df.score.notnull()]
How do I remove these rows and keep the rows with missing values?
This code doesn't work:
df.drop(df.score.notnull()]
Assuming you wanted in the same dataframe you could use:
df = df[df.score.isnull()]
You could use df.loc[df.score.isnull()] or df.loc[~df.score.notnull()].

How to add a column to a python dataframe, that is a string manipulation of another column? [duplicate]

This question already has answers here:
Get first letter of a string from column
(2 answers)
Closed 4 years ago.
There is a column named 'country' and I want a column that is 'abc' <= the first two chars of 'country'.
In pseudocode:
df['abc'] = df['country'][0:2]
Of course this does not work.
You want:
df['abc'] = df['country'].str[:2]

How to select multiple rows in a pandas column to create a new dataframe [duplicate]

This question already has answers here:
Use a list of values to select rows from a Pandas dataframe
(8 answers)
Filter dataframe rows if value in column is in a set list of values [duplicate]
(7 answers)
Closed 4 years ago.
I have a Pandas Dataframe and there are some columns that I want to keep in terms of location and others I do not.
I know that when selecting a specific value in a column to get a row I can use:
x = df[df['location']=='Aberdeen']
However, how can I do it for many locations without having to do them individually and then concatenate?
This is what Ive tried:
x = df[[df['location']=='Aberdeen' & 'Glasgow' & 'London']]
But I am receiving a:
TypeError: 'Series' objects are mutable, thus they cannot be hashed
I know there must be a super simple solution to this but I haven't figured it out, please help.

Categories