Q:data filtrate by listin a dataframe [duplicate] - python

This question already has answers here:
Filter dataframe rows if value in column is in a set list of values [duplicate]
(7 answers)
Closed 5 years ago.
Now I have a dataframe named df,which contains serveral columns. One columns named A. And I have a list named b,which contains a part of data in column A. Now I want to filtrate the dataframe df,with column A only consists of the elements in list b.
I've used the following code:
for i in b:
df = df[df.A == i]
But the dataframe df becomes empty.
So how to filtrate the dataframe?
thx

Try this:
df = df[df.A.isin(b)]

Related

Change value based on condition in whole dataframe with multiple columns [duplicate]

This question already has answers here:
Replacing few values in a pandas dataframe column with another value
(8 answers)
Closed 4 months ago.
I have a dataframe with multiple columns. I know to change the value based on condition for one specific column. But how can I change the value based on condition over all columns for the whole dataframe? I want to replace // with 1
col1;col2;col3;col4;
23;54;12;//;
54;//;2;//;
8;2;//;1;
Let's try
df = df.replace('//', 1)
# or
df = df.mask(df.eq('//'), 1)

Python: Create seperate dataframe from a dataframe with one category of the category variable column [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 2 years ago.
I have a dataframe like the one below having 3 types of status -'complete','start' and 'fail'. I want to create another dataframe from this keeping only the "fail" status entries with their corresponding level number.
Let's do this:
fail_df = df[df['status']=='fail']
or this with str.contains:
fail_df = df[df['status'].str.contains(r'fail',case=False)]
Both ways will give a new dataframe with only status being 'fail'. However, the str.contains method is more robust to typo's.

Select a range of columns in Spark Dataframe [duplicate]

This question already has answers here:
Spark DataFrame equivalent to Pandas Dataframe `.iloc()` method?
(4 answers)
Get a range of columns of Spark RDD
(3 answers)
Closed 3 years ago.
Assuming that I have a Spark Dataframe df, how can I select a range of columns e.g. from column 100 to column 200?
Since df.columns returns a list, you can slice it and pass it to select:
df.select(df.columns[99:200])
This gets the subset of the DataFrame containing the 100th to 200th columns, inclusive.

How to select multiple rows in a pandas column to create a new dataframe [duplicate]

This question already has answers here:
Use a list of values to select rows from a Pandas dataframe
(8 answers)
Filter dataframe rows if value in column is in a set list of values [duplicate]
(7 answers)
Closed 4 years ago.
I have a Pandas Dataframe and there are some columns that I want to keep in terms of location and others I do not.
I know that when selecting a specific value in a column to get a row I can use:
x = df[df['location']=='Aberdeen']
However, how can I do it for many locations without having to do them individually and then concatenate?
This is what Ive tried:
x = df[[df['location']=='Aberdeen' & 'Glasgow' & 'London']]
But I am receiving a:
TypeError: 'Series' objects are mutable, thus they cannot be hashed
I know there must be a super simple solution to this but I haven't figured it out, please help.

Pandas: Delete row based on a condition of more than one column [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 5 years ago.
I have a DataFrame "df" with three columns named: "Particle", "Frequency1", "Frequency2" and a lot of rows.
I want to delete the rows where Frequency1 and Frequency2 are simoustaneously equal to 0.
What is the sintax for doing this?
You can also use: df = (df[df.Frequency1 == 0] & df[df.Frequency2 == 0]).
This will delete the row which has 0 in both columns of 'Frequency1andFrequency2`.

Categories