How to drop specific pandas rows by value [duplicate] - python

This question already has answers here:
Use a list of values to select rows from a Pandas dataframe
(8 answers)
Closed 7 months ago.
Is there any way that I can drop the value if its index = column index.
I mean, this is my toy dataframe
d = {'Non': [1, 2,4,5,2,7], 'Schzerando': [3, 4,8,4,7,7], 'cc': [1,2,0.75,0.25,0.3,1]}
df = pd.DataFrame(data=d)
df
Then I just want to keep the row which df["cc"] == 1 and 2, like this
Toy dataframe to try.

You can filter out the rows by converting the cc column to int type then filter by applying mask.
df['cc'] = df['cc'].astype('Int64')
df = df[df['cc'] == 1 | df['cc'] == 2 | df['cc'] == 3]
or you can declare a list with all the values you want to filter for then use pandas isin
f_list = [1,2,3]
df[df['cc'].isin(f_list)]

df[(df['cc'] == 1) | (df['cc'] == 2)]
Output :

Related

Apply function to two columns and map the output to a new column [duplicate]

This question already has answers here:
How to apply a function to two columns of Pandas dataframe
(15 answers)
Closed 3 years ago.
I am new to Pandas. Would like to know how to apply a function to two columns in a dataframe and map the output from the function to a new column in the dataframe. Is this at all possible with pandas syntax or should I resort to native Python to iterate over the rows in the dataframe columns to generate the new column?
a b
1 2
3 1
2 9
Question is how to get, for example, the multiplication of the two numbers in a new column c
a b c
1 2 2
3 1 3
2 9 18
You can do with pandas.
For example:
def funcMul(row):
return row['a']*row['b']
Then,
df['c'] = df.apply(funcMul,1)
Output:
a b c
0 1 2 2
1 3 1 3
2 2 9 18
You can do the following with pandas
import pandas as pd
def func(r):
return r[0]*r[1]
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
df['c'] = df.apply(func, axis = 1)
Also, here is the official documentation https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
The comment by harvpan shows the simplest way to achieve your specific example, but here is a generic way to do what you asked:
def functionUsedInApply(row):
""" The function logic for the apply function comes here.
row: A Pandas Series containing the a row in df.
"""
return row['a'] * row['b']
def functionUsedInMap(value):
""" This function is used in the map after the apply.
For this example, if the value is larger than 5,
return the cube, otherwise, return the square.
value: a value of whatever type is returned by functionUsedInApply.
"""
if value > 5:
return value**3
else:
return value**2
df['new_column_name'] = df.apply(functionUsedInApply,axis=1).map(functionUsedInMap)
The function above first adds columns a and b together and then returns the square of that value for a+b <=5 and the cube of that value for a+b > 5.

how to subset by fixed column and row by boolean in pandas? [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 3 years ago.
I am coming from R background. I need elementary with pandas.
if I have a dataframe like this
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
I want to subset dataframe to select a fixed column and select a row by a boolean.
For example
df.iloc[df.2 > 4][2]
then I want to set the value for the subset cell to equal a value.
something like
df.iloc[df.2 > 4][2] = 7
It seems valid for me however it seem pandas work with booleans in more strict way than R
In here it is .loc
df.loc[df[2] > 4,2]
1 6
Name: 2, dtype: int64
df.loc[df[2] > 4,2]=7
df
0 1 2
0 1 2 3
1 4 5 7

Converting DataFrame in Python [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 4 years ago.
Considering the following dataframe:
import pandas as pd
import numpy as np
import random
np.random.seed(10)
df1 = pd.DataFrame({'x':[1,2,3,4,5,1,2,3,4,5],
'y':[10,10,10,10,10,20,20,20,20,20],
'z':np.random.normal(size = 10)
})
I want to convert the x values into columns and y values into index (decreasing) with corresponding z values in the dataframe. It's something like this df2:
df2 = pd.DataFrame(np.random.randn(2,5), index = [20,10], columns=[1,2,3,4,5])
How can I conver df1 into df2's style?
You can use pandas.pivot_table:
res = df1.pivot_table(index='y', columns='x', values='z')
You may wish to remove or change your index names, but this is your result:
x 1 2 3 4 5
y
10 1.331587 0.715279 -1.545400 -0.008384 0.621336
20 -0.720086 0.265512 0.108549 0.004291 -0.174600

Select pandas column for each row by index list [duplicate]

This question already has an answer here:
Dataframe selecting Max for a column but output values of another
(1 answer)
Closed 4 years ago.
In the following code, how can I select data2's element each row that is given by the list of column index, data.idxmax(axis=1)?
data1 = pd.DataFrame([[1,2], [4,3], [5,6]])
data2 = pd.DataFrame([[10,20], [30,40], [50,60]])
data1.idxmax(axis=1)
The result should be pd.Series or pd.DataFrame of [20,30,60].
Use the lookup function:
i = data1.idxmax(axis=1)
data2.lookup(i.index, i.values)
This will give you an array with the values. To get the result as a Series, simply create it:
pd.Series(data2.lookup(i.index, i.values))
You can try max with axis = 1 and eq with axis = 0
data2[data1.eq(data1.max(1),0)].stack()
Out[193]:
0 1 20.0
1 0 30.0
2 1 60.0
dtype: float64

Ordering columns in dataframe

Recently updated to pandas 0.17.0 and I'm trying to order the columns in my dataframe alphabetically.
Here are the column labels as they currently are:
['UX2', 'RHO1', 'RHO3', 'RHO2', 'RHO4', 'UX1', 'UX4', 'UX3']
And I want them like this:
['RHO1', 'RHO2', 'RHO3', 'RHO4', 'UX1', 'UX2', 'UX3', 'UX4']
The only way I've been able to do this is following this from 3 years ago: How to change the order of DataFrame columns?
Is there a built-in way to do this in 0.17.0?
To sort the columns alphabetically here, you can just use sort_index:
df.sort_index(axis=1)
The method returns a reindexed DataFrame with the columns in the correct order.
This assumes that all of the column labels are strings (it won't work for a mix of, say, strings and integers). If this isn't the case, you may need to pass an explicit ordering to the reindex method.
You can just sort them and put them back. Suppose you have this:
df = pd.DataFrame()
for i, n in enumerate(['UX2', 'RHO1', 'RHO3', 'RHO2', 'RHO4', 'UX1', 'UX4', 'UX3']):
df[n] = [i]
It looks like this:
df
UX2 RHO1 RHO3 RHO2 RHO4 UX1 UX4 UX3
0 0 1 2 3 4 5 6 7
Do this:
df = df[ sorted(df.columns)]
And you should see this:
df
RHO1 RHO2 RHO3 RHO4 UX1 UX2 UX3 UX4
0 1 3 2 4 5 0 7 6
Create a list of the columns labels in the order you want.
cols = ['RHO1', 'RHO2', 'RHO3', 'RHO4', 'UX1', 'UX2', 'UX3', 'UX4']
Then assign this order to your DataFrame df:
df = df[cols]

Categories