This question already has answers here:
How to conditionally update DataFrame column in Pandas based on list
(2 answers)
Check if a number is odd or even in Python [duplicate]
(6 answers)
Closed 3 years ago.
Let's say I have the dataframe below:
my_df = pd.DataFrame({'A': [1, 2, 3]})
my_df
A
0 1
1 2
2 3
I want to add a column B with values X if the corresponding number in A is odd, otherwise Y. I would like to do it in this way if possible:
my_df['B'] = np.where(my_df['A'] IS ODD, 'X', 'Y')
I don't know how to check if the value is odd.
You were so close!
my_df['b'] = np.where(my_df['A'] % 2 != 0, 'X', 'Y')
value % 2 != 0 will check if a number is odd. Where are value % 2 == 0 will check for evens.
Output:
A b
0 1 X
1 2 Y
2 3 X
Related
This question already has answers here:
Conditional Replace Pandas
(7 answers)
Pandas DataFrame: replace all values in a column, based on condition
(8 answers)
Replacing dataframe values with NaN based on condition while preserving shape of df
(2 answers)
Closed 4 months ago.
I have a dataframe that looks something like this:
wavelength normalized flux lof
0 5100.00 0.948305 1
1 5100.07 0.796783 1
2 5100.14 0.696425 1
3 5100.21 0.880586 1
4 5100.28 0.836257 1
... ... ... ...
4281 5399.67 1.076449 1
4282 5399.74 1.038198 1
4283 5399.81 1.004292 1
4284 5399.88 0.946977 1
4285 5399.95 0.894559 1
If lof = -1, I want to replace the normalized flux value with np.nan. Otherwise, just leave the normalized flux value as is. Is there a simple way to do this?
You can just assign
df.loc[df['lof']==-1,'flux'] = np.nan
df['flux']=df['flux'].mask(df['lof'].eq(-1), np.nan)
df
This question already has an answer here:
Set multiple columns to zero based on a value in another column [duplicate]
(1 answer)
Closed 1 year ago.
I am working with a CSV file with data. basically what i want is that if the value in temp_coil column is above 12, then make the respective value in column sensible_heat, latent_heat, total_capacity equals to zero.
here is my code so far
if df.temp_coil > 12 in df.columns:
df.sensible_heat = 0
df.latent_heat = 0
df.total_capacity = 0
else:
pass
You can use boolean indexing for this:
df['sensible_heat'][df.temp_coil > 12] = 0
df['latent_heat'][df.temp_coil > 12] = 0
df['total_capacity'][df.temp_coil > 12] = 0
This question already has an answer here:
How is pandas groupby method actually working?
(1 answer)
Closed 1 year ago.
df=pd.DataFrame({'key':['A','B','C','A','B','C'],
'data1':range(6), 'data2': rng.randint(0,10,6)}, columns=['key','data1','data2'])
l=[0,1,0,1,2,0]
df.groupby(l).sum()
So the output of your code is
https://ibb.co/kgXSwvL
We have three distinct values 0,1,2 in the list l.
If you see carefully it's the sum of values in data1 column on the same index where 0 is encountered.
For Example, in the given list l, 0 is at 0th, 2nd and 5th position where data1 has values 0, 2, 5 which sums up to 7 and data2 has values 3, 5, 2 which sums up to 10.
and similarly for value 1, in the list l, 1 is at index 1 and 3, hence the value corresponding to data1 in the dataframe df for the label 1 and 3 is 1 and 3 respectively which sums up to 4.
Similarly for value 3 in the list the answer can be verified.
This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 3 years ago.
I am coming from R background. I need elementary with pandas.
if I have a dataframe like this
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
I want to subset dataframe to select a fixed column and select a row by a boolean.
For example
df.iloc[df.2 > 4][2]
then I want to set the value for the subset cell to equal a value.
something like
df.iloc[df.2 > 4][2] = 7
It seems valid for me however it seem pandas work with booleans in more strict way than R
In here it is .loc
df.loc[df[2] > 4,2]
1 6
Name: 2, dtype: int64
df.loc[df[2] > 4,2]=7
df
0 1 2
0 1 2 3
1 4 5 7
This question already has an answer here:
Dataframe selecting Max for a column but output values of another
(1 answer)
Closed 4 years ago.
In the following code, how can I select data2's element each row that is given by the list of column index, data.idxmax(axis=1)?
data1 = pd.DataFrame([[1,2], [4,3], [5,6]])
data2 = pd.DataFrame([[10,20], [30,40], [50,60]])
data1.idxmax(axis=1)
The result should be pd.Series or pd.DataFrame of [20,30,60].
Use the lookup function:
i = data1.idxmax(axis=1)
data2.lookup(i.index, i.values)
This will give you an array with the values. To get the result as a Series, simply create it:
pd.Series(data2.lookup(i.index, i.values))
You can try max with axis = 1 and eq with axis = 0
data2[data1.eq(data1.max(1),0)].stack()
Out[193]:
0 1 20.0
1 0 30.0
2 1 60.0
dtype: float64