This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 1 year ago.
My data frame looks like this.
a d e
0 BTC 31913.1123 -6.5%
1 ETH 1884.1621 -18.8%
2 USDT 1.0 0.1%
3 BNB 294.0246 -8.4%
4 ADA 1.0342 -14.3%
5 XRP 1.1423 -10.5%
On column d, I want to round the floats in column d to a whole number if it is greater than 10. If it is less than 10, I want to round it to 2 decimal places. This is the code I have right now df1['d'] = df1['d'].round(2). How do I had a conditional statement to this code to have it round based on conditions?
https://stackoverflow.com/a/31173785/7116645
Taking reference from above answer, you can simply do like following
df['d'] = [round(x, 2) if x > 10 else x for x in df['d']]
You can use simple statements like this:
df1['d'][df1['d']>10]=df1['d'][df1['d']>10].round()
df1['d'][df1['d']<10]=df1['d'][df1['d']<10].round(2)
Use numpy.where:
df1['d'] = np.where(df1['d'] < 10, df1['d'].round(2), df1['d'].round())
Related
This question already has answers here:
How to filter a pandas dataframe based on the length of a entry
(2 answers)
Closed 1 year ago.
I have a pandas DataFrame like this:
id subjects
1 [math, history]
2 [English, Dutch, Physics]
3 [Music]
How to filter this dataframe based on the length of the column subjects?
So for example, if I only want to have rows where len(subjects) >= 2?
I tried using
df[len(df["subjects"]) >= 2]
But this gives
KeyError: True
Also, using loc does not help, that gives me the same error.
Thanks in advance!
Use the string accessor to work with lists:
df[df['subjects'].str.len() >= 2]
Output:
id subjects
0 1 [math, history]
1 2 [English, Dutch, Physics]
This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 2 years ago.
I have two columns in the table, speed and power. I would like to exclude any 0 for power where speed is greater than 5.
So far I have
df[(df.sum(axis=1) != 0)]
which will exclude all 0 values but how do I amend that to also exclude all speeds greater than 5, while also including below 5?
You can try this -
drop_idx = df[df.power == 0) & (df.speed > 5)].index
df = df.drop(index=drop_idx)
Your df will contain all the required rows
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I need to extract a specific value from pandas df column. The data looks like this:
row my_column
1 artid=delish.recipe.45064;artid=delish_recipe_45064;avb=83.3;role=4;data=list;prf=i
2 ab=px_d_1200;ab=2;ab=t_d_o_1000;artid=delish.recipe.23;artid=delish;role=1;pdf=true
3 dat=_o_1000;artid=delish.recipe.23;ar;role=56;passing=true;points001
The data is not consistent, but separated by a comma and I need to extract role=x.
I separated the data by a semicolon. And can loop trough the values to fetch the roles, but was wondering if there is a more elegant way to solve it.
Desired output:
row my_column
1 role=4
2 role=1
3 role=56
Thank you.
You can use str.extract and pass the required pattern within parentheses.
df['my_column'] = df['my_column'].str.extract('(role=\d+)')
row my_column
0 1 role=4
1 2 role=1
2 3 role=56
This should work:
def get_role(x):
l=x.split(sep=';')
t=[i for i in l if i[:4]=='role')][0]
return t
df['my_column']=[i for i in map(lambda y: get_role(y), df['my_column'])]
I wanted to calculate the mean and standard deviation of a sample. The sample is two columns, first is a time and second column, separated by space is value. I don't know how to calculate mean and standard deviation of the second column of vales using python, maybe scipy? I want to use that method for large sets of data.
I also want to check which number of a set is seven times higher than standard deviation.
Thanks for help.
time value
1 1.17e-5
2 1.27e-5
3 1.35e-5
4 1.53e-5
5 1.77e-5
The mean is 1.418e-5 and the standard deviation is 2.369-6.
To answer your first question, assuming your samplee's dataframe is df, the following should work:
import pandas as pd
df = pd.DataFrame({'time':[1,2,3,4,5], 'value':[1.17e-5,1.27e-5,1.35e-5,1.53e-5,1.77e-5]}
df will be something like this:
>>> df
time value
0 1 0.000012
1 2 0.000013
2 3 0.000013
3 4 0.000015
4 5 0.000018
Then to obtain the standard deviation and mean of the value column respectively, run the following and you will get the outputs:
>>> df['value'].std()
2.368966019173766e-06
>>> df['value'].mean()
1.418e-05
To answer your second question, try the following:
std = df['value'].std()
df = df[(df.value > 7*std)]
I am assuming you want to obtain the rows at which value is greater than 7 times the sample standard deviation. If you actually want greater than or equal to, just change > to >=. You should then be able to obtain the following:
>>> df
time value
4 5 0.000018
Also, following #Mad Physicist's suggestion of adding Delta Degrees of Freedom ddof=0 (if you are unfamiliar with this, checkout Delta Degrees of Freedom Wiki), doing so results in the following:
std = df['value'].std(ddof=0)
df = df[(df.value > 7*std)]
with output:
>>> df
time value
3 4 0.000015
4 5 0.000018
P.S. If I am not wrong, its a convention here to stick to one question a post, not two.
This question already has answers here:
Absolute value for column in Python
(2 answers)
Closed 5 years ago.
In my dataframe I have a column containing numbers, some positive, some negative. Example
Amount
0 -500
1 659
3 -10
4 344
I want to turn all numbers Df['Amount'] into positive numbers. I thought about multiplying all numbers with *-1. But though this turns negative numbers positive, and also does the reverse.
Is there a better way to do this?
You can assign the result back to the original column:
df['Amount'] = df['Amount'].abs()
Or you can create a new column, instead:
df['AbsAmount'] = df['Amount'].abs()
You can take absolute value
d['Amount'].apply(abs)
abs() is the standard way to get absolute values.