Python Round with different decimal automatic deciamal - python

i want to round the number showing in my table
it looks now like:
i want it looks like:
How can i get that? use pandas or numpy and as simple as possible. Thanks!

In pandas , we can use pandas.DataFrame.round. See example below ( from pandas documentation )
data frame : df
dogs cats
0 0.21 0.32
1 0.01 0.67
2 0.66 0.03
3 0.21 0.18
Do round on df like below
df.round(1)
Output:
dogs cats
0 0.2 0.3
1 0.0 0.7
2 0.7 0.0
3 0.2 0.2
We can even specify the fields need to round, see link for more details : pandas.DataFrame.round
Or We can use default python round in a loop, as below
>>> round(5.76543, 2)
5.77

Related

Percent of total clusters per cluster per season using pandas

I have a pandas DataFrame that looks like this with 12 clusters in total. Certain clusters don't appear in a certain season.
I want to create a multi-line graph over the seasons of the percent of a specific cluster over each season. So if there are 30 teams in the 97-98 season and there are 10 teams in Cluster 1, then that value would be .33 since cluster 1 has one third of the total possible spots.
It'll look like this
And I want the dateset to look like this, where each cluster has its own percentage of the whole number of clusters in that season by percentage. I've tried using pandas groupby method to get a bunch of lists and then use value_counts() on that but that doesn't work since looping through df.groupby(['SEASON']) returns tuples, not a Series..
Thanks so much
Use .groupby combined with .value_counts and .unstack:
temp_df = df.groupby(['SEASON'])['Cluster'].value_counts(normalize=True).unstack().fillna(0.0)
temp_df.plot()
print(temp_df.round(2))
Cluster 0 1 2 4 5 6 7 10 11
SEASON
1996-97 0.1 0.21 0.17 0.21 0.07 0.1 0.03 0.07 0.03
1997-98 0.2 0.00 0.20 0.20 0.00 0.0 0.20 0.20 0.00

Pandas - Find first occurance of number closest to an input value

I have a dataframe like below.
time speed
0 1 0.20
1 2 0.40
2 3 2.00
3 4 3.00
4 5 0.40
5 6 0.43
6 7 6.00
I would like to find the first occurance of a number ( in 'Speed' Column) that is closest to an input value I enter.
For example :
input value = 0.43
Expected Output :
Speed : 0.40 & corresponding Time : 2
The speed column should not be sorted for this problem.
I tried the below,but not getting the expected output.
Any help on this would be appreciated.
absolute closest
You can compute the absolute difference to your reference and get the idxmin:
speed_input = 0.43
df.loc[abs(df['speed']-speed_input).idxmin()]
output:
time 6.00
speed 0.43
Name: 5, dtype: float64
first closest with threshold:
i = 0.43
thresh = 0.03
df.loc[abs(df['speed']-i).le(thresh).idxmax()]
output:
time 2.0
speed 0.4
Name: 1, dtype: float64
One idea is round both values:
df[[(df['speed'].round(1)-round(speed_input, 1)).abs().idxmin()]]

Output formatting in pandas describe

I have some problems with formatting describe table from pandas.
I would love to have 2 decimals precision in every column, but in last I need to have 1.11e11 format. I have tried applying
data.styles.format({"last_column": "{:.2E}"})
, but it does not seem to work for me, still the same result as can be seen below.
Things like: pd.set_option('display.float_format', '{:.2E}'.format)
is applied pandas-wide, which is not what I want to do.
print(data.describe(percentiles=[],).fillna("-.--").round(2))
count 1 1 1 1 1 1
mean 1.43 0.4 34.58 0.07 0.71 1.12877e+08
std -.-- -.-- -.-- -.-- -.-- -.--
min 1.43 0.4 34.58 0.07 0.71 1.12877e+08
50% 1.43 0.4 34.58 0.07 0.71 1.12877e+08
max 1.43 0.4 34.58 0.07 0.71 1.12877e+08
I would like to evade if tabulate, or any other tabular tool if possible, would like to solve this on level of pandas.
Does anyone please have a solution?
Thank you :)
Just use this:-
pd.set_option('precision',2)
and if you want to reset it back to original form i.e its default value then use this:-
pd.reset_option('precision')

Pandas applying data subset to new data frame

I have a script where I do munging with dataframes and extract data like the following:
times = pd.Series(df.loc[df['sy_x'].str.contains('AA'), ('t_diff')].quantile([.1, .25, .5, .75, .9]))
I want to add the resulting data from quantile() to a data frame with separate columns for each of those quantiles, lets say the columns are:
ID pt_1 pt_2 pt_5 pt_7 pt_9
AA
BB
CC
How might I add the quantiles to each row of ID?
new_df = None
for index, value in times.items():
for col in df[['pt_1', 'pt_2','pt_5','pt_7','pt_9',]]:
..but that feels wrong and not idiomatic. Should I be using loc or iloc? I have a couple more Series that I'll need to add to other columns not shown, but I think I can figure that out once I know
EDIT:
Some of the output of times looks like:
0.1 -0.5
0.25 -0.3
0.5 0.0
0.75 2.0
0.90 4.0
Thanks in advance for any insight
IIUC, you want a groupby():
# toy data
np.random.seed(1)
df = pd.DataFrame({'sy_x':np.random.choice(['AA','BB','CC'], 100),
't_diff': np.random.randint(0,100,100)})
df.groupby('sy_x').t_diff.quantile((0.1,.25,.5,.75,.9)).unstack(1)
Output:
0.10 0.25 0.50 0.75 0.90
sy_x
AA 16.5 22.25 57.0 77.00 94.5
BB 9.1 21.00 58.5 80.25 91.3
CC 9.7 23.25 40.5 65.75 84.1
Try something like:
pd.DataFrame(times.values.T, index=times.keys())

python rolling cumulative return with groupby

I have the following dataframe and would like to get the rolling cumulative return over the last lets say for this example 2 periods grouped by an identifier. For my actual case I need a longer period, but my problem is more with the groupby:
id return
2012 1 0.5
2012 2 0.2
2013 1 0.1
2013 2 0.3
The result should look like this:
id return cumreturn
2012 1 0.5 0.5
2012 2 0.2 0.2
2013 1 0.1 0.65
2013 2 0.3 0.56
It is import that the period is rolling. I have the following formula so far:
df["cumreturn"] = df.groupby("id")["return"].fillna(0).pd.rolling_apply(df,5,lambda x: np.prod(1+x)-1)
However, I get the following error: AttributeError: 'Series' object has no attribute 'pd'. I know how to get the rolling cumulative return. However, I just cant figure out how to combine it with groupby.
Let's try this:
df_out = (df.set_index('id', append=True)
.assign(cumreturn=df.groupby('id')['return'].rolling(2,min_periods=1)
.apply(lambda x: np.prod(1+x)-1)
.swaplevel(0,1)).reset_index(1))
Output:
id return cumreturn
2012 1 0.5 0.50
2012 2 0.2 0.20
2013 1 0.1 0.65
2013 2 0.3 0.56

Categories