Transform dataframe value to range value in Python 3

Transform dataframe value to range value in Python 3 - python

I have a dataframe with the values:
3.05
35.97
49.11
48.80
48.02
10.61
25.69
6.02
55.36
0.42
47.87
2.26
54.43
8.85
8.75
14.29
41.29
35.69
44.27
1.08
I want transform the value into range and give new value to each value.
From the df we know the min value is 0.42 and the max value is 55.36.
From range min to max, I want divide to 4 group which is:
0.42 - 14.15 transform to 1
14.16 - 27.88 transform to 2
27.89 - 41.61 transform to 3
41.62 - 55.36 transform to 4
so the result I expected is
1
3
4
4
4
1
2
1
4
1
4
1
4
1
1
2
3
3
4
1

This is normally called binning, but pandas calls it cut. Sample code is below:
import pandas as pd
# Create a list of numbers, with a header called "nums"
data_list = [('nums', [3.05, 35.97, 49.11, 48.80, 48.02, 10.61, 25.69, 6.02, 55.36, 0.42, 47.87, 2.26, 54.43, 8.85, 8.75, 14.29, 41.29, 35.69, 44.27, 1.08])]
# Create the labels for the bin
bin_labels = [1,2,3,4]
# Create the dataframe object using the data_list
df = pd.DataFrame.from_items(data_list)
# Define the scope of the bins
bins = [0.41, 14.16, 27.89, 41.62, 55.37]
# Create the "bins" column using the cut function using the bins and labels
df['bins'] = pd.cut(df['nums'], bins=bins, labels=bin_labels)
This creates a dataframe which has the following structure:
print(df)
nums bins
0 3.05 1
1 35.97 3
2 49.11 4
3 48.80 4
4 48.02 4
5 10.61 1
6 25.69 2
7 6.02 1
8 55.36 4
9 0.42 1
10 47.87 4
11 2.26 1
12 54.43 4
13 8.85 1
14 8.75 1
15 14.29 2
16 41.29 3
17 35.69 3
18 44.27 4
19 1.08 1

You could construct a function like the following to have full control over the process:
def transform(l):
l2 = []
for i in l:
if 0.42 <= i <= 14.15:
l2.append(1)
elif i <= 27.8:
l2.append(2)
elif i <= 41.61:
l2.append(3)
elif i <= 55.36:
l2.append(4)
return(l2)
df['nums'] = transform(df['nums'])

Related

Is there a way to recalculate existing values in df based on conditions? - Python / Pandas

I have a DataFrame with Employees and their hours for different categories.
I need to recalculate only specific categories (OT, MILE and REST Categories SHOULD NOT Be Updated, ALL Other Should be updated) ONLY if OT category is present under Empl_Id.
data = {'Empl_Id': [1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3],
'Category': ["MILE", "REST", "OT", "TRVL", "REG", "ADMIN", "REST", "REG", "MILE", "OT", "TRVL", "REST", "MAT", "REG"],
'Value': [43, 0.7, 6.33, 2.67, 52, 22, 1.17, 16.5, 73.6, 4.75, 1.33, 2.5, 5.5, 52.25]}
df = pd.DataFrame(data=data)
df
Empl_Id
Category
Value
1
MILE
43
1
REST
0.7
1
OT
6.33
1
TRVL
2.67
1
REG
52
2
ADMIN
22
2
REST
1.17
2
REG
16.5
3
MILE
73.6
3
OT
4.75
3
TRVL
1.33
3
REST
2.5
3
MAT
5.5
3
REG
52.25
The Logic is to:
1) Find % of OT Hours from Total Hours (OT, REST and MILE don't count):
1st Empl_Id: 6.33 (OT) / 2.67 (TRVL) + 52 (REG) = 6.33 / 54.67 = 11.58 %
2nd Empl_Id: OT Hours Not present, nothing should be updated
3rd Empl_Id: 4.75 (OT) / 1.33 (TRVL) + 5.5 (MAT) + 52.25 (REG) = 4.75 / 59.08 = 8.04 %
2) Substract % of OT from each category (OT, REST and MILE don't count):
Empl_Id
Category
Value
1
MILE
43
1
REST
0.7
1
OT
6.33
1
TRVL
2.67 - 11.58 % (0.31) = 2.36
1
REG
52 - 11.58 % (6.02) = 45.98
2
ADMIN
22
2
REST
1.17
2
REG
16.5
3
MILE
73.6
3
OT
4.75
3
TRVL
1.33 - 8.04 % (0.11) = 1.22
3
REST
2.5
3
MAT
5.5 - 8.04 % (0.44) = 5.06
3
REG
52.25 - 8.04 % (4.2) = 48.05

You can use:
keep = ['OT', 'MILE', 'REST']
# get factor
factor = (df.groupby(df['Empl_Id'])
.apply(lambda g: g.loc[g['Category'].eq('OT'),'Value'].sum()
/g.loc[~g['Category'].isin(keep),'Value'].sum()
)
.rsub(1)
)
# update
df.loc[~df['Category'].isin(keep), 'Value'] *= df['Empl_Id'].map(factor)
output:
Empl_Id Category Value
0 1 MILE 43.000000
1 1 REST 0.700000
2 1 OT 6.330000
3 1 TRVL 2.360852
4 1 REG 45.979148
5 2 ADMIN 22.000000
6 2 REST 1.170000
7 2 REG 16.500000
8 3 MILE 73.600000
9 3 OT 1.750000
10 3 TRVL 1.290604
11 3 REST 2.500000
12 3 MAT 5.337085
13 3 REG 50.702310

How to add column for every month and generate number i.e. 1,2,3..etc

I have a huge csv file of dataframe. However, I don't have the date column. I only have the sales for every month from Jan-2022 until Dec-2034. Below is the example of my dataframe:
import pandas as pd
data = [[6661, 'Mobile Phone', 43578, 5000, 78564, 52353, 67456, 86965, 43634, 32546, 56332, 58944, 98878, 68588, 43634, 3463, 74533, 73733, 64436, 45426, 57333, 89762, 4373, 75457, 74845, 86843, 59957, 74563, 745335, 46342, 463473, 52352, 23622],
[6672, 'Play Station', 4475, 2546, 5757, 2352, 57896, 98574, 53536, 56533, 88645, 44884, 76585, 43575, 74573, 75347, 57573, 5736, 53737, 35235, 5322, 54757, 74573, 75473, 77362, 21554, 73462, 74736, 1435, 4367, 63462, 32362, 56332],
[6631, 'Laptop', 35347, 36376, 164577, 94584, 78675, 76758, 75464, 56373, 56343, 54787, 7658, 76584, 47347, 5748, 8684, 75373, 57573, 26626, 25632, 73774, 847373, 736646, 847457, 57346, 43732, 347346, 75373, 6473, 85674, 35743, 45734],
[6600, 'Camera', 14365, 60785, 25436, 46747, 75456, 97644, 63573, 56433, 25646, 32548, 14325, 64748, 68458, 46537, 7537, 46266, 7457, 78235, 46223, 8747, 67453, 4636, 3425, 4636, 352236, 6622, 64625, 36346, 46346, 35225, 6436],
[6643, 'Lamp', 324355, 143255, 696954, 97823, 43657, 66686, 56346, 57563, 65734, 64484, 87685, 54748, 9868, 573, 73472, 5735, 73422, 86352, 5325, 84333, 7473, 35252, 7547, 73733, 7374, 32266, 654747, 85743, 57333, 46346, 46266]]
ds = pd.DataFrame(data, columns = ['ID', 'Product', 'SalesJan-22', 'SalesFeb-22', 'SalesMar-22', 'SalesApr-22', 'SalesMay-22', 'SalesJun-22', 'SalesJul-22', 'SalesAug-22', 'SalesSep-22', 'SalesOct-22', 'SalesNov-22', 'SalesDec-22', 'SalesJan-23', 'SalesFeb-23', 'SalesMar-23', 'SalesApr-23', 'SalesMay-23', 'SalesJun-23', 'SalesJul-23', 'SalesAug-23', 'SalesSep-23', 'SalesOct-23', 'SalesNov-23', 'SalesDec-23', 'SalesJan-24', 'SalesFeb-24', 'SalesMar-24', 'SalesApr-24', 'SalesMay-24', 'SalesJun-24', 'SalesJul-24']
Since I have more than 10 monthly sales column, I want to loop the date after each of the month sales column. Then, the first 6 months will generate number 1, while the next 12 months will generate number 2, then another 12 months will generate number 3, another subsequent 12 months will generate number 4 and so on.
Below shows the sample of result that I want:
Is there any way to perform the loop and adding the date column beside each of the sales month?

Here is the simplest approach I can think of:
for i, col in enumerate(ds.columns[2:]):
ds.insert(2 * i + 2, col.removeprefix("Sales"), (i - 6) // 12 + 2)

Here is a vectorial approach (using insert repeatedly is inefficient):
# convert (valid) columns to datetime
cols = pd.to_datetime(ds.columns, format='Sales%b-%y', errors='coerce')
# identify valid dates
m = cols.notna()
# get year
y = cols[m].year
# calculate number (1 for first 6 months, then +1 per 12 months)
num = ((cols[m].month+12*(y-y.min()))+5)//12+1
# slice dates columns, assign the number, rename
df2 = (ds.loc[:, m].assign(**dict(zip(ds.columns[m], num)))
.rename(columns=lambda x: x[5:])
)
# get new order of columns
idx = np.r_[np.zeros((~m).sum()), np.tile(np.arange(m.sum()), 2)+1]
# concat and reorder
out = pd.concat([ds, df2], axis=1).iloc[:, np.argsort(idx)]
print(out)
output:
ID Product SalesJan-22 Jan-22 SalesFeb-22 Feb-22 SalesMar-22 Mar-22 SalesApr-22 Apr-22 SalesMay-22 May-22 SalesJun-22 Jun-22 SalesJul-22 Jul-22 SalesAug-22 Aug-22 Sep-22 SalesSep-22 Oct-22 SalesOct-22 SalesNov-22 Nov-22 Dec-22 SalesDec-22 Jan-23 SalesJan-23 Feb-23 SalesFeb-23 SalesMar-23 Mar-23 Apr-23 SalesApr-23 SalesMay-23 May-23 SalesJun-23 Jun-23 Jul-23 SalesJul-23 SalesAug-23 Aug-23 Sep-23 SalesSep-23 SalesOct-23 Oct-23 Nov-23 SalesNov-23 Dec-23 SalesDec-23 Jan-24 SalesJan-24 Feb-24 SalesFeb-24 Mar-24 SalesMar-24 Apr-24 SalesApr-24 May-24 SalesMay-24 SalesJun-24 Jun-24 SalesJul-24 Jul-24
0 6661 Mobile Phone 43578 1 5000 1 78564 1 52353 1 67456 1 86965 1 43634 2 32546 2 2 56332 2 58944 98878 2 2 68588 2 43634 2 3463 74533 2 2 73733 64436 2 45426 2 3 57333 89762 3 3 4373 75457 3 3 74845 3 86843 3 59957 3 74563 3 745335 3 46342 3 463473 52352 3 23622 4
1 6672 Play Station 4475 1 2546 1 5757 1 2352 1 57896 1 98574 1 53536 2 56533 2 2 88645 2 44884 76585 2 2 43575 2 74573 2 75347 57573 2 2 5736 53737 2 35235 2 3 5322 54757 3 3 74573 75473 3 3 77362 3 21554 3 73462 3 74736 3 1435 3 4367 3 63462 32362 3 56332 4
2 6631 Laptop 35347 1 36376 1 164577 1 94584 1 78675 1 76758 1 75464 2 56373 2 2 56343 2 54787 7658 2 2 76584 2 47347 2 5748 8684 2 2 75373 57573 2 26626 2 3 25632 73774 3 3 847373 736646 3 3 847457 3 57346 3 43732 3 347346 3 75373 3 6473 3 85674 35743 3 45734 4
3 6600 Camera 14365 1 60785 1 25436 1 46747 1 75456 1 97644 1 63573 2 56433 2 2 25646 2 32548 14325 2 2 64748 2 68458 2 46537 7537 2 2 46266 7457 2 78235 2 3 46223 8747 3 3 67453 4636 3 3 3425 3 4636 3 352236 3 6622 3 64625 3 36346 3 46346 35225 3 6436 4
4 6643 Lamp 324355 1 143255 1 696954 1 97823 1 43657 1 66686 1 56346 2 57563 2 2 65734 2 64484 87685 2 2 54748 2 9868 2 573 73472 2 2 5735 73422 2 86352 2 3 5325 84333 3 3 7473 35252 3 3 7547 3 73733 3 7374 3 32266 3 654747 3 85743 3 57333 46346 3 46266 4

Here's a little solution : (I put the year unstead of your 1, 2, ... incrementation since i thought it is more representative, but you can change it easily)
idx_counter = 0
for idx, col in enumerate(ds.columns):
if col.startswith('Sales'):
date = col.replace('Sales', '')
year = col.split('-')[1]
ds.insert(loc=idx + 1 + idx_counter, column=date, value=[year] * ds.shape[0])
idx_counter += 1
output:
ID Product SalesJan-22 Jan-22 SalesFeb-22 Feb-22 SalesMar-22 Mar-22 SalesApr-22 Apr-22 ... SalesMar-24 Mar-24 SalesApr-24 Apr-24 SalesMay-24 May-24 SalesJun-24 Jun-24 SalesJul-24 Jul-24
0 6661 Mobile Phone 43578 22 5000 22 78564 22 52353 22 ... 745335 24 46342 24 463473 24 52352 24 23622 24
1 6672 Play Station 4475 22 2546 22 5757 22 2352 22 ... 1435 24 4367 24 63462 24 32362 24 56332 24
2 6631 Laptop 35347 22 36376 22 164577 22 94584 22 ... 75373 24 6473 24 85674 24 35743 24 45734 24
3 6600 Camera 14365 22 60785 22 25436 22 46747 22 ... 64625 24 36346 24 46346 24 35225 24 6436 24
4 6643 Lamp 324355 22 143255 22 696954 22 97823 22 ... 654747 24 85743 24 57333 24 46346 24 46266 24

This should do the trick.
import math
new_cols = []
old_cols = [x for x in df.columns if x.startswith('Sales')]
for i, col in enumerate(old_cols):
new_cols.append(col[5:])
if i < 6:
val = 1
else:
val = ((i+6)/12)+1
df[col[5:]] = math.floor(val)
df[['ID', 'Product'] + [x for y in zip(old_cols, new_cols) for x in y]]

How to split dataframe into multiple dataframes based on column-name?

I have a dataframe with columns like this:
['id', 't_dur0', 't_dur1', 't_dur2', 't_dance0', 't_dance1', 't_dance2', 't_energy0',
't_energy1', 't_energy2']
And I have a code which returns the average of three columns with the same name:
# Takes in a dataframe with three columns and returns a dataframe with one column of their means as integers
def average_column(dataframe):
dataframe = dataframe.copy() # To avoid SettingWithCopyWarning
# Create new column name without integers
temp = dataframe.columns.tolist()[0]
col_name = temp.rstrip(temp[2:-1])
dataframe[col_name] = dataframe.mean(axis=1) # Add column to the dataframe (axis=1 means the mean() is applied row-wise)
mean_df = dataframe.iloc[: , -1:] # Isolated column of the mean by selecting all rows (:) for the last column (-1:)
print("Original:\n{}\nAverage columns:\n{}".format(dataframe, mean_df))
return mean_df.astype(float)
This function gives me this output:
Original:
t_dance0 t_dance1 t_dance2 dance
0 0.549 0.623 0.5190 0.563667
1 0.871 0.702 0.4160 0.663000
2 0.289 0.328 0.2340 0.283667
3 0.886 0.947 0.8260 0.886333
4 0.724 0.791 0.7840 0.766333
... ... ... ... ...
Average columns:
dance
0 0.563667
1 0.663000
2 0.283667
3 0.886333
4 0.766333
... ...
I asked this question about how I can split it into unique and duplicate columns. Which led me to this code:
# Function that splits dataframe into two separate dataframes, one with all unique
columns and one with all duplicates
def sub_dataframes(dataframe):
# Extract common prefix -> remove trailing digits
cols = dataframe.columns.str.replace(r'\d*$', '', regex=True).to_series().value_counts()
# Split columns
unq_cols = cols[cols == 1].index
dup_cols = dataframe.columns[~dataframe.columns.isin(unq_cols)] # All columns from dataframe that is not in unq_cols
return dataframe[unq_cols], dataframe[dup_cols]
unq_df = sub_dataframes(df)[0]
dup_df = sub_dataframes(df)[1]
print("Unique columns:\n\n{}\n\nDuplicate columns:\n\n{}".format(unq_df, dup_df))
Which gives me this output:
Unique columns:
id
0 22352
1 106534
2 23608
3 8655
4 49670
... ...
Duplicate columns:
t_dur0 t_dur1 t_dur2 t_dance0 t_dance1 t_dance2
0 292720 293760.0 292733.0 0.549 0.623 0.5190
1 213760 181000.0 245973.0 0.871 0.702 0.4160
2 157124 130446.0 152450.0 0.289 0.328 0.2340
3 127896 176351.0 166968.0 0.886 0.947 0.8260
4 210320 226253.0 211880.0 0.724 0.791 0.7840
... ... ... ... ... ... ...
2828 70740 262400.0 220680.0 0.224 0.609 0.7110
2829 252226 222400.0 214973.0 0.526 0.623 0.4820
2830 269146 251560.0 172760.0 0.551 0.756 0.7820
2831 344764 425613.0 249652.0 0.473 0.572 0.8230
2832 210955 339869.0 304124.0 0.112 0.523 0.0679
I have tried to combine these functions in another function that takes in a dataframe and returns the dataframe with all duplicate columns replaced by their mean, but I have trouble with splitting the dups_df into smaller dataframes. Is there a simpler way I can do this?
An example on the desired output:
Original:
total_tracks t_dur0 t_dur1 t_dur2 t_dance0 t_dance1 t_dance2 \
0 4 292720 293760.0 292733.0 0.549 0.623 0.5190
1 12 213760 181000.0 245973.0 0.871 0.702 0.4160
2 59 157124 130446.0 152450.0 0.289 0.328 0.2340
3 8 127896 176351.0 166968.0 0.886 0.947 0.8260
4 17 210320 226253.0 211880.0 0.724 0.791 0.7840
... ... ... ... ... ... ... ...
After function:
total_tracks popularity duration dance
0 4 21 293071.000000 0.563667
1 12 14 213577.666667 0.663000
2 59 41 146673.333333 0.283667
3 8 1 157071.666667 0.886333
4 17 47 216151.000000 0.766333
... ... ... ...

Use wide_to_long for reshape original DataFrame first and then aggregate mean:
cols = ['total_tracks']
df1 = (pd.wide_to_long(df,
stubnames=['t_dur','t_dance'],
i=cols,
j='tmp')
.reset_index()
.drop('tmp', 1)
.groupby(cols, as_index=False)
.mean())
print (df1)
total_tracks t_dur t_dance
0 4 293071.000000 0.563667
1 8 157071.666667 0.886333
2 12 213577.666667 0.663000
3 17 216151.000000 0.766333
4 59 146673.333333 0.283667
Details:
cols = ['total_tracks']
print(pd.wide_to_long(df,
stubnames=['t_dur','t_dance'],
i=cols,
j='tmp'))
t_dur t_dance
total_tracks tmp
4 0 292720.0 0.549
12 0 213760.0 0.871
59 0 157124.0 0.289
8 0 127896.0 0.886
17 0 210320.0 0.724
4 1 293760.0 0.623
12 1 181000.0 0.702
59 1 130446.0 0.328
8 1 176351.0 0.947
17 1 226253.0 0.791
4 2 292733.0 0.519
12 2 245973.0 0.416
59 2 152450.0 0.234
8 2 166968.0 0.826
17 2 211880.0 0.784

How to create Traingular moving average in python using for loop

I use python pandas to caculate the following formula
(https://i.stack.imgur.com/XIKBz.png)
I do it in python like this :
EURUSD['SMA2']= EURUSD['Close']. rolling (2).mean()
EURUSD['TMA2']= ( EURUSD['Close'] + EURUSD[SMA2']) / 2
The proplem is long coding when i calculated TMA 100 , so i need to use " for loop " to easy change TMA period .
Thanks in advance
Edited :
I had found the code but there is an error :
values = []
for i in range(1,201): values.append(eurusd['Close']).rolling(window=i).mean() values.mean()

TMA is average of averages.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 5))
print(df)
# df['mean0']=df.mean(0)
df['mean1']=df.mean(1)
print(df)
df['TMA'] = df['mean1'].rolling(window=10,center=False).mean()
print(df)
Or you can easily print it.
print(df["mean1"].mean())
Here is how it looks:
0 1 2 3 4
0 0.643560 0.412046 0.072525 0.618968 0.080146
1 0.018226 0.222212 0.077592 0.125714 0.595707
2 0.652139 0.907341 0.581802 0.021503 0.849562
3 0.129509 0.315618 0.711265 0.812318 0.757575
4 0.881567 0.455848 0.470282 0.367477 0.326812
5 0.102455 0.156075 0.272582 0.719158 0.266293
6 0.412049 0.527936 0.054381 0.587994 0.442144
7 0.063904 0.635857 0.244050 0.002459 0.423960
8 0.446264 0.116646 0.990394 0.678823 0.027085
9 0.951547 0.947705 0.080846 0.848772 0.699036
0 1 2 3 4 mean1
0 0.643560 0.412046 0.072525 0.618968 0.080146 0.365449
1 0.018226 0.222212 0.077592 0.125714 0.595707 0.207890
2 0.652139 0.907341 0.581802 0.021503 0.849562 0.602470
3 0.129509 0.315618 0.711265 0.812318 0.757575 0.545257
4 0.881567 0.455848 0.470282 0.367477 0.326812 0.500397
5 0.102455 0.156075 0.272582 0.719158 0.266293 0.303313
6 0.412049 0.527936 0.054381 0.587994 0.442144 0.404901
7 0.063904 0.635857 0.244050 0.002459 0.423960 0.274046
8 0.446264 0.116646 0.990394 0.678823 0.027085 0.451842
9 0.951547 0.947705 0.080846 0.848772 0.699036 0.705581
0 1 2 3 4 mean1 TMA
0 0.643560 0.412046 0.072525 0.618968 0.080146 0.365449 NaN
1 0.018226 0.222212 0.077592 0.125714 0.595707 0.207890 NaN
2 0.652139 0.907341 0.581802 0.021503 0.849562 0.602470 NaN
3 0.129509 0.315618 0.711265 0.812318 0.757575 0.545257 NaN
4 0.881567 0.455848 0.470282 0.367477 0.326812 0.500397 NaN
5 0.102455 0.156075 0.272582 0.719158 0.266293 0.303313 NaN
6 0.412049 0.527936 0.054381 0.587994 0.442144 0.404901 NaN
7 0.063904 0.635857 0.244050 0.002459 0.423960 0.274046 NaN
8 0.446264 0.116646 0.990394 0.678823 0.027085 0.451842 NaN
9 0.951547 0.947705 0.080846 0.848772 0.699036 0.705581 0.436115

Problem to implement count, groupby, np.repeat and agg with pandas

I have similar dataframe pandas:
df = pd.DataFrame({'x': np.random.rand(61800), 'y':np.random.rand(61800), 'z':np.random.rand(61800)})
I need to work out my dataset for the following result:
extract = df.assign(count=np.repeat(range(10),10)).groupby('count',as_index=False).agg(['mean','min', 'max'])
But if i use np.repeat(range(150),150)) i received this error:

This doesn't work because the .assign you're performing needs to have enough values to fit the original dataframe:
In [81]: df = pd.DataFrame({'x': np.random.rand(61800), 'y':np.random.rand(61800), 'z':np.random.rand(61800)})
In [82]: df.assign(count=np.repeat(range(10),10))
ValueError: Length of values does not match length of index
In this case, everything works fine if we do 10 groups repeated 6,180 times:
In [83]: df.assign(count=np.repeat(range(10),6180))
Out[83]:
x y z count
0 0.781364 0.996545 0.756592 0
1 0.609127 0.981688 0.626721 0
2 0.547029 0.167678 0.198857 0
3 0.184405 0.484623 0.219722 0
4 0.451698 0.535085 0.045942 0
... ... ... ... ...
61795 0.783192 0.969306 0.974836 9
61796 0.890720 0.286384 0.744779 9
61797 0.512688 0.945516 0.907192 9
61798 0.526564 0.165620 0.766733 9
61799 0.683092 0.976219 0.524048 9
[61800 rows x 4 columns]
In [84]: extract = df.assign(count=np.repeat(range(10),6180)).groupby('count',as_index=False).agg(['mean','min', 'max'])
In [85]: extract
Out[85]:
x y z
mean min max mean min max mean min max
count
0 0.502338 0.000230 0.999546 0.501603 0.000263 0.999842 0.503807 0.000113 0.999826
1 0.500392 0.000059 0.999979 0.499935 0.000012 0.999767 0.500114 0.000230 0.999811
2 0.498377 0.000023 0.999832 0.496921 0.000003 0.999475 0.502887 0.000028 0.999828
3 0.504970 0.000637 0.999680 0.500943 0.000256 0.999902 0.497370 0.000257 0.999969
4 0.501195 0.000290 0.999992 0.498617 0.000149 0.999779 0.497895 0.000022 0.999877
5 0.499476 0.000186 0.999956 0.503227 0.000308 0.999907 0.504688 0.000100 0.999756
6 0.495488 0.000378 0.999606 0.499893 0.000119 0.999740 0.495924 0.000031 0.999556
7 0.498443 0.000005 0.999417 0.495728 0.000262 0.999972 0.501255 0.000087 0.999978
8 0.494110 0.000014 0.999888 0.495197 0.000074 0.999970 0.493215 0.000166 0.999718
9 0.496333 0.000365 0.999307 0.502074 0.000110 0.999856 0.499164 0.000035 0.999927

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Transform dataframe value to range value in Python 3 - python

You could construct a function like the following to have full control over the process: def transform(l): l2 = [] for i in l: if 0.42 <= i <= 14.15: l2.append(1) elif i <= 27.8: l2.append(2) elif i <= 41.61: l2.append(3) elif i <= 55.36: l2.append(4) return(l2) df['nums'] = transform(df['nums'])

Related

Is there a way to recalculate existing values in df based on conditions? - Python / Pandas

How to add column for every month and generate number i.e. 1,2,3..etc

How to split dataframe into multiple dataframes based on column-name?

How to create Traingular moving average in python using for loop

Problem to implement count, groupby, np.repeat and agg with pandas

Categories

Resources