Count values in column and assign to row - python

I have a dataframe like this:
dT_sampleTime steps
0 0.002 0.001
1 0.004 0.002
2 0.004 0.003
3 0.004 0.004
4 0.003 0.005
5 0.007 0.006
6 0.001 0.007
and I want to count how often the value of steps occurs in the column dT_sampleTime and create a new column absolute frequency.
dT_sampleTime steps absolute frequency
0 0.002 0.001 1
1 0.004 0.002 1
2 0.004 0.003 1
3 0.004 0.004 3
4 0.003 0.005 0
5 0.007 0.006 0
6 0.001 0.007 1
My idea was something like this:
df['absolute frequency'] = df.groupby(df['steps'],df['dT_sampleTime']).count

map the 'steps' column with the value_counts of the 'dt_sampleTime' column. Then fill the missing values with 0.
df['absolute frequency'] = (df['steps'].map(df['dT_sampleTime'].value_counts())
.fillna(0, downcast='infer'))
# dT_sampleTime steps absolute frequency
#0 0.002 0.001 1
#1 0.004 0.002 1
#2 0.004 0.003 1
#3 0.004 0.004 3
#4 0.003 0.005 0
#5 0.007 0.006 0
#6 0.001 0.007 1
When mapping with a Series it uses the index to look up the appropriate value. The value_counts Series is
df['dT_sampleTime'].value_counts()
#0.004 3
#0.007 1
#0.001 1
#0.002 1
#0.003 1
#Name: dT_sampleTime, dtype: int64
so 0.004 in the steps columns goes to 3, for instance.

Loop across df
Use the steps value of each row as a filter applied to the dT_sampleTime column
The number of rows in the resultant DataFrame is the absolute frequency of the current steps value within the dt_sampleTime column
Append this value to the current row under the absolute frequency field
for i, row in df.iterrows():
df.loc[i, 'absolute frequency'] = len(df[df['dT_sampleTime'] == row['steps']])
Resulting df based on the example given in your original question:
dT_sampleTime steps absolute frequency
0 0.002 0.001 1.0
1 0.004 0.002 1.0
2 0.004 0.003 1.0
3 0.004 0.004 3.0
4 0.003 0.005 0.0
5 0.007 0.006 0.0
6 0.001 0.007 1.0
I'm not sure that this is the most efficient way to achieve your ends, however it does work quite well and should be suitable for your purpose. Happy to take feedback on this from anybody if they know better and would be so kind.

Related

Plotly How to create a line animation with column name in x axis and column data in y axis?

I have a data frame as shown below.
Device_ID Die_Version Temp(deg) sup(V) freq sensitivity THD_94 THD_100 THD_105 THD_110 THD_112 THD_114 THD_115 THD_116 THD_118 THD_120
TTM_041 0x16 -40 1.8 0.8 -25.041 0.009 0.01 0.071 0.206 0.143 0.099 0.1 0.296 4.243 11.888
TTM_041 0x16 -40 1.8 2.4 -25.041 0.009 0.01 0.075 0.206 0.143 0.1 0.101 0.245 4.495 11.728
TTM_041 0x16 -40 1.98 0.8 -25.04 0.009 0.01 0.076 0.207 0.143 0.1 0.102 0.313 4.484 11.844
I need to plot the graph in such a way that column names (THD_94 THD_100 THD_105 THD_110 THD_112 THD_114 THD_115 THD_116 THD_118 THD_120) needs to come in the X axis and its values need to come in the Y axis.
I tried with below code, but it is not working as expected.
fig = px.line(df_MM_SPEC, x=px.Constant('col'), y=['THD_94', 'THD_100'], animation_frame='Device_ID')
# fig.update_layout(barmode='group')
fig.show()
reshaped_df = df[[col for col in df.columns if 'THD' in col]].T.stack().reset_index()
gives us some reshaped data that looks like this:
level_0 level_1 0
0 THD_94 0 0.009
1 THD_94 1 0.009
2 THD_94 2 0.009
3 THD_100 0 0.010
4 THD_100 1 0.010
5 THD_100 2 0.010
6 THD_105 0 0.071
7 THD_105 1 0.075
8 THD_105 2 0.076
9 THD_110 0 0.206
10 THD_110 1 0.206
11 THD_110 2 0.207
12 THD_112 0 0.143
13 THD_112 1 0.143
14 THD_112 2 0.143
15 THD_114 0 0.099
16 THD_114 1 0.100
17 THD_114 2 0.100
18 THD_115 0 0.100
19 THD_115 1 0.101
20 THD_115 2 0.102
21 THD_116 0 0.296
22 THD_116 1 0.245
23 THD_116 2 0.313
24 THD_118 0 4.243
25 THD_118 1 4.495
26 THD_118 2 4.484
27 THD_120 0 11.888
28 THD_120 1 11.728
29 THD_120 2 11.844
It might be wise to rename your columns to something more logical, but I'll leave that to the reader. With the reshaped data, it's pretty trivial to animate:
px.line(reshaped_df, x='level_0', y=0, animation_frame='level_1')

Groupby: how to compute a tranformation and division in every value by group

I have a database like this:
participant time1 time2 ... time27
1 0.003 0.001 0.003
1 0.003 0.002 0.001
1 0.006 0.003 0.003
1 0.003 0.001 0.003
2 0.003 0.003 0.001
2 0.003 0.003 0.001
3 0.006 0.003 0.003
3 0.007 0.044 0.006
3 0.000 0.005 0.007
I need to perform a transformation using np.log1p() per participant and divide every value by the maximum of each participant.
(log [X + 1]) / Xmax
How can I do this?
You can use:
df.join(df.groupby('participant')
.transform(lambda s: np.log1p(s)/s.max())
.add_suffix('_trans')
)
Output (as new columns):
participant time1 time2 time27 time1_trans time2_trans time27_trans
0 1 0.003 0.001 0.003 0.499251 0.333167 0.998503
1 1 0.003 0.002 0.001 0.499251 0.666001 0.333167
2 1 0.006 0.003 0.003 0.997012 0.998503 0.998503
3 1 0.003 0.001 0.003 0.499251 0.333167 0.998503
4 2 0.003 0.003 0.001 0.998503 0.998503 0.999500
5 2 0.003 0.003 0.001 0.998503 0.998503 0.999500
6 3 0.006 0.003 0.003 0.854582 0.068080 0.427930
7 3 0.007 0.044 0.006 0.996516 0.978625 0.854582
8 3 0.000 0.005 0.007 0.000000 0.113353 0.996516

Python: for loop iterations when adding dataframes

I have a dataframe with different returns looking something like:
0.2 -0.1 0.03 0.01
0.02 0.1 -0.1 -0.2
0.05 0.06 0.07 -0.07
0.03 -0.04 -0.04 -0.03
And I have a separate dataframe with the index returns in only one column:
0.01
0.015
-0.01
-0.02
What I want to do is to basically add(+) each row value of the index return dataframe with each value for each column in the stock return dataframe.
The desired outcome looks like:
0.21 -0.09
0.035 0.115
0.04 0.05
0.01 -0.06 etc etc
Normally in Matlab for example the for loop would be quite simple, but in python the indexing is what gets me stuck.
I have tried a simple for loop:
for i, j in df_stock_returns.iterrows():
df_new = df_stock_returns[i, j] + df_index_reuturns[j]
But that doesn't really work, any help is appreciated!
Assuming you have
In [27]: df
Out[27]:
0 1 2 3
0 0.20 -0.10 0.03 0.01
1 0.02 0.10 -0.10 -0.20
2 0.05 0.06 0.07 -0.07
3 0.03 -0.04 -0.04 -0.03
and
In [28]: dfi
Out[28]:
0
0 0.010
1 0.015
2 -0.010
3 -0.020
you can just write
In [26]: pd.concat([df[c] + dfi[0] for c in df], axis=1)
Out[26]:
0 0 1 2
0 0.210 -0.090 0.040 0.020
1 0.035 0.115 -0.085 -0.185
2 0.040 0.050 0.060 -0.080
3 0.010 -0.060 -0.060 -0.050
In pandas you almost never need to iterate over individual cells. Here I just iterated over the columns, and df[c] + dfi[0] adds the two columns element-wise. Then concat with axis=1 (0=rows, 1=columns) just concatenates everything into one dataframe.
I suppose the most straightforward way will work
for c in a.columns:
a[c] = a[c] + b
>>> a
0 1 2 3
0 0.210 -0.090 0.040 0.020
1 0.215 -0.085 0.045 0.025
2 0.190 -0.110 0.020 0.000
3 0.180 -0.120 0.010 -0.010
You can simply add two df as below
col1=[0.2,0.02]
col2=[-0.1,0.2]
col3=[0.01,0.015]
df1=pd.DataFrame(data=list(zip(col1, col2)),columns=['list1','list2'])
df2=pd.DataFrame({'list3':col3})
output = df1[:] + df2['list3'].values
The df1[:] extract all columns and it to the reference column df2['list3']

Pandas : interpolate a dataframe and replace values

For each column of a dataframe, I did an interpolation using the pandas function "interpolate" and i'm trying to replace values of the dataframe by values of the interpolated curve (trend curve on excel).
I have the following dataframe, named data
0 1
0 0.000 0.002
1 0.001 0.002
2 0.001 0.003
3 0.003 0.004
4 0.003 0.005
5 0.003 0.005
6 0.004 0.006
7 0.005 0.006
8 0.006 0.007
9 0.006 0.007
10 0.007 0.008
11 0.007 0.009
12 0.008 0.010
13 0.008 0.010
14 0.010 0.012
I then did the following code:
for i in range(len(data.columns)):
data[i].interpolate(method="polynomial",order=2,inplace=True)
I thought that inplace would replace values but it don't seems to work. Does someone knowns how to do that?
Thanks and have a good day :)
Try this,
import pandas as pd
import numpy as np
I created a mini text file with some crazy values so you can see how interpolate is working.
File looks like this,
0,1
0.0,.002
0.001,.3
NaN,NaN
4.003,NaN
.004,19
.005,234
NaN,444
1,777
Here is how to import and process your data,
df=pd.read_csv('datafile.txt, header=0)
for column in df:
df[column].interpolate(method="polynomial",order=2,inplace=True)
print(df.head())
the dataframe now looks like this,
0 1
0 0.000000 0.002000
1 0.001000 0.300000
2 2.943616 -30.768123
3 4.003000 -70.313176
4 0.004000 19.000000
5 0.005000 234.000000
6 0.616931 444.000000
7 1.000000 777.000000
Also,
if you mean you want to interpolate between the points in your dataframe, that is something different.
Something like that would be,
df1 = df.reindex(df.index.union(np.linspace(.11,.25,8)))
df1.interpolate('index')
the results of that look like,
0 1
0.00 0.00000 0.00200
0.11 0.00011 0.03478
0.13 0.00013 0.04074
0.15 0.00015 0.04670
0.17 0.00017 0.05266
0.19 0.00019 0.05862
0.21 0.00021 0.06458
0.23 0.00023 0.07054
0.25 0.00025 0.07650
1.00 0.00100 0.30000
It's in fact working with scipy.interpolate.UnivariateSpline

Split time series to find annual maximum from a data frame in Pandas

I have a time series dataset as follows where first column is year, second month, third date, and subsequent columns are time series data sets:
1995 1 1 0.0 1.929 23.015 1.429 0.806 0.177 0.027
1995 1 2 0.000 1.097 12.954 0.000 0.196 0.361 0.233
1995 1 3 0.000 11.391 0.228 0.004 2.134 11.190 0.028
1995 1 4 0.504 0.373 0.197 0.333 5.894 0.003 0.098
1995 1 5 0.027 20.957 0.115 0.208 0.000 0.000 0.104
1995 1 6 0.043 9.952 0.042 2.499 1.406 0.000 0.748
.... . . ..... ..... ..... ..... ..... ..... .....
2000 12 31 50.98 23.23 98.78 34.23 34.54 45.54 34.21
I have read it using:
pd.read_csv('D:/test.csv')
I want to read data columns and find annual maximum value from them. Any suggestion on how to group or split data to find annual maximum values for each variable would be helpful.
IIUC, given your sample dataframe df, you can read it with:
df = pd.read_csv('D:/test.csv', header=None)
and then groupby by column 0 (year) and get the maximum for each value:
g = df.groupby(0).max()
This returns:
1 2 3 4 5 6 7 8 9
0
1995 1 6 0.504 20.957 23.015 2.499 5.894 11.19 0.748

Categories