so I have a series, I want to cumsum, but start over every time I hit a 0, somthing like this:
orig
wanted result
0
0
0
1
1
1
2
1
2
3
1
3
4
1
4
5
1
5
6
1
6
7
0
0
8
1
1
9
1
2
10
1
3
11
0
0
12
1
1
13
1
2
14
1
3
15
1
4
16
1
5
17
1
6
any ideas? (pandas, pure python, other)
Use df['orig'].eq(0).cumsum() to generate groups starting on each 0, then cumcount to get the increasing values:
df['result'] = df.groupby(df['orig'].eq(0).cumsum()).cumcount()
output:
orig wanted result result
0 0 0 0
1 1 1 1
2 1 2 2
3 1 3 3
4 1 4 4
5 1 5 5
6 1 6 6
7 0 0 0
8 1 1 1
9 1 2 2
10 1 3 3
11 0 0 0
12 1 1 1
13 1 2 2
14 1 3 3
15 1 4 4
16 1 5 5
17 1 6 6
Intermediate:
df['orig'].eq(0).cumsum()
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 2
8 2
9 2
10 2
11 3
12 3
13 3
14 3
15 3
16 3
17 3
Name: orig, dtype: int64
import pandas as pd
condition = df.Orig.eq(0)
df['reset'] = condition.cumsum()
Related
I want to read volume value from mp3 file (filename.mp3) and not by recording the audio file as in this example :
import sounddevice as sd
import numpy as np
def print_sound(indata, outdata, frames, time, status):
volume_norm = np.linalg.norm(indata)*10
print (int(volume_norm))
with sd.Stream(callback=print_sound):
sd.sleep(10000)
output :
1
1
1
0
1
1
1
1
0
0
0
0
0
17
24
8
5
15
18
16
6
2
3
5
10
8
5
1
0
0
2
4
3
1
0
0
0
1
3
4
2
0
0
2
2
4
4
3
0
0
2
2
5
3
0
0
0
0
3
3
1
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
1
2
2
2
2
2
2
3
4
3
3
7
13
4
4
3
5
6
3
2
3
3
4
6
6
6
4
3
3
2
3
6
6
8
12
15
1
0
0
1
12
19
2
4
3
6
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
5
0
0
2
3
0
0
0
0
0
5
5
17
4
6
3
4
5
16
10
7
31
5
1
0
0
0
0
3
3
1
0
0
0
0
0
0
0
0
0
0
1
7
0
2
5
20
5
6
5
29
12
4
7
2
0
1
5
13
51
5
9
44
7
3
3
4
4
4
1
1
1
1
110
71
0
0
48
23
28
4
0
0
0
0
0
74
53
37
29
26
15
17
14
7
5
5
6
6
6
6
7
7
7
7
7
7
7
7
8
8
8
7
7
6
6
6
6
6
6
4
53
47
18
13
9
8
8
7
5
4
4
4
4
5
6
6
6
5
4
3
3
3
2
3
2
3
3
3
3
3
3
4
4
4
5
5
5
6
7
7
8
7
18
8
2
2
4
Try this:
install: pip install ffmpegio
then the following code should do what you want:
import ffmpegio
import numpy as np
with ffmpegio.open(mp3file, 'ra', blocksize=frames, sample_fmt='dbl') as file:
for i, indata in enumerate(file):
volume_norm = np.linalg.norm(indata)*10
n0 = i*frames # starting sample index
t = np.range(n0,n0+volume_norm.shape[0])/file.sample_rate
print (int(volume_norm))
sample_fmt='dbl' argument makes indata to be float data type. If you want to keep the original sample type, remove the argument.
I'm the dev of ffmpegio library. Let me know if you encounter any issues, and I'll fix'em asap.
I have the following column in my pandas dataframe named as FailureLabel
ID FailureLabel
0 1 1
1 2 1
2 3 1
3 4 0
4 5 0
5 6 0
6 7 0
7 8 1
8 9 1
9 10 0
10 11 0
11 12 1
12 13 1
I would like to assign a unique_id to this column such that eachs 1's have a unique id whereas all zeros + the next one have a common "unique id".
I tried using the following code ,
df['unique_id'] = (df['FailureLabel'] | (df['FailureLabel']!=df['FailureLabel'].shift())).cumsum()
which gives me the following output,
ID FailureLabel unique_id
0 1 1 1
1 2 1 2
2 3 1 3
3 4 0 4
4 5 0 4
5 6 0 4
6 7 0 4
7 8 1 5
8 9 1 6
9 10 0 7
10 11 0 7
11 12 1 8
12 13 1 9
But what I desire is,
ID FailureLabel unique_id
0 1 1 1
1 2 1 2
2 3 1 3
3 4 0 4
4 5 0 4
5 6 0 4
6 7 0 4
7 8 1 4
8 9 1 5
9 10 0 6
10 11 0 6
11 12 1 6
12 13 1 7
Use Series.shift with backfilling first value, compare by 1 and add cumulative sum:
df['unique_id'] = df['FailureLabel'].shift().bfill().eq(1).cumsum()
print (df)
ID FailureLabel unique_id
0 1 1 1
1 2 1 2
2 3 1 3
3 4 0 4
4 5 0 4
5 6 0 4
6 7 0 4
7 8 1 4
8 9 1 5
9 10 0 6
10 11 0 6
11 12 1 6
12 13 1 7
I have the following dataframe:
df = pd.DataFrame({'group_nr':[0,0,1,1,1,2,2,3,3,0,0,1,1,2,2,2,3,3]})
print(df)
group_nr
0 0
1 0
2 1
3 1
4 1
5 2
6 2
7 3
8 3
9 0
10 0
11 1
12 1
13 2
14 2
15 2
16 3
17 3
and would like to change from repeating group numbers to incremental group numbers:
group_nr incremental_group_nr
0 0 0
1 0 0
2 1 1
3 1 1
4 1 1
5 2 2
6 2 2
7 3 3
8 3 3
9 0 4
10 0 4
11 1 5
12 1 5
13 2 6
14 2 6
15 2 6
16 3 7
17 3 7
I can't find a way of doing this without looping through the rows. Does someone have an idea how to implement this nicely?
You can check if the values are equal to the following, and take a cumsum of the boolean series to generate the groups:
df['incremental_group_nr'] = df.group_nr.ne(df.group_nr.shift()).cumsum().sub(1)
print(df)
group_nr incremental_group_nr
0 0 0
1 0 0
2 1 1
3 1 1
4 1 1
5 2 2
6 2 2
7 3 3
8 3 3
9 0 4
10 0 4
11 1 5
12 1 5
13 2 6
14 2 6
15 2 6
16 3 7
17 3 7
Compare by shifted values by Series.shift with not equal by Series.ne and then add cumulative sum with subract 1:
df['incremental_group_nr'] = df['group_nr'].ne(df['group_nr'].shift()).cumsum() - 1
print(df)
group_nr incremental_group_nr
0 0 0
1 0 0
2 1 1
3 1 1
4 1 1
5 2 2
6 2 2
7 3 3
8 3 3
9 0 4
10 0 4
11 1 5
12 1 5
13 2 6
14 2 6
15 2 6
16 3 7
17 3 7
Another idea is use backfilling first missing value after shift by bfill:
df['incremental_group_nr'] = df['group_nr'].ne(df['group_nr'].shift().bfill()).cumsum()
print(df)
group_nr incremental_group_nr
0 0 0
1 0 0
2 1 1
3 1 1
4 1 1
5 2 2
6 2 2
7 3 3
8 3 3
9 0 4
10 0 4
11 1 5
12 1 5
13 2 6
14 2 6
15 2 6
16 3 7
17 3 7
I've got a df
df1
a b
4 0 1
5 0 1
6 0 2
2 0 3
3 1 2
15 1 3
12 1 3
13 1 1
15 3 1
14 3 1
8 3 3
9 3 2
10 3 1
the df should be grouped by a and b and I need a column c that goes up from 1 to amount of groups within subgroups of a
df1
a b c
4 0 1 1
5 0 1 1
6 0 2 2
2 0 3 3
3 1 2 1
15 1 3 2
12 1 3 2
13 1 1 3
15 3 1 1
14 3 1 1
8 3 3 2
9 3 2 3
10 3 1 4
How can I do that?
We can do groupby + transform factorize
df['C']=df.groupby('a').b.transform(lambda x : x.factorize()[0]+1)
4 1
5 1
6 2
2 3
3 1
15 2
12 2
13 3
15 1
14 1
8 1
9 1
10 2
Name: b, dtype: int64
Just so we can see the loop version
from itertools import count
from collections import defaultdict
x = defaultdict(count)
y = {}
c = []
for a, b in zip(df.a, df.b):
if (a, b) not in y:
y[(a, b)] = next(x[a]) + 1
c.append(y[(a, b)])
df.assign(C=c)
a b C
4 0 1 1
5 0 1 1
6 0 2 2
2 0 3 3
3 1 2 1
15 1 3 2
12 1 3 2
13 1 1 3
15 3 1 1
14 3 1 1
8 3 3 2
9 3 2 3
10 3 1 1
One option is groupby a and then iterate through each group and groupby b. Then use can use ngroup
df['c'] = np.hstack([g.groupby('b').ngroup().to_numpy() for _,g in df.groupby('a')])
a b c
4 0 1 0
5 0 1 0
6 0 2 1
2 0 3 2
3 1 2 1
15 1 3 2
12 1 3 2
13 1 1 0
15 3 1 0
14 3 1 0
8 3 1 0
9 3 1 0
10 3 2 1
you can use groupby.rank if you don't care about the order in the data.
df['c'] = df.groupby('a')['b'].rank('dense').astype(int)
Currently, I have a dataframe like this:
0 0 0 3 0 0
0 7 8 9 1 0
0 4 5 2 4 0
My code to stack it:
dt = dataset.iloc[:,0:7].stack().sort_index(level=1).reset_index(level=0, drop=True).to_frame()
dt['variable'] = pandas.Categorical(dt.index).codes+1
dt.rename(columns={0:index_column_name}, inplace=True)
dt.set_index(index_column_name, inplace=True)
dt['variable'] = numpy.sort(dt['variable'])
However, it drops the first row when I'm stacking it, and I want to keep the headers / first row, how would I achieve this?
In essence, I'm losing the data from the first row (a.k.a column headers) and I want to keep it.
Desired Output:
value,variable
0 1
0 1
0 1
0 2
7 2
4 2
0 3
8 3
5 3
3 4
9 4
2 4
0 5
1 5
4 5
0 6
0 6
0 6
Current output:
value,variable
0 1
0 1
7 2
4 2
8 3
5 3
9 4
2 4
1 5
4 5
0 6
0 6
Why not use df.melt as #WeNYoBen mentioned?
print(df)
1 2 3 4 5 6
0 0 0 0 3 0 0
1 0 7 8 9 1 0
2 0 4 5 2 4 0
print(df.melt())
variable value
0 1 0
1 1 0
2 1 0
3 2 0
4 2 7
5 2 4
6 3 0
7 3 8
8 3 5
9 4 3
10 4 9
11 4 2
12 5 0
13 5 1
14 5 4
15 6 0
16 6 0
17 6 0