Replicating rows of a Pandas dataframe based on a column condition - python

I have a dataframe looking like this:
Weekday Day_in_Month Starting_hour Ending_hour Power
3 1 1 3 35
3 1 3 7 15
4 2 22 2 5
.
.
.
I want to duplicate every column until the Starting_hour matches the Ending_hour.
-> All values of the row should be the same, but the Starting_hour value should change by Starting_hour + 1 for every new row.
The final dataframe should look like the following:
Weekday Day_in_Month Starting_hour Ending_hour Power
3 1 1 3 35
3 1 2 3 35
3 1 3 3 35
3 1 3 7 15
3 1 4 7 15
3 1 5 7 15
3 1 6 7 15
3 1 7 7 15
4 2 22 2 5
4 2 23 2 5
4 2 24 2 5
4 2 1 2 5
4 2 2 2 5
I appreciate any ideas on it, thanks!

Use Index.repeat with subtracted values and repeat rows by DataFrame.loc, then add counter to Starting_hour by GroupBy.cumcount:
df1 = df.loc[df.index.repeat(df['Ending_hour'].sub(df['Starting_hour']).add(1))]
df1['Starting_hour'] += df1.groupby(level=0).cumcount()
df1 = df1.reset_index(drop=True)
print (df1)
EDIT: If possible greater Starting_hour add 24 to Ending_hour, then in last step remove 1 for starting hours by 0, use modulo by 24 and last add 1:
m = df['Starting_hour'].gt(df['Ending_hour'])
e = df['Ending_hour'].mask(m, df['Ending_hour'].add(24))
df1 = df.loc[df.index.repeat(e.sub(df['Starting_hour']).add(1))]
df1['Starting_hour'] = (df1['Starting_hour'].add(df1.groupby(level=0).cumcount())
.sub(1).mod(24).add(1))
df1 = df1.reset_index(drop=True)
print (df1)
Weekday Day_in_Month Starting_hour Ending_hour Power
0 3 1 1 3 35
1 3 1 2 3 35
2 3 1 3 3 35
3 3 1 3 7 15
4 3 1 4 7 15
5 3 1 5 7 15
6 3 1 6 7 15
7 3 1 7 7 15
8 4 2 22 2 5
9 4 2 23 2 5
10 4 2 24 2 5
11 4 2 1 2 5
12 4 2 2 2 5

Related

Change repeating groups in a column to incremental groups

I have the following dataframe:
df = pd.DataFrame({'group_nr':[0,0,1,1,1,2,2,3,3,0,0,1,1,2,2,2,3,3]})
print(df)
group_nr
0 0
1 0
2 1
3 1
4 1
5 2
6 2
7 3
8 3
9 0
10 0
11 1
12 1
13 2
14 2
15 2
16 3
17 3
and would like to change from repeating group numbers to incremental group numbers:
group_nr incremental_group_nr
0 0 0
1 0 0
2 1 1
3 1 1
4 1 1
5 2 2
6 2 2
7 3 3
8 3 3
9 0 4
10 0 4
11 1 5
12 1 5
13 2 6
14 2 6
15 2 6
16 3 7
17 3 7
I can't find a way of doing this without looping through the rows. Does someone have an idea how to implement this nicely?
You can check if the values are equal to the following, and take a cumsum of the boolean series to generate the groups:
df['incremental_group_nr'] = df.group_nr.ne(df.group_nr.shift()).cumsum().sub(1)
print(df)
group_nr incremental_group_nr
0 0 0
1 0 0
2 1 1
3 1 1
4 1 1
5 2 2
6 2 2
7 3 3
8 3 3
9 0 4
10 0 4
11 1 5
12 1 5
13 2 6
14 2 6
15 2 6
16 3 7
17 3 7
Compare by shifted values by Series.shift with not equal by Series.ne and then add cumulative sum with subract 1:
df['incremental_group_nr'] = df['group_nr'].ne(df['group_nr'].shift()).cumsum() - 1
print(df)
group_nr incremental_group_nr
0 0 0
1 0 0
2 1 1
3 1 1
4 1 1
5 2 2
6 2 2
7 3 3
8 3 3
9 0 4
10 0 4
11 1 5
12 1 5
13 2 6
14 2 6
15 2 6
16 3 7
17 3 7
Another idea is use backfilling first missing value after shift by bfill:
df['incremental_group_nr'] = df['group_nr'].ne(df['group_nr'].shift().bfill()).cumsum()
print(df)
group_nr incremental_group_nr
0 0 0
1 0 0
2 1 1
3 1 1
4 1 1
5 2 2
6 2 2
7 3 3
8 3 3
9 0 4
10 0 4
11 1 5
12 1 5
13 2 6
14 2 6
15 2 6
16 3 7
17 3 7

pandas create groups based on previous value

I have a DataFrame which is sorted by an integer column v1:
v1
0 1
1 5
2 6
3 12
4 15
5 23
6 24
7 25
8 33
I want to group values in v1 like this: If a value - prev_value < 5, they have the same group.
For that, I want to give an increasing number for each group.
So I want to create another column, v1_group, which will have the output:
v1 v1_group
0 1 1
1 5 1
2 6 1
3 12 2 # 12 - 6 > 5, new group
4 15 2
5 23 3
6 24 3
7 25 3
8 33 4
I need to do the same task with a datetime column: group values if value - prev_value < timedelta.
I know I can solve this using a standard for loop. Is there a better pandas way?
IIUC,
df['v1_group'] = df.v1.diff().ge(5).cumsum() + 1
Output:
v1 v1_group
0 1 1
1 5 1
2 6 1
3 12 2
4 15 2
5 23 3
6 24 3
7 25 3
8 33 4

Match length of index to length of list in loop

I have a DataFrame that I want to expand with the expansion of a list.
df['IPS'] = pd.Series(ip)
df[date] = pd.Series(countfreq(re.findall( r'[0-9]+(?:\.[0-9]+){3}', str(
print (df)
The list ip expands but when setting the column it doesn't show the new elements. It keeps the same length that was first passed in (19)
0 127.15.20.191 2
1 64.255.65.13 2
2 175.248.78.182 2
3 37.83.22.114 1
4 33.68.135.110 2
5 101.124.127.56 2
6 236.32.91.121 2
7 207.229.230.76 2
8 90.16.136.130 2
9 64.199.43.58 2
10 127.15.20.191 2
11 64.255.65.13 2
12 175.248.78.182 2
13 33.68.135.110 2
14 101.124.127.56 2
15 236.32.91.121 2
16 207.229.230.76 2
17 90.16.136.130 2
18 64.199.43.58 2
For example if the length of the list was changed to 20
0 127.15.20.191 2
1 64.255.65.13 2
2 175.248.78.182 2
3 37.83.22.114 1
4 33.68.135.110 2
5 101.124.127.56 2
6 236.32.91.121 2
7 207.229.230.76 2
8 90.16.136.130 2
9 64.199.43.58 2
10 127.15.20.191 2
11 64.255.65.13 2
12 175.248.78.182 2
13 33.68.135.110 2
14 101.124.127.56 2
15 236.32.91.121 2
16 207.229.230.76 2
17 90.16.136.130 2
18 64.199.43.58 2
19 64.210.89.88 1

Dropping different possible combination of values from a column in pandas dataframe iteratively

I have a data frame as shown below:
import pandas as pd
Data = pd.DataFrame({'L1': [1,2,3,4,5], 'L2': [6,7,3,5,6], 'ouptput':[10,11,12,13,14]})
Data
Yields,
L1 L2 ouptput
0 1 6 10
1 2 7 11
2 3 3 12
3 4 5 13
4 5 6 14
I want to loop through the data to remove n number of values from the column 'output' in above Data, where n = [1,2,3,4] and assign it to a new data frame 'Test_Data'. For example if I assign n = 2 the function should produce
Test_Data - iteration 1 as
L1 L2 ouptput
0 1 6
1 2 7
2 3 3 12
3 4 5 13
4 5 6 14
Test_Data - iteration 2 as
L1 L2 ouptput
0 1 6 10
1 2 7 11
2 3 3
3 4 5
4 5 6 14
like wise it should produce a data frame with 2 values removed from the 'output' column in data frame. It should produce a new output (new combination) everytime. No output should be repeated. Also I should have control over the number of iterations. Eample 5c3 has 10 possible combinations. But I should be able to stop it at 8 iterations.
This is not a great solution but will probably help you achieve what you are looking for:
import pandas as pd
Data = pd.DataFrame({'L1': [1,2,3,4,5], 'L2': [6,7,3,5,6], 'output':[10,11,12,13,14]})
num_iterations = 1
num_values = 3
for i in range(0, num_iterations):
tmp_data = Data.copy()
tmp_data.loc[i*num_values:num_values*(i+1)-1, 'output'] = ''
print tmp_data
This gives you a concatenated dataframe with every combination using pd.concat and itertools.combinations
from itertools import combinations
import pandas as pd
def mask(df, col, idx):
d = df.copy()
d.loc[list(idx), col] = ''
return d
n = 2
pd.concat({c: mask(Data, 'ouptput', c) for c in combinations(Data.index, n)})
L1 L2 ouptput
0 1 0 1 6
1 2 7
2 3 3 12
3 4 5 13
4 5 6 14
2 0 1 6
1 2 7 11
2 3 3
3 4 5 13
4 5 6 14
3 0 1 6
1 2 7 11
2 3 3 12
3 4 5
4 5 6 14
4 0 1 6
1 2 7 11
2 3 3 12
3 4 5 13
4 5 6
1 2 0 1 6 10
1 2 7
2 3 3
3 4 5 13
4 5 6 14
3 0 1 6 10
1 2 7
2 3 3 12
3 4 5
4 5 6 14
4 0 1 6 10
1 2 7
2 3 3 12
3 4 5 13
4 5 6
2 3 0 1 6 10
1 2 7 11
2 3 3
3 4 5
4 5 6 14
4 0 1 6 10
1 2 7 11
2 3 3
3 4 5 13
4 5 6
3 4 0 1 6 10
1 2 7 11
2 3 3 12
3 4 5
4 5 6

Separating elements of a Pandas DataFrame in Python

I have a pandas DataFrame that looks like the following:
Time Measurement
0 0 1
1 1 2
2 2 3
3 3 4
4 4 5
5 0 2
6 1 3
7 2 4
8 3 5
9 4 6
10 0 3
11 1 4
12 2 5
13 3 6
14 4 7
15 0 1
16 1 2
17 2 3
18 3 4
19 4 5
20 0 2
21 1 3
22 2 4
23 3 5
24 4 6
25 0 3
26 1 4
27 2 5
28 3 6
29 4 7
which can be generated with the following code:
import pandas
time=[0,1,2,3,4]
repeat_1_conc_1=[1,2,3,4,5]
repeat_1_conc_2=[2,3,4,5,6]
repeat_1_conc_3=[3,4,5,6,7]
d1=pandas.DataFrame([time,repeat_1_conc_1]).transpose()
d2=pandas.DataFrame([time,repeat_1_conc_2]).transpose()
d3=pandas.DataFrame([time,repeat_1_conc_3]).transpose()
repeat_2_conc_1=[1,2,3,4,5]
repeat_2_conc_2=[2,3,4,5,6]
repeat_2_conc_3=[3,4,5,6,7]
d4=pandas.DataFrame([time,repeat_2_conc_1]).transpose()
d5=pandas.DataFrame([time,repeat_2_conc_2]).transpose()
d6=pandas.DataFrame([time,repeat_2_conc_3]).transpose()
df= pandas.concat([d1,d2,d3,d4,d5,d6]).reset_index()
df.drop('index',axis=1,inplace=True)
df.columns=['Time','Measurement']
print df
If you look at the code, you'll see that I have two experimental repeats in the same DataFrame which should be separated at df.iloc[:15]. Additionally, within each experiment I have 3 sub-experiments that can be thought of like the starting conditions of a dose response, i.e. first sub-experiment starts with 1, second with 2 and third with 3. These should be separated at index intervals of `len(time)', which is 0-4, 5 elements for each experimental repeat. Could somebody please tell me the best way to separate this data into individual time course measurements for each experiment? I'm not exactly sure what the best data structure would be to use but I just need to be able to access each data for each sub experiment for each experimental repeat easily. Perhaps sometime like:
repeat1=
Time Measurement
0 0 1
1 1 2
2 2 3
3 3 4
4 4 5
5 0 2
6 1 3
7 2 4
8 3 5
9 4 6
10 0 3
11 1 4
12 2 5
13 3 6
14 4 7
Repeat 2=
Time Measurement
15 0 1
16 1 2
17 2 3
18 3 4
19 4 5
20 0 2
21 1 3
22 2 4
23 3 5
24 4 6
25 0 3
26 1 4
27 2 5
28 3 6
29 4 7
IIUC, you may set a multiindex so that you can index your DF accessing experiments and subexperiments easily:
In [261]: dfi = df.set_index([df.index//15+1, df.index//5 - df.index//15*3 + 1])
In [262]: dfi
Out[262]:
Time Measurement
1 1 0 1
1 1 2
1 2 3
1 3 4
1 4 5
2 0 2
2 1 3
2 2 4
2 3 5
2 4 6
3 0 3
3 1 4
3 2 5
3 3 6
3 4 7
2 1 0 1
1 1 2
1 2 3
1 3 4
1 4 5
2 0 2
2 1 3
2 2 4
2 3 5
2 4 6
3 0 3
3 1 4
3 2 5
3 3 6
3 4 7
selecting subexperiments
In [263]: dfi.loc[1,1]
Out[263]:
Time Measurement
1 1 0 1
1 1 2
1 2 3
1 3 4
1 4 5
In [264]: dfi.loc[2,2]
Out[264]:
Time Measurement
2 2 0 2
2 1 3
2 2 4
2 3 5
2 4 6
select second experiment with all subexperiments:
In [266]: dfi.loc[2,:]
Out[266]:
Time Measurement
1 0 1
1 1 2
1 2 3
1 3 4
1 4 5
2 0 2
2 1 3
2 2 4
2 3 5
2 4 6
3 0 3
3 1 4
3 2 5
3 3 6
3 4 7
alternatively you can create your own slicing function:
def my_slice(rep=1, subexp=1):
rep -= 1
subexp -= 1
return df.ix[rep*15 + subexp*5 : rep*15 + subexp*5 + 4, :]
demo:
In [174]: my_slice(1,1)
Out[174]:
Time Measurement
0 0 1
1 1 2
2 2 3
3 3 4
4 4 5
In [175]: my_slice(2,1)
Out[175]:
Time Measurement
15 0 1
16 1 2
17 2 3
18 3 4
19 4 5
In [176]: my_slice(2,2)
Out[176]:
Time Measurement
20 0 2
21 1 3
22 2 4
23 3 5
24 4 6
PS bit more convenient way to concatenate your DFs:
df = pandas.concat([d1,d2,d3,d4,d5,d6], ignore_index=True)
so you don't need the following .reset_index() and drop()

Categories