I have a Series that look like this
col
0 1
1 2
2 3
3 4
4 1
5 2
6 3
7 1
8 2
9 3
10 1
11 2
and I would like to generate a second counter that looks like this
col col2
0 1 1
1 2 1
2 3 1
3 4 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
10 1 4
11 2 4
How can I do that in python?
If 1 is always start of groups then create mask by compare by Series.eq and then add Series.cumsum for cumulative sum:
df['col2'] = df['col'].eq(1).cumsum()
print (df)
col col2
0 1 1
1 2 1
2 3 1
3 4 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
10 1 4
11 2 4
I have the following Dataframe:
a b c d
0 1 4 9 2
1 2 5 8 7
2 4 6 2 3
3 3 2 7 5
I want to assign a number to each element in a row according to it's order. The result should look like this:
a b c d
0 1 3 4 2
1 1 2 4 3
2 3 4 1 2
3 2 1 4 3
I tried to use the np.argsort function which doesn't work. Does someone know an easy way to to this? Thanks.
Use DataFrame.rank:
df = df.rank(axis=1).astype(int)
print (df)
a b c d
0 1 3 4 2
1 1 2 4 3
2 3 4 1 2
3 2 1 4 3
I have two data frames in pandas from which i need to get the rows with all the corresponding column values in second which are not in first .
ex
df A
A B C D
6 4 1 6
7 6 6 3
1 6 2 9
8 0 4 9
1 0 2 3
8 4 7 5
4 7 1 1
3 7 3 4
5 2 8 8
3 2 8 8
5 2 8 8
df B
A B C D
1 0 2 3
8 4 7 5
4 7 1 1
1 0 2 3
8 4 7 5
4 7 1 1
3 7 3 4
5 2 8 8
1 1 1 1
2 2 2 2
1 1 1 1
req
A B C D
1 1 1 1
2 2 2 2
1 1 1 1
i tried using pd.merge and inner/left on all columns but it is taking a lot more computational time and resource if the rows and columns are more. is there any other way to work it around like iterating through each row of dfA with dfB on all columns and then pick the ones which are there only in dfB?
You can use merge with ind parameter.
df_b.merge(df_a, on=['A','B','C','D'],
how='left', indicator='ind')\
.query('ind == "left_only"')\
.drop('ind', axis=1)
Output:
A B C D
9 1 1 1 1
10 2 2 2 2
11 1 1 1 1
Say I have a python DataFrame with the following structure:
pd.DataFrame([[1,2,3,4],[1,2,3,4],[1,3,5,6],[1,4,6,7],[1,4,6,7],[1,4,6,7]])
Out[262]:
0 1 2 3
0 1 2 3 4
1 1 2 3 4
2 1 3 5 6
3 1 4 6 7
4 1 4 6 7
5 1 4 6 7
How can I add a column called 'ct' that counts the instances of the DataFrame where column 1-3 match to each row that matches... so the DataFrame would look like this when all is completed.
0 1 2 3 ct
0 1 2 3 4 2
1 1 2 3 4 2
2 1 3 5 6 1
3 1 4 6 7 3
4 1 4 6 7 3
5 1 4 6 7 3
You can use groupby + transform + size:
df['ct'] = df.groupby([1,2,3])[1].transform('size')
#alternatively
#df['ct'] = df.groupby([1,2,3])[1].transform(len)
print (df)
0 1 2 3 ct
0 1 2 3 4 2
1 1 2 3 4 2
2 1 3 5 6 1
3 1 4 6 7 3
4 1 4 6 7 3
5 1 4 6 7 3
I have a pandas DataFrame that looks like the following:
Time Measurement
0 0 1
1 1 2
2 2 3
3 3 4
4 4 5
5 0 2
6 1 3
7 2 4
8 3 5
9 4 6
10 0 3
11 1 4
12 2 5
13 3 6
14 4 7
15 0 1
16 1 2
17 2 3
18 3 4
19 4 5
20 0 2
21 1 3
22 2 4
23 3 5
24 4 6
25 0 3
26 1 4
27 2 5
28 3 6
29 4 7
which can be generated with the following code:
import pandas
time=[0,1,2,3,4]
repeat_1_conc_1=[1,2,3,4,5]
repeat_1_conc_2=[2,3,4,5,6]
repeat_1_conc_3=[3,4,5,6,7]
d1=pandas.DataFrame([time,repeat_1_conc_1]).transpose()
d2=pandas.DataFrame([time,repeat_1_conc_2]).transpose()
d3=pandas.DataFrame([time,repeat_1_conc_3]).transpose()
repeat_2_conc_1=[1,2,3,4,5]
repeat_2_conc_2=[2,3,4,5,6]
repeat_2_conc_3=[3,4,5,6,7]
d4=pandas.DataFrame([time,repeat_2_conc_1]).transpose()
d5=pandas.DataFrame([time,repeat_2_conc_2]).transpose()
d6=pandas.DataFrame([time,repeat_2_conc_3]).transpose()
df= pandas.concat([d1,d2,d3,d4,d5,d6]).reset_index()
df.drop('index',axis=1,inplace=True)
df.columns=['Time','Measurement']
print df
If you look at the code, you'll see that I have two experimental repeats in the same DataFrame which should be separated at df.iloc[:15]. Additionally, within each experiment I have 3 sub-experiments that can be thought of like the starting conditions of a dose response, i.e. first sub-experiment starts with 1, second with 2 and third with 3. These should be separated at index intervals of `len(time)', which is 0-4, 5 elements for each experimental repeat. Could somebody please tell me the best way to separate this data into individual time course measurements for each experiment? I'm not exactly sure what the best data structure would be to use but I just need to be able to access each data for each sub experiment for each experimental repeat easily. Perhaps sometime like:
repeat1=
Time Measurement
0 0 1
1 1 2
2 2 3
3 3 4
4 4 5
5 0 2
6 1 3
7 2 4
8 3 5
9 4 6
10 0 3
11 1 4
12 2 5
13 3 6
14 4 7
Repeat 2=
Time Measurement
15 0 1
16 1 2
17 2 3
18 3 4
19 4 5
20 0 2
21 1 3
22 2 4
23 3 5
24 4 6
25 0 3
26 1 4
27 2 5
28 3 6
29 4 7
IIUC, you may set a multiindex so that you can index your DF accessing experiments and subexperiments easily:
In [261]: dfi = df.set_index([df.index//15+1, df.index//5 - df.index//15*3 + 1])
In [262]: dfi
Out[262]:
Time Measurement
1 1 0 1
1 1 2
1 2 3
1 3 4
1 4 5
2 0 2
2 1 3
2 2 4
2 3 5
2 4 6
3 0 3
3 1 4
3 2 5
3 3 6
3 4 7
2 1 0 1
1 1 2
1 2 3
1 3 4
1 4 5
2 0 2
2 1 3
2 2 4
2 3 5
2 4 6
3 0 3
3 1 4
3 2 5
3 3 6
3 4 7
selecting subexperiments
In [263]: dfi.loc[1,1]
Out[263]:
Time Measurement
1 1 0 1
1 1 2
1 2 3
1 3 4
1 4 5
In [264]: dfi.loc[2,2]
Out[264]:
Time Measurement
2 2 0 2
2 1 3
2 2 4
2 3 5
2 4 6
select second experiment with all subexperiments:
In [266]: dfi.loc[2,:]
Out[266]:
Time Measurement
1 0 1
1 1 2
1 2 3
1 3 4
1 4 5
2 0 2
2 1 3
2 2 4
2 3 5
2 4 6
3 0 3
3 1 4
3 2 5
3 3 6
3 4 7
alternatively you can create your own slicing function:
def my_slice(rep=1, subexp=1):
rep -= 1
subexp -= 1
return df.ix[rep*15 + subexp*5 : rep*15 + subexp*5 + 4, :]
demo:
In [174]: my_slice(1,1)
Out[174]:
Time Measurement
0 0 1
1 1 2
2 2 3
3 3 4
4 4 5
In [175]: my_slice(2,1)
Out[175]:
Time Measurement
15 0 1
16 1 2
17 2 3
18 3 4
19 4 5
In [176]: my_slice(2,2)
Out[176]:
Time Measurement
20 0 2
21 1 3
22 2 4
23 3 5
24 4 6
PS bit more convenient way to concatenate your DFs:
df = pandas.concat([d1,d2,d3,d4,d5,d6], ignore_index=True)
so you don't need the following .reset_index() and drop()