df=pd.DataFrame({'c1':[12,45,21,49],'c2':[67,86,28,55]})
I'd like to convert the index into columns
c1 c2
0 1 2 3 0 1 2 3
12 45 21 49 67 86 28 55
I tried combining stack and unstack but so far without success
Use unstack + to_frame + T:
df=pd.DataFrame({'c1':[12,45,21,49],'c2':[67,86,28,55]})
print (df.unstack().to_frame().T)
c1 c2
0 1 2 3 0 1 2 3
0 12 45 21 49 67 86 28 55
Or DataFrame + numpy.ravel + numpy.reshape with MultiIndex.from_product:
mux = pd.MultiIndex.from_product([df.columns, df.index])
print (pd.DataFrame(df.values.ravel().reshape(1, -1), columns=mux))
c1 c2 c3
0 1 2 3 0 1 2 3 0 1 2 3
0 12 67 67 45 86 86 21 28 28 49 55 55
Related
I have this data frame:
ID Date X1 X2 Y
A 16-07-19 58 50 0
A 17-07-19 61 83 1
A 18-07-19 97 38 0
A 19-07-19 29 77 0
A 20-07-19 66 71 1
A 21-07-19 28 74 0
B 19-07-19 54 65 1
B 20-07-19 55 32 1
B 21-07-19 50 30 0
B 22-07-19 51 38 0
B 23-07-19 81 61 0
C 24-07-19 55 29 0
C 25-07-19 97 69 1
C 26-07-19 92 44 1
C 27-07-19 55 97 0
C 28-07-19 13 48 1
D 29-07-19 77 27 1
D 30-07-19 68 50 1
D 31-07-19 71 32 1
D 01-08-19 89 57 1
D 02-08-19 46 70 0
D 03-08-19 14 68 1
D 04-08-19 12 87 1
D 05-08-19 56 13 0
E 06-08-19 47 35 1
I want to create a variable that equals 1 when Y was equal 1 at the last time (for each ID), and 0 otherwise.
Also, to exclude all the rows that come after the last time Y was equal 1.
Expected result:
ID Date X1 X2 Y Last
A 16-07-19 58 50 0 0
A 17-07-19 61 83 1 0
A 18-07-19 97 38 0 0
A 19-07-19 29 77 0 0
A 20-07-19 66 71 1 1
B 19-07-19 54 65 1 0
B 20-07-19 55 32 1 1
C 24-07-19 55 29 0 0
C 25-07-19 97 69 1 0
C 26-07-19 92 44 1 0
C 27-07-19 55 97 0 0
C 28-07-19 13 48 1 1
D 29-07-19 77 27 1 0
D 30-07-19 68 50 1 0
D 31-07-19 71 32 1 0
D 01-08-19 89 57 1 0
D 02-08-19 46 70 0 0
D 03-08-19 14 68 1 0
D 04-08-19 12 87 1 1
E 06-08-19 47 35 1 1
First remove all rows after last 1 in Y with compare Y with swap order and GroupBy.cumsum, then get all rows not equal by 0 and filter in boolean indexing, last use
numpy.where for new column:
df = df[df['Y'].eq(1).iloc[::-1].groupby(df['ID']).cumsum().ne(0).sort_index()]
df['Last'] = np.where(df['ID'].duplicated(keep='last'), 0, 1)
print (df)
ID Date X1 X2 Y Last
0 A 16-07-19 58 50 0 0
1 A 17-07-19 61 83 1 0
2 A 18-07-19 97 38 0 0
3 A 19-07-19 29 77 0 0
4 A 20-07-19 66 71 1 1
6 B 19-07-19 54 65 1 0
7 B 20-07-19 55 32 1 1
11 C 24-07-19 55 29 0 0
12 C 25-07-19 97 69 1 0
13 C 26-07-19 92 44 1 0
14 C 27-07-19 55 97 0 0
15 C 28-07-19 13 48 1 1
16 D 29-07-19 77 27 1 0
17 D 30-07-19 68 50 1 0
18 D 31-07-19 71 32 1 0
19 D 01-08-19 89 57 1 0
20 D 02-08-19 46 70 0 0
21 D 03-08-19 14 68 1 0
22 D 04-08-19 12 87 1 1
24 E 06-08-19 47 35 1 1
EDIT:
m = df['Y'].eq(1).iloc[::-1].groupby(df['ID']).cumsum().ne(0).sort_index()
df['Last'] = np.where(m.ne(m.groupby(df['ID']).shift(-1)) & m,1,0)
print (df)
ID Date X1 X2 Y Last
0 A 16-07-19 58 50 0 0
1 A 17-07-19 61 83 1 0
2 A 18-07-19 97 38 0 0
3 A 19-07-19 29 77 0 0
4 A 20-07-19 66 71 1 1
5 A 21-07-19 28 74 0 0
6 B 19-07-19 54 65 1 0
7 B 20-07-19 55 32 1 1
8 B 21-07-19 50 30 0 0
9 B 22-07-19 51 38 0 0
10 B 23-07-19 81 61 0 0
11 C 24-07-19 55 29 0 0
12 C 25-07-19 97 69 1 0
13 C 26-07-19 92 44 1 0
14 C 27-07-19 55 97 0 0
15 C 28-07-19 13 48 1 1
16 D 29-07-19 77 27 1 0
17 D 30-07-19 68 50 1 0
18 D 31-07-19 71 32 1 0
19 D 01-08-19 89 57 1 0
20 D 02-08-19 46 70 0 0
21 D 03-08-19 14 68 1 0
22 D 04-08-19 12 87 1 1
23 D 05-08-19 56 13 0 0
24 E 06-08-19 47 35 1 1
I want add rate based on the conditions in few columns
if A > 30 +1 and B > 50 +1 and C > 80 +1, D doesn't matter,
for example i have a matrix (dataframe):
A B C D
0 21 32 84 43 # 0 + 0 + 1
1 79 29 42 63 # 1 + 0 + 0
2 31 38 6 52 # 1 + 0 + 0
3 92 54 79 75 # 1 + 1 + 0
4 9 14 87 85 # 0 + 0 + 1
what i try:
In [1]: import numpy as np
In [2]: import pandas as pd
In [36]: df = pd.DataFrame(
np.random.randint(0,100,size=(5, 4)),
columns=list('ABCD')
)
In [36]: df
Out[36]:
A B C D
0 21 32 84 43
1 79 29 42 63
2 31 38 6 52
3 92 54 79 75
4 9 14 87 8
create series (df['A'] > 30)
concat it to the frame
and sum rows
In [37]: df['R'] = pd.concat(
[(df['A'] > 30), (df['B'] > 50), (df['C'] > 80)], axis=1
).sum(axis=1)
In [38]: df
Out[38]:
A B C D R
0 21 32 84 43 1
1 79 29 42 63 1
2 31 38 6 52 1
3 92 54 79 75 2
4 9 14 87 85 1
And result as i expected, but maybe there are more simple way?
You can just do this:
df['R'] = (df.iloc[:,:3]>[30, 50, 80]).sum(axis=1)
the same solution using column names
df['R'] = (df[['A','B','C']]>[30, 50, 80]).sum(axis=1)
How about
df["R"] = (
(df["A"] > 30).astype(int) +
(df["B"] > 50).astype(int) +
(df["C"] > 80).astype(int)
)
You can also try this. Not sure if it is any better.
>>> df
A B C D
0 8 47 95 52
1 90 84 39 80
2 15 52 37 79
3 99 24 76 5
4 93 4 97 0
>>> df.apply(lambda x: int(x[0] > 30) + int(x[1] > 50) + int(x[2] > 80) , axis=1)
0 1
1 2
2 1
3 1
4 2
dtype: int64
>>> df.agg(lambda x: int(x[0] > 30) + int(x[1] > 50) + int(x[2] > 80) , axis=1)
0 1
1 2
2 1
3 1
4 2
dtype: int64
Sorry If the headline isn't clear enough, i'll explain myself better with example:
dataframe1 = pd.DataFrame(columns=['UniqueNum', 'B' ,'A'])
dataframe1['UniqueNum'] = ['1a','2b', '3c']
dataframe1['A'] = ['2','6', '7']
dataframe1['B'] = ['3','88', '23']
print dataframe1
dataframe2 = pd.DataFrame(columns=['TestId', 'C' ,'D'])
dataframe2['TestId'] = ['1a','2b', '3c', '1a', '3c', '2b']
dataframe2['C'] = ['22','46', '47','22','46', '47']
dataframe2['D'] = ['13','88', '233','22','46', '47']
print dataframe2
print are:
>>>
UniqueNum B A
0 1a 3 2
1 2b 88 6
2 3c 23 7
TestId C D
0 1a 22 13
1 2b 46 88
2 3c 47 233
3 1a 22 22
4 3c 46 46
5 2b 47 47
>>>
I want to merge so the output dataframe will look like that:
TestId C D B A
0 1a 22 13 3 2
1 2b 46 88 88 6
2 3c 47 233 23 7
3 1a 22 22 3 2
4 3c 46 46 23 7
5 2b 47 47 88 6
Meaning to add to dataframe2 the columns with the values correspond to the match between UniqueNum in dataframe1 to TestId in dataframe2.
Thanks
You can use DataFrame.merge with left join and rename columns:
d = {'UniqueNum':'TestId'}
df = dataframe2.merge(dataframe1.rename(columns=d), how='left', on='TestId')
Or created index instead rename and change parameters for left_on and right_index:
df = dataframe2.merge(dataframe1.set_index('UniqueNum'),
how='left',
left_on='TestId',
right_index=True)
Or specified both columns and last remove UniqueNum column:
df = dataframe2.merge(dataframe1,
how='left',
left_on='TestId',
right_on='UniqueNum').drop('UniqueNum', axis=1)
print (df)
TestId C D B A
0 1a 22 13 3 2
1 2b 46 88 88 6
2 3c 47 233 23 7
3 1a 22 22 3 2
4 3c 46 46 23 7
5 2b 47 47 88 6
I have a pandas dataframe, df1.
I want to overwrite its values with values in df2, where the index and column name match.
I've found a few answers on this site, but nothing that quite does what I want.
df1
A B C
0 33 44 54
1 11 32 54
2 43 55 12
3 43 23 34
df2
A
0 5555
output
A B C
0 5555 44 54
1 11 32 54
2 43 55 12
3 43 23 34
You can use combine_first with convert to integer if necessary:
df = df2.combine_first(df1).astype(int)
print (df)
A B C
0 5555 44 54
1 11 32 54
2 43 55 12
3 43 23 34
If need check intersection index and columns between both DataFrames:
df2= pd.DataFrame({'A':[5555, 2222],
'D':[3333, 4444]},index=[0, 10])
idx = df2.index.intersection(df1.index)
cols = df2.columns.intersection(df1.columns)
df = df2.loc[idx, cols].combine_first(df1).astype(int)
print (df)
A B C
0 5555 44 54
1 11 32 54
2 43 55 12
3 43 23 34
I'm trying to figure out how to retrieve values from future dates using an offset variable in a separate row in Python. For instance, I have the dataframe df below, and I'd like to find a way to produce Column C:
Orig A Orig B Desired Column C
54 1 76
76 4 46
14 3 46
35 1 -3
-3 0 -3
46 0 46
64 0 64
93 0 93
72 0 72
Any help is much appreciated, thank you!
You can use NumPy for a vectorised solution:
import numpy as np
idx = np.arange(df.shape[0]) + df['OrigB'].values
df['C'] = df['OrigA'].iloc[idx].values
print(df)
OrigA OrigB C
0 54 1 76
1 76 4 46
2 14 3 46
3 35 1 -3
4 -3 0 -3
5 46 0 46
6 64 0 64
7 93 0 93
8 72 0 72
import pandas as pd
dict = {"Orig A": [54,76,14,35,-3,46,64,93,72],
"Orig B": [1,4,3,1,0,0,0,0,0],
"Desired Column C": [76,46,46,-3,-3,46,64,93,72]}
df = pd.DataFrame(dict)
df["desired_test"] = [df["Orig A"].values[i+j] for i,j in enumerate(df["Orig B"].values)]
df
Orig A Orig B Desired Column C desired_test
0 54 1 76 76
1 76 4 46 46
2 14 3 46 46
3 35 1 -3 -3
4 -3 0 -3 -3
5 46 0 46 46
6 64 0 64 64
7 93 0 93 93
8 72 0 72 72