convert index into columns - pandas

convert index into columns - pandas - python

df=pd.DataFrame({'c1':[12,45,21,49],'c2':[67,86,28,55]})
I'd like to convert the index into columns
c1 c2
0 1 2 3 0 1 2 3
12 45 21 49 67 86 28 55
I tried combining stack and unstack but so far without success

Use unstack + to_frame + T:
df=pd.DataFrame({'c1':[12,45,21,49],'c2':[67,86,28,55]})
print (df.unstack().to_frame().T)
c1 c2
0 1 2 3 0 1 2 3
0 12 45 21 49 67 86 28 55
Or DataFrame + numpy.ravel + numpy.reshape with MultiIndex.from_product:
mux = pd.MultiIndex.from_product([df.columns, df.index])
print (pd.DataFrame(df.values.ravel().reshape(1, -1), columns=mux))
c1 c2 c3
0 1 2 3 0 1 2 3 0 1 2 3
0 12 67 67 45 86 86 21 28 28 49 55 55

Related

Assign a value of 1 when another variable was equal 1 at the last time

I have this data frame:
ID Date X1 X2 Y
A 16-07-19 58 50 0
A 17-07-19 61 83 1
A 18-07-19 97 38 0
A 19-07-19 29 77 0
A 20-07-19 66 71 1
A 21-07-19 28 74 0
B 19-07-19 54 65 1
B 20-07-19 55 32 1
B 21-07-19 50 30 0
B 22-07-19 51 38 0
B 23-07-19 81 61 0
C 24-07-19 55 29 0
C 25-07-19 97 69 1
C 26-07-19 92 44 1
C 27-07-19 55 97 0
C 28-07-19 13 48 1
D 29-07-19 77 27 1
D 30-07-19 68 50 1
D 31-07-19 71 32 1
D 01-08-19 89 57 1
D 02-08-19 46 70 0
D 03-08-19 14 68 1
D 04-08-19 12 87 1
D 05-08-19 56 13 0
E 06-08-19 47 35 1
I want to create a variable that equals 1 when Y was equal 1 at the last time (for each ID), and 0 otherwise.
Also, to exclude all the rows that come after the last time Y was equal 1.
Expected result:
ID Date X1 X2 Y Last
A 16-07-19 58 50 0 0
A 17-07-19 61 83 1 0
A 18-07-19 97 38 0 0
A 19-07-19 29 77 0 0
A 20-07-19 66 71 1 1
B 19-07-19 54 65 1 0
B 20-07-19 55 32 1 1
C 24-07-19 55 29 0 0
C 25-07-19 97 69 1 0
C 26-07-19 92 44 1 0
C 27-07-19 55 97 0 0
C 28-07-19 13 48 1 1
D 29-07-19 77 27 1 0
D 30-07-19 68 50 1 0
D 31-07-19 71 32 1 0
D 01-08-19 89 57 1 0
D 02-08-19 46 70 0 0
D 03-08-19 14 68 1 0
D 04-08-19 12 87 1 1
E 06-08-19 47 35 1 1

First remove all rows after last 1 in Y with compare Y with swap order and GroupBy.cumsum, then get all rows not equal by 0 and filter in boolean indexing, last use
numpy.where for new column:
df = df[df['Y'].eq(1).iloc[::-1].groupby(df['ID']).cumsum().ne(0).sort_index()]
df['Last'] = np.where(df['ID'].duplicated(keep='last'), 0, 1)
print (df)
ID Date X1 X2 Y Last
0 A 16-07-19 58 50 0 0
1 A 17-07-19 61 83 1 0
2 A 18-07-19 97 38 0 0
3 A 19-07-19 29 77 0 0
4 A 20-07-19 66 71 1 1
6 B 19-07-19 54 65 1 0
7 B 20-07-19 55 32 1 1
11 C 24-07-19 55 29 0 0
12 C 25-07-19 97 69 1 0
13 C 26-07-19 92 44 1 0
14 C 27-07-19 55 97 0 0
15 C 28-07-19 13 48 1 1
16 D 29-07-19 77 27 1 0
17 D 30-07-19 68 50 1 0
18 D 31-07-19 71 32 1 0
19 D 01-08-19 89 57 1 0
20 D 02-08-19 46 70 0 0
21 D 03-08-19 14 68 1 0
22 D 04-08-19 12 87 1 1
24 E 06-08-19 47 35 1 1
EDIT:
m = df['Y'].eq(1).iloc[::-1].groupby(df['ID']).cumsum().ne(0).sort_index()
df['Last'] = np.where(m.ne(m.groupby(df['ID']).shift(-1)) & m,1,0)
print (df)
ID Date X1 X2 Y Last
0 A 16-07-19 58 50 0 0
1 A 17-07-19 61 83 1 0
2 A 18-07-19 97 38 0 0
3 A 19-07-19 29 77 0 0
4 A 20-07-19 66 71 1 1
5 A 21-07-19 28 74 0 0
6 B 19-07-19 54 65 1 0
7 B 20-07-19 55 32 1 1
8 B 21-07-19 50 30 0 0
9 B 22-07-19 51 38 0 0
10 B 23-07-19 81 61 0 0
11 C 24-07-19 55 29 0 0
12 C 25-07-19 97 69 1 0
13 C 26-07-19 92 44 1 0
14 C 27-07-19 55 97 0 0
15 C 28-07-19 13 48 1 1
16 D 29-07-19 77 27 1 0
17 D 30-07-19 68 50 1 0
18 D 31-07-19 71 32 1 0
19 D 01-08-19 89 57 1 0
20 D 02-08-19 46 70 0 0
21 D 03-08-19 14 68 1 0
22 D 04-08-19 12 87 1 1
23 D 05-08-19 56 13 0 0
24 E 06-08-19 47 35 1 1

rate based on the few condition

I want add rate based on the conditions in few columns
if A > 30 +1 and B > 50 +1 and C > 80 +1, D doesn't matter,
for example i have a matrix (dataframe):
A B C D
0 21 32 84 43 # 0 + 0 + 1
1 79 29 42 63 # 1 + 0 + 0
2 31 38 6 52 # 1 + 0 + 0
3 92 54 79 75 # 1 + 1 + 0
4 9 14 87 85 # 0 + 0 + 1
what i try:
In [1]: import numpy as np
In [2]: import pandas as pd
In [36]: df = pd.DataFrame(
np.random.randint(0,100,size=(5, 4)),
columns=list('ABCD')
)
In [36]: df
Out[36]:
A B C D
0 21 32 84 43
1 79 29 42 63
2 31 38 6 52
3 92 54 79 75
4 9 14 87 8
create series (df['A'] > 30)
concat it to the frame
and sum rows
In [37]: df['R'] = pd.concat(
[(df['A'] > 30), (df['B'] > 50), (df['C'] > 80)], axis=1
).sum(axis=1)
In [38]: df
Out[38]:
A B C D R
0 21 32 84 43 1
1 79 29 42 63 1
2 31 38 6 52 1
3 92 54 79 75 2
4 9 14 87 85 1
And result as i expected, but maybe there are more simple way?

You can just do this:
df['R'] = (df.iloc[:,:3]>[30, 50, 80]).sum(axis=1)
the same solution using column names
df['R'] = (df[['A','B','C']]>[30, 50, 80]).sum(axis=1)

How about
df["R"] = (
(df["A"] > 30).astype(int) +
(df["B"] > 50).astype(int) +
(df["C"] > 80).astype(int)
)

You can also try this. Not sure if it is any better.
>>> df
A B C D
0 8 47 95 52
1 90 84 39 80
2 15 52 37 79
3 99 24 76 5
4 93 4 97 0
>>> df.apply(lambda x: int(x[0] > 30) + int(x[1] > 50) + int(x[2] > 80) , axis=1)
0 1
1 2
2 1
3 1
4 2
dtype: int64
>>> df.agg(lambda x: int(x[0] > 30) + int(x[1] > 50) + int(x[2] > 80) , axis=1)
0 1
1 2
2 1
3 1
4 2
dtype: int64

Merge dataframe cols and a row to specific index base on another dataframe column value

Sorry If the headline isn't clear enough, i'll explain myself better with example:
dataframe1 = pd.DataFrame(columns=['UniqueNum', 'B' ,'A'])
dataframe1['UniqueNum'] = ['1a','2b', '3c']
dataframe1['A'] = ['2','6', '7']
dataframe1['B'] = ['3','88', '23']
print dataframe1
dataframe2 = pd.DataFrame(columns=['TestId', 'C' ,'D'])
dataframe2['TestId'] = ['1a','2b', '3c', '1a', '3c', '2b']
dataframe2['C'] = ['22','46', '47','22','46', '47']
dataframe2['D'] = ['13','88', '233','22','46', '47']
print dataframe2
print are:
>>>
UniqueNum B A
0 1a 3 2
1 2b 88 6
2 3c 23 7
TestId C D
0 1a 22 13
1 2b 46 88
2 3c 47 233
3 1a 22 22
4 3c 46 46
5 2b 47 47
>>>
I want to merge so the output dataframe will look like that:
TestId C D B A
0 1a 22 13 3 2
1 2b 46 88 88 6
2 3c 47 233 23 7
3 1a 22 22 3 2
4 3c 46 46 23 7
5 2b 47 47 88 6
Meaning to add to dataframe2 the columns with the values correspond to the match between UniqueNum in dataframe1 to TestId in dataframe2.
Thanks

You can use DataFrame.merge with left join and rename columns:
d = {'UniqueNum':'TestId'}
df = dataframe2.merge(dataframe1.rename(columns=d), how='left', on='TestId')
Or created index instead rename and change parameters for left_on and right_index:
df = dataframe2.merge(dataframe1.set_index('UniqueNum'),
how='left',
left_on='TestId',
right_index=True)
Or specified both columns and last remove UniqueNum column:
df = dataframe2.merge(dataframe1,
how='left',
left_on='TestId',
right_on='UniqueNum').drop('UniqueNum', axis=1)
print (df)
TestId C D B A
0 1a 22 13 3 2
1 2b 46 88 88 6
2 3c 47 233 23 7
3 1a 22 22 3 2
4 3c 46 46 23 7
5 2b 47 47 88 6

Overwrite some rows in pandas dataframe with ones from another dataframe based on index

I have a pandas dataframe, df1.
I want to overwrite its values with values in df2, where the index and column name match.
I've found a few answers on this site, but nothing that quite does what I want.
df1
A B C
0 33 44 54
1 11 32 54
2 43 55 12
3 43 23 34
df2
A
0 5555
output
A B C
0 5555 44 54
1 11 32 54
2 43 55 12
3 43 23 34

You can use combine_first with convert to integer if necessary:
df = df2.combine_first(df1).astype(int)
print (df)
A B C
0 5555 44 54
1 11 32 54
2 43 55 12
3 43 23 34
If need check intersection index and columns between both DataFrames:
df2= pd.DataFrame({'A':[5555, 2222],
'D':[3333, 4444]},index=[0, 10])
idx = df2.index.intersection(df1.index)
cols = df2.columns.intersection(df1.columns)
df = df2.loc[idx, cols].combine_first(df1).astype(int)
print (df)
A B C
0 5555 44 54
1 11 32 54
2 43 55 12
3 43 23 34

Retrieving future value in Python using offset variable from another column

I'm trying to figure out how to retrieve values from future dates using an offset variable in a separate row in Python. For instance, I have the dataframe df below, and I'd like to find a way to produce Column C:
Orig A Orig B Desired Column C
54 1 76
76 4 46
14 3 46
35 1 -3
-3 0 -3
46 0 46
64 0 64
93 0 93
72 0 72
Any help is much appreciated, thank you!

You can use NumPy for a vectorised solution:
import numpy as np
idx = np.arange(df.shape[0]) + df['OrigB'].values
df['C'] = df['OrigA'].iloc[idx].values
print(df)
OrigA OrigB C
0 54 1 76
1 76 4 46
2 14 3 46
3 35 1 -3
4 -3 0 -3
5 46 0 46
6 64 0 64
7 93 0 93
8 72 0 72

import pandas as pd
dict = {"Orig A": [54,76,14,35,-3,46,64,93,72],
"Orig B": [1,4,3,1,0,0,0,0,0],
"Desired Column C": [76,46,46,-3,-3,46,64,93,72]}
df = pd.DataFrame(dict)
df["desired_test"] = [df["Orig A"].values[i+j] for i,j in enumerate(df["Orig B"].values)]
df
Orig A Orig B Desired Column C desired_test
0 54 1 76 76
1 76 4 46 46
2 14 3 46 46
3 35 1 -3 -3
4 -3 0 -3 -3
5 46 0 46 46
6 64 0 64 64
7 93 0 93 93
8 72 0 72 72

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

convert index into columns - pandas - python

df=pd.DataFrame({'c1':[12,45,21,49],'c2':[67,86,28,55]}) I'd like to convert the index into columns c1 c2 0 1 2 3 0 1 2 3 12 45 21 49 67 86 28 55 I tried combining stack and unstack but so far without success

Related

Assign a value of 1 when another variable was equal 1 at the last time

rate based on the few condition

Merge dataframe cols and a row to specific index base on another dataframe column value

Overwrite some rows in pandas dataframe with ones from another dataframe based on index

Retrieving future value in Python using offset variable from another column

Categories

Resources