Panda stacking columns to rows with respect to some other columns - python

Sorry my question is not really clear from the title but what I'm trying to do is exactly this
with pandas
from this:
col1 col2 col3 col 4
A D G X
B E H Y
C F I Z
to this
col col 4
A X
B Y
C Z
D X
E Y
F Z
G X
H Y
I Z

You can use df.melt() as follows:
df.melt(id_vars='col4', value_name='col').drop('variable', axis=1)
Output:
col4 col
0 X A
1 Y B
2 Z C
3 X D
4 Y E
5 Z F
6 X G
7 Y H
8 Z I

Try this:
df.melt(id_vars='col4', value_vars=['col1', 'col2', 'col3']).drop(columns='variable')
Output:
col value
0 X A
1 Y B
2 Z C
3 X D
4 Y E
5 Z F
6 X G
7 Y H
8 Z I

out_df = pd.DataFrame()
out_df['col'] = df['col1'].tolist() + df['col2'].tolist() + df['col3'].tolist()
out_df['col 4'] = df['col 4'].tolist() * 3

Related

Pandas replace Na using merge or join

I want to replace Na in column A with based on shared values of column B so column rows with x in column B have 1 in column A and rows with y in column B have 2 in column A
A B C D E
1 x d e q
Na x v s f
Na x v e j
2 y w e v
Na y b d g
'''
Use groupby.transform('first'), eventually combined with convert_dtypes:
df['A'] = df.groupby('B')['A'].transform('first').convert_dtypes()
output:
A B C D E
0 1 x d e q
1 1 x v s f
2 1 x v e j
3 2 y w e v
4 2 y b d g

Aggregate over difference of levels of factor in Pandas DataFrame?

Given df1:
A B C
0 a 7 x
1 b 3 x
2 a 5 y
3 b 4 y
4 a 5 z
5 b 3 z
How to get df2 where for each value in C of df1, a new col D has the difference bettwen the df1 values in col B where col A==a and where col A==b:
C D
0 x 4
1 y 1
2 z 2
I'd use a pivot table:
df = df1.pivot_table(columns = ['A'],values = 'B', index = 'C')
df2 = pd.DataFrame({'D': df['a'] - df['b']})
The risk in the answer given by #YOBEN_S is that it will fail if b appears before a for a given value of C

Pandas explode index

I have a df like below
a = pd.DataFrame([{'col1': ['a,b,c'], 'col2': 'x'},{'col1': ['d,b'], 'col2': 'y'}])
When I do an explode using df.explode(‘col1’), I get below results
col1 col2
a x
b x
c x
d y
b y
However, I wanted something like below,
col1 col2 col1_index
a x 1
b x 2
c x 3
d y 1
b y 2
Can someone help me?
You could do the following:
result = a.explode('col1').reset_index().rename(columns={'index' : 'col1_index'})
result['col1_index'] = result.groupby('col1_index').cumcount()
print(result)
Output
col1_index col1 col2
0 0 a x
1 1 b x
2 2 c x
3 0 d y
4 1 b y
After you explode you can simply do:
a['col1_index'] = a.groupby('col2').cumcount()+1
col1 col2 col1_index
0 a x 1
1 b x 2
2 c x 3
3 d y 1
4 b y 2

Pandas: How do I repeat dataframe for each value in a series?

I have a dataframe (df) as such:
A B
1 a
2 b
3 c
And a series: S = pd.Series(['x','y','z']) I want to repeat the dataframe df for each value in the series. The expected result is to be like this:
result:
S A B
x 1 a
y 1 a
z 1 a
x 2 b
y 2 b
z 2 b
x 3 c
y 3 c
z 3 c
How do I achieve this kind of output? I'm thinking of merge or join but mergeing is giving me a memory error. I am dealing with a rather large dataframe and series. Thanks!
Using numpy, lets say you have series and df of diffenent lengths
s= pd.Series(['X', 'Y', 'Z', 'A']) #added a character to s to make it length 4
s_n = len(s)
df_n = len(df)
pd.DataFrame(np.repeat(df.values,s_n, axis = 0), columns = df.columns, index = np.tile(s,df_n)).rename_axis('S').reset_index()
S A B
0 X 1 a
1 Y 1 a
2 Z 1 a
3 A 1 a
4 X 2 b
5 Y 2 b
6 Z 2 b
7 A 2 b
8 X 3 c
9 Y 3 c
10 Z 3 c
11 A 3 c
UPDATE:
here is a bit changed #A-Za-z's solution which might be bit more memory saving, but it's slower:
x = pd.DataFrame(index=range(len(df) * len(S)))
for col in df.columns:
x[col] = np.repeat(df[col], len(s))
x['S'] = np.tile(S, len(df))
Old incorrect answer:
In [94]: pd.concat([df.assign(S=S)] * len(s))
Out[94]:
A B S
0 1 a x
1 2 b y
2 3 c z
0 1 a x
1 2 b y
2 3 c z
0 1 a x
1 2 b y
2 3 c z
Setup
df = pd.DataFrame({'A': {0: 1, 1: 2, 2: 3}, 'B': {0: 'a', 1: 'b', 2: 'c'}})
S = pd.Series(['x','y','z'], name='S')
Solution
#Convert the Series to a Dataframe with desired shape of the output filled with S values.
#Join df_S to df to get As and Bs
df_S = pd.DataFrame(index=np.repeat(S.index,3), columns=['S'], data= np.tile(S.values,3))
df_S.join(df)
Out[54]:
S A B
0 x 1 a
0 y 1 a
0 z 1 a
1 x 2 b
1 y 2 b
1 z 2 b
2 x 3 c
2 y 3 c
2 z 3 c

Merging two dataframes, with different lengths, and repeating values

I have two dataframes with the same col 'A' that I want to merge on. However, in df2 col A is replicated a random number of times. This replication is important to my problem and I cannot drop it. I want the final dataframe to look like df3. Where Col A merges Col B values to each replication.
df1 df2
Col A Col B Col A Col B
1 v 1 a
2 w 2 b
3 x 2 c
4 y 3 d
3 e
4 f
df3
Col A Col B Col C
1 a v
2 b w
2 c w
3 d x
3 e x
4 f y
Use merge:
df2.merge(df1, on='Col A')
Out:
Col A Col B_x Col B_y
0 1 a v
1 2 b w
2 2 c w
3 3 d x
4 3 e x
5 4 f y
And if necessary, rename afterwards:
df = df2.merge(df1, on='Col A')
df.columns = ['Col A', 'Col B', 'Col C']
for more info, see the Pandas Documentation on merging and joining.
I believe you need map by Series created by set_index:
print (df1.set_index('Col A')['Col B'])
Col A
1 v
2 w
3 x
4 y
Name: Col B, dtype: object
df2['Col C'] = df2['Col A'].map(df1.set_index('Col A')['Col B'])
print (df2)
Col A Col B Col C
0 1 a v
1 2 b w
2 2 c w
3 3 d x
4 3 e x
5 4 f y

Categories