Merge columns with have \n

Merge columns with have \n - python

ex)
C1 C2 C3 C4 C5 C6
0 A B nan C A nan
1 B C D nan B nan
2 D E F nan C nan
3 nan nan A nan nan B
I'm merging columns, but I want to give '\n\n' in the merging process.
so output what I want
C
0 A
B
C
A
1 B
C
D
B
2 D
E
F
C
3. A
B
I want 'nan' to drop.
I tried
df['merge'] = df['C1'].map(str) + '\n\n' + tt['C2'].map(str) + '\n\n' + tt['C3'].map(str) + '\n\n' + df['C4'].map(str)
However, this includes all nan values.
thank you for reading.

Use DataFrame.stack for Series, misisng values are removed, so you can aggregate join:
df['merge'] = df.stack().groupby(level=0).agg('\n\n'.join)
#for filter only C columns
df['merge'] = df.filter(like='C').stack().groupby(level=0).agg('\n\n'.join)
Or remove missing values by join per rows by Series.dropna:
df['merge'] = df.apply(lambda x: '\n\n'.join(x.dropna()), axis=1)
#for filter only C columns
df['merge'] = df.filter(like='C').apply(lambda x: '\n\n'.join(x.dropna()), axis=1)
print (df)
C1 C2 C3 C4 C5 C6 merge
0 A B NaN C A NaN A\n\nB\n\nC\n\nA
1 B C D NaN B NaN B\n\nC\n\nD\n\nB
2 D E F NaN C NaN D\n\nE\n\nF\n\nC
3 NaN NaN A NaN NaN B A\n\nB

Related

Pandas groupby two columns and expand the third

I have a Pandas dataframe with the following structure:
A B C
a b 1
a b 2
a b 3
c d 7
c d 8
c d 5
c d 6
c d 3
e b 4
e b 3
e b 2
e b 1
And I will like to transform it into this:
A B C1 C2 C3 C4 C5
a b 1 2 3 NAN NAN
c d 7 8 5 6 3
e b 4 3 2 1 NAN
In other words, something like groupby A and B and expand C into different columns.
Knowing that the length of each group is different.
C is already ordered
Shorter groups can have NAN or NULL values (empty), it does not matter.

Use GroupBy.cumcount and pandas.Series.add with 1, to start naming the new columns from 1 onwards, then pass this to DataFrame.pivot, and add DataFrame.add_prefix to rename the columns (C1, C2, C3, etc...). Finally use DataFrame.rename_axis to remove the indexes original name ('g') and transform the MultiIndex into columns by using DataFrame.reset_indexcolumns A,B:
df['g'] = df.groupby(['A','B']).cumcount().add(1)
df = df.pivot(['A','B'], 'g', 'C').add_prefix('C').rename_axis(columns=None).reset_index()
print (df)
A B C1 C2 C3 C4 C5
0 a b 1.0 2.0 3.0 NaN NaN
1 c d 7.0 8.0 5.0 6.0 3.0
2 e b 4.0 3.0 2.0 1.0 NaN
Because NaN is by default of type float, if you need the columns dtype to be integers add DataFrame.astype with Int64:
df['g'] = df.groupby(['A','B']).cumcount().add(1)
df = (df.pivot(['A','B'], 'g', 'C')
.add_prefix('C')
.astype('Int64')
.rename_axis(columns=None)
.reset_index())
print (df)
A B C1 C2 C3 C4 C5
0 a b 1 2 3 <NA> <NA>
1 c d 7 8 5 6 3
2 e b 4 3 2 1 <NA>
EDIT: If there's a maximum N new columns to be added, it means that A,B are duplicated. Therefore, it will beneeded to add helper groups g1, g2 with integer and modulo division, adding a new level in index:
N = 4
g = df.groupby(['A','B']).cumcount()
df['g1'], df['g2'] = g // N, (g % N) + 1
df = (df.pivot(['A','B','g1'], 'g2', 'C')
.add_prefix('C')
.droplevel(-1)
.rename_axis(columns=None)
.reset_index())
print (df)
A B C1 C2 C3 C4
0 a b 1.0 2.0 3.0 NaN
1 c d 7.0 8.0 5.0 6.0
2 c d 3.0 NaN NaN NaN
3 e b 4.0 3.0 2.0 1.0

df1.astype({'C':str}).groupby([*'AB'])\
.agg(','.join).C.str.split(',',expand=True)\
.add_prefix('C').reset_index()
A B C0 C1 C2 C3 C4
0 a b 1 2 3 None None
1 c d 7 8 5 6 3
2 e b 4 3 2 1 None

The accepted solution but avoiding the deprecation warning:
N = 3
g = df_grouped.groupby(['A','B']).cumcount()
df_grouped['g1'], df_grouped['g2'] = g // N, (g % N) + 1
df_grouped = (df_grouped.pivot(index=['A','B','g1'], columns='g2', values='C')
.add_prefix('C_')
.astype('Int64')
.droplevel(-1)
.rename_axis(columns=None)
.reset_index())

Merging columns using pandas

I am trying to merge multiple-choice question columns using pandas so I can then manipulate them. An example of what my questions look like is:
C1 C2 C3
0 A A
1 B B
2 C C
3 D D
The data is currently presented as C1 and C2 but I need it to be combined into 1 column as represented in C3.

One option, assuming NaN in empty cells, is to bfill the first column and copy it:
df['C3'] = df[['C1', 'C2']].bfill(axis=1)['C1']
This way is extensible to any number of initial columns.
Output:
C1 C2 C3
0 A NaN A
1 NaN B B
2 NaN C C
3 D NaN D

You may try with fillna
df['C3'] = df['C1'].fillna(df['C2'])
df
Out[483]:
C1 C2 C3
0 A NaN A
1 NaN B B
2 NaN C C
3 D NaN D

You can also use combine_first:
df['C3'] = df['C1'].combine_first(df['C2'])
print(df)
# Output
C1 C2 C3
0 A NaN A
1 NaN B B
2 NaN C C
3 D NaN D
If your cells contain empty strings and not null values, replace them temporary by NaN:
df['C3'] = df['C1'].replace('', np.nan).combine_first(df['C2'])
print(df)
# Output
C1 C2 C3
0 A A
1 B B
2 C C
3 D D

how to add new input row on dataframe?

I have this data-frame
df = pd.DataFrame({'Type':['A','A','B','B'], 'Variants':['A3','A6','Bxy','Byz']})
it shows like this
Type Variants
0 A A3
1 A A6
2 B Bxy
3 B Byz
I should make a function that adds a new row below each on every new Type key-values.
it should go like this if I'm adding n=2
Type Variants
0 A A3
1 A A6
2 A Nan
3 A Nan
4 B Bxy
5 B Byz
6 B Nan
7 B Nan
can anyone help me with this , I will appreciate it a lot, thx in advance

Create a dataframe to merge with your original one:
def add_rows(df, n):
df1 = pd.DataFrame(np.repeat(df['Type'].unique(), n), columns=['Type'])
return pd.concat([df, df1]).sort_values('Type').reset_index(drop=True)
out = add_rows(df, 2)
print(out)
# Output
Type Variants
0 A A3
1 A A6
2 A NaN
3 A NaN
4 B Bxy
5 B Byz
6 B NaN
7 B NaN

How to compare values of certain columns of one dataframe with the values of same set of columns in another dataframe?

I have three dataframes df1, df2, and df3, which are defined as follows
df1 =
A B C
0 1 a a1
1 2 b b2
2 3 c c3
3 4 d d4
4 5 e e5
5 6 f f6
df2 =
A B C
0 1 a X
1 2 b Y
2 3 c Z
df3 =
A B C
3 4 d P
4 5 e Q
5 6 f R
I have defined a Primary Key list PK = ["A","B"].
Now, I take a fourth dataframe df4 as df4 = df1.sample(n=2), which gives something like
df4 =
A B C
4 5 e e5
1 2 b b2
Now, I want to select the rows from df2 and df1 which matches the values of the primary keys of df4.
For eg, in this case,
I need to get row with
index = 4 from df3,
index = 1 from df2.
If possible I need to get a dataframe as follows:
df =
A B C A(df2) B(df2) C(df2) A(df3) B(df3) C(df3)
4 5 e e5 5 e Q
1 2 b b2 2 b Y
Any ideas on how to work this out will be very helpful.

Use two consecutive DataFrame.merge operations along with using DataFrame.add_suffix on the right dataframe to left merge the dataframes df4, df2, df3, finally use Series.fillna to replace the missing values with empty string:
df = (
df4.merge(df2.add_suffix('(df2)'), left_on=['A', 'B'], right_on=['A(df2)', 'B(df2)'], how='left')
.merge(df3.add_suffix('(df3)'), left_on=['A', 'B'], right_on=['A(df3)', 'B(df3)'], how='left')
.fillna('')
)
Result:
# print(df)
A B C A(df2) B(df2) C(df2) A(df3) B(df3) C(df3)
0 5 e e5 5 e Q
1 2 b b2 2 b Y

Here's how I would do it on the entire data set. If you want to sample first, just update the merge statements at the end by replacing df1 with df4 or just take a sample of t
PK = ["A","B"]
df2 = pd.concat([df2,df2], axis=1)
df2.columns=['A','B','C','A(df2)', 'B(df2)', 'C(df2)']
df2.drop(columns=['C'], inplace=True)
df3 = pd.concat([df3,df3], axis=1)
df3.columns=['A','B','C','A(df3)', 'B(df3)', 'C(df3)']
df3.drop(columns=['C'], inplace=True)
t = df1.merge(df2, on=PK, how='left')
t = t.merge(df3, on=PK, how='left')
Output
A B C A(df2) B(df2) C(df2) A(df3) B(df3) C(df3)
0 1 a a1 1.0 a X NaN NaN NaN
1 2 b b2 2.0 b Y NaN NaN NaN
2 3 c c3 3.0 c Z NaN NaN NaN
3 4 d d4 NaN NaN NaN 4.0 d P
4 5 e e5 NaN NaN NaN 5.0 e Q
5 6 f f6 NaN NaN NaN 6.0 f R

How do I combine N non-numerical columns while removing null values?

Building on this question Combining columns and removing NaNs Pandas,
I have a dataframe that looks like this:
col x y z
a1 a NaN NaN
a2 NaN b NaN
a3 NaN c NaN
a4 NaN NaN d
a5 NaN e NaN
a6 f NaN NaN
a7 g NaN NaN
a8 NaN NaN NaN
The cell values are strings and the NaNs are arbitrary null values.
I would like to combine the columns to add a new combined column thus:
col w
a1 a
a2 b
a3 c
a4 d
a5 e
a6 f
a7 g
a8 NaN
The elegant solution proposed in the question above uses
df['w']=df[['x','y','z']].sum(axis=1)
but sum does not work for non-numerical values.
How, in this case for strings, do I combine the columns into a single column?
You can assume:
Each row only has one of x, y, z that is non-null.
The individual columns must be referenced by name (since they are a subset of all of the available columns in the dataframe).
In general there are N and not just 3 columns in the subset.
Hopefully no use for iloc/for loops :\
Update: (apologies to those who have already given answers :\ )
I have added a final row where every column contains NaN, and I would like the combined row to reflect that. Thanks + sorry!
Thanks as ever for all help

Here is yet another solution:
df['res'] = df.fillna('').sum(1).replace('', np.nan)
The result is
x y z res
col
a1 a NaN NaN a
a2 NaN b NaN b
a3 NaN c NaN c
a4 NaN NaN d d
a5 NaN e NaN e
a6 f NaN NaN f
a7 g NaN NaN g
a8 NaN NaN NaN NaN

I think you need:
s = df[['x','y','z']]
df['w'] = s.values[s.notnull()]
df[['col','w']]
Or After edit of question:
df['w'] = pd.DataFrame(df[['x','y','z']].apply(lambda x: x.values[x.notnull()],axis=1).tolist())
df[['col','w']].fillna(np.nan)
Which gives
col w
0 a1 a
1 a2 b
2 a3 c
3 a4 d
4 a5 e
5 a6 f
6 a7 g
7 a8 NaN

Instead of generic sum, you have to apply a custom function.
This one, for example works on your example:
import numpy as np
f = lambda x: x[x.notnull()][0] if any(x.notnull()) else np.nan
df['w'] = df[list('xyz')].apply(f, axis=1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Merge columns with have \n - python

Related

Pandas groupby two columns and expand the third

Merging columns using pandas

how to add new input row on dataframe?

How to compare values of certain columns of one dataframe with the values of same set of columns in another dataframe?

How do I combine N non-numerical columns while removing null values?

Categories

Resources