How to make the values of a pandas dataframe column as column

How to make the values of a pandas dataframe column as column - python

I would like to reshape my dataframe:
from Input_DF
col1 col2 col3
Course_66 0\nCourse_67 1\nCourse_68 0 a c
Course_66 1\nCourse_67 0\nCourse_68 0 a d
to Output_DF
Course_66 Course_67 Course_68 col2 col3
0 0 1 a c
0 1 0 a d
Please, note that col1 contains one long string.
Please, any help would be very appreciated.
Many Thanks in advance.
Best Regards,
Carlo

Use:
#first split by whitespaces to df
df1 = df['col1'].str.split(expand=True)
#for each column split by \n and select first value
df2 = df1.apply(lambda x: x.str.split(r'\\n').str[0])
#for columns select only first row and select second splitted value
df2.columns = df1.iloc[0].str.split(r'\\n').str[1]
print (df2)
0 Course_66 Course_67 Course_68
0 0 0 1
1 0 1 0
#join to original, remove unnecessary column
df = df2.join(df.drop('col1', axis=1))
print (df)
Course_66 Course_67 Course_68 col2 col3
0 0 0 1 a c
1 0 1 0 a d
Another solution with list comprehension:
L = [[y.split('\\n')[0] for y in x.split()] for x in df['col1']]
cols = [x.split('\\n')[1] for x in df.loc[0, 'col1'].split()]
df1 = pd.DataFrame(L, index=df.index, columns=cols)
print (df1)
Course_66 Course_67 Course_68
0 0 0 1
1 0 1 0
EDIT:
#split values by whitespaces - it split by \n too
df1 = df['course_vector'].str.split(expand=True)
#select each pair columns
df2 = df1.iloc[:, 1::2]
#for columns select each unpair value in first row
df2.columns = df1.iloc[0, 0::2]
#join to original
df = df2.join(df.drop('course_vector', axis=1))

Since your data are ordered in value, key pairs, you can split on newlines and multiple spaces with regex to get a list, and then take every other value starting at the first position for values and the second position for labels and return a Series object. By applying, you will get back a DataFrame from these multiple series, which you can then combine with the original DataFrame.
import pandas as pd
df = pd.DataFrame({'col1': ['0\nCourse_66 0\nCourse_67 1\nCourse_68',
'0\nCourse_66 1\nCourse_67 0\nCourse_68'],
'col2': ['a', 'a'], 'col3': ['c', 'd']})
def to_multiple_columns(str_list):
# take the numeric values for each series and column labels and return as a series
# by taking every other value
return pd.Series(str_list[::2], str_list[1::2])
# split on newlines and spaces
splits = df['col1'].str.split(r'\n|\s+').apply(to_multiple_columns)
output = pd.concat([splits, df.drop('col1', axis=1)], axis=1)
print(output)
Output:
Course_66 Course_67 Course_68 col2 col3
0 0 0 1 a c
1 0 1 0 a d

Related

How to compare two dataframes in Python pandas and output the difference?

I have two df with the same numbers of columns but different numbers of rows.
df1
col1 col2
0 a 1,2,3,4
1 b 1,2,3
2 c 1
df2
col1 col2
0 b 1,3
1 c 1,2
2 d 1,2,3
3 e 1,2
df1 is the existing list, df2 is the updated list. The expected result is whatever in df2 that was previously not in df1.
Expected result:
col1 col2
0 c 2
1 d 1,2,3
2 e 1,2
I've tried with
mask = df1['col2'] != df2['col2']
but it doesn't work with different rows of df.

Use DataFrame.explode by splitted values in columns col2, then use DataFrame.merge with right join and indicato parameter, filter by boolean indexing only rows with right_only and last aggregate join:
df11 = df1.assign(col2 = df1['col2'].str.split(',')).explode('col2')
df22 = df2.assign(col2 = df2['col2'].str.split(',')).explode('col2')
df = df11.merge(df22, indicator=True, how='right', on=['col1','col2'])
df = (df[df['_merge'].eq('right_only')]
.groupby('col1')['col2']
.agg(','.join)
.reset_index(name='col2'))
print (df)
col1 col2
0 c 2
1 d 1,2,3
2 e 1,2

Pandas DataFrame column string concatenation

I have a df with a column of strings like so:
col1
a
b
c
d
I also have a string variable x = 'x' and a list of strings list1 = ['ax', cx']
I want to create a new column that checks if the concatenated string of col1 + x is in list1. If yes then col2 = 1 else col2 = 0.
Here is my attempt:
df['col2'] = 1 if str(df['col1'] + x) in list1 else 0
Which doesn't work.
df['col2'] = 1 if df['col1'] + x in list1 else 0
Doesn't work either.
What would be the correct way to format this?
Thank you for any help.
col1 col2 <-- should be this
a 1
b 0
c 1
d 0

Use isin:
df['col2'] = df.col1.add('x').isin(list1).astype(int)
# col1 col2
#0 a 1
#1 b 0
#2 c 1
#3 d 0
Check Results

You can use map function as follows.
df['col2'] = df['col1'].map(lambda val: 1 if x + val in list1 else 0)

An other solution using apply :
import pandas as pd
df = pd.DataFrame({'col1': ['a','b','c', 'd']})
def func(row):
list1 = {'ax', 'cx'}
row['col2'] = 1 if row.col1 + 'x' in list1 else 0
return row
df2 = df.apply(func, axis='columns')
# OUTPUTS :
# col1 col2
#0 a 1
#1 b 0
#2 c 1
#3 d 0

How to apply a function on a series of columns, based on the values in a corresponding series of columns?

I have a df where I have several columns, that, based on the value (1-6) in these columns, I want to assign a value (0-1) to its corresponding column. I can do it on a column by column basis but would like to make it a single function. Below is some example code:
import pandas as pd
df = pd.DataFrame({'col1': [1,3,6,3,5,2], 'col2': [4,5,6,6,1,3], 'col3': [3,6,5,1,1,6],
'colA': [0,0,0,0,0,0], 'colB': [0,0,0,0,0,0], 'colC': [0,0,0,0,0,0]})
(col1 corresponds with colA, col2 with colB, col3 with colC)
This code works on a column by column basis:
df.loc[(df.col1 != 1) & (df.col1 < 6), 'colA'] = (df['colA']+ 1)
But I would like to be able to have a list of columns, so to speak, and have it correspond with another. Something like this, (but that actually works):
m = df['col1' : 'col3'] != 1 & df['col1' : 'col3'] < 6
df.loc[m, 'colA' : 'colC'] += 1
Thank You!

Idea is filter both DataFrames by DataFrame.loc, then filter columns by mask and rename columns by another df2 and last use DataFrame.add only for df.columns:
df1 = df.loc[:, 'col1' : 'col3']
df2 = df.loc[:, 'colA' : 'colC']
d = dict(zip(df1.columns,df2.columns))
df1 = ((df1 != 1) & (df1 < 6)).rename(columns=d)
df[df2.columns] = df[df2.columns].add(df1)
print (df)
col1 col2 col3 colA colB colC
0 1 4 3 0 1 1
1 3 5 6 1 1 0
2 6 6 5 0 0 1
3 3 6 1 1 0 0
4 5 1 1 1 0 0
5 2 3 6 1 1 0

Here's what I would do:
# split up dataframe
sub_df = df.iloc[:,:3]
abc = df.iloc[:,3:]
# make numpy array truth table
truth_table = (sub_df.to_numpy() > 1) & (sub_df.to_numpy() < 6)
# redefine abc based on numpy truth table
new_abc = pd.DataFrame(truth_table.astype(int), columns=['colA', 'colB', 'colC'])
# join the updated dataframe subgroups
new_df = pd.concat([sub_df, new_abc], axis=1)

Reformat dataframe using pandas by adding new rows based on dictionary value

Given below is my dataframe
df = pd.DataFrame({'Col1':['1','2'],'Col2':[{'a':['a1','a2']},{'b':['b1']}]})
Col1 Col2
0 1 {u'a': [u'a1', u'a2']}
1 2 {u'b': [u'b1']}
I need to reformat this data frame as below
Col1 NCol2 NCol3
0 1 a a1
1 1 a a2
2 2 b b1
Basically, for each key value pair in the dictionary, i am adding a row with key and value in Ncol2 and Ncol3.
Thanks for help in advance.

You can use the following solution:
df1 = df['Col2'].apply(pd.Series).apply(lambda x: x.explode())\
.stack().reset_index(level=1)
df1.columns = ['Col2', 'Col3']
df.drop('Col2', axis=1).merge(df1, left_index=True, right_index=True)\
.reset_index(drop=True)
Output:
Col1 Col2 Col3
0 1 a a1
1 1 a a2
2 2 b b1

Python - how to split list for creating new column? pandas

I have a data frame like this
col1 col2
[A, B] 1
[A, C] 2
I would like to separate col1 into two columns and the output, I would like it out in this form
col1_A col1_B col2
A B 1
A C 2
I have tried this df['col1'].str.rsplit(',',n=2, expand=True)
but it showed TypeError: list indices must be integers or slices, not str

join + pop
df = df.join(pd.DataFrame(df.pop('col1').values.tolist(),
columns=['col1_A', 'col1_B']))
print(df)
col2 col1_A col1_B
0 1 A B
1 2 A C
It's good practice to try and avoid pd.Series.apply, which often amounts a Python-level loop with an additional overhead.

You can use apply:
import pandas as pd
df = pd.DataFrame({
"col1": [['A', 'B'], ['A', 'C']],
"col2": [1, 2],
})
df['col1_A'] = df['col1'].apply(lambda x: x[0])
df['col1_B'] = df['col1'].apply(lambda x: x[1])
del df['col1']
df = df[df.columns[[1,2,0]]]
print(df)
col1_A col1_B col2
0 A B 1
1 A C 2

You can do this:
>> df_expanded = df['col1'].apply(pd.Series).rename(
columns = lambda x : 'col1_' + str(x))
>> df_expanded
col1_0 col1_1
0 A B
1 A C
Adding these columns to the original dataframe:
>> pd.concat([df_expanded, df], axis=1).drop('col1', axis=1)
col1_0 col1_1 col2
0 A B 1
1 A C 2
If columns need to be named as the first element in the rows:
df_expanded.columns = ['col1_' + value
for value in df_expanded.iloc[0,:].values.tolist()]
col1_A col1_B
0 A B
1 A C

Zip values and column name and use insert to get right position.
for ind,(k,v) in enumerate(zip(zip(*df.pop('col1').tolist()),['col1_A', 'col1_B'])):
df.insert(ind, v, k)
Full example
import pandas as pd
df = pd.DataFrame({
"col1": [['A', 'B'], ['A', 'C']],
"col2": [1, 2],
})
for ind,(k,v) in enumerate(zip(zip(*df.pop('col1').tolist()),['col1_A', 'col1_B'])):
df.insert(ind, v, k)
print(df)
Returns:
col1_A col1_B col2
0 A B 1
1 A C 2

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to make the values of a pandas dataframe column as column - python

Related

How to compare two dataframes in Python pandas and output the difference?

Pandas DataFrame column string concatenation

How to apply a function on a series of columns, based on the values in a corresponding series of columns?

Reformat dataframe using pandas by adding new rows based on dictionary value

Python - how to split list for creating new column? pandas

Categories

Resources