Pandas converting columns to rows - python

I have a dataframe with columns like this -
Name Id 2019col1 2019col2 2019col3 2020col1 2020col2 2020col3 2021col1 2021Ccol2 2021Ccol3
That is, the columns are repeated for each year.
I want to take the year out and make it a column, so that the final dataframe looks like -
Name Id Year col1 col2 col3
Is there a way in pandas to achieve something like this?

Use wide_to_long, but before change order years to end of columns names like 2019col1 to col12019 in list comprehension:
print (df)
Name Id 2019col1 2019col2 2019col3 2020col1 2020col2 2020col3 \
0 a 456 4 5 6 2 3 4
2021col1 2021col2 2021col3
0 5 2 1
df.columns = [x[4:] + x[:4] if x[:4].isnumeric() else x for x in df.columns]
df = (pd.wide_to_long(df.reset_index(),
['col1','col2', 'col3'],
i='index',
j='Year').reset_index(level=0, drop=True).reset_index())
print (df)
Year Id Name col1 col2 col3
0 2019 456 a 4 5 6
1 2020 456 a 2 3 4
2 2021 456 a 5 2 1

Related

Summing up data with the same conditions in PANDAS

I have a dataframe:
df =
col1
Num
1
4
1
4
2
5
2
1
2
1
3
2
I want to add all the numbers and show the total.
So I will get:
col1
Sum
1
8
2
7
3
2
Try this:
df.groupby('col1').sum()
If you wanted the new column to have the name 'sum' as in your example you could do the following:
df1 = df.groupby('col1').sum()
df1.columns = ['Sum']

select based on row combinations on different columns pandas

I have the following pandas data frame.
ID col1 col2 value
1 4 New 20
2 4 OLD 30
3 5 OLD 60
4 5 New 50
5 3 New 70
I would like to select only rows which has the following rules. from col1 value 4 and 3 should be in New and 5 should be in Old in col2. Drop other rows other wise.
ID col1 col2 value
1 4 New 20
3 5 Old 60
5 3 New 70
Can any one help with this in Python pandas?
Use DataFrame.query with filter by in chained by & for bitwise AND and second condition chain by | for bitwise OR:
df1 = df.query("(col1 in [4,3] & col2 == 'New') | (col1 == 5 & col2 == 'OLD')")
print (df1)
ID col1 col2 value
0 1 4 New 20
2 3 5 OLD 60
4 5 3 New 70
Or use boolean indexing with Series.isin:
df1 = df[df['col1'].isin([3,4]) & df['col2'].eq('New') |
df['col1'].eq(5) & df['col2'].eq('OLD')]

subtract one column from multiple columns in the same dataframe using method chaining

I have a dataframe in pandas and I would like to subtract one column (lets say col1) from col2 and col3 (or from more columns if there are) without writing the the below assign statement for each column.
df = pd.DataFrame({'col1':[1,2,3,4], 'col2':[2,5,6,8], 'col3':[5,5,5,9]})
df = (df
...
.assign(col2 = lambda x: x.col2 - x.col1)
)
How can I do this? Or would it work with apply? How would you be able to do this with method chaining?
Edit: (using **kwarg with chainning method)
As in your comment, if you want to chain method on the intermediate(on-going calculated) dataframe, you need to define a custom dictionary to calculate each column to use with assign as follows (you can't use lambda to directly construct dictionary inside assign).
In this example I do add 5 to the dataframe before chaining assign to show how it works on chain processing as you want
d = {cl: lambda x, cl=cl: x[cl] - x['col1'] for cl in ['col2','col3']}
df_final = df.add(5).assign(**d)
In [63]: df
Out[63]:
col1 col2 col3
0 1 2 5
1 2 5 5
2 3 6 5
3 4 8 9
In [64]: df_final
Out[64]:
col1 col2 col3
0 6 1 4
1 7 3 3
2 8 3 2
3 9 4 5
Note: df_final.col1 is different from df.col1 because of the add operation before assign. Don't forget cl=cl in the lambda of dictionary. It is there to avoid late-binding issue of python.
Use df.sub
df_sub = df.assign(**df[['col2','col3']].sub(df.col1, axis=0).add_prefix('sub_'))
Out[22]:
col1 col2 col3 sub_col2 sub_col3
0 1 2 5 1 4
1 2 5 5 3 3
2 3 6 5 3 2
3 4 8 9 4 5
If you want to assign values back to col2, col3, use additional update
df.update(df[['col2','col3']].sub(df.col1, axis=0))
print(df)
Output:
col1 col2 col3
0 1 1 4
1 2 3 3
2 3 3 2
3 4 4 5

Create readable string in pandas dataframe

I have a single column dataframe:
col1
1
2
3
4
I need to create another column where it will be a string like:
Result:
col1 col2
1 Value is 1
2 Value is 2
3 Value is 3
4 Value is 4
I know about formatted strings but not sure how to implement it in dataframe
Convert column to string and prepend values:
df['col2'] = 'Value is ' + df['col1'].astype(str)
Or use f-strings with Series.map:
df['col2'] = df['col1'].map(lambda x: f'Value is {x}')
print (df)
col1 col2
0 1 Value is 1
1 2 Value is 2
2 3 Value is 3
3 4 Value is 4

How to prevent pandas from only assigning value from one df to column of another for only one row?

I have a df that looks like this:
id col1 col2
1 2 3
4 5 6
7 8 9
when I go to add a new column and assign a value like this:
df['new_col'] = old_df['email']
The assignment only assigns the value to the first like so:
id col1 col2 new_col
1 2 3 a#a.com
4 5 6 NaN
7 8 9 NaN
How do I have the assignment for all rows like so:
id col1 col2 new_col
1 2 3 a#a.com
4 5 6 a#a.com
7 8 9 a#a.com
edit:
old_df:
id col3 col4 email
1 2 3 a#a.com
Pandas series assignment works by index. Since old_df only contains index 0, only index 0, i.e. the first row, of df is updated.
For your particular problem, you can use iat and assign a scalar to a series:
df['new_col'] = old_df['email'].iat[0]
This works because Pandas broadcasts scalars to the whole series irrespective of index.

Categories