How can I do a specific operation on my Dataframe?

How can I do a specific operation on my Dataframe? - python

Hello I have this dataframe using python and pandas :
b c
d 1 4
e 2 5
f 3 6
I would like to have this :
a b c
d 1 4
e 2 5
f 3 6
How can I do this operation ?
Thank you very much !

df['a'] = df.index
df = df[['a','b','c']].reset_index(drop=True)

To change the index column's name:
df.index.name = "a"
To change the index column to be a regular column:
df.reset_index(inplace=True)

Related

How can a duplicate row be dropped with some condition [duplicate]

This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 9 months ago.
Simple DataFrame:
df = pd.DataFrame({'A': [1,1,2,2], 'B': [0,1,2,3], 'C': ['a','b','c','d']})
df
A B C
0 1 0 a
1 1 1 b
2 2 2 c
3 2 3 d
I wish for every value (groupby) of column A, to get the value of column C, for which column B is maximum. For example for group 1 of column A, the maximum of column B is 1, so I want the value "b" of column C:
A C
0 1 b
1 2 d
No need to assume column B is sorted, performance is of top priority, then elegance.

Check with sort_values +drop_duplicates
df.sort_values('B').drop_duplicates(['A'],keep='last')
Out[127]:
A B C
1 1 1 b
3 2 3 d

df.groupby('A').apply(lambda x: x.loc[x['B'].idxmax(), 'C'])
# A
#1 b
#2 d
Use idxmax to find the index where B is maximal, then select column C within that group (using a lambda-function

Here's a little fun with groupby and nlargest:
(df.set_index('C')
.groupby('A')['B']
.nlargest(1)
.index
.to_frame()
.reset_index(drop=True))
A C
0 1 b
1 2 d
Or, sort_values, groupby, and last:
df.sort_values('B').groupby('A')['C'].last().reset_index()
A C
0 1 b
1 2 d

Similar solution to #Jondiedoop, but avoids the apply:
u = df.groupby('A')['B'].idxmax()
df.loc[u, ['A', 'C']].reset_index(drop=1)
A C
0 1 b
1 2 d

Merge on key and keep values of first dataframe

I have two dataframes:
df1
key value
A 1
B 2
C 2
D 3
df2
key value
C 3
D 3
E 5
F 7
I would like to merge this dataframes by their key and get a dataframe which looks like this one. So, I want to get only one column (no new column with suffixes) and remove the value of df2 if the values do not fit together.
df_merged
key value
A 1
B 2
C 2
D 3
E 5
F 7
How can I do this? Should I rather take join or concatenate? Thanks a lot!

Use concat with DataFrame.drop_duplicates by column key:
df = pd.concat([df1, df2], ignore_index=True).drop_duplicates('key')
print (df)
key value
0 A 1
1 B 2
2 C 2
3 D 3
6 E 5
7 F 7

Just adding to #jezrael's answer, you could also use groupby with first:
>>> pd.concat([df1, df2], ignore_index=True).groupby('key', as_index=False).first()
key value
0 A 1
1 B 2
2 C 2
3 D 3
4 E 5
5 F 7
>>>

Assign each unique value of column to whole Dataframe as if data frame duplicate itself based on value of another column

i am trying to iterate value of column from df2 and assign each value of column from df2 to the df1.As if df1 will multiply itself based on value of column from df2.
let's say i have df1 as per below:
df1
1
2
3
and df2 as per below:
df2
A
B
C
I want third dataframe df3 will became like below:
df3
1 A
2 A
3 A
1 B
2 B
3 B
1 C
2 C
3 C
for now i have tried below code
for i, value in ACS_shock['scenario'].iteritems():
df1['sec'] = df1[i] = value[:]
But when i generate the file from DF1 my output is like below:
1 A B C
2 A B C
3 A B C
Any idea how can i correct this code.
much appreciated.

You can use pd.concat and np.repeat:
>>> import pandas as pd
>>> import numpy as np
>>> df1 = pd.Series([1,2,3])
>>> df1
0 1
1 2
2 3
dtype: int64
>>> df2 = pd.Series(list('ABC'))
>>> df2
0 A
1 B
2 C
dtype: object
>>> df3 = pd.DataFrame({'df1': pd.concat([df1]*3).reset_index(drop=True),
'df2': np.repeat(df2, 3).reset_index(drop=True)})
>>> df3
df1 df2
0 1 A
1 2 A
2 3 A
3 1 B
4 2 B
5 3 B
6 1 C
7 2 C
8 3 C

delimit/split row values and form individual rows

reproducible code for data:
import pandas as pd
dict = {"a": "[1,2,3,4]", "b": "[1,2,3,4]"}
dict = pd.DataFrame(list(dict.items()))
dict
0 1
0 a [1,2,3,4]
1 b [1,2,3,4]
I wanted to split/delimit "column 1" and create individual rows for each split values.
expected output:
0 1
0 a 1
1 a 2
2 a 3
3 a 4
4 b 1
5 b 2
6 b 3
7 b 4
Should I be removing the brackets first and then split the values? I really don't get any idea of doing this. Any reference that would help me solve this please?

Based on the logic from that answer:
s = d[1]\
.apply(lambda x: pd.Series(eval(x)))\
.stack()
s.index = s.index.droplevel(-1)
s.name = "split"
d.join(s).drop(1, axis=1)

Because you have strings containing a list (and not lists) in your cells, you can use eval:
dict_v = {"a": "[1,2,3,4]", "b": "[1,2,3,4]"}
df = pd.DataFrame(list(dict_v.items()))
df = (df.rename(columns={0:'l'}).set_index('l')[1]
.apply(lambda x: pd.Series(eval(x))).stack()
.reset_index().drop('level_1',1).rename(columns={'l':0,0:1}))
or another way could be to create a DataFrame (probably faster) such as:
df = (pd.DataFrame(df[1].apply(eval).tolist(),index=df[0])
.stack().reset_index(level=1, drop=True)
.reset_index(name='1'))
your output is
0 1
0 a 1
1 a 2
2 a 3
3 a 4
4 b 1
5 b 2
6 b 3
7 b 4
all the rename are to get exactly your input/output

Adding a column in dataframes based on similar columns in them

I am trying to get an output where I wish to add column d in d1 and d2 where a b c are same (like groupby).
For example
d1 = pd.DataFrame([[1,2,3,4]],columns=['a','b','c','d'])
d2 = pd.DataFrame([[1,2,3,4],[2,3,4,5]],columns=['a','b','c','d'])
then I'd like to get an output as
a b c d
0 1 2 3 8
1 2 3 4 5
Merging the two data frames and adding the resultant column d where a b c are same.
d1.add(d2) or radd gives me an aggregate of all columns
The solution should be a DataFrame which can be added again to another similarly.
Any help is appreciated.

You can use set_index first:
print (d2.set_index(['a','b','c'])
.add(d1.set_index(['a','b','c']), fill_value=0)
.astype(int)
.reset_index())
a b c d
0 1 2 3 8
1 2 3 4 5

df = pd.concat([d1, d2])
df.drop_duplicates()
a b c d
0 1 2 3 4
1 2 3 4 5

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I do a specific operation on my Dataframe? - python

Hello I have this dataframe using python and pandas : b c d 1 4 e 2 5 f 3 6 I would like to have this : a b c d 1 4 e 2 5 f 3 6 How can I do this operation ? Thank you very much !

df['a'] = df.index df = df[['a','b','c']].reset_index(drop=True)

To change the index column's name: df.index.name = "a" To change the index column to be a regular column: df.reset_index(inplace=True)

Related

How can a duplicate row be dropped with some condition [duplicate]

Merge on key and keep values of first dataframe

Assign each unique value of column to whole Dataframe as if data frame duplicate itself based on value of another column

delimit/split row values and form individual rows

Adding a column in dataframes based on similar columns in them

Categories

Resources