I have the following data frame:
df = pd.DataFrame()
df['Name'] = ['A','B','C']
df['Value'] = ['2+0.5','1+0.2','2-0.06']
What I wanted to do is to split the value and assign to two new columns.
It means my desired output will be as follow:
The element in value column will be splitted into two and I will re-use the sign in the columns.
I am very grateful for your advice.
Thanks.
import pandas as pd
df = pd.DataFrame()
df['Name'] = ['A','B','C']
df['Value'] = ['2+0.5','1+0.2','2-0.06']
df[['value1','value2']]=df.Value.str.split('[-+]+',expand=True)
contain_symbol = df.Value.str.contains('-',regex=False)
df.loc[contain_symbol,"value2"] = -df.loc[contain_symbol,"value2"].astype(float)
Say you have a column source_name with values like : 'Anand_VUNagar_DC (Gujarat)' and you want to create 3 new columns A,B,B with values : 'Mother', 'Teresa', 'India'
You can do by-
df1[['A','B','C']]=df1['source_name'].str.rsplit('_',2, expand=True)
result:
source_name A B C
0 Anand_VUNagar_DC (Gujarat) Anand VUNagar DC (Gujarat)
IIUC
newdf=df.Value.str.split('([-+])+',expand=True)
newdf[2]=newdf[1].map({'+':1,'-':-1})*newdf[2].astype(float)
df[['value1','value2']]=newdf[[0,2]]
df
Out[30]:
Name Value value1 value2
0 A 2+0.5 2 0.50
1 B 1+0.2 1 0.20
2 C 2-0.06 2 -0.06
Related
In [182]: colname
Out[182]: 'col1'
In [183]: x= 'df_' + colname
In [184]: x
Out[184]: 'df_col1'
May I know how to create a new pandas data frame with x, such that the new data frame's name would be df_col1
Another way is to initialize a dictionary, and add keys containing dataframe as follows:
import pandas as pd
x = "your_column_name"
df_dict = {}
df_dict[x] = pd.DataFrame()
x = "your_new_column_name"
df_dict[x] = pd.DataFrame()
You can then change "x" anytime, and use the same idea to append dataframe in dictionary. To fetch dataframe back, you will then retrieve it back as you access dictionary.
You can use the locals() function as given below,
>>> mydf
col_A col_B
0 1 4
1 2 5
2 3 6
>>> colname = 'col1'
>>> locals()[f'df_{colname}'] = mydf.col_A
>>> df_col1
0 1
1 2
2 3
Name: col_A, dtype: int64
exec function lets you to define and evaluate expressions. Use:
df_=pd.DataFrame({'a':[1,2,3]})
col_name = 'col1'
exec(f"df_{col_name}= df_")
Output:
I have the following DataFrame:
I need to switch values of col2 and col3 with the values of col4 and col5. Values of col1 will remain the same. The end result needs to look as the following:
Is there a way to do this without looping through the DataFrame?
Use rename in pandas
In [160]: df = pd.DataFrame({'A':[1,2,3],'B':[3,4,5]})
In [161]: df
Out[161]:
A B
0 1 3
1 2 4
2 3 5
In [167]: df.rename({'B':'A','A':'B'},axis=1)
Out[167]:
B A
0 1 3
1 2 4
2 3 5
This should do:
og_cols = df.columns
new_cols = [df.columns[0], *df.columns[3:], *df.columns[1:3]]
df = df[new_cols] # Sort columns in the desired order
df.columns = og_cols # Use original column names
If you want to swap the column values:
df.iloc[:, 1:3], df.iloc[:, 3:] = df.iloc[:,3:].to_numpy(copy=True), df.iloc[:,1:3].to_numpy(copy=True)
Pandas reindex could help :
cols = df.columns
#reposition the columns
df = df.reindex(columns=['col1','col4','col5','col2','col3'])
#pass in new names
df.columns = cols
i have a list ['df1', 'df2'] where i have stores some dataframes which have been filtered on few conditions. Then i have converted this list to dataframe using
df = pd.DataFrame(list1)
now the df has only one column
0
df1
df2
sometimes it may also have
0
df1
df2
df3
i wanted to concate all these my static code is
df_new = pd.concat([df1,df2],axis=1) or
df_new = pd.concat([df1,df2,df3],axis=1)
how can i make it dynamic (without me specifying as df1,df2) so that it takes the values and concat it.
Using array to add the lists and data frames :
import pandas as pd
lists = [[1,2,3],[4,5,6]]
arr = []
for l in lists:
new_df = pd.DataFrame(l)
arr.append(new_df)
df = pd.concat(arr,axis=1)
df
Result :
0 0
0 1 4
1 2 5
2 3 6
I have below list
ColumnName = 'Emp_id','Emp_Name','EmpAGe'
While i am trying to read above columns and assign inside dataframe i am getting extra double quotes
df = pd.dataframe(data,columns=[ColumnName])
columns=[ColumnName]
i am getting columns = ["'Emp_id','Emp_Name','EmpAGe'"]
how can i handle these extra double quotes and remove them while assigning header to data
This code
ColumnName = 'Emp_id','Emp_Name','EmpAGe'
Is a tuple and not a list.
In case you want three columns, each with values on the tuple above you gonna need
df = pd.dataframe(data,columns=list(ColumnName))
The problem is how you define the columns for pandas DataFrame.
The example below will build a correct data frame :
import pandas as pd
ColumnName1 = 'Emp_id','Emp_Name','EmpAGe'
df1 = [['A1','A1','A2'],['1','2','1'],['a0','a1','a3']]
df = pd.DataFrame(data=df1,columns=ColumnName1 )
df
Result :
Emp_id Emp_Name EmpAGe
0 A1 A1 A2
1 1 2 1
2 a0 a1 a3
A print screen of the code I wrote with the result, with no double quotations
Just for the shake of the understanding, where you can use col.replace to get the desired ..
Let take an example..
>>> df
col1" col2"
0 1 1
1 2 2
Result:
>>> df.columns = [col.replace('"', '') for col in df.columns]
# df.columns = df.columns.str.replace('"', '') <-- can use this as well
>>> df
col1 col2
0 1 1
1 2 2
OR
>>> df = pd.DataFrame({ '"col1"':[1, 2], '"col2"':[1,2]})
>>> df
"col1" "col2"
0 1 1
1 2 2
>>> df.columns = [col.replace('"', '') for col in df.columns]
>>> df
col1 col2
0 1 1
1 2 2
Your input is not quite right. ColumnName is already list-like and it should be passed on directly rather than wrapped in another list. In the latter case it would be interpreted as one single column.
df = pd.DataFrame(data, columns=ColumnName)
Assuming that I have a dataframe with the following values:
df:
col1 col2 value
1 2 3
1 2 1
2 3 1
I want to first groupby my dataframe based on the first two columns (col1 and col2) and then average over values of the thirs column (value). So the desired output would look like this:
col1 col2 avg-value
1 2 2
2 3 1
I am using the following code:
columns = ['col1','col2','avg']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
print(df[['col1','col2','avg']].groupby('col1','col2').mean())
which gets the following error:
ValueError: No axis named col2 for object type <class 'pandas.core.frame.DataFrame'>
Any help would be much appreciated.
You need to pass a list of the columns to groupby, what you passed was interpreted as the axis param which is why it raised an error:
In [30]:
columns = ['col1','col2','avg']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
print(df[['col1','col2','avg']].groupby(['col1','col2']).mean())
avg
col1 col2
1 2 3
3 3
If you want to group by multiple columns, you should put them in a list:
columns = ['col1','col2','value']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
df.loc[2] = [2,3,1]
print(df.groupby(['col1','col2']).mean())
Or slightly more verbose, for the sake of getting the word 'avg' in your aggregated dataframe:
import numpy as np
columns = ['col1','col2','value']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
df.loc[2] = [2,3,1]
print(df.groupby(['col1','col2']).agg({'value': {'avg': np.mean}}))