I have de DataFrame with almost 100 columns
I need to select col2 to col4 and col54. How can I do it?
I tried:
df = df.loc[:,'col2':col4']
but i can't add col54
You can do this in a couple of different ways:
Using the same format you are currently trying to use, I think doing a join of col54 will be necessary.
df = df.loc[:,'col2':'col4'].join(df.loc[:,'col54'])
.
Another method given that col2 is close to col4 would be to do this
df = df.loc[:,['col2','col3','col4', 'col54']]
or simply
df = df[['col2','col3','col4','col54']]
You can simply do this:
df = df.loc[:,['col2','col4','col54']]
loc takes the column names as list as well.
Or this:
df[['col2','col4','col54']]
You use a list or a pandas.IndexSlice object
In [1]: import pandas as pd
In [2]: df = pd.DataFrame(1,index=[0,1,2],columns=["col1","col2","col3","col4","col5"])
In [3]: df
Out[3]:
col1 col2 col3 col4 col5
0 1 1 1 1 1
1 1 1 1 1 1
2 1 1 1 1 1
In [4]: df.loc[:,['col1','col2','col4','col5']]
Out[4]:
col1 col2 col4 col5
0 1 1 1 1
1 1 1 1 1
2 1 1 1 1
In [5]: slicer = pd.IndexSlice
In [6]: df.loc[:,slicer["col3":"col5"]]
Out[6]:
col3 col4 col5
0 1 1 1
1 1 1 1
2 1 1 1
edit: I see I misread the OP. This is a bit tough. You can get 'Col2','Col3','Col4' using the pandas.IndexSlice as I demonstrated above. I'm trying to figure out how to include 'Col54' into that.
Related
I'm looping through list with multiple dicionaries and want them to be appended into single data frame.
#getting values of specific key from AWS' boto3 response
events_list = response_event.get('Events')
for e in events_list:
df = pd.DataFrame.from_dict(e)
print(df)
Current and expected result below:
col1 col2
0 1 3
col1 col2
0 2 4
col1 col2
0 3 5
col1 col2
0 1 3
1 2 4
2 3 5
Try with concat
out = pd.concat(pd.DataFrame.from_dict(e) for e in events_list)
I have data like:
import pandas as pd
df = pd.DataFrame(data=[[1,-2,3,0,0], [0,0,0,4,0], [0,0,0,0,5]]).T
df.columns = ['col1', 'col2', 'col3']
> df
col1 col2 col3
1 0 0
-2 0 0
3 0 0
0 4 0
0 0 5
I want to create a fourth ("Col4") that takes the col that is non-zero.
So result would be:
col1 col2 col3 col4
1 0 0 1
-2 0 0 -2
3 0 0 3
0 4 0 4
0 0 5 5
EDIT: If two non-zero, always use col1. Also, the numbers may be negative. I have updated the df to reflect this.
Using the maximum of the columns is a possibility
df['col4'] = df.max(axis=1)
Here's an example:
def func(a):
a = set(a)
assert len(a)==2 # 0 and another number
for i in a:
if i!=0:
return i
df['col4'] = df.apply(func,axis=1)
df1:
col1 col2
0 a 5
1 b 2
2 c 1
df2:
col1
0 qa0
1 qa1
2 qa2
3 qa3
4 qa4
5 qa5
final output:
col1 col2 col3
0 a 5 qa5
1 b 2 qa2
2 c 1 qa1
Basically , in df1, I have index stored for another df data. I have to fetch data from df2 and append it in df1.
I don't know how to fetch data via index number.
Use Series.map by another Series:
df1['col3'] = df1['col2'].map(df2['col1'])
Or use DataFrame.join with rename column:
df1 = df1.join(df2.rename(columns={'col1':'col3'})['col3'], on='col2')
print (df1)
col1 col2 col3
0 a 5 qa5
1 b 2 qa2
2 c 1 qa1
You can use iloc to get data and then to_numpy for values
df1["col3"] = df2.iloc[df1.col2].to_numpy()
df1
col1 col2 col3
0 a 5 qa5
1 b 2 qa2
2 c 1 qa1
i need do some operations with my dataframe
my dataframe is
df = pd.DataFrame(data={'col1':[1,2],'col2':[3,4]})
col1 col2
0 1 3
1 2 4
my operatin is column dependent
for example, i need to add (+) .max() of column to each value in this column
so df.col1.max() is 2 and df.col2.max() is 4
so my output should be:
col1 col2
0 3 7
1 4 8
i have been try this:
for i in df.columns:
df.i += df.i.max()
but
AttributeError: 'DataFrame' object has no attribute 'i'
you can chain df.add and df.max and specify the axis which avoids any loops.
df1 = df.add(df.max(axis=0))
print(df1)
col1 col2
0 3 7
1 4 8
To loop through the columns and add the maximum of each column you can do the following:
for col in df:
df[col] += df[col].max()
This gives
col1 col2
0 3 7
1 4 8
My question is related to my previous Question but it's different. So I am asking the new question.
In above question see the answer of #jezrael.
df = pd.DataFrame({'col1':[1,1,1],
'col2':[4,4,6],
'col3':[7,7,9],
'col4':[3,3,5]})
print (df)
col1 col2 col3 col4
0 1 4 7 3
1 1 4 7 3
2 1 6 9 5
df1 = df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'})
df1['result_col'] = df1['col3'].div(df1['col4'])
print (df1)
col4 col3 result_col
col1 col2
1 4 1 2 2.0
6 1 1 1.0
Now here I want to take count for the specific value of col4 . Say I also want to take count of col4 == 3 in the same query.
df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'}) ... + count(col4=='3')
How to do this in same above query I have tried bellow but not getting solution.
df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique','col4':'x: lambda x[x == 7].count()'})
Do some preprocessing by including the col4==3 as a column ahead of time. Then use aggregate
df.assign(result_col=df.col4.eq(3).astype(int)).groupby(
['col1', 'col2']
).agg(dict(col3='size', col4='nunique', result_col='sum'))
col3 result_col col4
col1 col2
1 4 2 2 1
6 1 0 1
old answers
g = df.groupby(['col1', 'col2'])
g.agg({'col3':'size','col4': 'nunique'}).assign(
result_col=g.col4.apply(lambda x: x.eq(3).sum()))
col3 col4 result_col
col1 col2
1 4 2 1 2
6 1 1 0
slightly rearranged
g = df.groupby(['col1', 'col2'])
final_df = g.agg({'col3':'size','col4': 'nunique'})
final_df.insert(1, 'result_col', g.col4.apply(lambda x: x.eq(3).sum()))
final_df
col3 result_col col4
col1 col2
1 4 2 2 1
6 1 0 1
I think you need aggregate with list of function in dict for column col4.
If need count 3 values the simpliest is sum True values in x == 3:
df1 = df.groupby(['col1','col2'])
.agg({'col3':'size','col4': ['nunique', lambda x: (x == 3).sum()]})
df1 = df1.rename(columns={'<lambda>':'count_3'})
df1.columns = ['{}_{}'.format(x[0], x[1]) for x in df1.columns]
print (df1)
col4_nunique col4_count_3 col3_size
col1 col2
1 4 1 2 2
6 1 0 1