How do you drop a column by index? - python

When I run this code it drops the first row instead of the first column:
df.drop(axis=1, index=0)
How do you drop a column by index?

You can use df.columns[i] to denote the column. Example:
df.drop(df.columns[0], axis=1)

Using the example
df = pd.DataFrame([
[1023.423,12.59595],
[1000,11.63024902],
[975,9.529815674],
[100,-48.20524597]], columns = ['col1', 'col2'])
col1 col2
0 1023.423 12.595950
1 1000.000 11.630249
2 975.000 9.529816
3 100.000 -48.205246
If you do df.drop(index=0), the output is dropping row with index 0
col1 col2
1 1000.0 11.630249
2 975.0 9.529816
3 100.0 -48.205246
If you do df.drop('col1', axis=1), the output is dropping column with name 'col1'
col2
0 12.595950
1 11.630249
2 9.529816
3 -48.205246
Please remember to use inplace=True where necessary

Related

check if row value value is equal to column name and access the value of the column

Sample dataframe -
col1
col2
col3
col4
colfromvaluestobepicked
new col
1
1
0
1
'col1'
1
0
0
1
1
'col2'
0
I want to create a new column whose values are based on if the colfromvaluestobepicked ('col1') is col1 then pick that col value and assign it to the new col and so on.
I am not sure how to achieve this?
Use DataFrame.melt for alternative for lookup:
df1 = df.melt('colfromvaluestobepicked', ignore_index=False)
df['new']=df1.loc[df1['colfromvaluestobepicked'].str.strip("'") == df1['variable'],'value']
try this:
df['new col'] = df.apply(lambda row: row[row['colfromvaluestobepicked']],axis =1)

How do I stop aggregate functions from adding unwanted rows to dataframe?

I wrote a line of code that groups the dataframe by column
df = df.groupby(['where','when']).agg({'col1': ['max'], 'col2': ['sum']})
After using the above code, the aggregated columns in the output has two extra rows, with 'max' and 'sum' taking up a column below the 'col1' and 'col2' index. It looks like this:
col1
col2
max
sum
where
when
home
1
a
a
work
2
b
b
This is my expected outcome:
where
when
col1
col2
home
1
a
a
work
2
b
b
I want to bring down both col1 and col2 down to the same row as location and month, and at the same time remove 'max' and 'sum' from showing. I couldn't really think of a way to make this work so help would be appreciated.
What you need is reset_index and pass column name to aggregate function in advance.
Use followoing:
df = df.groupby(['where','when']).agg(col1 = ('col1', 'max'), col2 = ('col2', 'sum')).reset_index()
Dataframe:
where when col1 col2
0 home 1 1 1
1 work 2 2 2
2 home 1 3 3
Output:
where when col1 col2
0 home 1 3 3
1 work 2 2 2
Update:
We can pass as_index = False to groupby which will stop pandas to put keys as the index and hence we don't need to reset the index afterwards.
df = df.groupby(['where','when'], as_index = False).agg(col1 = ('col1', 'max'), col2 = ('col2', 'sum'))

How to compare two dataframes in Python pandas and output the difference?

I have two df with the same numbers of columns but different numbers of rows.
df1
col1 col2
0 a 1,2,3,4
1 b 1,2,3
2 c 1
df2
col1 col2
0 b 1,3
1 c 1,2
2 d 1,2,3
3 e 1,2
df1 is the existing list, df2 is the updated list. The expected result is whatever in df2 that was previously not in df1.
Expected result:
col1 col2
0 c 2
1 d 1,2,3
2 e 1,2
I've tried with
mask = df1['col2'] != df2['col2']
but it doesn't work with different rows of df.
Use DataFrame.explode by splitted values in columns col2, then use DataFrame.merge with right join and indicato parameter, filter by boolean indexing only rows with right_only and last aggregate join:
df11 = df1.assign(col2 = df1['col2'].str.split(',')).explode('col2')
df22 = df2.assign(col2 = df2['col2'].str.split(',')).explode('col2')
df = df11.merge(df22, indicator=True, how='right', on=['col1','col2'])
df = (df[df['_merge'].eq('right_only')]
.groupby('col1')['col2']
.agg(','.join)
.reset_index(name='col2'))
print (df)
col1 col2
0 c 2
1 d 1,2,3
2 e 1,2

How to switch column values in the same Pandas DataFrame

I have the following DataFrame:
I need to switch values of col2 and col3 with the values of col4 and col5. Values of col1 will remain the same. The end result needs to look as the following:
Is there a way to do this without looping through the DataFrame?
Use rename in pandas
In [160]: df = pd.DataFrame({'A':[1,2,3],'B':[3,4,5]})
In [161]: df
Out[161]:
A B
0 1 3
1 2 4
2 3 5
In [167]: df.rename({'B':'A','A':'B'},axis=1)
Out[167]:
B A
0 1 3
1 2 4
2 3 5
This should do:
og_cols = df.columns
new_cols = [df.columns[0], *df.columns[3:], *df.columns[1:3]]
df = df[new_cols] # Sort columns in the desired order
df.columns = og_cols # Use original column names
If you want to swap the column values:
df.iloc[:, 1:3], df.iloc[:, 3:] = df.iloc[:,3:].to_numpy(copy=True), df.iloc[:,1:3].to_numpy(copy=True)
Pandas reindex could help :
cols = df.columns
#reposition the columns
df = df.reindex(columns=['col1','col4','col5','col2','col3'])
#pass in new names
df.columns = cols

how to create a dataframe aggregating (grouping?) a dataframe containing only strings

I would like to create a dataframe "aggregating" a larger data set.
Starting:
df:
col1 col2
1 A B
2 A C
3 A B
and getting:
df_aggregated:
col1 col2
1 A B
2 A C
without using any calclulation (count())
I would write:
df_aggreagated = df.groupby('col1')
but I do not get anything
print ( df_aggregated )
"error"
any help appreciated
You can accomplish this by simply dropping the duplicate entries using the df.drop_duplicates function:
df_aggregated = df.drop_duplicates(subset=['col1', 'col2'], keep=False)
print(df_aggregated)
col1 col2
1 A B
2 A C
You can use groupby with a function:
In [849]: df.groupby('col2', as_index=False).max()
Out[849]:
col2 col1
0 B A
1 C A

Categories