Pandas remove column by index - python

Suppose I have a DataFrame like this:
>>> df = pd.DataFrame([[1,2,3], [4,5,6], [7,8,9]], columns=['a','b','b'])
>>> df
a b b
0 1 2 3
1 4 5 6
2 7 8 9
And I want to remove second 'b' column. If I just use del statement, it'll delete both 'b' columns:
>>> del df['b']
>>> df
a
0 1
1 4
2 7
I can select column by index with .iloc[] and reassign DataFrame, but how can I delete only second 'b' column, for example by index?

df = df.drop(['b'], axis=1).join(df['b'].ix[:, 0:1])
>>> df
a b
0 1 2
1 4 5
2 7 8
Or just for this case
df = df.ix[:, 0:2]
But I think it has other better ways.

Related

Pandas values transfer from one df to another OverflowError

Is there pandas way to copy values to column 'column_to_fill' from another df without itterations? I have needed me row and column indexes in df_1 columns. I need to fill df_1['column_to_fill'] with values from df_2.
df1 = pd.DataFrame(columns=['row_df2', 'column_df2'])
df1['row_df2'] = [1, 3, 5]
df1['column_df2'] = ['a', 'c', 'd']
index=np.arange(6)
columns=['a', 'b', 'c', 'd']
df2 = pd.DataFrame(data=np.random.randint(10, size=(len(index), len(columns))), index=index, columns=columns)
df1['column_to_fill'] = 0
for idx in df1.index:
df1.loc[idx, 'column_to_fill'] = df2.loc[df1.loc[idx, 'row_df2'],
df1.loc[idx, 'column_df2']].sum()
df1
row_df2 column_df2
0 1 a
1 3 c
2 5 d
df2
a b c d
0 2 3 5 2
1 8 3 9 3
2 4 6 0 1
3 3 8 0 8
4 3 4 5 0
5 2 5 4 0
df1
row_df2 column_df2 column_to_fill
0 1 a 8
1 3 c 0
2 5 d 0
I think you want to pick the value of the df_2 based on the values(row and column combination) of df_1 and assign it to df_1 column. If that is the case then check below.,
df_1 = pd.DataFrame({'values_type_rows_df2':[0,1,0,1], 1:[4,5,6,7]})
df_2 = pd.DataFrame({0:['a','b','c','d'], 1:['e','a','b','c']})
df_1['column_to_fill'] = [df_2.loc[i,i] for i in df_1['values_type_rows_df2']]
Based on your modification of the question, below is the code modified.
df1['column_to_fill'] = [df2.loc[j["row_df2"], j["column_df2"]] for i,j in df1.loc[:,["row_df2", "column_df2"]].iterrows()]
Screenshot attached for the time it took

Assign each unique value of column to whole Dataframe as if data frame duplicate itself based on value of another column

i am trying to iterate value of column from df2 and assign each value of column from df2 to the df1.As if df1 will multiply itself based on value of column from df2.
let's say i have df1 as per below:
df1
1
2
3
and df2 as per below:
df2
A
B
C
I want third dataframe df3 will became like below:
df3
1 A
2 A
3 A
1 B
2 B
3 B
1 C
2 C
3 C
for now i have tried below code
for i, value in ACS_shock['scenario'].iteritems():
df1['sec'] = df1[i] = value[:]
But when i generate the file from DF1 my output is like below:
1 A B C
2 A B C
3 A B C
Any idea how can i correct this code.
much appreciated.
You can use pd.concat and np.repeat:
>>> import pandas as pd
>>> import numpy as np
>>> df1 = pd.Series([1,2,3])
>>> df1
0 1
1 2
2 3
dtype: int64
>>> df2 = pd.Series(list('ABC'))
>>> df2
0 A
1 B
2 C
dtype: object
>>> df3 = pd.DataFrame({'df1': pd.concat([df1]*3).reset_index(drop=True),
'df2': np.repeat(df2, 3).reset_index(drop=True)})
>>> df3
df1 df2
0 1 A
1 2 A
2 3 A
3 1 B
4 2 B
5 3 B
6 1 C
7 2 C
8 3 C

Sorting a dataframe by a column

Hi I need to sort a data frame. My data frame looks like below.
A B
2 5
3 9
2 7
I want to sort this by column A.
A B
2 5
2 7
3 9
when having duplicates in the column A,
sorted_data=data.sort_values(by=['A'], inplace=True)
doesn't workout. Any suggestion how I can fix this
It has worked correctly. The problem is that if you use inplace=True the sorting is done in your original DataFrame, data in your case.
If you want the order dataframe and to store it in sorted_data, do the following:
sorted_data=data.sort_values(by=['A'])
For example:
>>> df = pd.DataFrame({'A': [2,3,2], 'B': [5,9,7]})
>>> df.sort_values(by=['A'],inplace=True)
>>> df
a b
0 2 5
2 2 7
1 3 9
The other way:
>>> df = pd.DataFrame({'A': [2,3,2], 'B': [5,9,7]})
>>> sorted_df = df.sort_values(by=['A'])
>>> sorted_df
a b
0 2 5
2 2 7
1 3 9
>>> df
a b
0 2 5
1 3 9
2 2 7

Convert dataframe column values to new columns

I have a dataframe containing some data, which I want to transform, so that the values of one column define the new columns.
>>> import pandas as pd
>>> df = pd.DataFrame([['a','a','b','b'],[6,7,8,9]]).T
>>> df
A B
0 a 6
1 a 7
2 b 8
3 b 9
The values of the column A shall be the column names of the new dataframe. The result of the transformation should look like this:
a b
0 6 8
1 7 9
What I came up with so far didn't work completely:
>>> pd.DataFrame({ k : df.loc[df['A'] == k, 'B'] for k in df['A'].unique() })
a b
0 6 NaN
1 7 NaN
2 NaN 8
3 NaN 9
Besides this being incorrect, I guess there probably is a more efficient way anyway. I'm just really having a hard time understanding how to handle things with pandas.
You were almost there but you need the .values as the list of array and then provide the column names.
pd.DataFrame(pd.DataFrame({ k : df.loc[df['A'] == k, 'B'].values for k in df['A'].unique() }), columns=df['A'].unique())
Output:
a b
0 6 8
1 7 9
Using a dictionary comprehension with groupby:
res = pd.DataFrame({col: vals.loc[:, 1].values for col, vals in df.groupby(0)})
print(res)
a b
0 6 8
1 7 9
Use set_index, groupby, cumcount, and unstack:
(df.set_index(['A', df.groupby('A').cumcount()])['B']
.unstack(0)
.rename_axis([None], axis=1))
Output:
a b
0 6 8
1 7 9

rename index of a pandas dataframe

I have a pandas dataframe whose indices look like:
df.index
['a_1', 'b_2', 'c_3', ... ]
I want to rename these indices to:
['a', 'b', 'c', ... ]
How do I do this without specifying a dictionary with explicit keys for each index value?
I tried:
df.rename( index = lambda x: x.split( '_' )[0] )
but this throws up an error:
AssertionError: New axis must be unique to rename
Perhaps you could get the best of both worlds by using a MultiIndex:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(8).reshape(4,2), index=['a_1', 'b_2', 'c_3', 'c_4'])
print(df)
# 0 1
# a_1 0 1
# b_2 2 3
# c_3 4 5
# c_4 6 7
index = pd.MultiIndex.from_tuples([item.split('_') for item in df.index])
df.index = index
print(df)
# 0 1
# a 1 0 1
# b 2 2 3
# c 3 4 5
# 4 6 7
This way, you can access things according to first level of the index:
In [30]: df.ix['c']
Out[30]:
0 1
3 4 5
4 6 7
or according to both levels of the index:
In [31]: df.ix[('c','3')]
Out[31]:
0 4
1 5
Name: (c, 3)
Moreover, all the DataFrame methods are built to work with DataFrames with MultiIndices, so you lose nothing.
However, if you really want to drop the second level of the index, you could do this:
df.reset_index(level=1, drop=True, inplace=True)
print(df)
# 0 1
# a 0 1
# b 2 3
# c 4 5
# c 6 7
That's the error you'd get if your function produced duplicate index values:
>>> df = pd.DataFrame(np.random.random((4,3)),index="a_1 b_2 c_3 c_4".split())
>>> df
0 1 2
a_1 0.854839 0.830317 0.046283
b_2 0.433805 0.629118 0.702179
c_3 0.390390 0.374232 0.040998
c_4 0.667013 0.368870 0.637276
>>> df.rename(index=lambda x: x.split("_")[0])
[...]
AssertionError: New axis must be unique to rename
If you really want that, I'd use a list comp:
>>> df.index = [x.split("_")[0] for x in df.index]
>>> df
0 1 2
a 0.854839 0.830317 0.046283
b 0.433805 0.629118 0.702179
c 0.390390 0.374232 0.040998
c 0.667013 0.368870 0.637276
but I'd think about whether that's really the right direction.

Categories