Resetting index after removing rows from Pandas data frame - python

I am trying to split an option chain into a separate data frame for rows with just the calls ('C') from the column, Right.
options_df
Index
Right
0
P
1
P
2
P
3
C
4
C
5
C
I try to make a new data frame, df, to hold the calls ('C'):
df = options_df
df.drop(df[df["Right"] == 'P'].index)
This returns the data frame, df, but unfortunately it keeps the indexing from the original data frame, options_df:
df
Index
Right
3
C
4
C
5
C
Ideally, the data frame for df would look like this:
Index
Right
0
C
1
C
2
C
But, it does not.
I've tried to correct with resetting the index, as below:
df.reset_index(drop=True)
But it also does not work and gives me back the entire original data frame, options_df:
df
Index
Right
0
P
1
P
2
P
3
C
4
C
5
C
I'm sure there is a simple solution, but I just cannot figure this one out. Thank you for you help!

You don't need to use .drop(), just select the rows of the condition you want and then reset the index by reset_index(drop=True), as follows:
df = df[df["Right"] == 'C'].reset_index(drop=True)
print(df)
Right
0 C
1 C
2 C

When you reset your index you need to add inplace=True
df.reset_index(drop=True, inplace=True)
Or, assign the result back to df with the line as you've written it:
df = df.reset_index(drop=True)

Related

Panda drop values in columns but keep columns

Has the title say, I would like to find a way to drop the row (erase it) in a data frame from a column to the end of the data frame but I don't find any way to do so.
I would like to start with
A B C
-----------
1 1 1
1 1 1
1 1 1
and get
A B C
-----------
1
1
1
I was trying with
df.drop(df.loc[:, 'B':].columns, axis = 1, inplace = True)
But this delete the column itself too
A
-
1
1
1
am I missing something?
If you only know the column name that you want to keep:
import pandas as pd
new_df = pd.DataFrame(df["A"])
If you only know the column names that you want to drop:
new_df = df.drop(["B", "C"], axis=1)
For your case, to keep the columns, but remove the content, one possible way is:
new_df = pd.DataFrame(df["A"], columns=df.columns)
Resulting df contains columns "A" and "B" but without values (NaN instead)

Add modified copy of a row into Data frame

Let's say we have a data frame below
df = pd.DataFrame(numpy.random.randint(0,5,size=(5, 4)), columns=list('ABCD'))
df
A B C D
0 3 3 0 0
1 0 3 3 2
2 1 0 0 0
3 2 4 4 0
4 3 2 2 4
I would want to append a new row from the existing data and modify several columns
newrow = df.loc[0].copy()
newrow.A = 99
newrow.B = 90
df.append(newrow)
By doing this I got a warning when trying to modify the row
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
<string>:23: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
What would be the clean way of achieving what I intend to do ? I won't have index to use loc because the row is not inside the df yet
If later I would like to come back to this row, how could I retrieve its index at the moment of appending.
newrow = df.loc[0].copy()
df.append(newrow)
df.loc[which index to use, "A"] = 99
In other words, let's say I would want to add the row first then modify it later, how could I get the added row's index
As I can see, you modify every value of the current df row, so it might unnecessary to copy the current row and get the warning.
Just create a dict with your values and append it to the df:
newrow = {'A':99,'B':90,'C':92, 'D':93}
df = df.append(newrow, ignore_index=True)
Use ignore_index=True and the newrow will just be the last index in your df.
use df.iloc[-1] to find the appended line if you didn't use the ignore_index = True tip.

Drop a column which is a subset of any other column in a dataframe

I have a pandas dataframe as below. How can I drop any column which is a subset of any of the remaining columns? I would like to do this without using fillna.
df = pd.DataFrame([ [1,1,3,3], [np.NaN,2,np.NaN,4]], columns=['A','B','C','D'] )
df
A B C D
0 1.0 1 3.0 3
1 NaN 2 NaN 4
I can identify here that column A is subset of B and column C is a subset of D with something like this:
if all(df[A][df[A].notnull()].isin(df[B]))
I could run a loop over all columns and drop the subset columns. But is there a more efficient way to accomplish this, so that I have the following result:
df
B D
0 1 3
1 2 4
Thanks.
It still requires iteration, but you can use this list comprehension (with an if statement similar to the one you provided) to get columns to keep:
keep_cols = [x for x in df if not any(df.drop(x, axis=1).apply(lambda y: df[x].dropna().isin(y).all()))]
# ['B', 'D']
And then use the result with filter:
df.filter(items=keep_cols)
# B D
# 0 1 3
# 1 2 4
This should be fast enough, since it still uses apply at its core, and seems to be safer/more efficient than dropping columns within a loop.
If you're keen on a one-line solution, of course assigning the list to a variable is an optional step:
df.filter(items=[x for x in df if not any(df.drop(x, axis=1).apply(lambda y: df[x].dropna().isin(y).all()))])

Pandas automatically converts row to column

I have a very simple dataframe like so:
In [8]: df
Out[8]:
A B C
0 2 a a
1 3 s 3
2 4 c !
3 1 f 1
My goal is to extract the first row in such a way that looks like this:
A B C
0 2 a a
As you can see the dataframe shape (1x3) is preserved and the first row still has 3 columns.
However when I type the following command df.loc[0] the output result is this:
df.loc[0]
Out[9]:
A 2
B a
C a
Name: 0, dtype: object
As you can see the row has turned into a column with 3 rows! (3x1 instead of 3x1). How is this possible? how can I simply extract the row and preserve its shape as described in my goal? Could you provide a smart and elegant way to do it?
I tried to use the transpose command .T but without success... I know I could create another dataframe where the columns are extracted by the original dataframe but this way quite tedious and not elegant I would say (pd.DataFrame({'A':[2], 'B':'a', 'C':'a'})).
Here is the dataframe if you need it:
import pandas as pd
df = pd.DataFrame({'A':[2,3,4,1], 'B':['a','s','c','f'], 'C':['a', 3, '!', 1]})
You need add [] for DataFrame:
#select by index value
print (df.loc[[0]])
A B C
0 2 a a
Or:
print (df.iloc[[0]])
A B C
0 2 a a
If need transpose Series, first need convert it to DataFrame by to_frame:
print (df.loc[0].to_frame())
0
A 2
B a
C a
print (df.loc[0].to_frame().T)
A B C
0 2 a a
Use a range selector will preserve the Dataframe format.
df.iloc[0:1]
Out[221]:
A B C
0 2 a a

set multiple Pandas DataFrame columns to values in a single column or multiple scalar values at the same time

I'm trying to set multiple new columns to one column and, separately, multiple new columns to multiple scalar values. Can't do either. Any way to do it other than setting each one individually?
df=pd.DataFrame(columns=['A','B'],data=np.arange(6).reshape(3,2))
df.loc[:,['C','D']]=df['A']
df.loc[:,['C','D']]=[0,1]
for c in ['C', 'D']:
df[c] = d['A']
df['C'] = 0
df['D'] = 1
Maybe it is what you are looking for.
df=pd.DataFrame(columns=['A','B'],data=np.arange(6).reshape(3,2))
df['C'], df['D'] = df['A'], df['A']
df['E'], df['F'] = 0, 1
# Result
A B C D E F
0 0 1 0 0 0 1
1 2 3 2 2 0 1
2 4 5 4 4 0 1
The assign method will create multiple, new columns in one step. You can pass a dict() with the column and values to return a new DataFrame with the new columns appended to the end.
Using your examples:
df = df.assign(**{'C': df['A'], 'D': df['A']})
and
df = df.assign(**{'C': 0, 'D':1})
See this answer for additional detail: https://stackoverflow.com/a/46587717/4843561

Categories