Dropping rows in pandas with .index

Dropping rows in pandas with .index - python

I came across the below line of code, which gives an error when '.index' is not present in it.
print(df.drop(df[df['Quantity'] == 0].index).rename(columns={'Weight': 'Weight (oz.)'}))
What is the purpose of '.index' while using drop in pandas?

As explained in the documentation, you can use drop with index:
A B C D
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
df.drop([0, 1]) # Here 0 and 1 are the index of the rows
Output:
A B C D
2 8 9 10 11
In this case it will drop the first 2 rows.
With .index in your example, you find the rows where Quantity=0and retrieve their index(and then use like in the documentation)

this is the detail about .drop() method:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html
.drop() method need a parameter 'label' which is a list of index labels(when axis=0, which is the default case) or columns labels (when axis=1).
df[df['Quantity'] == 0] returns a DataFrame where Quantity=0, but what we need is the index label where Quantity=0, so .index is needed.

Related

changing index of 1 row in pandas

I have the the below df build from a pivot of a larger df. In this table 'week' is the the index (dtype = object) and I need to show week 53 as the first row instead of the last
Can someone advice please? I tried reindex and custom sorting but can't find the way
Thanks!
here is the table

Since you can't insert the row and push others back directly, a clever trick you can use is create a new order:
# adds a new column, "new" with the original order
df['new'] = range(1, len(df) + 1)
# sets value that has index 53 with 0 on the new column
# note that this comparison requires you to match index type
# so if weeks are object, you should compare df.index == '53'
df.loc[df.index == 53, 'new'] = 0
# sorts values by the new column and drops it
df = df.sort_values("new").drop('new', axis=1)
Before:
numbers
weeks
1 181519.23
2 18507.58
3 11342.63
4 6064.06
53 4597.90
After:
numbers
weeks
53 4597.90
1 181519.23
2 18507.58
3 11342.63
4 6064.06

One way of doing this would be:
import pandas as pd
df = pd.DataFrame(range(10))
new_df = df.loc[[df.index[-1]]+list(df.index[:-1])].reset_index(drop=True)
output:
0
9 9
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
Alternate method:
new_df = pd.concat([df[df["Year week"]==52], df[~(df["Year week"]==52)]])

Error when setting default value to entire new column in Pandas dataframe

Code works however getting this error when trying to set default value =1 to entire new column in Pandas dataframe. What does this warning error mean and how can I rework it so I don't get this warning error.
df['new']=1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

this should solve the problem:
soldactive = df[(df.DispositionStatus == 'Sold') & (df.AssetStatus == 'Active')].copy()
your code:
removesold = df(df.ExitDate.isin(errorval)) & (df.DispositionStatus == 'Sold') & (af.AssetStatus == 'Resolved')]
df = df.drop(removesold.index)
soldactive = df[(df.DispositionStatus == 'Sold') & (df.AssetStatus == 'Active')]
soldactive['FlagError'] = 1
you've created soldactive DF as a copy of the subset (sliced) df.
After that you'are trying to create a new column on that copy. It gives you a warning: A value is trying to be set on a copy of a slice from a DataFrame because dataframes are value-mutable (see excerpt from docs below)
Docs:
All pandas data structures are value-mutable (the values they contain
can be altered) but not always size-mutable. The length of a Series
cannot be changed, but, for example, columns can be inserted into a
DataFrame. However, the vast majority of methods produce new objects
and leave the input data untouched. In general, though, we like to
favor immutability where sensible.
Here is a test case:
In [375]: df
Out[375]:
a b c
0 9 6 4
1 5 2 8
2 8 1 6
3 3 4 1
4 8 0 2
In [376]: a = df[1:3]
In [377]: a['new'] = 1
C:\envs\py35\Scripts\ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
In [378]: del a
In [379]: a = df[1:3].copy()
In [380]: a['new'] = 1
In [381]: a
Out[381]:
a b c new
1 5 2 8 1
2 8 1 6 1
In [382]: df
Out[382]:
a b c
0 9 6 4
1 5 2 8
2 8 1 6
3 3 4 1
4 8 0 2

Solution
df.loc[:, 'new'] = 1
pandas uses [] to provide a copy. Use loc and iloc to access the DataFrame directly.
What's more is that if the 'new' column didn't already exist, it would have worked. It only threw that error because the column already existed and you were trying to edit it on a view or copy... I think

Obtaining the first few rows of a dataframe

Is there a way to get the first n rows of a dataframe without using the indices. For example, I know if I have a dataframe called df I could get the first 5 rows via df.ix[5:]. But, what if my indices are not ordered and I dont want to order them? This does not seem to work. Hence, I was wondering if there is another way to select the first couple of rows. I apologize if there is already an answer to this. I wasnt able to find one.

Use head(5) or iloc[:5]
In [7]:
df = pd.DataFrame(np.random.randn(10,3))
df
Out[7]:
0 1 2
0 -1.230919 1.482451 0.221723
1 -0.302693 -1.650244 0.957594
2 -0.656565 0.548343 1.383227
3 0.348090 -0.721904 -1.396192
4 0.849480 -0.431355 0.501644
5 0.030110 0.951908 -0.788161
6 2.104805 -0.302218 -0.660225
7 -0.657953 0.423303 1.408165
8 -1.940009 0.476254 -0.014590
9 -0.753064 -1.083119 -0.901708
In [8]:
df.head(5)
Out[8]:
0 1 2
0 -1.230919 1.482451 0.221723
1 -0.302693 -1.650244 0.957594
2 -0.656565 0.548343 1.383227
3 0.348090 -0.721904 -1.396192
4 0.849480 -0.431355 0.501644
In [11]:
df.iloc[:5]
Out[11]:
0 1 2
0 -1.230919 1.482451 0.221723
1 -0.302693 -1.650244 0.957594
2 -0.656565 0.548343 1.383227
3 0.348090 -0.721904 -1.396192
4 0.849480 -0.431355 0.501644

Pandas Filter function returned a Series, but expected a scalar bool

I am attempting to use filter on a pandas dataframe to filter out all rows that match a duplicate value(need to remove ALL the rows when there are duplicates, not just the first or last).
This is what I have that works in the editor :
df = df.groupby("student_id").filter(lambda x: x.count() == 1)
But when I run my script with this code in it I get the error:
TypeError: filter function returned a Series, but expected a scalar bool
I am creating the dataframe by concatenating two other frames immediately before trying to apply the filter.

it should be:
In [32]: grouped = df.groupby("student_id")
In [33]: grouped.filter(lambda x: x["student_id"].count()==1)
Updates:
i'm not sure about the issue u mentioned regarding the interactive console. technically speaking in this particular case (there might be other situations such as the intricate "import" functionality in which diff env may behave differently), the console (such as ipython) should behave the same as other environment (orig python env, or some IDE embedded one)
an intuitive way to understand the pandas groupby is to treat the return obj of DataFrame.groupby() as a list of dataframe. so when u try to using filter to apply the lambda function upon x, x is actually one of those dataframes:
In[25]: df = pd.DataFrame(data,columns=year)
In[26]: df
Out[26]:
2013 2014
0 0 1
1 2 3
2 4 5
3 6 7
4 0 1
5 2 3
6 4 5
7 6 7
In[27]: grouped = df.groupby(2013)
In[28]: grouped.count()
Out[28]:
2014
2013
0 2
2 2
4 2
6 2
in this example, the first dataframe in the grouped obj would be:
In[33]: df1 = df.ix[[0,4]]
In[34]: df1
Out[33]:
2013 2014
0 0 1
4 0 1

how about using the pd.DataFrame.drop_duplicates() method?
Documentation.
Are you sure you really want to remove ALL rows? Not n-1?

Change an element in the last row of a DataFrame

I set up a simple DataFrame in pandas:
a = pandas.DataFrame([[1,2,3], [4,5,6], [7,8,9]], columns=['a','b','c'])
>>> print a
a b c
0 1 2 3
1 4 5 6
2 7 8 9
I would like to be able to alter a single element in the last row of. In pandas==0.13.1 I could use the following:
a.iloc[-1]['a'] = 77
>>> print a
a b c
0 1 2 3
1 4 5 6
2 77 8 9
but after updating to pandas==0.14.1, I get the following warning when doing this:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
The problem of course being that -1 is not an index of a, so I can't use loc. As the warning indicates, I have not changed column 'a' of the last row, I've only altered a discarded local copy.
How do I do this in the newer version of pandas? I realize I could use the index of the last row like:
a.loc[2,'a'] = 77
But I'll be working with tables where multiple rows have the same index, and I don't want to reindex my table every time. Is there a way to do this without knowing the index of the last row before hand?

Taking elements from the solutions of #PallavBakshi and #Mike, the following works in Pandas >= 0.19:
a.loc[a.index[-1], 'a'] = 4.0
Just using iloc[-1, 'a'] won't work as 'a' is not a location.

Alright I've found a way to solve this problem without chaining, and without worrying about multiple indices.
a.iloc[-1, a.columns.get_loc('a')] = 77
>>> a
a b c
0 1 2 3
1 4 5 6
2 77 8 9
I wasn't able to use iloc before because I couldn't supply the column index as an int, but get_loc solves that problem. Thanks for the helpful comments everyone!

For pandas 0.22,
a.at[a.index[-1], 'a'] = 77
this is just one of the ways.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dropping rows in pandas with .index - python

I came across the below line of code, which gives an error when '.index' is not present in it. print(df.drop(df[df['Quantity'] == 0].index).rename(columns={'Weight': 'Weight (oz.)'})) What is the purpose of '.index' while using drop in pandas?

Related

changing index of 1 row in pandas

Error when setting default value to entire new column in Pandas dataframe

Obtaining the first few rows of a dataframe

Pandas Filter function returned a Series, but expected a scalar bool

Change an element in the last row of a DataFrame

Categories

Resources