Change an element in the last row of a DataFrame - python

I set up a simple DataFrame in pandas:
a = pandas.DataFrame([[1,2,3], [4,5,6], [7,8,9]], columns=['a','b','c'])
>>> print a
a b c
0 1 2 3
1 4 5 6
2 7 8 9
I would like to be able to alter a single element in the last row of. In pandas==0.13.1 I could use the following:
a.iloc[-1]['a'] = 77
>>> print a
a b c
0 1 2 3
1 4 5 6
2 77 8 9
but after updating to pandas==0.14.1, I get the following warning when doing this:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
The problem of course being that -1 is not an index of a, so I can't use loc. As the warning indicates, I have not changed column 'a' of the last row, I've only altered a discarded local copy.
How do I do this in the newer version of pandas? I realize I could use the index of the last row like:
a.loc[2,'a'] = 77
But I'll be working with tables where multiple rows have the same index, and I don't want to reindex my table every time. Is there a way to do this without knowing the index of the last row before hand?

Taking elements from the solutions of #PallavBakshi and #Mike, the following works in Pandas >= 0.19:
a.loc[a.index[-1], 'a'] = 4.0
Just using iloc[-1, 'a'] won't work as 'a' is not a location.

Alright I've found a way to solve this problem without chaining, and without worrying about multiple indices.
a.iloc[-1, a.columns.get_loc('a')] = 77
>>> a
a b c
0 1 2 3
1 4 5 6
2 77 8 9
I wasn't able to use iloc before because I couldn't supply the column index as an int, but get_loc solves that problem. Thanks for the helpful comments everyone!

For pandas 0.22,
a.at[a.index[-1], 'a'] = 77
this is just one of the ways.

Related

Conditionally dropping columns in a pandas dataframe

I have this dataframe and my goal is to remove any columns that have less than 1000 entries.
Prior to to pivoting the df I know I have 880 unique well_id's with entries ranging from 4 to 60k+. I know should end up with 102 well_id's.
I tried to accomplish this in a very naïve way by collecting the wells that I am trying to remove in an array and using a loop but I keep getting a 'TypeError: Level type mismatch' but when I just use del without a for loop it works.
#this works
del df[164301.0]
del df['TB-0071']
# this doesn't work
for id in unwanted_id:
del df[id]
Any help is appreciated, Thanks.
You can use dropna method:
df.dropna(thresh=[]) #specify [here] how many non-na values you require to keep the row
The advantage of this method is that you don't need to create a list.
Also don't forget to add the usual inplace = True if you want the changes to be made in place.
You can use pandas drop method:
df.drop(columns=['colName'], inplace=True)
You can actually pass a list of columns names:
unwanted_id = [164301.0, 'TB-0071']
df.drop(columns=unwanted_ids, inplace=True)
Sample:
df[:5]
from to freq
0 A X 20
1 B Z 9
2 A Y 2
3 A Z 5
4 A X 8
df.drop(columns=['from', 'to'])
freq
0 20
1 9
2 2
3 5
4 8
And to get those column names with more than 1000 unique values, you can use something like this:
counts = df.nunique()[df.nunique()>1000].to_frame('uCounts').reset_index().rename(columns={'index':'colName'})
counts
colName uCounts
0 to 1001
1 freq 1050

Excel Copy Paste Way in Python

I have a data frame as below
df = pd.DataFrame([[3,2,1],[4,5,6],[10,20,30]], columns = ['A','B','C'])
A B C
0 3 2 1
1 4 5 6
2 10 20 30
Is there any way in python to mimic copy and paste function in excel? For example I want to copy paste row 0 column A and B and paste them into row 0 column B and C, such that it will become
A B C
0 3 3 2
1 4 5 6
2 10 20 30
In a small data frame, I can use:
df.loc[0,'C'] = df.loc[0,'B']
df.loc[0,'B'] = df.loc[0,'A']
But my original data frame is sizable and I prefer not to do this one element by one element.
I was also trying to do:
df.loc[0,['A','B']] = df.loc[0,['B','C']]
But my data in row 0 column A becomes NaN.
So is there a way of doing things similar like doing copy paste in excel in python (simply block a range of data, copy them and paste them on top of another existing data)? Thanks
anky_91's answer
df.loc[0, ['B', 'C']] = df.loc[0, ['A', 'B']].to_numpy()
shift
There are many ways you can use shift. This is just one.
df.update(df.shift(axis=1)[['B', 'C']])
For reasons that I'm not happy about, you can provide a fill_value to shift to preserve the integer dtype
df.update(df.shift(axis=1, fill_value=0)[['B', 'C']])
This mostly feels like a bad idea, but if it's what you really want to do, you can use .iloc to address columns by number and just shift them:
In [56]: df.iloc[0, 1:] = df.iloc[0, :-1].values
In [57]: df
Out[57]:
A B C
0 3 3 2
1 4 5 6
2 10 20 30

search for a set of values on rows (& not |) from a column

I'm new to python and I'm trying to find the entries from the first column that have in the second column all the entries I'm searching for. ex: I want entries {155, 137} and I expect to get 5 and 6 from column id1 in return.
id1 id2
----------
1. 10
2. 10
3. 10
4. 9
5. 137
5. 150
5. 155
6. 10
6. 137
6. 155
....
I've searched a lot on google, but couldn't solve it. I read these entries from an excel, I tried with multiple for loops, but it doesn't look nice because I'm searching for a lot of entries
I tried this:
df = pd.read_excel('path/temp.xlsx') #now I have two Columns and many rows
d1 = df.values.T[0].tolist()
d2 = df.values.T[1].tolist()
d1[d2.index(115) & d2.index(187)& d2.index(276) & d2.index(239) & d2.index(200) & d2.index(24) & d2.index(83)]
and it returned 1
I started to work this week, so I'm very new
Assuming you have two lists for both the IDs (i.e. one list for id1, and one for id2), and the lists correspond to each other (that means, the value at index i in list1 corresponds to the value at index i of list2).
If that is your case, then you simply have to find out the index of the element, you want to search for, and the corresponding index in the other list will be the answer to your query.
To get the index of the element, you can use Python's inbuilt feature to get an index, namely:
list.index(<element>)
It will return the zero-based index of the element you wanted in the list.
To get the corresponding ID from id1, you can simply use this index (because of one-one correspondence). In your case, it can be written as:
id1[id2.index(137)] #it will return 5
NOTE:
index() method will return the index of the first matching entry from the list.
best to use pandas
import pandas as pd
import numpy as np
Random data
n = 10
I = [i for i in range(1,7)]
df1 = pd.DataFrame({'Id1': [Id[np.random.randint(len(I))] for i in range(n)],
'Id2': np.random.randint(0,1000,n)})
df1.head(5)
Id1 Id2
0 4 170
1 6 170
2 6 479
3 4 413
4 6 52
Query using
df1.loc[~df1.Id2.isin([170,479])].Id1
Out[344]:
3 4
4 6
5 6
6 3
7 1
8 5
9 6
Name: Id1, dtype: int64
for now, I've solved it by doing this

Dropping rows in pandas with .index

I came across the below line of code, which gives an error when '.index' is not present in it.
print(df.drop(df[df['Quantity'] == 0].index).rename(columns={'Weight': 'Weight (oz.)'}))
What is the purpose of '.index' while using drop in pandas?
As explained in the documentation, you can use drop with index:
A B C D
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
df.drop([0, 1]) # Here 0 and 1 are the index of the rows
Output:
A B C D
2 8 9 10 11
In this case it will drop the first 2 rows.
With .index in your example, you find the rows where Quantity=0and retrieve their index(and then use like in the documentation)
this is the detail about .drop() method:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html
.drop() method need a parameter 'label' which is a list of index labels(when axis=0, which is the default case) or columns labels (when axis=1).
df[df['Quantity'] == 0] returns a DataFrame where Quantity=0, but what we need is the index label where Quantity=0, so .index is needed.

Error when setting default value to entire new column in Pandas dataframe

Code works however getting this error when trying to set default value =1 to entire new column in Pandas dataframe. What does this warning error mean and how can I rework it so I don't get this warning error.
df['new']=1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
this should solve the problem:
soldactive = df[(df.DispositionStatus == 'Sold') & (df.AssetStatus == 'Active')].copy()
your code:
removesold = df(df.ExitDate.isin(errorval)) & (df.DispositionStatus == 'Sold') & (af.AssetStatus == 'Resolved')]
df = df.drop(removesold.index)
soldactive = df[(df.DispositionStatus == 'Sold') & (df.AssetStatus == 'Active')]
soldactive['FlagError'] = 1
you've created soldactive DF as a copy of the subset (sliced) df.
After that you'are trying to create a new column on that copy. It gives you a warning: A value is trying to be set on a copy of a slice from a DataFrame because dataframes are value-mutable (see excerpt from docs below)
Docs:
All pandas data structures are value-mutable (the values they contain
can be altered) but not always size-mutable. The length of a Series
cannot be changed, but, for example, columns can be inserted into a
DataFrame. However, the vast majority of methods produce new objects
and leave the input data untouched. In general, though, we like to
favor immutability where sensible.
Here is a test case:
In [375]: df
Out[375]:
a b c
0 9 6 4
1 5 2 8
2 8 1 6
3 3 4 1
4 8 0 2
In [376]: a = df[1:3]
In [377]: a['new'] = 1
C:\envs\py35\Scripts\ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
In [378]: del a
In [379]: a = df[1:3].copy()
In [380]: a['new'] = 1
In [381]: a
Out[381]:
a b c new
1 5 2 8 1
2 8 1 6 1
In [382]: df
Out[382]:
a b c
0 9 6 4
1 5 2 8
2 8 1 6
3 3 4 1
4 8 0 2
Solution
df.loc[:, 'new'] = 1
pandas uses [] to provide a copy. Use loc and iloc to access the DataFrame directly.
What's more is that if the 'new' column didn't already exist, it would have worked. It only threw that error because the column already existed and you were trying to edit it on a view or copy... I think

Categories