I'd like to select one column only but all the rows except last row.
If I did it like below, the result is empty.
a = data_vaf.loc[:-1, 'Area']
loc:location
iloc:index location.
They just can't operate implicitly.
Therefore we exclude last raw by iloc then select the column Area
As shown by the comment from #ThePyGuy
data_vaf.iloc[:-1]['Area']
Here's the structure of
iloc[row, column]
And
iloc[row] do the same thing as iloc[row,:]
df.iloc[:-1] do the same thing as df[:-1]
There are multiple ways to do this as addressed in the comments using iloc.
df['col'].iloc[:-1]
Then we just drop the last row
out = data_vaf.drop(data_vaf.index[-1])['Area']
Related
I see a lot of questions related to dropping rows that have a certain value in a column, or dropping the entirety of columns, but pretend we have a Pandas Dataframe like the one below.
In this case, how could one write a line to go through the CSV, and drop all rows like 2 and 4? Thank you.
You could try
~((~df).all(axis=1))
to get the rows that you want to keep/drop. To get the dataframe with just those rows, you would use
df = df[~((~df).all(axis=1))]
A more detailed explanation is here:
Delete rows from a pandas DataFrame based on a conditional expression involving len(string) giving KeyError
This should help
for i in range(df.shape[0]):
value=df.shape[1]
count=0
for column_name in column_names:
if df.loc[[i]].column_name==False:
count=count+1
if count==value:
df.drop(index=i,inplace=True)
could you explain me please the difference between those two:
#1
for index, row in df.iterrows():
#2
for x in df['city']:
Should I always use or for index, row in df.iterrows(): while trying to access data in pandas:
for index, row in df.iterrows():
for x in df['city']:
Or in some cases specifying the column name like in the second example will me enough?
Thank you
There are more ways to iterate than the ways you described. It all comes down to how simple your iteration is and the "efficiency" of it.
The second example way will be enough if you just want to iterate rows over a single column.
Also bare in mind, depending on the method of iteration, they return different dtypes. You can read about them all on pandas doc.
This is an interesting article explaining the different methods regarding performance https://medium.com/#rtjeannier/pandas-101-cont-9d061cb73bfc
for index, row in df.iterrows():
print(row['city'])
Explanation: It helps us to iterate over a data frame row-wise with row variable having values for each column of that row & 'index' having an index of that row. To access any value for that row, mention the column name as above
for x in df['city']:
print(x)
Explanation: It helps us to iterate over a Series df['city'] & not other columns in df.
enter image description here
Our objective right now is to drop the duplicate player rows, but keep the row with the highest count in the G column (Games played). What code can we use to achieve this? I've attached a link to the image of our Pandas output here.
You probably want to first sort the dataframe by column G.
df = df.sort_values(by='G', ascending=False)
You can then use drop_duplicates to drop all duplicates except for the first occurrence.
df.drop_duplicates(['Player'], keep='first')
There are 2 ways that I can think of
df.groupby('Player', as_index=False)['G'].max()
and
df.sort_values('G').drop_duplicates(['Player'] , keep = 'last')
The first method uses groupby to group values by Player, and contracts rows keeping the one with the maximum of G. The second one uses the drop_duplicate method of Pandas to achieve the same.
Try this,
Assume your dataframe object is df1 then
series= df1.groupby('Player')['G'].max() # this will return series.
pd.DataFrame(series)
let me know if this work for you or not.
My Pandas DataFrame has 17543 rows. I want to drop a row, only if every column contains 'nan'. I tried instructions as per the link drop rows in for loop
but did not help. The following is my code
NullRows=0
for i in range(len(SetMerge.index)):
if(SetMerge.iloc[i].isnull().all()):
df=SetMerge.drop(SetMerge.index[i])
NullRows +=1
print("total null rows : ", NullRows)
I get only one row dropped in df with 17542 rows whereas NullRows output is 30.
drop doesn't mutate your SetMerge. Thus, you need to re-assign SetMerge after drop, or use another function.
It is written in answer, by link which you've posted here and checked. Specify inplace=True option for mutation.
I am curious to know how to grab index number off of a dataframe that's meeting a specific condition. I've been playing with pandas.Index.get_loc, but no luck.
I've loaded a csv file, and it's structured in a way that has 1000+ rows with all column values filled in, but in the middle there is one completely empty row, and the data starts again. I wanted to get the index # of the row, so I can remove/delete all the subsequent rows that come after the empty row.
This is the way I identified the empty row, df[df["ColumnA"] ==None], but no luck in getting the row index number for that row. Please help!
What you most likely want is pd.DataFrame.dropna:
Return object with labels on given axis omitted where alternately any
or all of the data are missing
If the row is empty, you can simply do this:
df = df.dropna(how='all')
If you want to find indices of null rows, you can use pd.DataFrame.isnull:
res = df[df.isnull().all(axis=1)].index
To remove rows with indices greater than the first empty row:
df = df[df.index < res[0]]