Iterating through iterrows - python

iterrows can be used to iterate through a pandas dataframe:
for row in df.iterrows():
print(row)
How can I use a second for loop to iterate through each element in the row?

iterrows returns a tuple. The element indexed by [1] contains the row. You can then iterate through that element.
for row in x.iterrows():
print(row[1])
for b in row[1]:
print(b)

Related

Access previous row while iterating on rows of a dataframe using iterrows

I am iterating through the rows of a dataframe using iterrows:
for index, row in df.iterrows():
pass
Given that the index here contains datetime objects, how can we easily access the row at the previous index (i-1) while being at level index (i) ?
Thanks
You can try below
row_ = None
for index, row in df.iterrows():
# processing logic here (use row_ as prev row and "row" as current)
row_ = row
row_ will be None if index is 0 else it will be previous row.
This logic should work for any index type

Python: shapefile.Reader, how to set up .iloc?

When I iterate over dataframe or geodataframe and I want to set up some section, I use df.iloc[0:100]. How can I set up some section when I use shapefile.Reader? For example 0-100 rows.
with shapefile.Reader('C:/Users/ja/Inne/Desktop/Praca/Orto_PL1992_piksel3-50cm/PL1992_5000_025') as shp:
total_rows = shp.numRecords
for row_num, row in enumerate(shp.iterRecords()):
print(row)
A generator is not subscriptable and iterRecords() returns a generator. Instead, use shapeRecords() (or records()). It gives you a list.
rows = shapefile.Reader(shapefile_path).shapeRecords()[0:100]
for row_num, row in enumerate(rows):
print(row_num, row)

How to access a row in pandas?

could you explain me please the difference between those two:
#1
for index, row in df.iterrows():
#2
for x in df['city']:
Should I always use or for index, row in df.iterrows(): while trying to access data in pandas:
for index, row in df.iterrows():
for x in df['city']:
Or in some cases specifying the column name like in the second example will me enough?
Thank you
There are more ways to iterate than the ways you described. It all comes down to how simple your iteration is and the "efficiency" of it.
The second example way will be enough if you just want to iterate rows over a single column.
Also bare in mind, depending on the method of iteration, they return different dtypes. You can read about them all on pandas doc.
This is an interesting article explaining the different methods regarding performance https://medium.com/#rtjeannier/pandas-101-cont-9d061cb73bfc
for index, row in df.iterrows():
print(row['city'])
Explanation: It helps us to iterate over a data frame row-wise with row variable having values for each column of that row & 'index' having an index of that row. To access any value for that row, mention the column name as above
for x in df['city']:
print(x)
Explanation: It helps us to iterate over a Series df['city'] & not other columns in df.

python add elements to dataframe row

I'm iterating over a dataframe and I want to add new elements to each row, so that I can add the new row to a second dataframe.
for index, row in df1.iterrows():
# I looking for somethis like this:
new_row = row.append({'column_name_A':10})
df2 = df2.append(new_row,ignore_index=True)
If I understand correctly, you want to have a copy of your original dataframe with a new column added. You can create a copy of the original dataframe, add the new column to it and then iterate over the rows of the new dataframe to update the values of the new column as you would have done in your code posted in the question.
df2 = df1.copy()
df2['column_name_A'] = 0
for index, row in df2.iterrows():
row['column_name_A'] = some_value

How to iterate Pandas DataFrame (row-by-row) that has non-sequential index labels?

I am trying to iterate a Pandas DataFrame (row-by-row) that has non-sequential index labels. In other words, one Dataframe's index labels look like this: 2,3,4,5,6,7,8,9,11,12,.... There is no row/index label 10. I would like to iterate the DataFrame to update/edit certain values in each row based on a condition since I am reading Excel sheets (which has merged cells) into DataFrames.
I tried the following code (# Manuel's answer) to iterate through each row of df and edit each row if conditions apply.
for col in list(df): #All columns
for row in df[1].iterrows(): ##All rows, except first
if pd.isnull(df.loc[row[0],'Album_Name']): ##If this cell is empty all in the same row too.
continue
elif pd.isnull(df.loc[row[0], col]) and pd.isnull(df.loc[row[0]+1, col]): ##If a cell and next one are empty, take previous value.
df.loc[row[0], col] = df.loc[row[0]-1, col]
However, since the DataFrame has non-sequential index labels, I get the following error message: KeyError: the label [10] is not in the [index]. How can I iterate and edit the DataFrame (row-by-row) with non-sequential index labels?
For reference, here is what my Excel sheet and DataFrame looks like:
Yes, just change the second loop to:
for row in df:
and then refer to the row with "row", not name.
for col in df: #All columns
for row in df: ##All rows, except first
if row==1:
continue #this skips to next loop iteration
if pd.isnull(df.loc[row[0],'Album_Name']): ##If this cell is empty all in the same row too.
continue
elif pd.isnull(df.loc[row[0], col]) and pd.isnull(df.loc[row[0]+1, col]): ##If a cell and next one are empty, take previous value.
df.loc[row[0], col] = df.loc[row[0]-1, col]

Categories