I am iterating through the rows of a dataframe using iterrows:
for index, row in df.iterrows():
pass
Given that the index here contains datetime objects, how can we easily access the row at the previous index (i-1) while being at level index (i) ?
Thanks
You can try below
row_ = None
for index, row in df.iterrows():
# processing logic here (use row_ as prev row and "row" as current)
row_ = row
row_ will be None if index is 0 else it will be previous row.
This logic should work for any index type
Related
When I iterate over dataframe or geodataframe and I want to set up some section, I use df.iloc[0:100]. How can I set up some section when I use shapefile.Reader? For example 0-100 rows.
with shapefile.Reader('C:/Users/ja/Inne/Desktop/Praca/Orto_PL1992_piksel3-50cm/PL1992_5000_025') as shp:
total_rows = shp.numRecords
for row_num, row in enumerate(shp.iterRecords()):
print(row)
A generator is not subscriptable and iterRecords() returns a generator. Instead, use shapeRecords() (or records()). It gives you a list.
rows = shapefile.Reader(shapefile_path).shapeRecords()[0:100]
for row_num, row in enumerate(rows):
print(row_num, row)
I'm writing a program that searches through the first row of a sheet for a specific value ("Filenames"). Once found, it iterates through that column and returns the values underneath it (rows 2 through x).
I've figured out how to iterate through the first row in the sheet, and get the cell which contains the specific value, but now I need to iterate over that column and print out those values. How do I do so?
import os
import sys
from openpyxl import load_workbook
def main():
column_value = 'Filenames'
wb = load_workbook('test.xlsx')
script = wb["Script"]
# Find "Filenames"
for col in script.iter_rows(min_row=1, max_row=1):
for name in col:
if (name.value == column_value):
print("Found it!")
filenameColumn = name
print(filenameColumn)
# Now that we have that column, iterate over the rows in that specific column to get the filenames
for row in filenameColumn: # THIS DOES NOT WORK
print(row.value)
main()
You're actually iterating over rows and cells, not columns and names here:
for col in script.iter_rows(min_row=1, max_row=1):
for name in col:
if you rewrite it that way, you can see you get a cell, like this:
for row in script.iter_rows(min_row=1, max_row=1):
for cell in row:
if (cell.value == column_value):
print("Found it!")
filenameCell = cell
print(filenameCell)
So you have a cell. You need to get the column, which you can do with cell.column which returns a column index.
Better though, than iterating over just the first row (which iter_rows with min and max row set to 1 does) would be to just use iter_cols - built for this. So:
for col in script.iter_cols():
# see if the value of the first cell matches
if col[0].value == column_value:
# this is the column we want, this col is an iterable of cells:
for cell in col:
# do something with the cell in this column here
I'm iterating over a dataframe and I want to add new elements to each row, so that I can add the new row to a second dataframe.
for index, row in df1.iterrows():
# I looking for somethis like this:
new_row = row.append({'column_name_A':10})
df2 = df2.append(new_row,ignore_index=True)
If I understand correctly, you want to have a copy of your original dataframe with a new column added. You can create a copy of the original dataframe, add the new column to it and then iterate over the rows of the new dataframe to update the values of the new column as you would have done in your code posted in the question.
df2 = df1.copy()
df2['column_name_A'] = 0
for index, row in df2.iterrows():
row['column_name_A'] = some_value
I am trying to iterate a Pandas DataFrame (row-by-row) that has non-sequential index labels. In other words, one Dataframe's index labels look like this: 2,3,4,5,6,7,8,9,11,12,.... There is no row/index label 10. I would like to iterate the DataFrame to update/edit certain values in each row based on a condition since I am reading Excel sheets (which has merged cells) into DataFrames.
I tried the following code (# Manuel's answer) to iterate through each row of df and edit each row if conditions apply.
for col in list(df): #All columns
for row in df[1].iterrows(): ##All rows, except first
if pd.isnull(df.loc[row[0],'Album_Name']): ##If this cell is empty all in the same row too.
continue
elif pd.isnull(df.loc[row[0], col]) and pd.isnull(df.loc[row[0]+1, col]): ##If a cell and next one are empty, take previous value.
df.loc[row[0], col] = df.loc[row[0]-1, col]
However, since the DataFrame has non-sequential index labels, I get the following error message: KeyError: the label [10] is not in the [index]. How can I iterate and edit the DataFrame (row-by-row) with non-sequential index labels?
For reference, here is what my Excel sheet and DataFrame looks like:
Yes, just change the second loop to:
for row in df:
and then refer to the row with "row", not name.
for col in df: #All columns
for row in df: ##All rows, except first
if row==1:
continue #this skips to next loop iteration
if pd.isnull(df.loc[row[0],'Album_Name']): ##If this cell is empty all in the same row too.
continue
elif pd.isnull(df.loc[row[0], col]) and pd.isnull(df.loc[row[0]+1, col]): ##If a cell and next one are empty, take previous value.
df.loc[row[0], col] = df.loc[row[0]-1, col]
iterrows can be used to iterate through a pandas dataframe:
for row in df.iterrows():
print(row)
How can I use a second for loop to iterate through each element in the row?
iterrows returns a tuple. The element indexed by [1] contains the row. You can then iterate through that element.
for row in x.iterrows():
print(row[1])
for b in row[1]:
print(b)