When I iterate over dataframe or geodataframe and I want to set up some section, I use df.iloc[0:100]. How can I set up some section when I use shapefile.Reader? For example 0-100 rows.
with shapefile.Reader('C:/Users/ja/Inne/Desktop/Praca/Orto_PL1992_piksel3-50cm/PL1992_5000_025') as shp:
total_rows = shp.numRecords
for row_num, row in enumerate(shp.iterRecords()):
print(row)
A generator is not subscriptable and iterRecords() returns a generator. Instead, use shapeRecords() (or records()). It gives you a list.
rows = shapefile.Reader(shapefile_path).shapeRecords()[0:100]
for row_num, row in enumerate(rows):
print(row_num, row)
Related
I am iterating through the rows of a dataframe using iterrows:
for index, row in df.iterrows():
pass
Given that the index here contains datetime objects, how can we easily access the row at the previous index (i-1) while being at level index (i) ?
Thanks
You can try below
row_ = None
for index, row in df.iterrows():
# processing logic here (use row_ as prev row and "row" as current)
row_ = row
row_ will be None if index is 0 else it will be previous row.
This logic should work for any index type
could you explain me please the difference between those two:
#1
for index, row in df.iterrows():
#2
for x in df['city']:
Should I always use or for index, row in df.iterrows(): while trying to access data in pandas:
for index, row in df.iterrows():
for x in df['city']:
Or in some cases specifying the column name like in the second example will me enough?
Thank you
There are more ways to iterate than the ways you described. It all comes down to how simple your iteration is and the "efficiency" of it.
The second example way will be enough if you just want to iterate rows over a single column.
Also bare in mind, depending on the method of iteration, they return different dtypes. You can read about them all on pandas doc.
This is an interesting article explaining the different methods regarding performance https://medium.com/#rtjeannier/pandas-101-cont-9d061cb73bfc
for index, row in df.iterrows():
print(row['city'])
Explanation: It helps us to iterate over a data frame row-wise with row variable having values for each column of that row & 'index' having an index of that row. To access any value for that row, mention the column name as above
for x in df['city']:
print(x)
Explanation: It helps us to iterate over a Series df['city'] & not other columns in df.
What is the easiest way using openpyxl to iterate through a column not by number but by column header (string value in first row of ws):
Something like this:
for cell in ws.columns['revenue']:
print(cell.value)
Column headers don't exist so you'd have to create something to represent them, presumably based on the names in the first row:
headers = {}
for idx, cell in enumerate(ws.iter_rows(min_row=1, max_row=1), start=1):
headers[cell.value] = idx
revenue = ws.columns[headers['revenue']]
ws.columns will return all columns which could be slow on a large worksheet.
You could also add a named range to represent the relevant cells and loop through that.
my question is very similar to here: Find unique values in a Pandas dataframe, irrespective of row or column location
I am very new to coding, so I apologize for the cringing in advance.
I have a .csv file which I open as a pandas dataframe, and would like to be able to return unique values across the entire dataframe, as well as all unique strings.
I have tried:
for row in df:
pd.unique(df.values.ravel())
This fails to iterate through rows.
The following code prints what I want:
for index, row in df.iterrows():
if isinstance(row, object):
print('%s\n%s' % (index, row))
However, trying to place these values into a previously defined set (myset = set()) fails when I hit a blank column (NoneType error):
for index, row in df.iterrows():
if isinstance(row, object):
myset.update(print('%s\n%s' % (index, row)))
I get closest to what I was when I try the following:
for index, row in df.iterrows():
if isinstance(row, object):
myset.update('%s\n%s' % (index, row))
However, my set prints out a list of characters rather than the strings/floats/values that appear on my screen when I print above.
Someone please help point out where I fail miserably at this task. Thanks!
I think the following should work for almost any dataframe. It will extract each value that is unique in the entire dataframe.
Post a comment if you encounter a problem, i'll try to solve it.
# Replace all nones / nas by spaces - so they won't bother us later
df = df.fillna('')
# Preparing a list
list_sets = []
# Iterates all columns (much faster than rows)
for col in df.columns:
# List containing all the unique values of this column
this_set = list(set(df[col].values))
# Creating a combined list
list_sets = list_sets + this_set
# Doing a set of the combined list
final_set = list(set(list_sets))
# For completion's sake, you can remove the space introduced by the fillna step
final_set.remove('')
Edit :
I think i know what happens. You must have some float columns, and fillna is failing on those, as the code i gave you was replacing missing values with an empty string. Try those :
df = df.fillna(np.nan) or
df = df.fillna(0)
For the first point, you'll need to import numpy first (import numpy as np). It must already be installed as you have pandas.
I have searched the docs and SO and could not find anything to resolve my issue. I am trying to call a select from my sqlite database and add it to a dictionary with the columns as keys. When I do this it returns a row for each column/key. It is has 14 columns and if I only have 4 rows it repeats for each one. This was the first attempt
columns = [desc[0] for desc in cursor.description]
results = []
for row in r:
Summary = {}
items = zip(columns, row)
for (items, values) in items:
Summary[items] = row
results.append(Summary)
Then I also tried the row_factory as given in the docs. That didn't work. My end goal is to be able to print out to a text file verticly by using
for x in results:
print x[name]
print x[email]
etc
Any help is appreciated
You are creating your dictionary incorrectly. Use:
for row in r:
summary = dict(zip(columns, row))
results.append(summary)
instead.
Your code sets the whole row sequence as the value for each key in Summary, instead of the individual column value, then appending that same dictionary to the results list for each column key..