Openpyxl: How do I get the values of a specific column? - python

I'm writing a program that searches through the first row of a sheet for a specific value ("Filenames"). Once found, it iterates through that column and returns the values underneath it (rows 2 through x).
I've figured out how to iterate through the first row in the sheet, and get the cell which contains the specific value, but now I need to iterate over that column and print out those values. How do I do so?
import os
import sys
from openpyxl import load_workbook
def main():
column_value = 'Filenames'
wb = load_workbook('test.xlsx')
script = wb["Script"]
# Find "Filenames"
for col in script.iter_rows(min_row=1, max_row=1):
for name in col:
if (name.value == column_value):
print("Found it!")
filenameColumn = name
print(filenameColumn)
# Now that we have that column, iterate over the rows in that specific column to get the filenames
for row in filenameColumn: # THIS DOES NOT WORK
print(row.value)
main()

You're actually iterating over rows and cells, not columns and names here:
for col in script.iter_rows(min_row=1, max_row=1):
for name in col:
if you rewrite it that way, you can see you get a cell, like this:
for row in script.iter_rows(min_row=1, max_row=1):
for cell in row:
if (cell.value == column_value):
print("Found it!")
filenameCell = cell
print(filenameCell)
So you have a cell. You need to get the column, which you can do with cell.column which returns a column index.
Better though, than iterating over just the first row (which iter_rows with min and max row set to 1 does) would be to just use iter_cols - built for this. So:
for col in script.iter_cols():
# see if the value of the first cell matches
if col[0].value == column_value:
# this is the column we want, this col is an iterable of cells:
for cell in col:
# do something with the cell in this column here

Related

How to return non-empty top row column values of unknown column length in Excel?

Python version: 3.6
Python library: openpyxl
Excel version: 365
This will return the values from each cell in 255 columns of the top row of an excel file. I only put 255 in as a temporary place to stop:
for row in ws.iter_rows(min_row=1, max_col=255, max_row=1, values_only=True):
print(row)
I don't know how many columns with data will be in each workbook. All the top row cells that contain data will be consecutively listed starting from column 1.
When a top row cell without data is encountered, all remaining columns/rows will be empty.
I need the values of those consecutive top rows that contain values.
Thanks for the time.
#CharlieClark pointed me in the right direction. Something like this worked out for me. I still had to keep max_col=255 though or it would error out.
def column_get():
i = 1
for row in ws.iter_rows(min_row=1, max_col=255, max_row=1, values_only=True):
for x in row:
if row[i] is not None:
print(row[i])
i += 1
else:
break
If I understand you correctly you can just remove max_col value. then it prints the first row values until an empty cell.
try this:
for row in ws.iter_rows(min_row=1, max_row=1, values_only=True):
print(row)
If you still see many None values check if the sheet's first row doesn't contain any value interpeted by mistake as None. I would suggest you to debug it this way: create a new empty sheet and insert manually a test data - see if it works. if it does copy paste manually the data from the actual sheet to the test one.

How do I get the value of a specific column / cell going horizontal in an excel using python

I recently developed code to find a keyword I input and it finds the keyword by iterating over the rows of an excel sheet, but when I find that keyword in the row how do I move horizontally and get the value from a column cell in the very row I found the keyword in?
A simple way to do this is to grab the value from a cell in a different column as you iterate over each row. Below, I'm assuming you are working from an existing workbook, which you can load by declaring the filepath variable.
import openpyxl
wb = openpyxl.load_workbook(filepath)
ws = wb.active
# Iterate each row of the spreadsheet
for row in ws.iter_rows():
# Check if the value in column A is equal to variable "target"
if row[0].value == target:
# If there is a match, output is value in same row from column B
output = row[1].value
In this example, you iterate through each row to check if the value in column A is equal to the target variable. If so, you can then retrieve any other value on that row by changing the index for the output variable.
Column index values run from 0 on, so row[0].value would be the value in the row for column A, row[1].value is the value in the row for column B, and so forth.
You have not given much information here as to what library you are using, which would be essential to give you any syntax hints. Openpyxl? Pandas?
So I can just help you with some pointers for your code:
You have a function that iterated over the rows.
You should write the function in a way that it keeps track of which row its checking, and then, when it finds the keyword, it should return the row number. Perhaps with the enumerate function. Or with a simple counter
counter = 1
for cell in column:
if keyword = cell.value:
return counter
else:
counter += 1
With the row number, all you need to do is to create a reference to the cell in which the value is, then add 1 column to the reference.
For example, if the reference for the keyword is (1, 2) (column, row) then you do a transformation like
keyword_ref = (1, 2)
value_ref = (keyword_ref[0] + 1, keyword_ref[1])
Finally you return the value in the value_ref.

Get first empty row of sheet in Excel file with Python

I need to find the first empty row in an Excel file, i am currently using Openpyxl with Python.
I couldn't find any method that does what i need so i am trying to make my own. This is my code:
book = load_workbook("myfile.xlsx")
ws = book.worksheets[0]
for row in ws['C{}:C{}'.format(ws.min_row,ws.max_row)]:
for cell in row:
if cell.value is None:
print cell.value
break
I am iterating through all cells in the "C" column and i am "breaking" if the cell is empty. The problem is that it won't break, it'll just keep print out "None" values.
Thanks
There is a built-in worksheet property "max_row" in openpyxl:
https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.worksheet.html#openpyxl.worksheet.worksheet.Worksheet.max_row
max_row: an integer defining the maximum row index containing data
This way your loop will stop if it encounters any empty cell in a row.
If you want the row wo be completely empty you can use all.
book = load_workbook("myfile.xlsx")
ws = book.worksheets[0]
for cell in ws["C"]:
if cell.value is None:
print cell.row
break
else:
print cell.row + 1
Update to the question in the comments:
ws["C"] will get a slice from C1:CX where X is the last filled cell in any column. So if the C column happens to be the longest column and every entry is filled you will only get cells with cell is not None so you won't break out of the loop. If you didn't break out of the loop you will enter the else block and since you looped till the last filled row, the first empty row will be the next one.

How to iterate Pandas DataFrame (row-by-row) that has non-sequential index labels?

I am trying to iterate a Pandas DataFrame (row-by-row) that has non-sequential index labels. In other words, one Dataframe's index labels look like this: 2,3,4,5,6,7,8,9,11,12,.... There is no row/index label 10. I would like to iterate the DataFrame to update/edit certain values in each row based on a condition since I am reading Excel sheets (which has merged cells) into DataFrames.
I tried the following code (# Manuel's answer) to iterate through each row of df and edit each row if conditions apply.
for col in list(df): #All columns
for row in df[1].iterrows(): ##All rows, except first
if pd.isnull(df.loc[row[0],'Album_Name']): ##If this cell is empty all in the same row too.
continue
elif pd.isnull(df.loc[row[0], col]) and pd.isnull(df.loc[row[0]+1, col]): ##If a cell and next one are empty, take previous value.
df.loc[row[0], col] = df.loc[row[0]-1, col]
However, since the DataFrame has non-sequential index labels, I get the following error message: KeyError: the label [10] is not in the [index]. How can I iterate and edit the DataFrame (row-by-row) with non-sequential index labels?
For reference, here is what my Excel sheet and DataFrame looks like:
Yes, just change the second loop to:
for row in df:
and then refer to the row with "row", not name.
for col in df: #All columns
for row in df: ##All rows, except first
if row==1:
continue #this skips to next loop iteration
if pd.isnull(df.loc[row[0],'Album_Name']): ##If this cell is empty all in the same row too.
continue
elif pd.isnull(df.loc[row[0], col]) and pd.isnull(df.loc[row[0]+1, col]): ##If a cell and next one are empty, take previous value.
df.loc[row[0], col] = df.loc[row[0]-1, col]

Python 3 - openpyxl - Iterating through column by name

What is the easiest way using openpyxl to iterate through a column not by number but by column header (string value in first row of ws):
Something like this:
for cell in ws.columns['revenue']:
print(cell.value)
Column headers don't exist so you'd have to create something to represent them, presumably based on the names in the first row:
headers = {}
for idx, cell in enumerate(ws.iter_rows(min_row=1, max_row=1), start=1):
headers[cell.value] = idx
revenue = ws.columns[headers['revenue']]
ws.columns will return all columns which could be slow on a large worksheet.
You could also add a named range to represent the relevant cells and loop through that.

Categories