Python 3 - openpyxl - Iterating through column by name

Python 3 - openpyxl - Iterating through column by name - python

What is the easiest way using openpyxl to iterate through a column not by number but by column header (string value in first row of ws):
Something like this:
for cell in ws.columns['revenue']:
print(cell.value)

Column headers don't exist so you'd have to create something to represent them, presumably based on the names in the first row:
headers = {}
for idx, cell in enumerate(ws.iter_rows(min_row=1, max_row=1), start=1):
headers[cell.value] = idx
revenue = ws.columns[headers['revenue']]
ws.columns will return all columns which could be slow on a large worksheet.
You could also add a named range to represent the relevant cells and loop through that.

Related

How to return non-empty top row column values of unknown column length in Excel?

Python version: 3.6
Python library: openpyxl
Excel version: 365
This will return the values from each cell in 255 columns of the top row of an excel file. I only put 255 in as a temporary place to stop:
for row in ws.iter_rows(min_row=1, max_col=255, max_row=1, values_only=True):
print(row)
I don't know how many columns with data will be in each workbook. All the top row cells that contain data will be consecutively listed starting from column 1.
When a top row cell without data is encountered, all remaining columns/rows will be empty.
I need the values of those consecutive top rows that contain values.
Thanks for the time.

#CharlieClark pointed me in the right direction. Something like this worked out for me. I still had to keep max_col=255 though or it would error out.
def column_get():
i = 1
for row in ws.iter_rows(min_row=1, max_col=255, max_row=1, values_only=True):
for x in row:
if row[i] is not None:
print(row[i])
i += 1
else:
break

If I understand you correctly you can just remove max_col value. then it prints the first row values until an empty cell.
try this:
for row in ws.iter_rows(min_row=1, max_row=1, values_only=True):
print(row)
If you still see many None values check if the sheet's first row doesn't contain any value interpeted by mistake as None. I would suggest you to debug it this way: create a new empty sheet and insert manually a test data - see if it works. if it does copy paste manually the data from the actual sheet to the test one.

Openpyxl: How do I get the values of a specific column?

I'm writing a program that searches through the first row of a sheet for a specific value ("Filenames"). Once found, it iterates through that column and returns the values underneath it (rows 2 through x).
I've figured out how to iterate through the first row in the sheet, and get the cell which contains the specific value, but now I need to iterate over that column and print out those values. How do I do so?
import os
import sys
from openpyxl import load_workbook
def main():
column_value = 'Filenames'
wb = load_workbook('test.xlsx')
script = wb["Script"]
# Find "Filenames"
for col in script.iter_rows(min_row=1, max_row=1):
for name in col:
if (name.value == column_value):
print("Found it!")
filenameColumn = name
print(filenameColumn)
# Now that we have that column, iterate over the rows in that specific column to get the filenames
for row in filenameColumn: # THIS DOES NOT WORK
print(row.value)
main()

You're actually iterating over rows and cells, not columns and names here:
for col in script.iter_rows(min_row=1, max_row=1):
for name in col:
if you rewrite it that way, you can see you get a cell, like this:
for row in script.iter_rows(min_row=1, max_row=1):
for cell in row:
if (cell.value == column_value):
print("Found it!")
filenameCell = cell
print(filenameCell)
So you have a cell. You need to get the column, which you can do with cell.column which returns a column index.
Better though, than iterating over just the first row (which iter_rows with min and max row set to 1 does) would be to just use iter_cols - built for this. So:
for col in script.iter_cols():
# see if the value of the first cell matches
if col[0].value == column_value:
# this is the column we want, this col is an iterable of cells:
for cell in col:
# do something with the cell in this column here

How to compare column values of one excel file to the column values of another excel file in Python using openpyxl?

I am able to read column data of two excel files. Below is my code:-
from openpyxl import load_workbook
book = load_workbook("Book1.xlsx")
book2 = load_workbook("Book2.xlsx")
sheets = book['Sheet1']
anotherSheet = book2["sheet1"]
for val1 in sheets:
print(val1[0].value)
print("\n\n\n\n")
for val2 in anotherSheet:
print(val2[0].value)
I need to compare each value of Book1's column to every value of Book2's column. I am totally confused about how to perform the comparison. If the value matches then I can add another column and put "Yes" and if it doesn't then I can put "No". In other words, I just need to check if the values of Book1's Column exist in Book2's. Some help would be highly appreciated.

I don't know the full answer but I guess you can take the values on arrays and compare them one by one

Finally, figured out the solution.
First, we need to create 3 Lists to store values from book1, book2 and tempList to store matched values.
from openpyxl import load_workbook
book = load_workbook("Book1.xlsx")
book2 = load_workbook("Book2.xlsx")
sheets = book['Sheet1']
anotherSheet = book2["sheet1"]
book1_list = []
book2_list = []
tempList = []
Next, we also want to skip the heading of the columns and store in new variable.
skip_Head_of_anotherSheet = anotherSheet[2: anotherSheet.max_row]
Then iterate through sheets and append the values of your required column to their respective lists (in my case it was '0' which means the first column).
for val1 in sheets:
book1_list.append(val1[0].value)
for val2 in skip_Head_of_anotherSheet:
book2_list.append(val2[0].value)
Check for repetitions in your lists and remove any duplicate values.
book1_list = list(dict.fromkeys(book1_list))
Store the length of your lists for debugging purposes
length_of_firstList = len(book1_list)
length_of_secondList = len(book2_list)
Next, iterate through both the lists and check if any of them matches, then store the matched values to the tempList.
for i in book1_list:
for j in book2_list:
if i == j:
tempList.append(j)
#print(j)
Now, it's time to edit our excel sheet. We will iterate through matched values that are stored inside tempList and find those values that are inside the actual excel sheet. When we detect the same value, we will mark YES to the 4th Column of the excel sheet i.e. 'D' column by identifying the index of that particular row. Additionally, if the cells are blank on our 'D' column then we will mark NO.
for temp in tempList:
for pointValue in skip_Head_of_anotherSheet:
if temp == pointValue[0].value:
anotherSheet.cell(column=4, row=pointValue[0].row, value="YES")
#print(pointValue[0].row)
if pointValue[3].value is None:
anotherSheet.cell(column=4, row=pointValue[0].row, value="NO")
Finally, we will add a header to our newly populated column & save our excel sheet and print required information for debugging purposes.
anotherSheet.cell(column=4, row=1, value="PII")
book2.save("Book2.xlsx")
print("SUCCESSFULLY UPDATED THE EXCEL SHEET")
print("Length of First List = ", length_of_firstList)
print("Length of Second List = ", length_of_secondList)
I hope this will help someone with the same issue.

How do I get the value of a specific column / cell going horizontal in an excel using python

I recently developed code to find a keyword I input and it finds the keyword by iterating over the rows of an excel sheet, but when I find that keyword in the row how do I move horizontally and get the value from a column cell in the very row I found the keyword in?

A simple way to do this is to grab the value from a cell in a different column as you iterate over each row. Below, I'm assuming you are working from an existing workbook, which you can load by declaring the filepath variable.
import openpyxl
wb = openpyxl.load_workbook(filepath)
ws = wb.active
# Iterate each row of the spreadsheet
for row in ws.iter_rows():
# Check if the value in column A is equal to variable "target"
if row[0].value == target:
# If there is a match, output is value in same row from column B
output = row[1].value
In this example, you iterate through each row to check if the value in column A is equal to the target variable. If so, you can then retrieve any other value on that row by changing the index for the output variable.
Column index values run from 0 on, so row[0].value would be the value in the row for column A, row[1].value is the value in the row for column B, and so forth.

You have not given much information here as to what library you are using, which would be essential to give you any syntax hints. Openpyxl? Pandas?
So I can just help you with some pointers for your code:
You have a function that iterated over the rows.
You should write the function in a way that it keeps track of which row its checking, and then, when it finds the keyword, it should return the row number. Perhaps with the enumerate function. Or with a simple counter
counter = 1
for cell in column:
if keyword = cell.value:
return counter
else:
counter += 1
With the row number, all you need to do is to create a reference to the cell in which the value is, then add 1 column to the reference.
For example, if the reference for the keyword is (1, 2) (column, row) then you do a transformation like
keyword_ref = (1, 2)
value_ref = (keyword_ref[0] + 1, keyword_ref[1])
Finally you return the value in the value_ref.

Returning unique values in .csv and unique strings in python+pandas

my question is very similar to here: Find unique values in a Pandas dataframe, irrespective of row or column location
I am very new to coding, so I apologize for the cringing in advance.
I have a .csv file which I open as a pandas dataframe, and would like to be able to return unique values across the entire dataframe, as well as all unique strings.
I have tried:
for row in df:
pd.unique(df.values.ravel())
This fails to iterate through rows.
The following code prints what I want:
for index, row in df.iterrows():
if isinstance(row, object):
print('%s\n%s' % (index, row))
However, trying to place these values into a previously defined set (myset = set()) fails when I hit a blank column (NoneType error):
for index, row in df.iterrows():
if isinstance(row, object):
myset.update(print('%s\n%s' % (index, row)))
I get closest to what I was when I try the following:
for index, row in df.iterrows():
if isinstance(row, object):
myset.update('%s\n%s' % (index, row))
However, my set prints out a list of characters rather than the strings/floats/values that appear on my screen when I print above.
Someone please help point out where I fail miserably at this task. Thanks!

I think the following should work for almost any dataframe. It will extract each value that is unique in the entire dataframe.
Post a comment if you encounter a problem, i'll try to solve it.
# Replace all nones / nas by spaces - so they won't bother us later
df = df.fillna('')
# Preparing a list
list_sets = []
# Iterates all columns (much faster than rows)
for col in df.columns:
# List containing all the unique values of this column
this_set = list(set(df[col].values))
# Creating a combined list
list_sets = list_sets + this_set
# Doing a set of the combined list
final_set = list(set(list_sets))
# For completion's sake, you can remove the space introduced by the fillna step
final_set.remove('')
Edit :
I think i know what happens. You must have some float columns, and fillna is failing on those, as the code i gave you was replacing missing values with an empty string. Try those :
df = df.fillna(np.nan) or
df = df.fillna(0)
For the first point, you'll need to import numpy first (import numpy as np). It must already be installed as you have pandas.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3 - openpyxl - Iterating through column by name - python

What is the easiest way using openpyxl to iterate through a column not by number but by column header (string value in first row of ws): Something like this: for cell in ws.columns['revenue']: print(cell.value)

Related

How to return non-empty top row column values of unknown column length in Excel?

Openpyxl: How do I get the values of a specific column?

How to compare column values of one excel file to the column values of another excel file in Python using openpyxl?

How do I get the value of a specific column / cell going horizontal in an excel using python

Returning unique values in .csv and unique strings in python+pandas

Categories

Resources