Python Openpxyl writing into an excel in random locations - python

I'm working on an API that I was given and am having trouble with printing to the correct row in an Excel file. It is to check the first row that has any open cells in the table and then print to it. Sometimes the first available row is listed as a random row where the row is already full and is overwriting the data or completely skipping rows with only 11/30 cells with data. I have two programs doing this and having the same issues.
Here is the row selection and printing portion of the code.
wb = openpyxl.load_workbook(path)
ws = wb.active
# This loop will go over the excel rows to find the first empty row. It increments the variable #"firstEmptyRow" until it finds the first empty row.
firstEmptyRow = 0
for row in ws:
if not any([cell.value == None for cell in row]):
firstEmptyRow += 1
print(firstEmptyRow)
# The following lines will post the dataframes in the excel path.
with pd.ExcelWriter(path, mode="a",engine = "openpyxl", if_sheet_exists="overlay" ) as writer:
allDf.to_excel(writer, sheet_name= "Main Table", index = False, header = False,startrow=firstEmptyRow,startcol=0)
with pd.ExcelWriter(path, mode="a",engine = "openpyxl", if_sheet_exists="overlay" ) as writer:
combinedArray.to_excel(writer, sheet_name= "Main Table", index = False, header = False,startrow=firstEmptyRow,startcol=18)
Thank you for any help you have. :) Let me kn0ow if you have any questions.
To fix this I've made sure locate if there was data in each row, there is. Beyond this, I have deleted any data that may have been in any cells outside of the table. Neither helped.

Related

Python and Excel Formula

Complete beginner here but have a specific need to try and make my life easier with automating Excel.
I have a weekly report that contains a lot of useless columns and using Python I can delete these and rename them, with the code below.
from openpyxl import Workbook, load_workbook
wb = load_workbook('TestExcel.xlsx')
ws = wb.active
ws.delete_cols(1,3)
ws.delete_cols(3,8)
ws.delete_cols(4,3)
ws.insert_cols(3,1)
ws['A1'].value = "Full Name"
ws['C1'].value = "Email Address"
ws['C2'].value = '=B2&"#testdomain.com"'
wb.save('TestExcelUpdated.xlsx')
This does the job but I would like the formula to continue from B2 downwards (since the top row are headings).
ws['C2'].value = '=B2&"#testdomain.com"'
Obviously, in Excel it is just a case of dragging the formula down to the end of the column but I'm at a loss to get this working in Python. I've seen similar questions asked but the answers are over my head.
Would really appreciate a dummies guide.
Example of Excel report after Python code
one way to do this is by iterating over the rows in your worksheet.
for row in ws.iter_rows(min_row=2): #min_row ensures you skip your header row
row[2].value = '=B' + str(row[0].row) + '&"#testdomain.com"'
row[2].value selects the third column due to zero based indexing. row[0].row gets the number corresponding to the current row

An empty data frame error when using read_excel(),skiprow

I have an excel with an explanation written in the first row as follows:
I want to convert the second row as a header and the data below it into a dataframe format.
So I wrote the following code, but the result is an empty data frame
df = pd.read_excel(filename, skiprows=1)
print(df) =
Empty DataFrame
Columns: []
Index: []
If I enter Excel and delete the first row and do not use skiprows, a correct dataframe appears. Which should I fix?
And since there is only one sheet, I did not set it
When opening the first file saved, the cursor is positioned at row=1 column[a]=1. If I change the cursor position and save the excel, the data frame comes out well. How do I move the cursor and save it?
skiprows=1 works fine me.
pd.read_excel('book1.xlsx', skiprows=1)
Here is an alternative using openpyxl, maybe you can give this a try and see -
from openpyxl import load_workbook
wb = load_workbook('book1.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
data = []
for row in ws.values:
data.append([item for item in row])
df = pd.DataFrame(data[2:], columns=data[1])
have you tried skiprows=[0]?
This is outlined here:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
Then the default behaviour for headers should be taken as the new row 0
The skiprow argument looks at rows starting from 0. So you have to put skiprows=0.

Copy column of cell values from one workbook to another with openpyxl

I am extracting data from one workbook's column and need to copy the data to another existing workbook.
This is how I extract the data (works fine):
wb2 = load_workbook('C:\\folder\\AllSitesOpen2.xlsx')
ws2 = wb2['report1570826222449']
#Extract column A from Open Sites
DateColumnA = []
for row in ws2.iter_rows(min_row=16, max_row=None, min_col=1, max_col=1):
for cell in row:
DateColumnA.append(cell.value)
DateColumnA
The above code successfully outputs the cell values in each row of the first column to DateColumnA
I'd like to paste the values stored in DateColumnA to this existing destination workbook:
#file to be pasted into
wb3 = load_workbook('C:\\folder\\output.xlsx')
ws3 = wb3['Sheet1']
But I am missing a piece conceptually here. I can't connect the dots. Can someone advise how I can get this data from my source workbook to the new destination workbook?
Lets say you want to copy the column starting in cell 'A1' of 'Sheet1' in wb3:
wb3 = load_workbook('C:\\folder\\output.xlsx')
ws3 = wb3['Sheet1']
for counter in range(len(DateColumnA)):
cell_id = 'A' + str(counter + 1)
ws3[cell_id] = DateColumnA[counter]
wb3.save('C:\\folder\\output.xlsx')
I ended up getting this to write the list to another pre-existing spreadsheet:
for x, rows in enumerate(DateColumnA):
ws3.cell(row=x+1, column=1).value = rows
#print(rows)
wb3.save('C:\\folder\\output.xlsx')
Works great but now I need to determine how to write the data to output.xlsx starting at row 16 instead of row 1 so I don't overwrite the first 16 existing header rows in output.xlsx. Any ideas appreciated.
I figured out a more concise way to write the source data to a different starting row on destination sheet in a different workbook. I do not need to dump the values in to a list as I did above. iter_rows does all the work and openpyxl nicely passes it to a different workbook and worksheet:
row_offset=5
for rows in ws2.iter_rows(min_row=2, max_row=None, min_col=1, max_col=1):
for cell in rows:
ws3.cell(row=cell.row + row_offset, column=1, value=cell.value)
wb3.save('C:\\folder\\DestFile.xlsx')

Iterate through columns in Read-only workbook in openpyxl

I have a somewhat large .xlsx file - 19 columns, 5185 rows. I want to open the file, read all the values in one column, do some stuff to those values, and then create a new column in the same workbook and write out the modified values. Thus, I need to be able to both read and write in the same file.
My original code did this:
def readExcel(doc):
wb = load_workbook(generalpath + exppath + doc)
ws = wb["Sheet1"]
# iterate through the columns to find the correct one
for col in ws.iter_cols(min_row=1, max_row=1):
for mycell in col:
if mycell.value == "PerceivedSound.RESP":
origCol = mycell.column
# get the column letter for the first empty column to output the new values
newCol = utils.get_column_letter(ws.max_column+1)
# iterate through the rows to get the value from the original column,
# do something to that value, and output it in the new column
for myrow in range(2, ws.max_row+1):
myrow = str(myrow)
# do some stuff to make the new value
cleanedResp = doStuff(ws[origCol + myrow].value)
ws[newCol + myrow] = cleanedResp
wb.save(doc)
However, python threw a memory error after row 3853 because the workbook was too big. The openpyxl docs said to use Read-only mode (https://openpyxl.readthedocs.io/en/latest/optimized.html) to handle big workbooks. I'm now trying to use that; however, there seems to be no way to iterate through the columns when I add the read_only = True param:
def readExcel(doc):
wb = load_workbook(generalpath + exppath + doc, read_only=True)
ws = wb["Sheet1"]
for col in ws.iter_cols(min_row=1, max_row=1):
#etc.
python throws this error:
AttributeError: 'ReadOnlyWorksheet' object has no attribute 'iter_cols'
If I change the final line in the above snippet to:
for col in ws.columns:
python throws the same error:
AttributeError: 'ReadOnlyWorksheet' object has no attribute 'columns'
Iterating over rows is fine (and is included in the documentation I linked above):
for col in ws.rows:
(no error)
This question asks about the AttritubeError but the solution is to remove Read-only mode, which doesn't work for me because openpyxl won't read my entire workbook in not Read-only mode.
So: how do I iterate through columns in a large workbook?
And I haven't yet encountered this, but I will once I can iterate through the columns: how do I both read and write the same workbook, if said workbook is large?
Thanks!
If the worksheet has only around 100,000 cells then you shouldn't have any memory problems. You should probably investigate this further.
iter_cols() is not available in read-only mode because it requires constant and very inefficient reparsing of the underlying XML file. It is however, relatively easy to convert rows into columns from iter_rows() using zip.
def _iter_cols(self, min_col=None, max_col=None, min_row=None,
max_row=None, values_only=False):
yield from zip(*self.iter_rows(
min_row=min_row, max_row=max_row,
min_col=min_col, max_col=max_col, values_only=values_only))
import types
for sheet in workbook:
sheet.iter_cols = types.MethodType(_iter_cols, sheet)
According to the documentation, ReadOnly mode only supports row-based reads (column reads are not implemented). But that's not hard to solve:
wb2 = Workbook(write_only=True)
ws2 = wb2.create_sheet()
# find what column I need
colcounter = 0
for row in ws.rows:
for cell in row:
if cell.value == "PerceivedSound.RESP":
break
colcounter += 1
# cells are apparently linked to the parent workbook meta
# this will retain only values; you'll need custom
# row constructor if you want to retain more
row2 = [cell.value for cell in row]
ws2.append(row2) # preserve the first row in the new file
break # stop after first row
for row in ws.rows:
row2 = [cell.value for cell in row]
row2.append(doStuff(row2[colcounter]))
ws2.append(row2) # write a new row to the new wb
wb2.save('newfile.xlsx')
wb.close()
wb2.close()
# copy `newfile.xlsx` to `generalpath + exppath + doc`
# Either using os.system,subprocess.popen, or shutil.copy2()
You will not be able to write to the same workbook, but as shown above you can open a new workbook (in writeonly mode), write to it, and overwrite the old file using OS copy.

Openpyxl and Hidden/Unhidden Excel Worksheets

I have the following code that reads data from a tab-delimited text file and then writes it to a specified worksheet within an existing Excel workbook. The variables "workbook", "write_sheet", and "text_file" are input by the user
tab_reader = csv.reader(text_file, delimiter='\t')
xls_book = openpyxl.load_workbook(filename=workbook)
sheet_names = xls_book.get_sheet_names()
xls_sheet = xls_book.get_sheet_by_name(write_sheet)
for row_index, row in enumerate(tab_reader):
number = 0
col_number = first_col
while number < num_cols:
cell_tmp = xls_sheet.cell(row = row_index, column = col_number)
cell_tmp.value = row[number]
number += 1
col_number += 1
xls_book.save(workbook)
However when I run this code on a preexisting "workbook" in which "worksheet" is a hidden tab, the output unhides the tab. I think the reason is because openpyxl is not modifying the file but creating a new file entirely. Is there an easy way to tell python to check if the worksheet is hidden and then output a hidden or unhidden sheet based on whether or not the condition is satisfied?
Thanks!
We currently don't support hiding worksheets in openpyxl so this is just ignored when reading the file and, therefore, lost when saving it. I don't think it should be too hard to add it. Please submit a feature request on Bitbucket.
[UPDATE]
The feature is now available:
ws.sheet_state = 'hidden'
Or actually xls_sheet.sheet_state = 'hidden' in your particular case.

Categories