Copying/pasting a column of formulas using python - python

I have a very large excel file that I'm dealing with in python. I have a column where every cell is a different formula. I want to copy the formulas and paste them one column over from column GD to GE.
The issue is that I want to the formulas to update like they do in excel, its just that excel takes a very long time to copy/paste because the file I'm working with is very large.
Any ideas on possibly how to use openpyxl's translator to do this or anything else?
from openpyxl import load_workbook
import pandas as pd
#loads the excel file and is now saved under workbook#
workbook = load_workbook('file.xlsx')
#uses the individual sheets index(first sheet = 0) to work on one sheet at a time#
sheet= workbook.worksheets[8]
#inserts a column at specified index number#
sheet.insert_cols(187)
#naming the new columns#
sheet['GE2']= '20220531'
here is my updated code
from openpyxl import load_workbook
from openpyxl.formula.translate import Translator
#loads the excel file and is now saved under workbook#
workbook = load_workbook('file.xlsx')
#uses the individual sheets index(first sheet = 0) to work on one sheet at a time#
sheet= workbook.worksheets[8]
formula = sheet['GD3'].value
new_formula = Translator(formula, origin= 'GE3').translate_formula("GD3")
sheet['GD2'] = new_formula
for row in sheet.iter_rows(min_col=187, max_col=188):
old, new = row
if new.data_type != "f":
continue
new_formula = Translator(new.value, origin=old.coordinate).translate_formula(new.coordinate)
workbook.save('file.xlsx')

When you add or remove columns and rows, Openpyxl does not manage formulae for you. The reason for this is simple: where should it stop? Managing a "dependency graph" is exactly the kind of functionality that an application like MS Excel provides.
But it is quite easy to do this in your own code using the Formula Translator
# insert the column
formula = ws['GE1'].value
new_formula = Translator(formula, origin="GD1").translate_formula("GE1")
ws['GE1'] = new_formula
It should be fairly straightforward to create a loop for this (check the data type and use cell.coordinate to avoid potential typos or incorrect adjustments.
sheet.insert_cols(187)
for row in ws.iter_rows(min_col=187, max_col=188):
old, new = row
if new.data_type != "f"
continue
new_formula = Translator(new.value, origin=old.coordinate).translate_formula(new.coordinate)

Related

Format and manipulate data across multiple Excel sheets in Python using openpyxl before converting to Dataframe

I need some help with editing the sheets within my Excel workbook in python, before I stack the data using pd.concat(). Each sheet (~100) within my Excel workbook is structured identically, with the unique identifier for each sheet being a 6-digit code that is found in line 1 of the worksheet.
I've already done the following steps to import the file, unmerge rows 1-4, and insert a new column 'C':
import openpyxl
import pandas as pd
wb = openpyxl.load_workbook('data_sheets.xlsx')
for sheet in wb.worksheets:
sheet.merged_cells
for merge in list(sheet.merged_cells):
sheet.unmerge_cells(range_string=str(merge))
sheet.insert_cols(3, 1)
print(sheet)
wb.save('workbook_test.xlsx')
#concat once worksheets have been edited
df= pd.concat(pd.read_excel('workbook_test.xlsx, sheet_name= None), ignore_index= True)
Before stacking the data however, I would like to make the following additonal (sequential) changes to every sheet:
Extract from row 1 the right 8 characters (in excel the equivalent of this would be =RIGHT(A1, 8) - this is to pull the unique code off of each sheet, which will look like '(000000)'.
Populate column C from rows 6-282 with the unique code.
Delete rows 1-5
The end result would make each sheet within the workbook look like this:
Is this possible to do with openpyxl, and if so, how? Any direction or assistance with this would be much appreciated - thank you!
Here is a 100% openpyxl approach to achieve what you're looking for :
from openpyxl import load_workbook
wb = load_workbook("workbook_test.xlsx")
for ws in wb:
ws.unmerge_cells("A1:O1") #unmerge first row till O
ws_uid = ws.cell(row=1, column=1).value[-8:] #get the sheet's UID
for num_row in range(6, 282):
ws.cell(row=num_row, column=3).value = '="{}"'.format(ws_uid) #write UID in Column C
ws.delete_rows(1, 5) #delete first 5 rows
wb.save("workbook_test.xlsx")
NB : This assume there is already an empty column (C).

Python and Excel Formula

Complete beginner here but have a specific need to try and make my life easier with automating Excel.
I have a weekly report that contains a lot of useless columns and using Python I can delete these and rename them, with the code below.
from openpyxl import Workbook, load_workbook
wb = load_workbook('TestExcel.xlsx')
ws = wb.active
ws.delete_cols(1,3)
ws.delete_cols(3,8)
ws.delete_cols(4,3)
ws.insert_cols(3,1)
ws['A1'].value = "Full Name"
ws['C1'].value = "Email Address"
ws['C2'].value = '=B2&"#testdomain.com"'
wb.save('TestExcelUpdated.xlsx')
This does the job but I would like the formula to continue from B2 downwards (since the top row are headings).
ws['C2'].value = '=B2&"#testdomain.com"'
Obviously, in Excel it is just a case of dragging the formula down to the end of the column but I'm at a loss to get this working in Python. I've seen similar questions asked but the answers are over my head.
Would really appreciate a dummies guide.
Example of Excel report after Python code
one way to do this is by iterating over the rows in your worksheet.
for row in ws.iter_rows(min_row=2): #min_row ensures you skip your header row
row[2].value = '=B' + str(row[0].row) + '&"#testdomain.com"'
row[2].value selects the third column due to zero based indexing. row[0].row gets the number corresponding to the current row

Printing Python Output to Excel Sheet(s)

For my master thesis I've created a script.
Now I want that output to be printed to an excel sheet - I read that xlwt can do that, but examples I've found only give instructions to manually print one string to the file. Now I started by adding that code:
import xlwt
new_workbook = xlwt.Workbook(encoding='utf-8')
new_sheet=new_workbook.add_sheet("1")
Now I have no clue where to go from there, can you please give me a hint? I'm guessing I need to somehow start a loop where each time it writes to a new line for each iteration it takes, but am not sure where to start. I'd really appreciate a hint, thank you!
since you are using pandas you can use to_excel to do that.
The usage is quite simple :
Just create a dataframe with the values you need into your excel sheet and save it as excel sheet :
import pandas as pd
df = pd.DataFrame(data={
'col1':["output1","output2","output3"],
'col2':["output1.1","output2.2","output3.3"]
})
df.to_excel("excel_name.xlsx",sheet_name="sheet_name",index=False)
What you need is openpyxl: https://openpyxl.readthedocs.io/en/stable/
from openpyxl import Workbook
wb = openpyxl.load_workbook('your_template.xlsx')
sheet = wb.active
sheet.cell(row=4, column=2).value = 'what you wish to write'
wb.save('save_file_name.xlsx')
wb.close()
Lets say you would save every result to a list total_distances like
total_distances = []
for c1, c2 in coords:
# here your code
total_distances.append(total_distance)
and than save it into worksheet as:
with Workbook('total_distances.xlsx') as workbook:
worksheet = workbook.add_worksheet()
data = ["Total_distance"]
row = 0
worksheet.write_row(row,0,data)
for i in total_distances:
row += 1
data = [round(i,2)]
worksheet.write_row(row,0,data)

How could calculate the excel data by using openpyxl

I have an assignment to do for my boring online class and I couldn't come out with an idea to do this thing. I'm told to calculate the ratio of four columns with this formula ratio = weight/heightlengthwidth. Bu i'm bad at using microsoft excel and ironically we haven't learnt anything related to that. So I remembered that there is a python library which works with excel sheets. So how could I calculate this ratio = Weight/HeightWidthLength by using openpyxl for every single row in this excel sheet easily ?
Though I've never used openpyxl library I tried to find a solution to your problem. If the spreadsheet you're working on looks like the one below then you should be able to work with this script.
Sample spreadsheet image
from openpyxl import load_workbook
# Modify filename and sheet name where the data is
workbook_filename = 'workbook.xlsx'
sheet_name = 'Sheet1'
wb = load_workbook(workbook_filename)
ws = wb[sheet_name]
# If the data is stored differently in your file, you have to modify
# this loop to suit your needs
for row in ws.iter_rows(min_row = 2, max_row = 3, max_col = 5):
row[4].value = row[0].value / (row[1].value * row[2].value * row[3].value)
wb.save('result.xlsx')

Is there any way to append a new row to an Excel spreadsheet using Python?

I have searched around, tried some win32com and some xlrd/xlwt/xlutils but all I can do is insert data into the existing Excel rows - I want to be able to insert one new row (specifically the first one, in my case). Does anyone know how to do this using Python?
as per suggestion, I will include what I did to add a row to my excel file
from xlrd import open_workbook # http://pypi.python.org/pypi/xlrd
from xlutils.copy import copy # http://pypi.python.org/pypi/xlutils
from xlwt import easyxf # http://pypi.python.org/pypi/xlwt
import xlwt
...next part is indented because it's in some for loops, not good at stack overflow formatting
rb = open_workbook( os.path.join(cohort_path, f),on_demand=True,encoding_override="cp1252",formatting_info=True)
#The following is because Messed up file has a missing row
if f=='MessedUp.xls':
r_sheet = rb.sheet_by_name('SHEET NAME') # read only copy to introspect the file
wb = copy(rb)
w_sheet = wb.get_sheet(rb.sheet_names().index('SHEET NAME')) #Workaround
#fix first rows
for col_index in range(0, r_sheet.ncols):
for row_index in range(2, r_sheet.nrows):
xfx = r_sheet.cell_xf_index(row_index-1, col_index)
xf = rb.xf_list[xfx]
bgx = xf.background.pattern_colour_index
xlwt.add_palette_colour("custom_colour", 0x17)
#rb.set_colour_RGB(0x21, 251, 228, 228) #or wb??
style_string = 'pattern: pattern solid, fore_colour custom_colour' if bgx in (55,23) else None
style = xlwt.easyxf(style_string)
w_sheet.write(row_index, col_index, r_sheet.cell(row_index-1,col_index).value,style=style)
wb.save(os.path.join(cohort_path, 'fixed_copy.xls'))
xlwt helps you in writing to excel.
To write anything to excel you have to specify a row and column
So its like worksheet.write(x,y,x*y)
This commands writes to a cell with x, y co-ordinates the values of x*y.
So, in your case, to write to a new row, just give the row number where you want the new row,
and write as much as columns you want. Easy.
Its not a list that you need to append you to. You can jump of to any cell you want to and write.
Check out a useful example here - http://codingtutorials.co.uk/python-excel-xlrd-xlwt/

Categories