Error with large data set using openpyxl python

Error with large data set using openpyxl python - python

I've got an excel file xlsx (shape:1180,6) that I'm trying to manipulate around. Pretty much creating an empty row every other row and inserting data to it, by just re-arranging the data. The code runs fine when i try it with just 10 rows of data but fails when i run the entire 1180 rows. It also runs a long time before spitting out the same unprocessed data. Is openpyxl not built for this? Just wondering if there's a more efficient way of doing it. Here's my code. Below the code is data after using a few rows, which is what i need, but fails for the entire data set.
%%time
import pandas as pd
import numpy as np
from openpyxl import load_workbook
import os
xls = pd.ExcelFile('input.xlsx')
df = xls.parse(0)
wb = load_workbook('input.xlsx')
#print(wb.sheetnames)
sh1=wb['Sheet1']
df.head()
#print(sh1.max_column)
for y in range(2,(sh1.max_row+1)*2,2):
sh1.insert_rows(y)
wb.save('output.xlsx')
m=3
for k in range(2,sh1.max_row+1,2):
sh1.cell(row=k,column=1).value = sh1.cell(row=m,column=1).value # copy from one cell and paste
sh1.cell(row=k,column=2).value = sh1.cell(row=m,column=3).value
sh1.cell(row=k,column=3).value = sh1.cell(row=m,column=2).value
sh1.cell(row=k,column=4).value = 'A'
sh1.cell(row=m,column=4).value = 'H'
sh1.cell(row=k,column=5).value = sh1.cell(row=m,column=6).value
sh1.cell(row=k,column=6).value = sh1.cell(row=m,column=5).value
m+=2
wb.save('output.xlsx')
xls = pd.ExcelFile('output.xlsx')
df1 = xls.parse(0)
wb1 = load_workbook('output.xlsx')
df1

Related

Copying/pasting a column of formulas using python

I have a very large excel file that I'm dealing with in python. I have a column where every cell is a different formula. I want to copy the formulas and paste them one column over from column GD to GE.
The issue is that I want to the formulas to update like they do in excel, its just that excel takes a very long time to copy/paste because the file I'm working with is very large.
Any ideas on possibly how to use openpyxl's translator to do this or anything else?
from openpyxl import load_workbook
import pandas as pd
#loads the excel file and is now saved under workbook#
workbook = load_workbook('file.xlsx')
#uses the individual sheets index(first sheet = 0) to work on one sheet at a time#
sheet= workbook.worksheets[8]
#inserts a column at specified index number#
sheet.insert_cols(187)
#naming the new columns#
sheet['GE2']= '20220531'
here is my updated code
from openpyxl import load_workbook
from openpyxl.formula.translate import Translator
#loads the excel file and is now saved under workbook#
workbook = load_workbook('file.xlsx')
#uses the individual sheets index(first sheet = 0) to work on one sheet at a time#
sheet= workbook.worksheets[8]
formula = sheet['GD3'].value
new_formula = Translator(formula, origin= 'GE3').translate_formula("GD3")
sheet['GD2'] = new_formula
for row in sheet.iter_rows(min_col=187, max_col=188):
old, new = row
if new.data_type != "f":
continue
new_formula = Translator(new.value, origin=old.coordinate).translate_formula(new.coordinate)
workbook.save('file.xlsx')

When you add or remove columns and rows, Openpyxl does not manage formulae for you. The reason for this is simple: where should it stop? Managing a "dependency graph" is exactly the kind of functionality that an application like MS Excel provides.
But it is quite easy to do this in your own code using the Formula Translator
# insert the column
formula = ws['GE1'].value
new_formula = Translator(formula, origin="GD1").translate_formula("GE1")
ws['GE1'] = new_formula
It should be fairly straightforward to create a loop for this (check the data type and use cell.coordinate to avoid potential typos or incorrect adjustments.
sheet.insert_cols(187)
for row in ws.iter_rows(min_col=187, max_col=188):
old, new = row
if new.data_type != "f"
continue
new_formula = Translator(new.value, origin=old.coordinate).translate_formula(new.coordinate)

How to iterate in excel with python

This is probably super simple, but i am new to python.
I wrote some code to insert a number into a certain row and column in excel. That gives me a value in another cell. I would like to iterate, by inserting -1000, then -950, then -900 up to +1000. And for every increment i would like to print the value.
How is this possible?
THis is my code so far
import xlwings as xw
import pandas as pd
import matplotlib.pyplot as plt
#load the excel file
wb = xw.Book("Datasets/Sektion_20111.xlsm")
#Sheet
sht = wb.sheets["Beregning"]
#dataframe
#Cell with normal force
sht.range("N25").value = (500)
#Print cell with nedre grænse, brudmoment
print(sht["AV24"].value)
This way it works by creating a new spreadsheet, where cell N25 has the value 1000, and i can read the result from that manually. i would like python to print all values and all results for me.
How can i do this?

As far as I understood you're trying to run an Excel macro several times, inserting values from -1000 to +1000 with step of 50 using a Python script, then get the result for each iteration, taken from a different cell of the sheet.
If this is your case openpyxl is not able to do that, as stated in this post:
openpyxl how to read formula result after editing input data on the sheet? data_only=True gives me a "None" result

To anyone interested, i solved it with this code
import xlwings as xw
#load the excel file
wb = xw.Book("Datasets/Sektion_20111.xlsm")
#Sheet
sht = wb.sheets["Beregning"]
#for loop
x = range(-100, 100, 50)
for i in x:
#Cell with normal force
sht.range("N25").value = i
print(sht["AV24"].value)
N25 is the cell to enter info.
AV24 is the cell to print.

Reading the excel file into pandas data frame without hidden rows or ignore the hidden rows in python

I have struck to read the excel file after ignoring the hidden rows into pandas Data frame. After lot of search I got the answer as below. Hope this will help some ones.
file_path ='text.xlsx'
import pandas as pd
import openpyxl
wb = openpyxl.load_workbook(file_path)
ws = wb['Table1']
hidden_rows = []
for rowLetter,rowDimension in ws.row_dimensions.items():
if rowDimension.hidden == True:
hidden_rows.append(rowLetter)
print(len(hidden_rows))
df = pd.read_excel(file_path)
#df.index += 2 (may some time we need adjust the data frame index to match
# index of excel.
print(list(set(df.index)-set(hidden_rows)))
unhidden = list( set(df.index) - set(hidden_rows) )
newdf = df.loc[unhidden]
newdf.shape

Writing a Python Created Pivot Table using Pandas to a Excel Document

I have been working on automating a series of reports in python. I have been trying to create a series of pivot tables from an imported csv (binlift.csv). I have found the Pandas library very useful for this however, I cant seem to find anything that helps me write the Panda created pivot tables to my excel document (Template.xlsx) and was wondering if anyone can help. So far I have the written the following code
import openpyxl
import csv
from datetime import datetime
import datetime
import pandas as pd
import numpy as np
file1 = "Template.xlsx" # template file
file2 = "binlift.csv" # raw data csv
wb1 = openpyxl.load_workbook(file1) # opens template
ws1 = wb1.create_sheet("Raw Data") # create a new sheet in template called Raw Data
summary = wb1.worksheets[0] # variables given to sheets for manipulation
rawdata = wb1.worksheets[1]
headings = ["READER","BEATID","LIFTYEAR","LIFTMONTH","LIFTWEEK","LIFTDAY","TAGGED","UNTAGGEDLIFT","LIFT"]
df = pd.read_csv(file2, names=headings)
pivot_1 = pd.pivot_table(df, index=["LIFTYEAR", "LIFTMONTH","LIFTWEEK"], values=["TAGGED","UNTAGGEDLIFT","LIFT"],aggfunc=np.sum)
pivot_2 = pd.pivot_table(df, index=["LIFTYEAR", "LIFTMONTH"], values=["TAGGED","UNTAGGEDLIFT"],aggfunc=np.sum)
pivot_3 = pd.pivot_table(df, index=["READER"], values=["TAGGED","UNTAGGEDLIFT","LIFT"],aggfunc=np.sum)
print(pivot_1)
print(pivot_2)
print(pivot_3)
wb1.save('test.xlsx')enter code here

There is an option in pandas to write the 'xlsx' files.
Here basically we get all the indices (at level 0) of the pivot table, and then one by one we go over these indices to subset the table and write that part of the table.
writer = pd.ExcelWriter('output.xlsx')
for manager in pivot_1.index.get_level_values(0).unique():
temp_df = pivot_1.xs(manager, level=0)
temp_df.to_excel(writer, manager)
writer.save()

Trying to write values to an excel sheet using python. Calculated values in that sheet do not update when trying to import the result into a dataframe

I am trying to write to an excel file and then load the result into a DF. However, calculated values are returning N/A in the DF even though when I open the excel sheet they are correctly displayed.
If I open and then save the excel sheet manually after updating using python, loading the dataframe works.
Here is the code:
from openpyxl import load_workbook
import pandas as pd
if __name__ == '__main__':
portfolio_values = getBalances('USD')
wb = load_workbook(filename = 'Client_Portfolio_Tracker.xlsx')
clients = wb['Clients']
clients['F2'] = portfolio_values
clients['G2'] = datetime.datetime.now().strftime("%m/%d/%y %H:%M")
wb.save('Client_Portfolio_Tracker.xlsx')
client_df = pd.read_excel('Client_Portfolio_Tracker.xlsx', sheetname = 'Clients')
print(client_df)
Output of dataframe
Thanks in advance!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Error with large data set using openpyxl python - python

Related

Copying/pasting a column of formulas using python

How to iterate in excel with python

Reading the excel file into pandas data frame without hidden rows or ignore the hidden rows in python

Writing a Python Created Pivot Table using Pandas to a Excel Document

Trying to write values to an excel sheet using python. Calculated values in that sheet do not update when trying to import the result into a dataframe

Categories

Resources