Python transfer excel formatting between two Excel documents - python

I'd like to copy the formatting between two Excel sheets in python.
Here is the situation:
I have a script that effectively "alters" (ie overwrites) an excel file by opening it using pd.ExcelWriter, then updates values in the rows. Finally, file is overwritten using ExcelWriter.
The Excel file is printed/shared/read by humans between updates done by the code. Humans will do things like change number formatting, turn on/off word wrap, and alter column widths.
My goal is the code updates should only alter the content of the file, not the formatting of the columns.
Is there a way I can read/store/write the sheet format within python so the output file has the same column formatting as the input file?
Here's the basic idea of what I am doing right now:
df_in= pd.read_excel("myfile.xlsx")
# Here is where I'd like to read in format of the first sheet of this file
xlwriter = pd.ExcelWriter('myfile.xlsx', engine='xlsxwriter')
df_out = do_update(df_in)
df_out.to_excel(xlwriter,'sheet1')
# Here is where I'd like to apply the format I read earlier to the sheet
xlwriter.save()
Note: I have played with xlsxwriter.set_column and add_format. As far as I can tell, these don't help me read the format from the current file

Pandas uses xlrd package for parsing Excel documents to DataFrames.
Interoperability between other xlsx packages and xlrd could be problematic when it comes to the data structure used to represent formatting information.
I suggest using openpyxl as your engine when instantiating pandas.ExcelWriter. It comes with reader and writer classes that are interoperable.
import pandas as pd
from openpyxl.styles.stylesheet import apply_stylesheet
from openpyxl.reader.excel import ExcelReader
xlreader = ExcelReader('myfile.xlsx', read_only=True)
xlwriter = pd.ExcelWriter('myfile.xlsx', engine='openpyxl')
df_in = pd.read_excel("myfile.xlsx")
df_out = do_update(df_in)
df_out.to_excel(xlwriter,'sheet1')
apply_stylesheet(xlreader.archive, xlwriter.book)
xlwriter.save()

Related

Treat everything as raw string (even formulas) when reading into pandas from excel

So, I am actually handling text responses from surveys, and it is common to have responses that starts with -, an example is: -I am sad today.
Excel would interpret it as #NAMES?
So when I import the excel file into pandas using read_excel, it would show NAN.
Now is there any method to force excel to retain as raw strings instead interpret it at formula level?
I created a vba and assigning the entire column with text to click through all the cells in the column, which is slow if there is ten thousand++ data.
I was hoping it can do it at python level instead, any idea?
I hope, it works for your solution, use openpyxl to extract excel data and then convert it into a pandas dataframe
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = './formula_contains_raw.xlsx', ).active
print(wb.values)
# sheet_names = wb.get_sheet_names()[0]
# sheet_ranges = wb[name]
df = pd.DataFrame(list(wb.values)[1:], columns=list(wb.values)[0])
df.head()
It works for me using a CSV instead of excel file.
In the CSV file (opened in excel) I need to select the option Formulas/Show Formulas, then save the file.
pd.read_csv('draft.csv')
Output:
Col1
0 hello
1 =-hello

Convert XLSX to CSV without losing values from formulas

I've tried a few methods, including pandas:
df = pd.read_excel('file.xlsx')
df.to_csv('file.csv')
But every time I convert my xlsx file over to csv format, I lose all data within columns that include a formula. I have a formula that concatenates values from two other cells + '#domain' to create user emails, but this entire column returns blank in the csv product.
The formula is basically this:
=CONCATENATE(B2,".",E2,"#domain")
The conversion is part of a larger code workflow, but it won't work if this column is left blank. The only thing I've tried that worked was this API, but I'd rather not pay a subscription if this can be done locally on the machine.
Any ideas? I'll try whatever you throw at me - bear in mind I'm new to this, but I will do my best!
You can try to open the excel file with the openpyxl library in the data-only mode. This will prevent the raw formulas - they are going to be calculated just the way you see them in excel itself.
import openpyxl
wb = openpyxl.load_workbook(filename, data_only=True)
Watch out when youre working with you original file and save it with the openpyxl-lib in the data-only-mode all your formulas will be lost. I had this once and it was horrible. So i recommend using a copy of your file to work with.
Since you have your xlsx-file with values only you can now use the internal csv library to generate a proper csv-file (idea from this post: How to save an Excel worksheet as CSV):
import csv
sheet = wb.active # was .get_active_sheet()
with open('test.csv', 'w', newline="") as f:
c = csv.writer(f)
for r in sheet.iter_rows(): # generator; was sh.rows
c.writerow([cell.value for cell in r])

Importing and writing multiple excel sheets with Panda

I am trying to import excel files which have multiple sheets. Currently, my code (below) is only importing the first sheet. The remainder of the code is preforming calculations from only one sheet (currently the first since I moved it there to make it work-but bonus if I can avoid this step).
Ideally, I would like to import all the sheets, preform calculations on the one sheet, and export all sheets again in an excel file. A majority of the sheets would be import/export with no changes, while the one sheet with a specific/consistent name would have calculations preformed on it and also exported. Not sure what functions to look into. Thanks!
df = pd.read_excel("excelfilename.xlsx")
df.head()
#other code present here preforming calculations
df.to_excel(r'newfilename.xlsx', index = False)
Load Excel file using pandas, then get sheet names using xlrd, and then save modified data back.
import xlrd
file_name = "using_excel.xlsx"
sheet_names_ = xlrd.open_workbook(file_name, on_demand=True).sheet_names()
for sheet_name in sheet_names_:
df_sheet = pd.read_excel(file_name, sheet_name=sheet_name)
# do something
if you_want_to_write_back_to_same_sheet_in_same_file:
writer = pd.ExcelWriter(file_name)
df_sheet.to_excel(writer, sheet_name=sheet_name)
writer.save()

Using Python to load template excel file, insert a DataFrame to specific lines and save as a new file

I'm having troubles writing something that I believe should be relatively easy.
I have a template excel file, that has some visualizations on it with a few spreadsheets. I want to write a scripts that loads the template, inserts an existing dataframe rows to specific cells on each sheet, and saves the new excel file as a new file.
The template already have all the cells designed and the visualization, so i will want to insert this data only without changing the design.
I tried several packages and none of them seemed to work for me.
Thanks for your help! :-)
I have written a package for inserting Pandas DataFrames to Excel sheets (specific rows/cells/columns), it's called pyxcelframe:
https://pypi.org/project/pyxcelframe/
It has very simple and short documentation, and the method you need is insert_frame
So, let's say we have a Pandas DataFrame called df which we have to insert in the Excel file ("MyWorkbook") sheet named "MySheet" from the cell B5, we can just use insert_frame function as follows:
from pyxcelframe import insert_frame
from openpyxl import load_workbook
workbook = load_workbook("MyWorkbook.xlsx")
worksheet = workbook["MySheet"]
insert_frame(worksheet=worksheet,
dataframe=df,
row_range=(5, 0),
col_range=(2, 0))
0 as the value of the second element of row_range or col_range means that there is no ending row or column specified, if you need specific ending row/column you can replace 0 with it.
Sounds like a job for xlwings. You didn't post any test data, but modyfing below to suit your needs should be quite straight-forward.
import xlwings as xw
wb = xw.Book('your_excel_template.xlsx')
wb.sheets['Sheet1'].range('A1').value = df[your_selected_rows]
wb.save('new_file.xlsx')
wb.close()

xlsxwriter - how to create from csv file

I'm going to poll controllers in our data centers and output all of them to a csv file. The python tool, xlsxwriter, looks to be the best for it. However, I don't see any mention of how to simply take a csv file and convert it to xlsx.
Xlsxwriter seems to be great for making an xlsx file based on the python script it's in, but I don't know how to gather that data from a csv file.
If you don't mind an answer with another package dependency, I highly recommend pandas for I/O operations like this. It's hard to beat in terms of both code economy and performance. Also, if you need to do any manipulations (filtering, sorting, etc.) on the data before writing to xslx, it's already in a handy dataframe.
You could do something like:
import pandas as pd
import xlsxwriter
path = 'some/path/'
#read the csv into a pandas dataframe
data = pd.read_csv(path + 'input.csv')
#setup the writer
writer = pd.ExcelWriter(path + 'output.xlsx', engine='xlsxwriter')
#write the dataframe to an xlsx file
data.to_excel(writer, sheet_name='mysheet', index=False)
writer.save()

Categories