Importing and writing multiple excel sheets with Panda - python

I am trying to import excel files which have multiple sheets. Currently, my code (below) is only importing the first sheet. The remainder of the code is preforming calculations from only one sheet (currently the first since I moved it there to make it work-but bonus if I can avoid this step).
Ideally, I would like to import all the sheets, preform calculations on the one sheet, and export all sheets again in an excel file. A majority of the sheets would be import/export with no changes, while the one sheet with a specific/consistent name would have calculations preformed on it and also exported. Not sure what functions to look into. Thanks!
df = pd.read_excel("excelfilename.xlsx")
df.head()
#other code present here preforming calculations
df.to_excel(r'newfilename.xlsx', index = False)

Load Excel file using pandas, then get sheet names using xlrd, and then save modified data back.
import xlrd
file_name = "using_excel.xlsx"
sheet_names_ = xlrd.open_workbook(file_name, on_demand=True).sheet_names()
for sheet_name in sheet_names_:
df_sheet = pd.read_excel(file_name, sheet_name=sheet_name)
# do something
if you_want_to_write_back_to_same_sheet_in_same_file:
writer = pd.ExcelWriter(file_name)
df_sheet.to_excel(writer, sheet_name=sheet_name)
writer.save()

Related

Removing the Indexed Column when Merging 2 Excel Spreadsheets into a new Sheet in an existing Excel Spreadsheet using Pandas

I wanted to automate comparing two excel spreadsheets and updating old data (call this spreadsheet Old_Data.xlsx) with new data (from a different excel document; called New_Data.xlsx) and placing the updated data into a different sheet on on Old_Data.xlsx.
I am able to successfully create the new sheet in Old_Data.xlsx and see the changes between the two data sets, however, in the new sheet an index appears labeling the rows of data from 0-n. I've tried hiding this index so the information on each sheet in Old_Data.xlsx appears the same, however, I cannot successfully seem to get rid of the addition of the index. See the code below:
from openpyxl import load_workbook
# import xlwings as xl
import pandas as pd
import jinja2
# Load the workbook that is going to updated with new information.
wb = load_workbook('OldData.xlsx')
# Define the file path for all of the old and new data.
old_path = 'OldData.xlsx'
new_path = 'NewData.xlsx'
# Load the data frames for each Spreadsheet.
df_old = pd.read_excel(old_path)
print(df_old)
df_new = pd.read_excel(new_path)
print(df_new)
# Keep all original information why showing the differences in information and write
# to a new sheet in the workbook.
difference = pd.merge(df_old, df_new, how='right')
difference = difference.style.format.hide()
print(difference)
# Append the difference to an existing Excel File
with pd.ExcelWriter('OldData.xlsx', mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
difference.to_excel(writer, sheet_name="1-25-2023")
This is an image of the table of the second sheet that I creating. (https://i.stack.imgur.com/7Amdf.jpg)
I've tried adding the code:
difference = difference.style.format.hide
To get rid of the row, but I have not succeeded.
pass index = False as an argument in last line of you code. It should be something like this :-
with pd.ExcelWriter('OldData.xlsx', mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
difference.to_excel(writer, sheet_name="1-25-2023", index = False)
I think this should solve your problem.

Python - whats the most efficient way to read large multi sheet spreadsheets into a pandas dataframe

I have a directory full of large spreadsheets.
My plan is to read each of the sheets into a dataframe, drop what I dont need and remove duplicates, then append to a master dataframe that I will then save as an excell file.
My current method like the following...
for workbook in filelist:
For sheet in workbook:
Df = pd.read_excell(workbook, sheet)
## Do table manipulation and append to master df
My problem is it takes a long time, I'm concerned that everytime I loop it is opening and closing the workbook.
Is there a way I can open the workbook and then cycle through each sheet saving it to a dataframe?
Note, the column headers are the same on each sheet.
Apologise for the shorthand code up there,I'm afk.
You can open the workbook once and read sheets from it. I don't know if this is really any faster, but worth a try
import pandas as pd
for filename in filelist:
workbook = pd.ExcelFile()
for sheet in workbook.sheet_names:
df = workbook.parse(sheet)

Python transfer excel formatting between two Excel documents

I'd like to copy the formatting between two Excel sheets in python.
Here is the situation:
I have a script that effectively "alters" (ie overwrites) an excel file by opening it using pd.ExcelWriter, then updates values in the rows. Finally, file is overwritten using ExcelWriter.
The Excel file is printed/shared/read by humans between updates done by the code. Humans will do things like change number formatting, turn on/off word wrap, and alter column widths.
My goal is the code updates should only alter the content of the file, not the formatting of the columns.
Is there a way I can read/store/write the sheet format within python so the output file has the same column formatting as the input file?
Here's the basic idea of what I am doing right now:
df_in= pd.read_excel("myfile.xlsx")
# Here is where I'd like to read in format of the first sheet of this file
xlwriter = pd.ExcelWriter('myfile.xlsx', engine='xlsxwriter')
df_out = do_update(df_in)
df_out.to_excel(xlwriter,'sheet1')
# Here is where I'd like to apply the format I read earlier to the sheet
xlwriter.save()
Note: I have played with xlsxwriter.set_column and add_format. As far as I can tell, these don't help me read the format from the current file
Pandas uses xlrd package for parsing Excel documents to DataFrames.
Interoperability between other xlsx packages and xlrd could be problematic when it comes to the data structure used to represent formatting information.
I suggest using openpyxl as your engine when instantiating pandas.ExcelWriter. It comes with reader and writer classes that are interoperable.
import pandas as pd
from openpyxl.styles.stylesheet import apply_stylesheet
from openpyxl.reader.excel import ExcelReader
xlreader = ExcelReader('myfile.xlsx', read_only=True)
xlwriter = pd.ExcelWriter('myfile.xlsx', engine='openpyxl')
df_in = pd.read_excel("myfile.xlsx")
df_out = do_update(df_in)
df_out.to_excel(xlwriter,'sheet1')
apply_stylesheet(xlreader.archive, xlwriter.book)
xlwriter.save()

Python: Write a dataframe to an already existing excel which contains a sheet with images

I have been working on this for too long now. I have an Excel with one sheet (sheetname = 'abc') with images in it and I want to have a Python script that writes a dataframe on a second separate sheet (sheetname = 'def') in the same excel file. Can anybody provide me with some example code, because everytime I try to write the dataframe, the first sheet with the images gets emptied.
This is what I tried:
book = load_workbook('filename_of_file_with_pictures_in_it.xlsx')
writer = pd.ExcelWriter('filename_of_file_with_pictures_in_it.xlsx', engine = 'openpyxl')
writer.book = book
x1 = np.random.randn(100, 2)
df = pd.DataFrame(x1)
df.to_excel(writer, sheet_name = 'def')
writer.save()
book.close()
It saves the random numbers in the sheet with the name 'def', but the first sheet 'abc' now becomes empty.
What goes wrong here? Hopefully somebody can help me with this.
Interesting question! With openpyxl you can easily add values, keep the formulas but cannot retain the graphs. Also with the latest version (2.5.4), graphs do not stay. So, I decided to address the issue with
xlwings :
import xlwings as xw
wb = xw.Book(r"filename_of_file_with_pictures_in_it.xlsx")
sht=wb.sheets.add('SheetMod')
sht.range('A1').value = np.random.randn(100, 2)
wb.save(r"path_new_file.xlsx")
With this snippet I managed to insert the random set of values and saved a new copy of the modified xlsx.As you insert the command, the excel file will automatically open showing you the new sheet- without changing the existing ones (graphs and formulas included). Make sure you install all the interdependencies to get xlwings to run in your system. Hope this helps!
You'll need to use an Excel 'reader' like Openpyxl or similar in combnination with Pandas for this, pandas' to_excel function is write only so it will not care what is inside the file when you open it.

Python script to parse a big workbook

I have an extra sized excel file and I need to automate a task I do everyday: Add rows to the bottom with the day's date, save a new workbook, crop the old ones and save as a new file with the day's date.
An example is today only having rows with date 04-10-2016 and the filename would be [sheetname]04102016H12 or [sheetname]04102016H16if it has passed 12 pm.
I've tried xldr, doing this in VBA and so on but I can't get along with VBA and it is slow. So I'd rather use Python here - lightweight, does the job and so on.
Anyway, so far, I have done the follwing:
import xlsxwriter, datetime, xlrd
import pandas as pd
# Parsing main excel sheet to save the correct
with xlrd.open_workbook(r'D:/path/to/file/file.xlsx', on_demand=True) as xls:
for sheet in xls.parse(xls.sheet_names([0])):
dfs = pd.read_excel(xls, sheet, header = 1)
now = datetime.date.today()
df[df['Data'] != now]
if datetime.time()<datetime.time(11,0,0,0):
df.to_excel(r'W:\path\I\need'+str(sheet)+now+'H12.xlsx', index=False)
else:
df.to_excel(r'W:\path\I\need'+str(sheet)+now+'H16.xlsx', index=False)
Unfortunately, this does not separate the main file into as many files as worksheets the workbook contains. It outputs TypeError: 'list' object is not callable, regarding this in xls.parse(xls.sheet_names([0])).
Based on comments below I am updating my answer. Just do:
xls.sheet_names()[0]
However, if you want to loop through the sheets, then you may want all sheet names instead of just the first one.

Categories