Open all Excel files in one folder and add sheets - python

I am trying to open all Excel (.xlsx) files in one folder and add three sheets (named M1, M2 and M3). Then save it in the existing Excel file (not mandatory). Can you please help? Thanks.
from openpyxl import load_workbook
wb2 = load_workbook(r'C:\Users\alex\mob\830.xlsx')
wb2.create_sheet('M1')
wb2.create_sheet('M2')
wb2.create_sheet('M3')
wb2.save(r'C:\Users\alex\mob\830.xlsx')
This works for each Excel file, but I want to iterate/loop/do it for all files in one folder. All files are .xlsx

You can use glob to list all files ending with specific extensions in a given directory:
import glob
from openpyxl import load_workbook
files = glob.glob("/some/random/location/*.xlsx")
for file in files:
wb2 = load_workbook(file)
wb2.create_sheet('M1')
wb2.create_sheet('M2')
wb2.create_sheet('M3')
wb2.save(file)
What glob.glob() does is - it returns an array of all files matching a specific search. In the current search, you are looking for all files with the .xlsx extension.
Note that this only looks at the extension, not the contents of the file, so if you have a simple test.txt file and rename it to test.xlsx, your program will likely crash.

This answered my question! Thank you very much.
In order for my computer to read the folder, I changed the directory to:
files = glob.glob(r'C:\Users\alex\mob\*.xlsx')

Related

Merge 200 + Excel Files into 1

I have a folder with 200+ excel files. I have the respective path and sheet names for each file in the folder. Is it possible to merge all of these files into one or a couple large excel file via python? If so, what libraries would be good for me to start reading up on for this type of script?
I am trying to condense the files into 1-8 excel files in total not 200+ excel files.
Thank you!
For example, suppose there are a.xlsx, b.xlsx, c.xlsx.
With using os(by import os) and endswith method, you can take all xlsx files.(You would easily find how to do it)
Then, read xlsx files in the loop(for or while statement) and with pandas and add it into a new excelwriter like below
e.g.
import pandas as pd
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('goal.xlsx', engine='xlsxwriter')
while True:
df = pd.read_excel('excel_file_path')
# Write each dataframe to a different worksheet.
df.to_excel(writer, sheet_name='Sheet{}'.format(sheet_number))
writer.save()
Go to directory where all csv are located (eg cd C:\Temp)
Click on the file directory and type "cmd"; this will open a command prompt at the file location.
Type "copy *.csv combine.csv"
(Replace "csv" with the type of excel file you have, however this will probably work best with csv files)

Trying to read a directory of .xlsm files in pandas

I (a noob) am currently trying to read a directory of .xlsm files into a pandas dataframe, with the intention of merging them all together into one big file. I've done similar tasks in the past with .csv files and had no problems, but this has me at a loss.
I'm currently running this:
import pandas as pd
import glob
import openpyxl
df = [pd.read_excel(filename,engine="openpyxl") for filename in glob.glob(r'\\data\Designer\BI_Development\BI_2022_Objective\BIDataLake\MTT\Automation\TimeTrackingSheets_Automation\TimeTrackingSheets_Automation\TM_TimeTrackingSheets\*.xlsm')]
This solution has worked for me in the past. But here, when I run the above code, i get the following error:
zipfile.BadZipFile: File is not a zip file
Which is confusing me, because the file that I'm trying to access is not a zip file. Granted, there is a zip file with that same name in the same directory, but when I rename the file I'm referencing in my program to distinguish it from the zip file, I get the same error.
Anyone have any ideas? I've lurked for a long time and this is my first question, so apologies if it's not formatted in the proper way. Happy to provide more information as necessary. Thank you in advance!
UPDATE
This was fixed by excluding hidden files in the script, something I was unaware was happening.
path = r'\\data\Designer\BI_Development\BI_2022_Objective\BIDataLake\MTT\Automation\TimeTrackingSheets_Automation\TimeTrackingSheets_Automation\TM_TimeTrackingSheets'
# read all the files with extension .xlsm i.e. excel
filenames = glob.glob(path + "\[!~]*.xlsm")
# print('File names:', filenames)
# empty data frame for the new output excel file with the merged excel files
outputxlsx = pd.DataFrame()
# for loop to iterate all excel files
for file in filenames:
# using concat for excel files
# after reading them with read_excel()
df = pd.concat(pd.read_excel( file, ["BW_TimeSheet"]), ignore_index=True, sort=False)
df['Username'] = os.path.basename(file)
outputxlsx.append(df)
# appending data of excel files
outputxlsx = outputxlsx.append( df, ignore_index=True)
print('Final Excel sheet now generated at the same location:')
outputxlsx.to_excel(path+"/Output.xlsx", index=False)
Thanks everyone for your help!
Please delete the encryption of the file.
engine="openpyxl"
This does not support reading encrypted files.
I refer to this issue.
This problem is related to excel and openpyxl. The best way is trying reading and writing to CSV.

Find a Excel file in directory, compress and send it to another folder

I have an Excel file WK6 that is downloaded in the below folder:
C:\Users\kj\Scripts\Sh\Result\Wk6
The Python script should first navigate till the above directory and then find the excel file WK6 (the name of the excel file changes as per week) and compress it. Then move it to some other directory.
Please help me understand how can I find and compress the file in python?
for zipping the file you can use following:
from zipfile import ZipFile
import os
path = r'C:\Users\kj\Scripts\Sh\Result\Wk6'
os.chdir(path)
ZipFile('<new path>/name.zip', 'w').write('Wk6.csv')

Opening excels from different folders with python

Hi have a folder and inside that folder I have got nfolders(400)
In each of those folders I have several documents and one of them is an excel with a key name
Is there any possibility of oppening those excel as df1, df2,dfn?
Does anyone know how to Do a foor loop that opens each of those 400 folders?
Thanks!!
Assuming your excel files have extension '.xlsx'.
I use os.walk(path) from os package. os.walk traverses all the subfolders.
Put the path to the parent folder in path variable.
import os
import pandas as pd
path_to_parentfolder = 'Parent_Folder/'
files = []
for r, d, f in os.walk(path_to_parentfolder):
for file in f:
if '.xlsx' in file: #Enter the extension for your file type
files.append(os.path.join(r, file).replace('/','\\'))
df_list = [pd.read_excel(open(file)) for file in files] #All your data is stored in the list
Read about os.walk in its docs

how to get the name of an unknown .XLS file into a variable in Python 3.7

I'm using Python 3.7.
I have to download an excel file (.xls) that has a unique filename every time I download it into a specific downloads folder location.
Then with Python and Pandas, I then have to open the excel file and read/convert it to a dataframe.
I want to automate the process, but I'm having trouble telling Python to get the full name of the XLS file as a variable, which will then be used by pandas:
# add dependencies and set location for downloads folder
import os
import glob
import pandas as pd
download_dir = '/Users/Aaron/Downloads/'
# change working directory to download directory
os.chdir(download_dir)
# get filename of excel file to read into pandas
excel_files = glob.glob('*.xls')
blah = str(excel_files)
blah
So then for example, the output for "blah" is:
"['63676532355861.xls']"
I have also tried just using "blah = print(excel_files)" for the above block, instead of the "str" method, and assigning that to a variable, which still doesn't work.
And then the rest of the process would do the following:
# open excel (XLS) file with unknown filename in pandas as a dataframe
data_df = pd.read_excel('WHATEVER.xls', sheet_name=None)
And then after I convert it to a data frame, I want to DELETE the excel file.
So far, I have spent a lot of time reading about fnames, io, open, os.path, and other libraries.
I still don't know how to get the name of the unknown .XLS file into a variable, and then later deleting that file.
Any suggestions would be greatly appreciated.
This code finds an xls file in your specified path reads the xls file and deletes the file.If your directory contains more than 1 xls file,It reads the last one.You can perform whatever operation you want if you find more than one xls files.
import os
for filename in os.listdir(os.getcwd()):
if filename.endswith(".xls"):
print(filename)
#do your operation
data_df = pd.read_excel(filename, sheet_name=None)
os.remove(filename)
Check this,
lst = os.listdir()
matching = [s for s in lst if '.xls' in s]
matching will have all list of excel files.
As you are having only one excel file, you can save in variable like file_name = matching[0]

Categories