I have a folder with 200+ excel files. I have the respective path and sheet names for each file in the folder. Is it possible to merge all of these files into one or a couple large excel file via python? If so, what libraries would be good for me to start reading up on for this type of script?
I am trying to condense the files into 1-8 excel files in total not 200+ excel files.
Thank you!
For example, suppose there are a.xlsx, b.xlsx, c.xlsx.
With using os(by import os) and endswith method, you can take all xlsx files.(You would easily find how to do it)
Then, read xlsx files in the loop(for or while statement) and with pandas and add it into a new excelwriter like below
e.g.
import pandas as pd
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('goal.xlsx', engine='xlsxwriter')
while True:
df = pd.read_excel('excel_file_path')
# Write each dataframe to a different worksheet.
df.to_excel(writer, sheet_name='Sheet{}'.format(sheet_number))
writer.save()
Go to directory where all csv are located (eg cd C:\Temp)
Click on the file directory and type "cmd"; this will open a command prompt at the file location.
Type "copy *.csv combine.csv"
(Replace "csv" with the type of excel file you have, however this will probably work best with csv files)
Related
I am trying to open all Excel (.xlsx) files in one folder and add three sheets (named M1, M2 and M3). Then save it in the existing Excel file (not mandatory). Can you please help? Thanks.
from openpyxl import load_workbook
wb2 = load_workbook(r'C:\Users\alex\mob\830.xlsx')
wb2.create_sheet('M1')
wb2.create_sheet('M2')
wb2.create_sheet('M3')
wb2.save(r'C:\Users\alex\mob\830.xlsx')
This works for each Excel file, but I want to iterate/loop/do it for all files in one folder. All files are .xlsx
You can use glob to list all files ending with specific extensions in a given directory:
import glob
from openpyxl import load_workbook
files = glob.glob("/some/random/location/*.xlsx")
for file in files:
wb2 = load_workbook(file)
wb2.create_sheet('M1')
wb2.create_sheet('M2')
wb2.create_sheet('M3')
wb2.save(file)
What glob.glob() does is - it returns an array of all files matching a specific search. In the current search, you are looking for all files with the .xlsx extension.
Note that this only looks at the extension, not the contents of the file, so if you have a simple test.txt file and rename it to test.xlsx, your program will likely crash.
This answered my question! Thank you very much.
In order for my computer to read the folder, I changed the directory to:
files = glob.glob(r'C:\Users\alex\mob\*.xlsx')
I have a bunch of files in a folder it contains various file types like xlsx, pdf, csv etc. what I am trying to do is export the pdf file names to an excel sheet. how can it be done using pandas please help?
i used for file in glob.glob("*"): print (file)
i got the output what i want is to export the output to excel sheet.
I (a noob) am currently trying to read a directory of .xlsm files into a pandas dataframe, with the intention of merging them all together into one big file. I've done similar tasks in the past with .csv files and had no problems, but this has me at a loss.
I'm currently running this:
import pandas as pd
import glob
import openpyxl
df = [pd.read_excel(filename,engine="openpyxl") for filename in glob.glob(r'\\data\Designer\BI_Development\BI_2022_Objective\BIDataLake\MTT\Automation\TimeTrackingSheets_Automation\TimeTrackingSheets_Automation\TM_TimeTrackingSheets\*.xlsm')]
This solution has worked for me in the past. But here, when I run the above code, i get the following error:
zipfile.BadZipFile: File is not a zip file
Which is confusing me, because the file that I'm trying to access is not a zip file. Granted, there is a zip file with that same name in the same directory, but when I rename the file I'm referencing in my program to distinguish it from the zip file, I get the same error.
Anyone have any ideas? I've lurked for a long time and this is my first question, so apologies if it's not formatted in the proper way. Happy to provide more information as necessary. Thank you in advance!
UPDATE
This was fixed by excluding hidden files in the script, something I was unaware was happening.
path = r'\\data\Designer\BI_Development\BI_2022_Objective\BIDataLake\MTT\Automation\TimeTrackingSheets_Automation\TimeTrackingSheets_Automation\TM_TimeTrackingSheets'
# read all the files with extension .xlsm i.e. excel
filenames = glob.glob(path + "\[!~]*.xlsm")
# print('File names:', filenames)
# empty data frame for the new output excel file with the merged excel files
outputxlsx = pd.DataFrame()
# for loop to iterate all excel files
for file in filenames:
# using concat for excel files
# after reading them with read_excel()
df = pd.concat(pd.read_excel( file, ["BW_TimeSheet"]), ignore_index=True, sort=False)
df['Username'] = os.path.basename(file)
outputxlsx.append(df)
# appending data of excel files
outputxlsx = outputxlsx.append( df, ignore_index=True)
print('Final Excel sheet now generated at the same location:')
outputxlsx.to_excel(path+"/Output.xlsx", index=False)
Thanks everyone for your help!
Please delete the encryption of the file.
engine="openpyxl"
This does not support reading encrypted files.
I refer to this issue.
This problem is related to excel and openpyxl. The best way is trying reading and writing to CSV.
I'm using Python 3.7.
I have to download an excel file (.xls) that has a unique filename every time I download it into a specific downloads folder location.
Then with Python and Pandas, I then have to open the excel file and read/convert it to a dataframe.
I want to automate the process, but I'm having trouble telling Python to get the full name of the XLS file as a variable, which will then be used by pandas:
# add dependencies and set location for downloads folder
import os
import glob
import pandas as pd
download_dir = '/Users/Aaron/Downloads/'
# change working directory to download directory
os.chdir(download_dir)
# get filename of excel file to read into pandas
excel_files = glob.glob('*.xls')
blah = str(excel_files)
blah
So then for example, the output for "blah" is:
"['63676532355861.xls']"
I have also tried just using "blah = print(excel_files)" for the above block, instead of the "str" method, and assigning that to a variable, which still doesn't work.
And then the rest of the process would do the following:
# open excel (XLS) file with unknown filename in pandas as a dataframe
data_df = pd.read_excel('WHATEVER.xls', sheet_name=None)
And then after I convert it to a data frame, I want to DELETE the excel file.
So far, I have spent a lot of time reading about fnames, io, open, os.path, and other libraries.
I still don't know how to get the name of the unknown .XLS file into a variable, and then later deleting that file.
Any suggestions would be greatly appreciated.
This code finds an xls file in your specified path reads the xls file and deletes the file.If your directory contains more than 1 xls file,It reads the last one.You can perform whatever operation you want if you find more than one xls files.
import os
for filename in os.listdir(os.getcwd()):
if filename.endswith(".xls"):
print(filename)
#do your operation
data_df = pd.read_excel(filename, sheet_name=None)
os.remove(filename)
Check this,
lst = os.listdir()
matching = [s for s in lst if '.xls' in s]
matching will have all list of excel files.
As you are having only one excel file, you can save in variable like file_name = matching[0]
I have essentially the converse of the problem answered in Python - Win32com - Open Workbook & Create a New Excel File for Each Tab. I need to iterate recursively thru a set of folders and copy the single tab containing data in a bunch of individual xls files into a single target xls, renaming the tabs appropriately as I go.
I have no formatting or formulae to copy, so nothing fancy needed here. Just first sheet in the individual xls files appended into the target xls.
Thanks in advance for any ideas/snippets.
Suppose there is a folder containing two files and a folder:
xls (folder containg excel files)
all.xlsx (the aggregated excel file that you are going to copy
values to)
copy.py (python script to do the work)
For each excel file in the xls folder, only cells A1:C3 in Sheet1 contain values. The python script will copy the values in cells A1:C3 of each individual excel file to the aggreated excel file in a separated sheet and rename it.
import os
from xlwings import Workbook, Sheet, Range
#Get the full paths of the excel files
full_paths = [os.path.abspath(os.path.join('xls', filename)) for filename in os.listdir('xls')]
#Full path of the aggregated excel file
aggregated_path = os.path.abspath(os.path.join(os.path.dirname(__file__), 'all.xlsx'))
for number, path in enumerate(full_paths):
#open the individual excel file
wb = Workbook(path)
#Get the values from the individual file
data = Range('A1').table.value
wb.close()
#open the aggregated excel file
wb = Workbook(aggregated_path)
#Put the values to the aggregated excel file
Range('Sheet' + str(number+1), 'A1').value = data
#Rename the sheet name
Sheet('Sheet' + str(number+1)).name = str(number+1)