Python Iterating program - python

I am a beginner programmer who is working on my first project. I am trying to create a script that unzips two files and extracts a folder that contains .csv files to a temp directory. I am hoping to import and format those .csv with Xlsx lib. My code is able to do the first part, it unzips all the .csv's perfectly.
I need some pointers on how to iterate over all the .csv's on the temp folder and copy the data of each .csv to an excel spreadsheet. I must note that the .csv files have only one row with 5 columns of data. Here is what I have:
for zfiles in glob.glob("*.zip"):
with zipfile.ZipFile(zfiles, 'r') as myS:
myS.extractall(tempDir)
os.chdir(tempDir)
for z in glob.glob("*.zip"):
with zipfile.ZipFile(z, 'r') as mySecondS:
mySecondS.extractall()
Thank you!

Related

Merge 200 + Excel Files into 1

I have a folder with 200+ excel files. I have the respective path and sheet names for each file in the folder. Is it possible to merge all of these files into one or a couple large excel file via python? If so, what libraries would be good for me to start reading up on for this type of script?
I am trying to condense the files into 1-8 excel files in total not 200+ excel files.
Thank you!
For example, suppose there are a.xlsx, b.xlsx, c.xlsx.
With using os(by import os) and endswith method, you can take all xlsx files.(You would easily find how to do it)
Then, read xlsx files in the loop(for or while statement) and with pandas and add it into a new excelwriter like below
e.g.
import pandas as pd
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('goal.xlsx', engine='xlsxwriter')
while True:
df = pd.read_excel('excel_file_path')
# Write each dataframe to a different worksheet.
df.to_excel(writer, sheet_name='Sheet{}'.format(sheet_number))
writer.save()
Go to directory where all csv are located (eg cd C:\Temp)
Click on the file directory and type "cmd"; this will open a command prompt at the file location.
Type "copy *.csv combine.csv"
(Replace "csv" with the type of excel file you have, however this will probably work best with csv files)

Trying to read a directory of .xlsm files in pandas

I (a noob) am currently trying to read a directory of .xlsm files into a pandas dataframe, with the intention of merging them all together into one big file. I've done similar tasks in the past with .csv files and had no problems, but this has me at a loss.
I'm currently running this:
import pandas as pd
import glob
import openpyxl
df = [pd.read_excel(filename,engine="openpyxl") for filename in glob.glob(r'\\data\Designer\BI_Development\BI_2022_Objective\BIDataLake\MTT\Automation\TimeTrackingSheets_Automation\TimeTrackingSheets_Automation\TM_TimeTrackingSheets\*.xlsm')]
This solution has worked for me in the past. But here, when I run the above code, i get the following error:
zipfile.BadZipFile: File is not a zip file
Which is confusing me, because the file that I'm trying to access is not a zip file. Granted, there is a zip file with that same name in the same directory, but when I rename the file I'm referencing in my program to distinguish it from the zip file, I get the same error.
Anyone have any ideas? I've lurked for a long time and this is my first question, so apologies if it's not formatted in the proper way. Happy to provide more information as necessary. Thank you in advance!
UPDATE
This was fixed by excluding hidden files in the script, something I was unaware was happening.
path = r'\\data\Designer\BI_Development\BI_2022_Objective\BIDataLake\MTT\Automation\TimeTrackingSheets_Automation\TimeTrackingSheets_Automation\TM_TimeTrackingSheets'
# read all the files with extension .xlsm i.e. excel
filenames = glob.glob(path + "\[!~]*.xlsm")
# print('File names:', filenames)
# empty data frame for the new output excel file with the merged excel files
outputxlsx = pd.DataFrame()
# for loop to iterate all excel files
for file in filenames:
# using concat for excel files
# after reading them with read_excel()
df = pd.concat(pd.read_excel( file, ["BW_TimeSheet"]), ignore_index=True, sort=False)
df['Username'] = os.path.basename(file)
outputxlsx.append(df)
# appending data of excel files
outputxlsx = outputxlsx.append( df, ignore_index=True)
print('Final Excel sheet now generated at the same location:')
outputxlsx.to_excel(path+"/Output.xlsx", index=False)
Thanks everyone for your help!
Please delete the encryption of the file.
engine="openpyxl"
This does not support reading encrypted files.
I refer to this issue.
This problem is related to excel and openpyxl. The best way is trying reading and writing to CSV.

How to combine excel files with different folders

I am new to python, have got a task of combining the excel files. I have 100 folders, each folder have 2 sub folders and in each sub folder have 24 excel files. now I have to find maximum value and minimum values of each and every 24 files and that value I have to concatenate with parent excel file (this one have to do for all 24 files). then i have to concatenate all 24 files have to write on first column of excel file. and this should be repeat for all the 100 folder, so finally i have to get single excel file with 100 column.
presently I am using manual method for every file and it is over writing is become complicated and time consuming please someone help me to get-out of that method
data12 = pd.read_excel (r'C:\Users\Videos\1.xlsx')
A= max(data12)
C= min(data12)
frame_data= [data12, A, C]
result = pd.concat(frame_data)
result.to_excel("output1.xlsx", sheet_name='modify_data', index=False)
You can use python glob library to browse through all the files in all the folders. You just need to pass the name of master folder. and then use one loop to read all the files one by one.
Link for reference: Python glob multiple filetypes

Converting .xls to .csv before recombining multiple files into a .xls

I am working on a webscraper tool which downloads excel files from a website. Of course, those .xls files are actually just renamed .csv files, which prevents me from just combining the .xls files together. Instead, I need to convert them all to .csv, them use pyexcel's pyexcel.merge_csv_to_a_book(filelist, outfilename='merged.xls') function to create a excel book from these .csv files.
Here is what I tried:
def concatenate_excel_files():
indexer = 0
excel_file_list = []
for file in glob.glob(os.getcwd()+'\Reports\*.'):
pyexcel.save_as(file_name=file, dest_file_name=str(indexer)+'.csv')
excel_file_list[indexer] = file
indexer += 1
pyexcel.merge_csv_to_a_book(excel_file_list, outfilename='merged.xls')
This fails to even convert the files to .csv (IndexError: list index out of range error.)
Any help rewriting this would be appreciated.
Answer by chfw:
for pyexcel to work properly, it needs to know file extension but in your case, the file extension is missing. And it will more helpful if the full stack trace is shown.

Renaming and Saving Excel Files With Python

I've got a pretty simple task but I haven't done too many functions with excel within python and I'm not sure how to go about doing this.
What I need to do:
Look at many excel files within subfolders, rename them according to information within the file and store them in all in one folder somewhere else.
The data is structured like this:
Main Folder
Subfolder1
File1
File2
File3
...
For about a hundred subfolders and several files within each subfolder.
From here, I want to pull the company name, part number, and date from within the file and use those to rename the excel file. Not sure how to rename the file.
Then save it somewhere else. I'm having trouble finding all these functions, any advice?
Check the os and os.path module for listing folder contents (walk, listdir) and working with path names (abspath, basename etc.)
Also, shutil has some interesting functions for copying stuff. Check out copyfile and specify the dst parameter based on the data you read from the excel file.
This page can help you getting at the Excel data: http://www.python-excel.org/
You probably want to have some highlevel code like this:
for subfolder_name in os.listdir(MAIN_FOLDER):
# exercise left to reader: filter out non-folders
subfolder_path = os.path.join(MAIN_FOLDER, subfolder_name)
for excel_file_name in os.listdir(os.path.join(MAIN_FOLDER, subfolder_name)):
# exercise left to reader: filter out non-excel-files
excel_file_path = os.path.join(subfolder_path, excel_file_name)
new_excel_file_name = extract_filename_from_excel_file(excel_file_path)
new_excel_file_path = os.path.join(NEW_MAIN_FOLDER, subfolder_name,
new_excel_file_name)
shutil.copyfile(excel_file_path, new_excel_file_path)
You'll have to provide extract_filename_from_excel_file yourself using the xlrd module from the site I mentioned.

Categories