Merge all excel files into one file with multiple sheets - python

i would like some help.
I have multiple excel files, each file only has one sheet.
I would like to combine all excel files into just one file but with multiple sheets one sheet per excel file keeping the same sheet names.
this is what i have so far:
import pandas as pd
from glob import glob
import os
excelWriter = pd.ExcelWriter("multiple_sheets.xlsx",engine='xlsxwriter')
for file in glob('*.xlsx'):
df = pd.read_excel(file)
df.to_excel(excelWriter,sheet_name=file,index=False)
excelWriter.save()
All the excel files looks like this:
https://iili.io/HfiJRHl.png
sorry i cannot upload images here, dont know why but i pasted the link
But all the excel files have the exact same columns and rows and just one sheet, the only difference is the sheet name
Thanks in advance

import pandas as pd
import os
output_excel = r'/home/bera/Desktop/all_excels.xlsx'
#List all excel files in folder
excel_folder= r'/home/bera/Desktop/GIStest/excelfiles/'
excel_files = [os.path.join(root, file) for root, folder, files in os.walk(excel_folder) for file in files if file.endswith(".xlsx")]
with pd.ExcelWriter(output_excel) as writer:
for excel in excel_files: #For each excel
sheet_name = pd.ExcelFile(excel).sheet_names[0] #Find the sheet name
df = pd.read_excel(excel) #Create a dataframe
df.to_excel(writer, sheet_name=sheet_name, index=False) #Write it to a sheet in the output excel

Related

How to use Python to extract Excel Sheet data

I have many folders, each folders contains 1 excel file like 1Aug2022, 2Aug2022...
I want python to Read thru all Folders, and only open the excel file name like 19AUG2022, the excel file have many sheets inside like IP-1*****, IP-2*****, IP-3*****. Then go to sheets with (IP-2*****) to extract 2columns of data.
How can I do it in python?
You can use pandas package: https://pandas.pydata.org/
an example is
import pandas as pd
your_excel_path = "your/path/to/the/excel/file"
data = pd.read_excel(your_excel_path, sheet_name = "19AUG2022") # If you want to read specific sheet's data
data = pd.read_excel(your_excel_path, sheet_name = None) # If you want to read all sheets' data, it will return a list of dataframes
As Fergus said use pandas.
The code to search all directorys may look like that:
import os
import pandas as pd
directory_to_search = "./"
sheet_name = "IP-2*****"
for root, dirs, files in os.walk(directory_to_search):
for file in files:
if file == "19AUG2022":
df = pd.read_excel(io=os.path.join(root, file), sheet_name=sheet_name)

How to convert multiple sheets in an excel workbook to csv files in python

I have a excel workbook that contains 8 sheets with different alphabetical names. i want to create csv files for each of these sheets and store it in a folder in python. Currently i am able to do this for a single sheet from the workbook but i am struggling to make a workflow on how to convert multiple sheets and store them as csv in a single folder. Here is my code for now:
import pandas as pd
my_csv=r'C:\Users\C\arcgis\New\NO.csv'
data_xls = pd.read_excel(r"C:\Users\C\Desktop\plots_data1.xlsx", "NO", index_col=0)
p=data_xls.to_csv(my_csv, encoding='utf-8')
If you want to get all of the sheets, you can pass sheet_name=None to the read_excel() call. This will then return a dictionary containing each sheet name as a key, with the value being the dataframe. With this you can iterate over each and create separate CSV files.
The following example uses a base filename with the sheetname appended, e.g. output_sheet1.csv output_sheet2.csv:
import pandas as pd
for sheet_name, df in pd.read_excel(r"input.xlsx", index_col=0, sheet_name=None).items():
df.to_csv(f'output_{sheet_name}.csv', index=False, encoding='utf-8')
It assumes that all of your sheetnames are suitable for being used as filenames.

Python/Pandas: Filter out files with specific keyword

I am splitting a xlsm file ( with multiple sheets) into a csv with each sheet as a separate csv file. I want to save into csv files only the sheets whose name contain the keyword "Robot" or "Auto". How can I do it? Currently it is saving all sheets into csv files. Here is the code I am using -
import pandas as pd
xl = pd.ExcelFile('Sample_File.xlsm')
for sheet in xl.sheet_names:
df = pd.read_excel(xl,sheet_name=sheet)
df1.to_csv(f"{sheet}.csv",index=False)
Can you try this?
import pandas as pd
import re
xl = pd.ExcelFile('Sample_File.xlsm')
for sheet in xl.sheet_names:
if re.search('Robot|Auto', sheet):
df = pd.read_excel(xl,sheet_name=sheet)
df.to_csv(f"{sheet}.csv",index=False)

Python: how to make a loop to copy data from different Excel files into a new one in an iterative way with pandas

I need to copy data from different Excel files into a new one. I would like to just tell the program to take all the files into a specific folder and copy two columns from each of them into a new Excel file. I tried a for loop but it overwrites data coming from different files and I get a new Excel file with just one sheet with data copied from the last file read by the program. Could you help me, please?
Here is my code:
import os.path
import pandas as pd
folder=r'C:\\Users\\PycharmProjects\\excelfile\\'
for fn in os.listdir(folder):
fx = pd.read_excel(os.path.join(folder, fn), usecols='H,E')
with pd.ExcelWriter('Output.xlsx') as writer:
ws = os.path.splitext(fn)[0]
fx.to_excel(writer, sheet_name=ws)
You should open the output file in append mode like so:
with pd.ExcelWriter("Output.xlsx", engine='openpyxl', mode='a') as writer:
ws = os.path.splitext(fn)[0]
fx.to_excel(writer, sheet_name=ws)

How can I extract the same excel sheet from multiple files?

I have a directory of similar excel files and want to extract the first sheet from each file and save it as a .csv file. Currently have code which works to extract and save sheet from individual file:
import glob
import pandas as pd
f = glob.glob('filename.xlsx') # assume the path
for excel in f:
out = excel.split('.')[0]+'.csv'
df = pd.read_excel(excel) # if only the first sheet is needed.
df.to_csv(out)
You can get all your files into a list using glob with a list comprehension:
files_to_be_read = glob.glob("*.xlsx") #Assuming you also have the path to the folder where the excel files are saved
for i in files_to_be_read:
df_in = pd.read_excel(i) #You pass the path, pd.read_excel always uses the first sheet by default
df_out = pd.to_csv(i+'.csv') #You will save the file with the same name, but in csv format

Categories