I have two separate excel sheets (xlsx format),
Excel 1 - Has 2 separate tabs.
Tab 1 has summary information linked to Tab 2 and
Tab 2 is the data to be taken from Excel 2
Excel 2 - Has relevant info (which needs to be copied to tab 2 of excel 1)
Sample of 2 files are shared in the below link
https://drive.google.com/drive/folders/1inrofeT6v9P0ISEcmbswvpxMMCq5TaV0?usp=sharing
Name references of both the files are the same. Basically, I want to copy the information from Excel 2 and paste it to Excel 1 (Which has a summary sheet to provide summary information)
I tried the below code
# importing openpyxl module
import openpyxl as xl
# opening the source excel file
filename ="D:\\1. Python Extracts\\KA-AVRB-Feb22-4.xlsx"
wb1 = xl.load_workbook(filename)
ws1 = wb1.worksheets[0]
# opening the destination excel file
filename1 ="D:\\2. Summary shees\\KA-AVRB-Feb22-4.xlsx"
wb2 = xl.load_workbook(filename1)
ws2 = wb2.worksheets[1]
# calculate total number of rows and columns in source excel file
mr = ws1.max_row
mc = ws1.max_column
# copying the cell values from source excel file to destination excel file
for i in range (1, mr + 1):
for j in range (1, mc + 1):
# reading cell value from source excel file
c = ws1.cell(row = i, column = j)
# writing the read value to destination excel file
ws2.cell(row = i+1, column = j).value = c.value
# saving the destination excel file
wb2.save(str(filename1))
The above code works with individual files. However, I have 2 sets of 140 excel files (i.e 140 excel summary sheets and 140 excel sheets having data), where I need to copy data from one file and paste it to another as explained above.
I understand I can try to place a for loop for the same, but after much trial, I'm unable to achieve the same.
Help would be highly appreciated!
Keeping source files in a subfolder named sourceFiles, and summaries in a subfolder named summary, we can iterate over all source files and run your function over them to make the summaries.
import os
# importing openpyxl module
import openpyxl as xl
for _, _, file in os.walk("/sourceFiles", topdown=False):
makeSummary(file)
def makeSummary(filename):
# opening the source excel file
#filename ="D:\\1. Python Extracts\\KA-AVRB-Feb22-4.xlsx"
wb1 = xl.load_workbook(filename)
ws1 = wb1.worksheets[0]
# opening the destination excel file
filename1 =".\summary\"+filename
wb2 = xl.load_workbook(filename1)
ws2 = wb2.worksheets[1]
# calculate total number of rows and columns in source excel file
mr = ws1.max_row
mc = ws1.max_column
# copying the cell values from source excel file to destination excel file
for i in range (1, mr + 1):
for j in range (1, mc + 1):
# reading cell value from source excel file
c = ws1.cell(row = i, column = j)
# writing the read value to destination excel file
ws2.cell(row = i+1, column = j).value = c.value
# saving the destination excel file
wb2.save(str(filename1))
PS: I haven't run this code on your files and I am uncertain about slashes now since I used this ages ago. Hence, please try to debug the path if this iteration doesn't work. To see how python's walk() works, refer this.
Related
I am trying to copy and paste data from one excel file to another using openpyxl. I found this script online, but when I run the script, the data from the source file is appended as I want, but the data in the destination file become blank cells. Any help would be greatly appreciated.
#Copying Data from Source File and Pasting in Destination File
#Opening the source excel file
filename ="path_to_file"
wb1 = xl.load_workbook(filename)
ws1 = wb1.worksheets[0]
#Opening the destination excel file
filename1 ="path_to_file"
wb2 = xl.load_workbook(filename1)
ws2 = wb2.active
#Calculating total number of rows and columns in source excel file
mr = ws1.max_row
mc = ws1.max_column
#Copying the cell values from source excel file to destination excel file
for i in range (1, mr + 1):
for j in range (1, mc + 1):
#Reading cell value from source excel file
c = ws1.cell(row = i, column = j)
#Writing the read value to destination excel file
ws2.cell(row = i, column = j).value = c.value
#Saving the destination excel file
wb2.save(str(filename1))
I have 40 excel workbooks that are source files, and 40 corresponding excel workbooks that are the destination files, every week I open the 40 source files and manually copy the data from a specific worksheet in each file and paste it into the corresponding destination file. I want to automate this task with Python and openpyxl.
Source files:
Destination files:
So far, I am able to copy data from one excel workbook and paste it into another one but I don't know how to expand it to cover copying from multiple input files and pasting in multiple destination files.
import openpyxl
# opening the source excel file
wbo = openpyxl.load_workbook('ABC_Export.xlsx')
#attach the ranges to the sheet
wso = wbo["Report Data"]["A9":"B100000"]
# opening the destination excel file
wbd = openpyxl.load_workbook("ABC_2023.xlsm", keep_vba=True)
#attach the ranges to the sheet
wsd = wbd["Sheet1"]["A2":"B100000"]
#step1 : pair the rows
for row1,row2 in zip(wso,wsd):
#within the row pair, pair the cells
for cell1, cell2 in zip(row1,row2):
cell2.value = cell1.value
#save document
wbd.save('ABC_2023.xlsm')
This is an example of what I want to copy from a source file:
and where to paste it into the corresponding destination file:
I don't think your code is working but maybe this can guide you a bit:
import os
from glob import glob
from openpyxl import load_workbook
def copy_data(src_file: str, dst_file: str) -> None:
# open files
ws_src = load_workbook(src_file)["Report Data"]
wb_dst = load_workbook(dst_file, keep_vba=True)
ws_dst = wb_dst["Sheet1"]
# configuration
start_row_src = 2 # A2
start_row_dst = 10 # A10
rows2copy = 100000
# copy data from src_file to dst_file
input_offset = start_row_dst - start_row_src
for i in range(start_row_src, rows2copy):
ws_dst[f"A{i}"].value = ws_src[f"A{i + input_offset}"].value
ws_dst[f"B{i}"].value = ws_src[f"B{i + input_offset}"].value
# save the modifications
wb_dst.save(dst_file)
# files directories
src_dir_path = "your/source/files/directory"
dst_dir_path = "your/destination/files/directory"
# iterate over all excel files found in source path
workbooks = glob(f"{src_dir_path}/*.xlsx")
for src in workbooks:
dst = dst_dir_path + '/' + os.path.basename(src).replace("_Report.", "_2023.")
copy_data(src, dst)
The idea is to scan for all input files and then call the copy_data function for each one. You will have to tweak it a bit to your needs.
dear community
I was struggling with a piece of code in Python that could get data from a Excel worksheet by reading and after create a new sheet with that data. `
It's not just a copy of the file, because it allows to make something with data on the way before saving it in a new file.
I was reading a file, saving in a intermediary list and after trying to save in the new xls file.
It didn't work because of data type weren't talking with each other. And I got stuck.
I saw this code below from Python Engineering by Michael Zippo, that helped me.
# importing openpyxl module
import openpyxl as xl;
# opening the source excel file
filename ="C:\\Users\\Admin\\Desktop\\trading.xlsx"
wb1 = xl.load_workbook(filename)
ws1 = wb1.worksheets[0]
# opening the destination excel file
filename1 ="C:\\Users\\Admin\\Desktop\\test.xlsx"
wb2 = xl.load_workbook(filename1)
ws2 = wb2.active
# calculate total number of rows and
# columns in source excel file
mr = ws1.max_row
mc = ws1.max_column
# copying the cell values from source
# excel file to destination excel file
for i in range (1, mr + 1):
for j in range (1, mc + 1):
# reading cell value from source excel file
c = ws1.cell(row = i, column = j)
# writing the read value to destination excel file
ws2.cell(row = i, column = j).value = c.value
# saving the destination excel file
wb2.save(str(filename1))
After looking up to new thing about Michael Zippo, (https://python.engineering/python-how-to-copy-data-from-one-excel-sheet-to-another/).
I found a way to improve the read-write FOR loop above:
from openpyxl import Workbook, load_workbook
wb1 = load_workbook('bank_statement.xlsx')
wb2 = Workbook()
sh1 = wb1.active
sh2 = wb2.active
for r in sh1.iter_rows():
for c in r:
sh2[c.coordinate]= c.value
wb2.save('bank_stat_improved.xlsx')
In the middle of the loop, you can do something with data and it will be a very useful code.
I'm trying to create a script which would run through excel files in a folder and copy the contents to a workbook. The aim is to copy the contents of each file onto different columns where the spacing between the columns is a set difference, ie. columns: A, D(A+3) & G(D+3). For my example I am running my code with 3 base datasets.
When I run the code, the final dataset ends up copying across the final excel document 3 times across the specified columns, instead of copying the 3 unique documents to the specified columns.
What I want: A B C
What I get: C C C
Code:
import os
import openpyxl
from openpyxl import Workbook, load_workbook
import string
for file in os.listdir(file_path):
if file.endswith('.xlsx'):
print(f'Loading file {file}...')
wb = load_workbook(file_path+file)
ws = wb.worksheets[0]
wb1 = load_workbook(new_path+'data.xlsx')
ws1 = wb1.active
#calculate max rows and columns in source dataset
mr = ws.max_row
mc = ws.max_column
m = [0,3,6]
#copying data to new sheet
for i in range(1,mr+1):
for j in range(1,mc+1):
for y in range(0,3):
#reading cell value from source
c = ws.cell(row = i, column = j)
#writing read value to destination
ws1.cell(row = i, column = j+int(m[y])).value = c.value
wb1.save(new_path+'data.xlsx')
Thank you for your help.
Edit:
The data is all in the same format and looks like:https://ibb.co/TMStH9j Current output: https://ibb.co/dmcbSJ1 Desired output: https://ibb.co/C1nqKJv
You need to move the creation and saving of the new workbook out of the for loop so that it is not overwritten each time a new file is looped over.
Also you need a way to count how many files you have looped over, so that you can increment the columns where the new data is copied to in the new workbook. Please see below:
Edit:
To get your expected output, I also removed the inner-most for loop and m list to rather use a single variable to space the columns of each new excel data apart.
import os
import openpyxl
from openpyxl import Workbook, load_workbook
import string
# Create new workbook outside of for loop so that it is not overwritten each loop
wb1 = Workbook()
ws1 = wb1.active
# count variable so each loop increments the column where the data is posted
count = 0
# how many columns to space data apart
col_spacing = 2
for file in os.listdir(file_path):
if file.endswith(".xlsx"):
print(f"Loading file {file}...")
wb = load_workbook(file_path + file)
ws = wb.worksheets[0]
# calculate max rows and columns in source dataset
mr = ws.max_row
mc = ws.max_column
# copying data to new sheet
for i in range(1, mr + 1):
for j in range(1, mc + 1):
# reading cell value from source
c = ws.cell(row=i, column=j)
# writing read value to destination
ws1.cell(row=i, column=count + j + (count * col_spacing)).value = c.value
# increment column count
count += 1
# save new workbook after all files have been looped through
wb1.save(new_path + "data.xlsx")
I'm trying to merge multiple file into one excel file using openpyxl in python
I know there is a way using panda, but my files have a problem there have been always 2 empty rows in the beginning of the excel file
So to avoid that I'm using openpyxl with the old way
Just open all files and copy the specific rows and columns to a new one
The first step I find out how to do it by just copy the specific row's and column of the new xlsx file
but I didn't find a way to add the next file (only the value not the header) under the first one
this my code
So far it just copy the first file (the header and the value)
But I didn't find out how to add the next file (only the value) under the first one
import openpyxl as xl
from openpyxl import Workbook
import os
def find_xlsx_files():
# the current path
dir_path = os.path.dirname(os.path.abspath(__file__))
# list to store files
res = []
# Iterate directory
for file in os.listdir(dir_path):
# check only xlsx files
if file.endswith('.xlsx'):
res.append(file)
return (res)
wb1 = xl.load_workbook (find_xlsx_files()[0])
ws1 = wb1.worksheets [0]
# open target Excel file
wb2 = Workbook()
ws = wb2.active
ws.title = "Changed Sheet"
wb2.save(filename = 'sample_book.xlsx')
ws2 = wb2.active
# calculate the total rows and
# columns in the Excel source file
mr = ws1.max_row
mc = ws1.max_column
# copy cell values from source
# Excel file to target Excel file
for i in range ( 3 , mr + 1 ):
for j in range ( 2 , mc + 1 ):
# read cell value from Excel source file
c = ws1.cell (row = i, column = j)
# writing the read value to the target Excel file
ws2.cell (row = i, column = j) .value = c.value
# save target Excel file
wb2.save ( str ('sample_book.xlsx'))
What you are doing is creating a list of the excel files in the default directory then just opening the first file '[0]' in the list with the line;
wb1 = xl.load_workbook (find_xlsx_files()[0])
This will never attempt to access any other excel file in the list. Having the list generation in the load book command isn't good, you don't want to be generating the list of available excel files each time you process a file. Calling of the function find_xlsx_files() should be done once.
The easiest fix to your code is to get your list of excel files and then iterate that list for processing.
excel_files = find_xlsx_files()
for xl_file in excel_files:
wb1 = xl.load_workbook(xl_file)
...
Also it should not be necessary to save the book until you have finished writing all data.
The function can be simplified using glob instead if you prefer.
import glob
import os
from openpyxl import Workbook, load_workbook
dir_path = os.path.dirname(os.path.abspath(__file__))
excel_files = glob.glob(dir_path + "/[!~]*.xlsx")
for xl_file in excel_files:
wb1 = load_workbook(xl_file)
ws1 = wb1.worksheets[0]
# open target Excel file
wb2 = Workbook()
ws = wb2.active
ws.title = "Changed Sheet"
# wb2.save(filename='sample_book.xlsx')
ws2 = wb2.active
# calculate the total rows and
# columns in the Excel source file
mr = ws1.max_row
mc = ws1.max_column
# copy cell values from source
# Excel file to target Excel file
for i in range(3, mr+1):
for j in range(2, mc+1):
# read cell value from Excel source file
c = ws1.cell(row=i, column=j)
# writing the read value to the target Excel file
ws2.cell(row=i, column=j).value = c.value
# save target Excel file
wb2.save(str('sample_book.xlsx'))
This is also assuming there is only one sheet in each excel file you want to process since you're only opening the first sheet.
ws1 = wb1.worksheets[0]