Copying an entire column using OpenPyXL in Python 3 - python

I'm trying to copy an entire column over using OpenPyXL. Google seems to offer a lot of examples using ranges, but not for an entire column.
I have a workbook with a single worksheet with a load of dates in column A and column JX (A contains monthly dates, JX contains quarterly dates). I want the monthly dates column (in A:A) to be copied over to each worksheet ending in 'M' in my target workbook, and the quarterly dates column (in JX:JX) to the worksheets ending in 'Q'.
However, for some reason the last nested for loop, for src, dst in zip(ws_base[monthRange], ws_target['A:A']): is only copying the first cell, and nothing else. It looks like I'm identifying the correct column with my monthRange and quarterRange strings, but Python isn't looping through the whole column despite the fact that I've got two ranges defined.
Does anyone have any ideas?
# Load the target workbook
targetwb = openpyxl.load_workbook('pythonOutput.xlsx')
# Load the source workbook
wb_base = openpyxl.load_workbook('Baseline_IFRS9_' + reportingMonth+'.xlsx')
# Go to row 9 and find "Geography:" to identify the relevant
# month and quarter date columns
sentinel = u"Geography:"
ws_base = wb_base.active
found = 0
dateColumns = []
for column in ws_base:
for cell in column:
if cell.value == sentinel:
dateColumns.append(cell.column) #
found + 1
if found == 2:
break
ColumnM = dateColumns[0]
ColumnQ = dateColumns[1]
print('Monthly col is ' + ColumnM)
print('Quarterly col is ' + ColumnQ)
IndexM = int(openpyxl.utils.column_index_from_string(str(ColumnM)))
IndexQ = int(openpyxl.utils.column_index_from_string(str(ColumnQ)))
print('Monthly col index is ' + str(IndexM))
print('Quarterly col index is ' + str(IndexQ))
print('Proceeding to paste into our new workbook...')
sheetLoop = targetwb.get_sheet_names()
for sheets in sheetLoop:
if sheets.endswith('Q'):
ws_target = targetwb[sheets]
quarterRange = ColumnQ + ':' + ColumnQ
print('Copying and pasting quarterly dates into: ' + sheets)
for src, dst in zip(ws_base[quarterRange], ws_target['A:A']):
dst.value = src.value
elif sheets.endswith('M'):
ws_target = targetwb[sheets]
monthRange = ColumnM + ':' + ColumnM
print('Copying and pasting monthly dates into: ' + sheets)
for src, dst in zip(ws_base[monthRange], ws_target['A:A']):
dst.value = src.value
targetwb.save('pythonOutput.xlsx')
Here's a simpler form of my problem.
import openpyxl
wb1 = openpyxl.load_workbook('pythonInput.xlsx')
ws1 = wb1.active
wb2 = openpyxl.load_workbook('pythonOutput.xlsx')
ws2 = wb2.active
for src, dst in zip(ws1['A:A'], ws2['B:B']):
print( 'Printing from ' + str(src.column) + str(src.row) + ' to ' + str(dst.column) + str(dst.row))
dst.value = src.value
wb2.save('test.xlsx')
So the problem here is that the for loop only prints from A1 to B1. Shouldn't it be looping down across rows..?

When you load a new XLSX in a spreadsheet editor, you see lots and lots of empty cells in a grid. However, these empty cells are actually omitted from the file, and they will be only written once they have a non-empty value. You can see for yourself: XLSX is essentially a bunch of ZIP-compressed XMLs, which can be opened with any archive manager.
In a similar fashion, new cells in OpenPyXL are only created when you access them. The ws2['B:B'] range only contains one cell, B1, and zip stops when the shortest iterator is exhausted.
With this in mind, you can iterate through the source range and use explicit coordinates to save the values in correct cells:
import openpyxl
wb1 = openpyxl.load_workbook('pythonInput.xlsx')
ws1 = wb1.active
wb2 = openpyxl.load_workbook('pythonOutput.xlsx')
ws2 = wb2.active
for cell in ws1['A:A']:
print('Printing from ' + str(cell.column) + str(cell.row))
ws2.cell(row=cell.row, column=2, value=cell.value)
wb2.save('test.xlsx')

Related

Copying data from multiple excel files to specified columns using python

I'm trying to create a script which would run through excel files in a folder and copy the contents to a workbook. The aim is to copy the contents of each file onto different columns where the spacing between the columns is a set difference, ie. columns: A, D(A+3) & G(D+3). For my example I am running my code with 3 base datasets.
When I run the code, the final dataset ends up copying across the final excel document 3 times across the specified columns, instead of copying the 3 unique documents to the specified columns.
What I want: A B C
What I get: C C C
Code:
import os
import openpyxl
from openpyxl import Workbook, load_workbook
import string
for file in os.listdir(file_path):
if file.endswith('.xlsx'):
print(f'Loading file {file}...')
wb = load_workbook(file_path+file)
ws = wb.worksheets[0]
wb1 = load_workbook(new_path+'data.xlsx')
ws1 = wb1.active
#calculate max rows and columns in source dataset
mr = ws.max_row
mc = ws.max_column
m = [0,3,6]
#copying data to new sheet
for i in range(1,mr+1):
for j in range(1,mc+1):
for y in range(0,3):
#reading cell value from source
c = ws.cell(row = i, column = j)
#writing read value to destination
ws1.cell(row = i, column = j+int(m[y])).value = c.value
wb1.save(new_path+'data.xlsx')
Thank you for your help.
Edit:
The data is all in the same format and looks like:https://ibb.co/TMStH9j Current output: https://ibb.co/dmcbSJ1 Desired output: https://ibb.co/C1nqKJv
You need to move the creation and saving of the new workbook out of the for loop so that it is not overwritten each time a new file is looped over.
Also you need a way to count how many files you have looped over, so that you can increment the columns where the new data is copied to in the new workbook. Please see below:
Edit:
To get your expected output, I also removed the inner-most for loop and m list to rather use a single variable to space the columns of each new excel data apart.
import os
import openpyxl
from openpyxl import Workbook, load_workbook
import string
# Create new workbook outside of for loop so that it is not overwritten each loop
wb1 = Workbook()
ws1 = wb1.active
# count variable so each loop increments the column where the data is posted
count = 0
# how many columns to space data apart
col_spacing = 2
for file in os.listdir(file_path):
if file.endswith(".xlsx"):
print(f"Loading file {file}...")
wb = load_workbook(file_path + file)
ws = wb.worksheets[0]
# calculate max rows and columns in source dataset
mr = ws.max_row
mc = ws.max_column
# copying data to new sheet
for i in range(1, mr + 1):
for j in range(1, mc + 1):
# reading cell value from source
c = ws.cell(row=i, column=j)
# writing read value to destination
ws1.cell(row=i, column=count + j + (count * col_spacing)).value = c.value
# increment column count
count += 1
# save new workbook after all files have been looped through
wb1.save(new_path + "data.xlsx")

Data only not working with openpyxl in python

I'm a beginner in Python and I'm developing a program that take some data of a .xlsx and put it into an other .xlsx.
To do so decided to use openpyxl. Here is the beginning of my code :
path1 = "sourceFile.xlsx"
path2 = "targetFile.xlsx"
sheet1 = openpyxl.load_workbook(path1, data_only=True)
sheet2 = openpyxl.load_workbook(path2)
As you can see I use the "data_only=True" to only take the data of my source file. My problem is that with this solution, "None" is returned for few cells of the source file. When I delete the "data_only=True" parameter, the formula is returned, "=B28" in these case. It's not what I want by the way that B28 cell of the target file has not the same value as B28 cell of source file.
I already search for solutions but surprisedly found nothing. If you have any idea you're welcomed !
If B28's value in the original file is different than the output file, then the issue is likely with the code you're using to copy the cells. When asked how you're extracting the cells, you gave code for extracting the value of a single cell. How are you extracting ALL the cells? For-loop? If you shared that code, we can further analyze this problem.
I'm including code which copies values from one file to another, you should be able to tweak this to your needs.
from openpyxl import load_workbook, Workbook
## VERSION 1: Output will have formulas from WB1
WB1 = load_workbook('int_column.xlsx')
WB1_WS1 = WB1['Sheet']
WB2 = Workbook()
WB2_WS1 = WB2.active # get the active sheet, so you don't need to create then delete one
# copy rows
for x, row in enumerate(WB1_WS1.rows):
if x < 100: # only copy first 100 rows
num_cells_in_row = len(row)
for y in range(num_cells_in_row):
WB2_WS1.cell(row=x + 1, column=y + 1).value = WB1_WS1.cell(row=x + 1, column=y + 1).value
WB2.save('copied.xlsx')
## VERSION 2: Output will have value displayed in cells in WB1
WB1 = load_workbook('int_column.xlsx', data_only=True)
WB1_WS1 = WB1['Sheet']
WB2 = Workbook()
WB2_WS1 = WB2.active # get the active sheet, so you don't need to create then delete one
# copy rows
for x, row in enumerate(WB1_WS1.rows):
if x < 100: # only copy first 100 rows
num_cells_in_row = len(row)
for y in range(num_cells_in_row):
WB2_WS1.cell(row=x + 1, column=y + 1).value = WB1_WS1.cell(row=x + 1, column=y + 1).value
WB2.save('copied.xlsx')
Please post more code if you need further assistance.

Excel parser stuck on one row

So I was making a quick script to loop through a bunch of sheets in an excel file (22 to be exact) and what I wanted to do was the following:
Open the excel sheet and open the sheet named "All" which contained a list of names and then loop through each name and do the following
To loop through all the other 22 sheets in the same workbook and look through each one for the name, which I knew was in the 'B' column.
If the name were to be found, I wanted to take all the columns in that row containing the data for that name and these columns were from A-H
Then copy and paste them next to the original name (same row) in the 'All sheet' while leaving a bit of a space between the original name and the others.
I wanted to do this for all 22 sheets and for the 200+ names listed in the 'All' sheet, my code is as follows:
import openpyxl, pprint
columns = ['A','B','C','D','E','F','G','H']
k = 10
x = 0
def colnum_string(n):
string = ""
while n > 0:
n, remainder = divmod(n - 1, 26)
string = chr(65 + remainder) + string
return string
print("Opening Workbook...")
wb = openpyxl.load_workbook('FileName.xlsx')
sheet_complete = wb.get_sheet_by_name("All")
row_count_all = sheet_complete.max_row
for row in range(4, row_count_all+1):
k = 10
cell = 'B' + str(row)
print(cell)
name = sheet_complete[cell].value
for i in range(2, 23):
sheet = wb.get_sheet_by_name(str(1995 + i))
row_count = sheet.max_row
for row2 in range(2, row_count+1):
cell2 = 'B' + str(row2)
name2 = sheet[cell].value
if name == name2:
x = x + 1
for z in range(0,len(columns)):
k = k + 1
cell_data = sheet[columns[z] + str(row2)].value
cell_target = colnum_string(k) + str(row)
sheet_complete[cell_target] = cell_data
wb.save('Scimago Country Ranking.xlsx')
print("Completed " + str(x) + " Task(s)")
break
The problem is that it keeps looping with the first name only, so it goes through all the names but when it comes to copying and pasting the data, it just redoes the first name so in the end, I end up with all the names in the 'All' sheet and next to each one is the data for the first name repeated over and over. I can't see what's wrong with my code but forgive me if it's a silly mistake as I'm kind of a beginner in these excel parsing scripts. print statements were for testing reasons.
P.S I know I'm using a deprecated function and I will change that, I was just too lazy to do it since it seems to still work fine and if that's the problem then please let me know.

I want to update the last row in an excel spreadsheet, daily. OpenPyXl

The following is an excerpt from a function (whose remaining body has been excluded; nothing to do with this issue and has already been tested to make sure has no faults).
Objective: Get "val1a" (a dollar value acquired from another part of the function) and "t" to update daily to an excel spreadsheet.
Right now, I have them mapped to the A2 and B2 cells, respectively. I can't figure out how to make them populate the latest row, whenever the function is run. (A2:B2, A3:B3, and so on...)
t = date.today()
ts = datetime.time(datetime.now())
wb = load_workbook('val1a.xlsx')
sheet = wb.worksheets[0]
# grab the active worksheet
ws = wb.active
ws['A1'] = 'PRICE'
ws['B1'] = 'DATE'
ws['C1'] = 'FED'
ws['D1'] = 'CTD'
ws['A2'] = val1a
ws['B2'] = t
# Save the file
wb.save('a1 ' + str(t) + ".xlsx")
# how to read values in excel
read1 = ws['A2'].value
ws.append() always puts values in the next row of a spreadsheet.

openpyxl - "copy/paste" range of cells

I'm new to Python and I'm trying to adapt some of my VBA code to it using the openpyxl library. On this particular case, I'm trying to copy 468 rows in a single column from a workbook according to the string in the header and to paste them in another workbook in a particular column that has another specific string as a header. I can't simply select the range of cells I want to copy because this is part of a report automation and the headers change positions from file to file.
What's the function I need to use to copy each of the 468 cells from one workbook into the 468 cells of the second workbook? Or alternatively how can I copy a range of cells and then paste them in another workbook? Here is my code and I know exactly what's wrong: I'm copying one cell (the last from the first workbook) repeatedly into the 468 cells of the second workbook.
#!/usr/bin/python3
import pdb
import openpyxl
from openpyxl.utils import column_index_from_string
wb1 = openpyxl.load_workbook('.../Extraction.xlsx')
wb2 = openpyxl.load_workbook('.../Template.xlsx')
ws1 = wb1.active
first_row1 = list(ws1.rows)[0] #to select the first row (header)
for cell in first_row1:
if cell.value == "email":
x = cell.column #to get the column
y = column_index_from_string(x) #to get the column's index
for i in range(2, 469):
cell_range1 = ws1.cell(i, y) #the wrong part
ws2 = wb2.active
first_row2 = list(ws2.rows)[0]
for cell in first_row2:
if cell.value == "emailAddress":
w = cell.column
z = column_index_from_string(w)
for o in range(2, 469):
cell_range2 = ws2.cell(o, z)
cell_range2.value = cell_range1.value
path = '.../Test.xlsx'
wb2.save(path)
It is actually quite easy to create such a function:
from openpyxl.utils import rows_from_range
def copy_range(range_str, src, dst):
for row in rows_from_range(range_str):
for cell in row:
dst[cell].value = src[cell].value
return
Note that range_str is a regular string such as "A1:B2" and src and dest both have to be valid sheet objects. However, if you are copying large ranges, this might take a while, as the read/writes seem to be rather time-consuming.
You may have to flip the input to .cell(), I guess it is .cell(column, row). Or just use the keywords .cell(column=z, row=o)
You need a dynamic index for both of the row iterators, while keeping the column indices where you found them:
for o in range(2, 469):
#note the common o for both, could also be o+1 for one if there is an offset
ws2.cell(o, z).value = ws1.cell(o, y).value

Categories