python excel subtract with 2 worksheet - python

Is it possible to create a python script to automatic which is subtract cell value with 2 worksheet in one excel file?
I have checked some documents, and seem that use the method of pandas or openpyxl to do so. But I can't to do that. Do you have any suggestion to me? Many thanks.
Script:
from datetime import datetime
import pandas as pd
import openpyxl as xl;
currDateTime = datetime.now()
Sheet1 ="C:\\Users\\peter\\Downloads\\" + currDateTime.strftime('%Y%m%d') + "\\5250A" + "\\5250A.xlsx"
wb3 = xl.load_workbook(Sheet1)
ws3 = wb3.worksheets[0]
wb4 = xl.load_workbook(Sheet1)
ws4 = wb4.worksheets[1]
wb5 = xl.load_workbook(Sheet1)
ws5 = wb5.create_sheet("Done")
wb4.subtract(wb3)
wb5.save(str(Sheet1))
Expected Result:

Do so in excel coule be way easier I think. There could be a smarter way to write this code.
[NOTE] I just do the subsctraction cell by cell, so if there's any mismatch like same row but different dept.id or same col but different item will make errors. If you may meet this situation, you'll have a change some in the following code.
import openpyxl as xl
def get_row_values(worksheet):
"""
return data structure:
[
[A1, B1, C1, ...],
[A2, B2, C2, ...],
...
]
"""
result = []
for i in worksheet.rows:
row_data = []
for j in i:
row_data.append(j.value)
result.append(row_data)
return result
if __name__ == '__main__':
# load excel file
wb = xl.load_workbook('test1.xlsx')
ws1 = wb.worksheets[0]
ws2 = wb.worksheets[1]
# get data from the first 2 worksheets
ws1_rows = get_row_values(ws1)
ws2_rows = get_row_values(ws2)
# calculate and make a new sheet
ws_new = wb.create_sheet('Done')
# insert header
ws_new.append(ws1_rows[0])
for row in range(1, len(ws1_rows)):
# do the substract cell by cell
row_data = []
for column, value in enumerate(ws1_rows[row]):
if column == 0:
# insert first column
row_data.append(value)
else:
if ws1_rows[row][0] == ws2_rows[row][0]:
# process only when first column match
row_data.append(value - ws2_rows[row][column])
ws_new.append(row_data)
wb.save('test2.xlsx')
here's my sample excel file
first sheet:
second sheet:
generated sheet:

Related

How to find cell in a column that contains a string and return value of a cell in another of row?

If the cell contains "external" from the C column then copy cell "good" from the D column, into the E column, in the rows where the A column contains 003.
Below are two images (before and after) in excel.
Before:
After:
I tried to find a correct script but it did not work out. It needs to be changed to "row" and "column" where I put "???" :
import openpyxl
from openpyxl import load_workbook
wb_source = openpyxl.load_workbook("path/file.xlsx")
sheet = wb_source['Sheet1']
x=sheet.max_row
y=sheet.max_column
for r in range(1, x+1) :
for j in range(1, y+1):
copy(sheet.cell(row= ???, column=???)
if str(copy.value)=="external":
sheet.??
break
wb_source.save("path/file2.xlsx")
How should they be added (row and column)?
Read the entire sheet.
Create a dictionary for the external products
Write back to Excel.
Try:
import openpyxl
wb = openpyxl.load_workbook("file1.xlsx")
ws = wb['Sheet1']
data = list()
for r, row in enumerate(ws.iter_rows()):
data.append([cell.value for c, cell in enumerate(row)])
mapper = {l[0]: l[-1] for l in data if l[2]=="external"}
for r, row in enumerate(ws.iter_rows()):
if ws.cell(r+1, 1).value in mapper:
ws.cell(r+1, 5).value = mapper[ws.cell(r+1, 1).value]
wb.save("file2.xlsx")

How to move a copied worksheet to the first position?

I would like to copy an Excel worksheet in Python using openpyxl. However, it defaults to placing the copied worksheet at the end. I want it at the front. The copy_worksheet doesn't allow specifying the position, unlike create_sheet. I'd rather not have to recreate the template.
I've considered sorting the sheets, but I'm not sure how to implement that.
Suppose I have a file called number.xlsx with an existing worksheet titled "blank" that I want to copy.
from openpyxl import load_workbook
from datetime import datetime
n = float(input("Number: "))
today = datetime.now()
m = today.month
d = today.day
y = str(today.year)
wb = load_workbook('number.xlsx')
if y in wb.sheetnames:
ws = wb[y]
ws.cell(row = 2 + d, column = 1 + m).value = n
wb.save('number.xlsx')
else:
ws = wb.copy_worksheet(wb["blank"]) #I want the copied sheet at the front, not the back
ws.title = y
ws.cell(row = 2 + d, column = 1 + m).value = n
wb.save('number.xlsx')
You can use move_sheet(sheet, offset=0) method for this. Here offset calculated as "current sheet index" + offset. Copy worksheet will add the sheet to the last of the workbook. So you need to give negative value to move sheet to index 0.
from openpyxl import load_workbook
wb = load_workbook("text.xlsx")
ws = wb.copy_worksheet(wb["sample"])
ws.title = "NewNameForCopiedSheet"
wb.move_sheet("NewNameForCopiedSheet", -(len(wb.sheetnames)-1))
I am posting an example
wb._sheets is what you use to control the order of tabs/sheets.
Get the position of sheet you want to rearrange and modify the list of sheets with new positions.
from openpyxl import Workbook
wb=Workbook()
# wb.create_sheet("Sheet")
wb.create_sheet("Sheet2")
wb.create_sheet("Sheet3")
wb.create_sheet("SheetA")
wb.create_sheet("ASheet")
wb.create_sheet("blank")
wb.save('book_original.xlsx')
blank_sheet_position = wb.worksheets.index(wb['blank'']) #get position of new sheet
blank_sheet_new_position = 0 #position where you want to move
sheets = wb._sheets.copy()
sheets.insert(blan_sheet_new_position, sheets.pop(blank_sheet_position))) #modifying the sheets list
wb._sheets = sheets
wb.save('book_myorder.xlsx')

How can I concatenate multiple rows of excel data into one?

I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.
So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.
I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.
I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.
import openpyxl
from openpyxl import Workbook
path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active
listab = []
rows = ws.max_row
columns = ws.max_column
for i in range (1, rows+1):
listab.append([])
cellValue = " "
prevCell = " "
for c in range (1, rows+1):
for r in range(1, columns+1):
cellValue = ws.cell(row=r, column=c).value
if cellValue == prevCell:
listab[r-1].append(prevCell)
elif cellValue == "NULL":
listab[r-1].append(prevCell)
elif cellValue != prevCell:
listab[r-1].append(cellValue)
prevCell = cellValue
for r in range(1, rows+1):
for c in range (1, columns+1):
j = ws2.cell(row = r, column=c)
j.value = listab[r-1][c-1]
print(listab)
wb2.save("output.xlsx")
There should be one line with the below information:
ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |
Personally I would go with pandas.
import pandas as pd
#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()
#Storing pd into a list
records_list = df_data.to_dict('records')
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed = {}
#Go through unique ids
for key in unique_ids:
processed[key] = {}
#Get related records
matching_records = [y for y in records_list if y['ods_service_ids'] == key]
#Loop through records
for record in matching_records:
#For each key to check, save in dict if non null
processed[key]['ods_service_ids'] = key
for detail_key in keys_to_check:
if record[detail_key] != "NO DATA" :
processed[key][detail_key] = record[detail_key]
##Note : doesn't handle duplicate values for different keys so far
#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]
#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)
The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?
EDIT:
df could be exported in a different way. Replaced below #export to Excel with the following :
df.to_excel("output.xlsx",sheet_name='Sheet_name_1')
EDIT 2:
with no input data it was hard to see any flows. Corrected the code above with fake data
To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.
One approach that would suit would be to use Python dictionaries for each service, updating them row by row.
wb = load_workbook("sample.xlsx")
ws = wb.active
objs = {}
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
if row[0] not in objs:
obj = {key:value for key, value in zip(headers, row)}
objs[obj['ods_service_id']] = obj
else:# update dict with non-None values
extra = {key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"}
obj.update(extra)
# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
ws2.append([obj[key] for key in headers])
Note how you can do everything without using counters.

How to filter column data using openpyxl

I am trying to apply a filter to an existing Excel file, and export it to another Excel file. I would like to extract rows that only contain the value 16, then export the table to another excel file (as shown in the picture below).
I have tried reading the openpyxl documentation multiple times and googling for solutions but I still can't make my code work. I have also attached the code and files below
import openpyxl
# Is use to create a reference of the Excel to wb
wb1 = openpyxl.load_workbook('test_data.xlsx')
wb2 = openpyxl.load_workbook('test_data_2.xlsx')
# Refrence the workbook to the worksheets
sh1 = wb1["data_set_1"]
sh2 = wb2["Sheet1"]
sh1.auto_filter.ref = "A:A"
sh1.auto_filter.add_filter_column(0, ["16"])
sh1.auto_filter.add_sort_condition("B2:D6")
sh1_row_number = sh1.max_row
sh1_col_number = sh1.max_column
rangeSelected = []
for i in range(1, sh1_row_number+1, 1):
rowSelected = []
for j in range(1, sh1_col_number+1, 1):
rowSelected.append(sh1.cell(row = i, column = j))
rangeSelected.append(rowSelected)
del rowSelected
for i in range(1, sh1_row_number+1, 1):
for j in range(1, sh1_col_number+1, 1):
sh2.cell(row = i, column = j).value = rangeSelected[i-1][j-1].value
wb1.save("test_data.xlsx")
wb2.save("test_data_2.xlsx")
The pictures shows what should be the desire result
The auto filter doesn't actually filter the data, it is just for visualization.
You probably want to filter while looping through the workbook. Please note with this code I assume you have the table headers already in the second workbook. It does not overwrite the data, it appends to the table.
import openpyxl
# Is use to create a reference of the Excel to wb
wb1 = openpyxl.load_workbook('test_data.xlsx')
wb2 = openpyxl.load_workbook('test_data_2.xlsx')
# Refrence the workbook to the worksheets
sh1 = wb1["data_set_1"]
sh2 = wb2["data_set_1"] # use same sheet name, different workbook
for row in sh1.iter_rows():
if row[0].value == 16: # filter on first column with value 16
sh2.append((cell.value for cell in row))
wb1.save("test_data.xlsx")
wb2.save("test_data_2.xlsx")

Python-Excel: How to write lines of a row to an existing excel file?

I have searched the site but I could not find anything related to the following question.
I have an existing spreadsheet that I am going to pull data from on a daily basis, the information in the spreadsheet will change everyday.
What I want to do is create a file that tracks certain information from this cell, I want it to pull the data from the spreadsheet and write it to another spreadsheet. The adding of the data to a new spreadsheet should not overwrite the existing data.I would really appreciate the help on this. See code below:
import os
import openpyxl
import xlrd
wb=openpyxl.load_workbook('Test_shorts.xlsx','r')
sheet = wb.active
rows = sheet.max_row
col = sheet.max_column
rows = rows+1
print rows
new =[]
for x in range (2, 3):
for y in range(1,10):
z= sheet.cell(row=x,column=y).value
new.append(z)
print(new)
If you want to copy the whole worksheet, you can use copy_worksheet() function directly. It will create a copy of your active worksheet.
I don't know your data, but I am sure you can finish it by yourself. Hope this may help
from openpyxl import load_workbook
file_name = "Test_shorts.xlsx"
wb = load_workbook(file_name)
sheet = wb.active
target = wb.copy_worksheet(sheet)
# you can code to append new data here
new = wb.get_sheet_by_name(target.title) # to get copied sheet
for x in range (2, 3):
for y in range(1,10):
print(x,y)
z= sheet.cell(row=x,column=y).value
new.append(z)
wb.save(file_name)
as commented, a loop of cells are required so I altered your code a little.
from openpyxl import load_workbook
file_name = "Test_shorts.xlsx"
wb = load_workbook(file_name)
current_sheet = wb.active
new_sheet = wb.create_sheet("New", 1)
for row in current_sheet.rows:
col = 0 # set the column to 0 when 1 row ends
for cell in row:
col += 1 # cell.column will return 'ABC's so I defined col for the column
new_sheet.cell(cell.row, col, cell.value)
wb.save(file_name)

Categories