In my project I am opening an Excel file with multiple sheets. I want to manipulate "sheet2" in Python (which works fine) and after that overwrite the old "sheet2" with the new one but KEEP the formatting.. so something like this:
import pandas as pd
update_sheet2 = pd.read_excel(newest_isaac_file, sheet_name='sheet2')
#do stuff with the sheet
with pd.ExcelWriter(filepath, engine='openpyxl', if_sheet_exists='replace', mode='a',
KEEP_FORMATTING = True) as writer:
df.to_excel(writer, sheet_name=sheetname, index=index)
In other words: Is there a way to get the formatting from an existing Excel sheet?
I could not find anything about that. I know I can manually set the formatting in Python but the formatting of the existing sheet is really complicated and has to stay the same.
thanks for your help!
As per your comment, try this code. It will open a file (Sample.xlsx), go to a sheet (Sheet1), insert new row at 15, copy the text and formatting from row 22 and paste it in the empty row (row 15). Code and final screen shot attached.
import openpyxl
from copy import copy
wb=openpyxl.load_workbook('Sample.xlsx') #Load workbook
ws=wb['Sheet1'] #Open sheet
ws.insert_rows(15, 1) #Insert one row at 15 and move everything one row downwards
for row in ws.iter_rows(min_row=22, max_row=22, min_col=1, max_col=ws.max_column): # Read values from row 22
for cell in row:
ws.cell(row=15, column=cell.column).value = cell.value #Update value to row 22 to new row 15
ws.cell(row=15, column=cell.column)._style = copy(cell._style) #Copy formatting
wb.save('Sample.xlsx')
How excel looks after running the code
Related
I need some help with editing the sheets within my Excel workbook in python, before I stack the data using pd.concat(). Each sheet (~100) within my Excel workbook is structured identically, with the unique identifier for each sheet being a 6-digit code that is found in line 1 of the worksheet.
I've already done the following steps to import the file, unmerge rows 1-4, and insert a new column 'C':
import openpyxl
import pandas as pd
wb = openpyxl.load_workbook('data_sheets.xlsx')
for sheet in wb.worksheets:
sheet.merged_cells
for merge in list(sheet.merged_cells):
sheet.unmerge_cells(range_string=str(merge))
sheet.insert_cols(3, 1)
print(sheet)
wb.save('workbook_test.xlsx')
#concat once worksheets have been edited
df= pd.concat(pd.read_excel('workbook_test.xlsx, sheet_name= None), ignore_index= True)
Before stacking the data however, I would like to make the following additonal (sequential) changes to every sheet:
Extract from row 1 the right 8 characters (in excel the equivalent of this would be =RIGHT(A1, 8) - this is to pull the unique code off of each sheet, which will look like '(000000)'.
Populate column C from rows 6-282 with the unique code.
Delete rows 1-5
The end result would make each sheet within the workbook look like this:
Is this possible to do with openpyxl, and if so, how? Any direction or assistance with this would be much appreciated - thank you!
Here is a 100% openpyxl approach to achieve what you're looking for :
from openpyxl import load_workbook
wb = load_workbook("workbook_test.xlsx")
for ws in wb:
ws.unmerge_cells("A1:O1") #unmerge first row till O
ws_uid = ws.cell(row=1, column=1).value[-8:] #get the sheet's UID
for num_row in range(6, 282):
ws.cell(row=num_row, column=3).value = '="{}"'.format(ws_uid) #write UID in Column C
ws.delete_rows(1, 5) #delete first 5 rows
wb.save("workbook_test.xlsx")
NB : This assume there is already an empty column (C).
I had a problem with my xlsxwriter code, it works perfectly fine but I need to figure out how to select a block of cells. For example - from A1 to J10 as it depicted on screenshot.
Does xlsxwriter have such a function? I've searched several formats, such as:
worksheet.write('A1:J1', '...')
But it write only on A1. So for example - how can I fill all highlighted area with one word without writing code for all of 10 rows?
Here is how I write my output from a pandas dataframe to an excel template.
Please note that if data is already present in the cells where you are trying to write the dataframe, it will not be overwritten and the dataframe will be written to a new sheet which is my i have included a step to clear existing data from the template. I have not tried to write output on merged cells so that might throw an error.
Setup
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
file_path='Template.xlsx'
book=load_workbook(file_path)
writer = pd.ExcelWriter(file_path, engine='openpyxl')
writer.book = book
sheet_name="Template 1"
sheet=book[sheet_name]
Set first row and first column in the excel template where output is to be pasted.
If my output is to be pasted starting in cell N2, row_start will be 2 and col_start will be 14
row_start=2
col_start=14
Clear existing data in excel template
for c_idx, col in enumerate(df.columns,col_start):
for r_idx in range(row_start,10001):
sheet.cell(row=r_idx, column=c_idx, value="")
Write dataframe to excel template
rows=dataframe_to_rows(df,index=False)
for r_idx, row in enumerate(rows,row_start):
for c_idx, col in enumerate(row,col_start):
sheet.cell(row=r_idx, column=c_idx, value=col)
writer.save()
writer.close()
I am trying to automate a process that basically reads in values from text files into certain excel cells. I have a template in excel that will read data from various sheets under certain names. For example, the template will read in data from "Video scores". Video scores is a .txt file that I copy and paste into excel. There are 5 different text files used in each project so it gets tedious after a while and when there are a lot of projects to complete.
How can I import or copy and paste these .txt files into excel to a specified sheet? I have been using openpyxl for the other parts of this project, but I am open to using another library if it can't be done with openpxl.
I've also tried opening and reading a file, but I couldn't figure out how to do what I want with that either. I have found a list of all the files I need, its just a matter of getting them into excel.
Thanks in advance for anyone who helps.
First, import the TXT file into a list in python, i'm asumming the TXT file is like this
1
2
3
4
....
with open(path_txt, "r") as e:
list1 = [i for i in e]
then, we paste the values of the list on the worksheet you need
from openpyxl import load_workbook
wb = load_workbook(path_xlsx)
ws = wb[sheet_name]
ws["A1"] = "values" #just a header
row = 2 #represent the 2 row of the sheet
column = 1 #represent the column "A" of the sheet
for i in list1:
ws.cell(row=row, column=column).value = i #getting the current cell, and writing the value of the list
row += 1 #just setting the current to the next
wb.save(path_xlsx)
Hope this works for you.
Pandas would do the trick!
Approach:
Have a sheet containing path to your files, separator, the corresponding target sheet names
Now read this excel sheet using pandas and iterate over each row for each file details, read the data, write it to new excel sheet of same workbook.
import pandas as pd
file_details_path = r"/Users/path for xl sheet/file details/File2XlDetails.xlsx"
target_sheet_path = r"/Users/path to target xl sheet/File samples/FiletoXl.xlsx"
# create a writer to save the file content in excel
writer = pd.ExcelWriter(target_sheet_path, engine='xlsxwriter')
file_details = pd.read_excel(file_details_path,
dtype = str,
index_col = False
)
def write_to_excel(file, trg_sheet_name):
# writes it to excel
file.to_excel(writer,
sheet_name = trg_sheet_name,
index = False,
)
# loop through each file record
for index, file_dtl in file_details.iterrows():
# you can print and check the row content for reference
print(file_dtl['File_path'])
print(file_dtl['Separator'])
print(file_dtl['Target_sheet_name'])
# reads file
file = pd.read_csv(file_dtl['File_path'],
sep = file_dtl['Separator'],
dtype = str,
index_col = False,
)
write_to_excel(file, file_dtl['Target_sheet_name'])
writer.save()
Hope this helps! Let me know if you run into any issues...
I am extracting data from one workbook's column and need to copy the data to another existing workbook.
This is how I extract the data (works fine):
wb2 = load_workbook('C:\\folder\\AllSitesOpen2.xlsx')
ws2 = wb2['report1570826222449']
#Extract column A from Open Sites
DateColumnA = []
for row in ws2.iter_rows(min_row=16, max_row=None, min_col=1, max_col=1):
for cell in row:
DateColumnA.append(cell.value)
DateColumnA
The above code successfully outputs the cell values in each row of the first column to DateColumnA
I'd like to paste the values stored in DateColumnA to this existing destination workbook:
#file to be pasted into
wb3 = load_workbook('C:\\folder\\output.xlsx')
ws3 = wb3['Sheet1']
But I am missing a piece conceptually here. I can't connect the dots. Can someone advise how I can get this data from my source workbook to the new destination workbook?
Lets say you want to copy the column starting in cell 'A1' of 'Sheet1' in wb3:
wb3 = load_workbook('C:\\folder\\output.xlsx')
ws3 = wb3['Sheet1']
for counter in range(len(DateColumnA)):
cell_id = 'A' + str(counter + 1)
ws3[cell_id] = DateColumnA[counter]
wb3.save('C:\\folder\\output.xlsx')
I ended up getting this to write the list to another pre-existing spreadsheet:
for x, rows in enumerate(DateColumnA):
ws3.cell(row=x+1, column=1).value = rows
#print(rows)
wb3.save('C:\\folder\\output.xlsx')
Works great but now I need to determine how to write the data to output.xlsx starting at row 16 instead of row 1 so I don't overwrite the first 16 existing header rows in output.xlsx. Any ideas appreciated.
I figured out a more concise way to write the source data to a different starting row on destination sheet in a different workbook. I do not need to dump the values in to a list as I did above. iter_rows does all the work and openpyxl nicely passes it to a different workbook and worksheet:
row_offset=5
for rows in ws2.iter_rows(min_row=2, max_row=None, min_col=1, max_col=1):
for cell in rows:
ws3.cell(row=cell.row + row_offset, column=1, value=cell.value)
wb3.save('C:\\folder\\DestFile.xlsx')
I'll try to explain my problem with an example:
Let's say I have an Excel file test.xlsx which has five tabs (aka worksheets): Sheet1, Sheet2, Sheet3, Sheet4 and sheet5. I am interested to read and modify data in sheet2.
My sheet2 has some columns whose cells are dropdowns and those dropdown values are defined in sheet4 and sheet5. I don't want to touch sheet4 and sheet5. (I mean sheet4 & sheet5 have some references to cells on Sheet2).
I know that I can read all the sheets in excel file using pd.read_excel('test.xlsx', sheetnames=None) which basically gives all sheets as a dictionary(OrderedDict) of DataFrames.
Now I want to modify my sheet2 and save it without disturbing others. So is it posibble to do this using Python Pandas library.
[UPDATE - 4/1/2019]
I am using Pandas read_excel to read whatever sheet I need from my excel file, validating the data with the data in database and updating the status column in the excelfile.
So for writing back the status column in excel I am using openpyxl as shown in the below pseudo code.
import pandas as pd
import openpyxl
df = pd.read_excel(input_file, sheetname=my_sheet_name)
df = df.where((pd.notnull(df)), None)
write_data = {}
# Doing some validations with the data and building my write_data with key
# as (row_number, column_number) and value as actual value to put in that
# cell.
at the end my write_data looks something like this:
{(2,1): 'Hi', (2,2): 'Hello'}
Now I have defined a seperate class named WriteData for writing data using openpyxl
# WriteData(input_file, sheet_name, write_data)
book = openpyxl.load_workbook(input_file, data_only=True, keep_vba=True)
sheet = book.get_sheet_by_name(sheet_name)
for k, v in write_data.items():
row_num, col_num = k
sheet.cell(row=row_num, column=col_num).value = v
book.save(input_file)
Now when I am doing this operation it is removing all the formulas and diagrams. I am using openpyxl 2.6.2
Please correct me if I am doing anything wrong! Is there any better way to do?
Any help on this will be greatly appreciated :)
To modify a single sheet at a time, you can use pandas excel writer:
sheet2 = pd.read_excel("test.xlsx", sheet = "sheet2")
##modify sheet2 as needed.. then to save it back:
with pd.ExcelWriter("test.xlsx") as writer:
sheet2.to_excel(writer, sheet_name="sheet2")