Copy Python list variable into existing xlsx excel file - python

I'm new to Python, so please excuse my ignorance. I have tried several different bits of code using openpyxl and pandas, however, can't get anything to work.
What I need is to copy the text of an existing list-variable in Python (which is an array of file paths), and paste this into an existing xlsx worksheet at a given cell.
For example, given the list-variable in Python ["apple", "orange", "grape"], I need cells A2, A3, and A4 of Sheet 1 to read the same.
Any help is much appreciated!
import pandas as pd
import os
folder = "C:\\Users\\user\\Documents\\temp"
x = []
for path in os.listdir(folder):
if path.endswith(".png"):
full_path = os.path.join(folder, path)
x.append(full_path)
fn = r"C:\\Users\\user\\Documents\\test.xlsx"
df = pd.read_excel(fn, header = None)
df2 = pd.DataFrame(x)
writer = pd.ExcelWriter(fn)
df2.to_excel(writer, startcol=0, startrow=1, header=None, index=False)
writer.save()
What this gets me is the correct information but seemingly overwrites any existing data.
All other cell contents and sheets are now gone and the newly added data is in an otherwise blank Sheet1. I need to maintain the existing spreadsheet as-is, and only add this info starting at a given cell on a given sheet.
Have also tried the following with xlwings, which imports the correct data into the existing sheet while maintaining all other data, but replaces the first cell with a 0. How do I get rid of this extra 0 from the dataframe?
filename = "C:\\Users\\user\\Documents\\test.xlsx"
df = pd.DataFrame(x)
wb = xw.Book(filename)
ws = wb.sheets('Sheet1')
ws.range('A2').options(index=False).value = df
wb = xw.Book(filename)
EDIT: The above xlwings code appears to work if I replace DataFrame with Series.
SOLUTION:
import xlwings as xw
filename = "C:\\Users\\user\\Documents\\test.xlsx"
df = pd.Series(x)
wb = xw.Book(filename)
ws = wb.sheets('Sheet1')
ws.range('A2').options(index=False).value = df
wb = xw.Book(filename)

I need to maintain the existing spreadsheet as-is
Open the excel file in append mode, take a look at this answer for example (other answers in the same thread are useful as well).

If you want to reference excel sheet by its indexing eg("A1") you can use openpyxl
from openpyxl import load_workbook
workbook = load_workbook(filename="sample.xlsx")
ll = ["apple", "orange", "grape"]
sheet = workbook.active
sheet["A2"] = ll[0]
sheet["A3"] = ll[1]
sheet["A4"] = ll[2]
workbook.save(filename="sample.xlsx")
Guess this solves your problem

Depending on how your data is formatted, this may work, if you are satisfied appending to the next empty column in the sheet. With this proviso, the pandas library is really good for handling tabular data and can work with Excel files (using the openpyxl library).
It looks like you've installed pandas already but, for the benefit of others reading, make sure you've installed pandas and openpyxl first:
pip install pandas openpyxl
Then try:
import pandas as pd
df = pd.read_excel('filename.xlsx')
df['new_column_name'] = listname
df.to_excel('output_spreadsheet.xlsx')
I personally find that more readable than using openpyxl directly.

Related

Removing the Indexed Column when Merging 2 Excel Spreadsheets into a new Sheet in an existing Excel Spreadsheet using Pandas

I wanted to automate comparing two excel spreadsheets and updating old data (call this spreadsheet Old_Data.xlsx) with new data (from a different excel document; called New_Data.xlsx) and placing the updated data into a different sheet on on Old_Data.xlsx.
I am able to successfully create the new sheet in Old_Data.xlsx and see the changes between the two data sets, however, in the new sheet an index appears labeling the rows of data from 0-n. I've tried hiding this index so the information on each sheet in Old_Data.xlsx appears the same, however, I cannot successfully seem to get rid of the addition of the index. See the code below:
from openpyxl import load_workbook
# import xlwings as xl
import pandas as pd
import jinja2
# Load the workbook that is going to updated with new information.
wb = load_workbook('OldData.xlsx')
# Define the file path for all of the old and new data.
old_path = 'OldData.xlsx'
new_path = 'NewData.xlsx'
# Load the data frames for each Spreadsheet.
df_old = pd.read_excel(old_path)
print(df_old)
df_new = pd.read_excel(new_path)
print(df_new)
# Keep all original information why showing the differences in information and write
# to a new sheet in the workbook.
difference = pd.merge(df_old, df_new, how='right')
difference = difference.style.format.hide()
print(difference)
# Append the difference to an existing Excel File
with pd.ExcelWriter('OldData.xlsx', mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
difference.to_excel(writer, sheet_name="1-25-2023")
This is an image of the table of the second sheet that I creating. (https://i.stack.imgur.com/7Amdf.jpg)
I've tried adding the code:
difference = difference.style.format.hide
To get rid of the row, but I have not succeeded.
pass index = False as an argument in last line of you code. It should be something like this :-
with pd.ExcelWriter('OldData.xlsx', mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
difference.to_excel(writer, sheet_name="1-25-2023", index = False)
I think this should solve your problem.

Trouble wrting to Excel

I' am new to Python and trying to write into a merged cell within Excel. I can see the data that is already stored within this cell/row, so I know its there. However when I try to overwrite it nothing happens.
I have tried messing with the index and header as well but nothing seems to work.
import pandas as pd
from openpyxl import load_workbook
Read the excel file into a pandas DataFrame
df = pd.read_excel(file here', sheet_name='Sheet1')
print(df.iloc[8, 2])
Make the changes to the DataFrame
df.iloc[8, 2] = "Bob Smith"
Load the workbook
book = load_workbook(file here)
writer = pd.ExcelWriter(file here, engine='openpyxl')
writer.book = book
Write the DataFrame to the first sheet
df.to_excel(writer, index=False)
Save the changes to the Excel file
writer.save()
import pandas as pd
from openpyxl import *
file="C:/Users/OneDrive/Bureau/draftExcel.xlsx"
df = pd.read_excel(file,sheet_name='sheet1')
df.iat[5,0]='cell is updated'
print(df) # to check first in the terminal if the content of the cell is updated
book=load_workbook(file)
writer=pd.ExcelWriter(file, engine='openpyxl')
df.to_excel(writer,sheet_name='sheet1',index=False)
writer.close()
I tried to make an example from what you explained because you didn't show your code, so I hope it was helpful.
Instead of using .iloc I used .iat so you can update the data in a specific cell in your DataFrame using column_index instead of column_label.
Remember that the Excel file you are working on must be closed while you are editing data with python, if it is open you will get an error.

Pandas dataframe to specific sheet in a excel file without losing formatting

I have a dataframe like as shown below
Date,cust,region,Abr,Number,
12/01/2010,Company_Name,Somecity,Chi,36,
12/02/2010,Company_Name,Someothercity,Nyc,156,
df = pd.read_clipboard(sep=',')
I would like to write this dataframe to a specific sheet (called temp_data) in the file output.xlsx
Therfore I tried the below
import pandas
from openpyxl import load_workbook
book = load_workbook('output.xlsx')
writer = pandas.ExcelWriter('output.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
I also tried the below
path = 'output.xlsx'
with pd.ExcelWriter(path) as writer:
writer.book = openpyxl.load_workbook(path)
final_df.to_excel(writer, sheet_name='temp_data',startrow=10)
writer.save()
But am not sure whether I am overcomplicating it. I get an error like as shown below. But I verifiedd in task manager, no excel file/task is running
BadZipFile: File is not a zip file
Moreover, I also lose my formatting of the output.xlsx file when I manage to write the file based on below suggestions. I already have a neatly formatted font,color file etc and just need to put the data inside.
Is there anyway to write the pandas dataframe to a specific sheet in an existing excel file? WITHOUT LOSING FORMATTING OF THE DESTIATION FILE
You need to just use to_excel from pandas dataframe.
Try below snippet:
df1.to_excel("output.xlsx",sheet_name='Sheet_name')
If there is existing data please try below snippet:
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('output.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.save()
writer.close()
Are you restricted to using pandas or openpyxl?
Because if you're comfortable using other libraries, the easiest way is probably using win32com to puppet excel as if you were a user manually copying and pasting the information over.
import pandas as pd
import io
import win32com.client as win32
import os
csv_text = """Date,cust,region,Abr,Number
12/01/2010,Company_Name,Somecity,Chi,36
12/02/2010,Company_Name,Someothercity,Nyc,156"""
df = pd.read_csv(io.StringIO(csv_text),sep = ',')
temp_path = r"C:\Users\[User]\Desktop\temp.xlsx" #temporary location where to write this dataframe
df.to_excel(temp_path,index = False) #temporarily write this file to excel, change the output path as needed
excel = win32.Dispatch("Excel.Application")
excel.Visible = True #Switch these attributes to False if you'd prefer Excel to be invisible while excecuting this script
excel.ScreenUpdating = True
temp_wb = excel.Workbooks.Open(temp_path)
temp_ws = temp_wb.Sheets("Sheet1")
output_path = r"C:\Users\[User]\Desktop\output.xlsx" #Path to your output excel file
output_wb = excel.Workbooks.Open(output_path)
output_ws = output_wb.Sheets("Output_sheet")
temp_ws.Range('A1').CurrentRegion.Copy(Destination = output_ws.Range('A1')) # Feel free to modify the Cell where you'd like the data to be copied to
input('Check that output looks like you expected\n') # Added pause here to make sure script doesn't overwrite your file before you've looked at the output
temp_wb.Close()
output_wb.Close(True) #Close output workbook and save changes
excel.Quit() #Close excel
os.remove(temp_path) #Delete temporary excel file
Let me know if this achieves what you were after.
I spent all day on this (and a co-worker of mine spent even longer). Thankfully, it seems to work for my purposes - pasting a dataframe into an Excel sheet without changing any of the Excel source formatting. It requires the pywin32 package, which "drives" Excel as if it a user, using VBA.
import pandas as pd
from win32com import client
# Grab your source data any way you please - I'm defining it manually here:
df = pd.DataFrame([
['LOOK','','','','','','','',''],
['','MA!','','','','','','',''],
['','','I pasted','','','','','',''],
['','','','into','','','','',''],
['','','','','Excel','','','',''],
['','','','','','without','','',''],
['','','','','','','breaking','',''],
['','','','','','','','all the',''],
['','','','','','','','','FORMATTING!']
])
# Copy the df to clipboard, so we can later paste it as text.
df.to_clipboard(index=False, header=False)
excel_app = client.gencache.EnsureDispatch("Excel.Application") # Initialize instance
wb = excel_app.Workbooks.Open("Template.xlsx") # Load your (formatted) template workbook
ws = wb.Worksheets(1) # First worksheet becomes active - you could also refer to a sheet by name
ws.Range("A3").Select() # Only select a single cell using Excel nomenclature, otherwise this breaks
ws.PasteSpecial(Format='Unicode Text') # Paste as text
wb.SaveAs("Updated Template.xlsx") # Save our work
excel_app.Quit() # End the Excel instance
In general, when using the win32com approach, it's helpful to record yourself (with a macro) doing what you want to accomplish in Excel, then reading the generated macro code. Often this will give you excellent clues as to what commands you could invoke.
The solution to your problem exists here: How to save a new sheet in an existing excel file, using Pandas?
To add a new sheet from a df:
import pandas as pd
from openpyxl import load_workbook
import os
import numpy as np
os.chdir(r'C:\workdir')
path = 'output.xlsx'
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
### replace with your df ###
x = np.random.randn(100, 2)
df = pd.DataFrame(x)
df.to_excel(writer, sheet_name = 'x')
writer.save()
writer.close()
You can try xltpl.
Create a template file based on your output.xlsx file.
Render a file with your data.
from xltpl.writerx import BookWriterx
writer = BookWriterx('template.xlsx')
d = {'rows': df.values}
d['tpl_name'] = 'tpl_sheet'
d['sheet_name'] = 'temp_data'
writer.render_sheet(d)
d['tpl_name'] = 'other_sheet'
d['sheet_name'] = 'other'
writer.render_sheet(d)
writer.save('out.xls')
See examples.

How to append dataframe to excel

I have a dataframe which I save it into an excel file at a certain location.
Currently I do this way:
df.to_excel(r'C:\Users\user_name\Downloads\test.xlsx')
Issue I am facing is when I insert the new dataframe it overwrites old ones. I want to append the new data. I tried several SOF answers but nothing seems to be working.
You can first read_excel, append and then write back to_excel:
filename = r'C:\Users\user_name\Downloads\test.xlsx'
existing = df.read_excel(filename)
output = existing.append(df)
output.to_excel(filename)
To check if the file exists before reading, you can use:
import os
filename = r'C:\Users\user_name\Downloads\test.xlsx'
if os.path.exists(filename):
existing = df.read_excel(filename)
output = existing.append(df)
else:
output = df
output.to_excel(filename)
One way to handle it is that you could read what is in Excel and combine it with your data frame, then overwrite the excel file/ generate it again basically.
Here's a sample of a similar question asked where a solution can be found with excel writer. Instead of overwriting the existing data, they just carefully set a startrow by reading the existing file for the startrow.
The last row/ start row can be found with the command writer.sheets[sheetname].max_row
append dataframe to excel with pandas
import pandas
from openpyxl import load_workbook
book = load_workbook('test.xlsx')
writer = pandas.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}
for sheetname in writer.sheets:
df1.to_excel(writer,sheet_name=sheetname, startrow=writer.sheets[sheetname].max_row, index = False,header= False)
writer.save()

How to get the value from merged cells in xlsx file using python?

I am trying to get the value from cell with row = 11 and column B and C. See screenshot for more clarification.
I tried following code using xlrd package but it does not print anything.
import xlrd
path = "C:/myfilepath/data.xlsx"
workbook = xlrd.open_workbook(path)
sheet = workbook.sheet_by_index(0)
sheet.cell_value(10,1)
sheet.cell_value(10,2)
I am not able to output the value from particular merged cells using xlrd package in python.
Above code should print the cell value i.e PCHGFT001KS
I don't know how xlrd works, but I do know how the lovely openpyxl works. You should use openpyxl! it's a robust tool for working with xlsx files. (NOT xls).
import openpyxl
wb = openpyxl.load_workbook(excel)
ws = wb[wb.get_sheet_names()[0]]
print(ws['B11'].value)
Extra:
If you want to unmerge those blocks you can do the following.
for items in ws.merged_cell_ranges:
ws.unmerge_cells(str(items))
wb.save(excel)

Categories