I'm trying to copy a dataframe to the first empty rows of an existing excel file.
I already know how to copy the dataframe to a new file, but my intention is to keep on adding information to the same file, each time on the first empty rows available.
I've tried something like this, but it hasn't worked (I'm still getting used to working with Pandas):
with pd.ExcelWriter('test.xlsx', mode='a') as writer:
df.to_excel(writer, sheet_name='Sheet1')
The excel's path I'm working with is the folder of my IDE (Spyder).
Thanks in advance!
You could read in the existing dataframe
df = pd.read_excel('test.xlsx')
add any rows to this dataframe and then output it all, overwriting what was previously stored
df.to_excel("test.xlsx")
From the documentation (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html) this will always overwrite the previous sheet avoiding the chance of duplicating data.
Related
I have an Excel 'file.xlsx' file:
with a sheet that has a named excel table, which is somewhere in the middle of the sheet, say C3.
a bunch of charts etc that use this table as source.
I have a pyspark DataFrame that I want to write to this table, so all the charts are updated and I have an Excel report.
I know how to do this using loops to set Cell.value one cell at a time. I'm hoping to find something less tedious.
Unsolved problems:
Update an existing table in an existing xlsx file.
Not lose/delete everything else in the xlsx file being updated.
(preferably) avoid iterating over the input tabular data and update excel cell-by-cell.
Things that didn't work for me:
pyspark.pandas.DataFrame.to_excel() problem: This overwrites the whole 'file.xlsx' and we lose all other sheets / charts etc.
df.toPandas().to_excel('file.xlsx', sheet_name=sheet_name, engine='openpyxl', index=False,
startcol=3, startrow=3)
openpyxl.utils.dataframe.dataframe_to_rows() problem: Starts pasting data at A1. Don't know how to update activeCell or current_row so append() starts from B3 instead of A1.
ws: Worksheet = openpyxl.open('file.xlsx').create_sheet(title=sheet_name)
for r in dataframe_to_rows(df.toPandas(), index=False, header=True):
ws.append(r)
My current solution iter_rows() / iter_cols() / cell_range() / Worksheet.cell() problem: Loops over cell-by-cell.
I've read these and some more:
Appending data to existing tables in openpyxl
Manipulate existing excel table using openpyxl
Writing to row using openpyxl?
Write to an existing excel file using Openpyxl starting in existing sheet starting at a specific column and row
Openpyxl/Pandas - Convert CSV to XLSX
openpyxl convert CSV to EXCEL
I initially made an empty excel file with column names (5 columns in each sheet) and sheet names (4 sheets with names).
When I tried to write data (a scalar value at a time, say 5) in an excel sheet using ExcelWriter, to_excel in Pandas. It deletes the previous data as well as deletes other sheets.
I don't want to aggregate the data in a variable and write it at once. Because this is a part of a time-consuming experiment and I want to save data regularly.
If the same can be done with normal python (without pandas), kindly suggest.
From pandas documentation, you need to create an ExcelWriter which opens the Excel file in append mode:
with ExcelWriter('path_to_file.xlsx', mode='a') as writer:
df.to_excel(writer, sheet_name='sheet_name')
I'm having troubles writing something that I believe should be relatively easy.
I have a template excel file, that has some visualizations on it with a few spreadsheets. I want to write a scripts that loads the template, inserts an existing dataframe rows to specific cells on each sheet, and saves the new excel file as a new file.
The template already have all the cells designed and the visualization, so i will want to insert this data only without changing the design.
I tried several packages and none of them seemed to work for me.
Thanks for your help! :-)
I have written a package for inserting Pandas DataFrames to Excel sheets (specific rows/cells/columns), it's called pyxcelframe:
https://pypi.org/project/pyxcelframe/
It has very simple and short documentation, and the method you need is insert_frame
So, let's say we have a Pandas DataFrame called df which we have to insert in the Excel file ("MyWorkbook") sheet named "MySheet" from the cell B5, we can just use insert_frame function as follows:
from pyxcelframe import insert_frame
from openpyxl import load_workbook
workbook = load_workbook("MyWorkbook.xlsx")
worksheet = workbook["MySheet"]
insert_frame(worksheet=worksheet,
dataframe=df,
row_range=(5, 0),
col_range=(2, 0))
0 as the value of the second element of row_range or col_range means that there is no ending row or column specified, if you need specific ending row/column you can replace 0 with it.
Sounds like a job for xlwings. You didn't post any test data, but modyfing below to suit your needs should be quite straight-forward.
import xlwings as xw
wb = xw.Book('your_excel_template.xlsx')
wb.sheets['Sheet1'].range('A1').value = df[your_selected_rows]
wb.save('new_file.xlsx')
wb.close()
I am having much trouble trying to read in a large excel file (.xlsx), and write some of its tabs/sheets to a smaller excel file.
In one class, I get return a dict of dataframes. The key is the respective sheet/tab that the dataframe came from, type string. The value is the actual dataframe, with all of its original columns, type DataFrame. In this class, I extract certain dataframes from the original excel file.
I am able to print out my key:value pairs after extracting the dataframes of my choice, and it all looks fine. However, I believe that my real problem is writing the actual data to 1 excel file. I only get the first dataframe, without the sheet name that it came from (it becomes the default 'Sheet1'), and nothing else.
Here is the code that writes my dict to an excel file:
def frames_to_excel(self, df_dict, path):
"""Write dictionary of dataframes to separate sheets, within
1 file."""
writer = pd.ExcelWriter(path, engine='xlsxwriter')
for tab_name, dframe in df_dict.items():
dframe.to_excel(writer, sheet_name=tab_name)
writer.save()
- "path" is the select output path to write the whole dict to a xlsx fle.
- "df_dict" is the dict of dataframes.
I am very sorry for the confusion. My bug was not at all in the code I posted, or any of the classes that parse the data from the original excel file. The problem was this line of code:
excel_path = re.sub(r"(?i)original|_original", "_custom", os.path.basename(excel_path))
By doing the basename function from the os library, I was only using the file name, instead of the entire full path:
writer = pd.ExcelWriter(excel_path, engine='xlsxwriter')
-Therefore, I was not writing the correct data to the full path, and I was looking at old data from my programs output, from about 5 days ago. Thanks for everyones help.
The fix (use the proper full path that you expect):
excel_path = re.sub(r"(?i)original|_original", "_custom", excel_path)
I have been working on this for too long now. I have an Excel with one sheet (sheetname = 'abc') with images in it and I want to have a Python script that writes a dataframe on a second separate sheet (sheetname = 'def') in the same excel file. Can anybody provide me with some example code, because everytime I try to write the dataframe, the first sheet with the images gets emptied.
This is what I tried:
book = load_workbook('filename_of_file_with_pictures_in_it.xlsx')
writer = pd.ExcelWriter('filename_of_file_with_pictures_in_it.xlsx', engine = 'openpyxl')
writer.book = book
x1 = np.random.randn(100, 2)
df = pd.DataFrame(x1)
df.to_excel(writer, sheet_name = 'def')
writer.save()
book.close()
It saves the random numbers in the sheet with the name 'def', but the first sheet 'abc' now becomes empty.
What goes wrong here? Hopefully somebody can help me with this.
Interesting question! With openpyxl you can easily add values, keep the formulas but cannot retain the graphs. Also with the latest version (2.5.4), graphs do not stay. So, I decided to address the issue with
xlwings :
import xlwings as xw
wb = xw.Book(r"filename_of_file_with_pictures_in_it.xlsx")
sht=wb.sheets.add('SheetMod')
sht.range('A1').value = np.random.randn(100, 2)
wb.save(r"path_new_file.xlsx")
With this snippet I managed to insert the random set of values and saved a new copy of the modified xlsx.As you insert the command, the excel file will automatically open showing you the new sheet- without changing the existing ones (graphs and formulas included). Make sure you install all the interdependencies to get xlwings to run in your system. Hope this helps!
You'll need to use an Excel 'reader' like Openpyxl or similar in combnination with Pandas for this, pandas' to_excel function is write only so it will not care what is inside the file when you open it.