Openpyxl column number formatting - python

Im using openpyxl to append formated dataframe rows to existing excel file/creating new with following code:
if os.path.isfile(transformed_file): #if file exists, load and append
workbook = openpyxl.load_workbook(transformed_file)
sheet = workbook['Sheet1']
for row in dataframe_to_rows(df, header=False, index=False):
sheet.append(row)
workbook.save(transformed_file)
workbook.close()
else: # create the excel file if doesn't already exist
with pd.ExcelWriter(path = transformed_file, engine = 'openpyxl') as writer:
df.to_excel(writer, index=False, sheet_name = 'Sheet1')
I need to format column 'G' as a plain number '0', at the moment when opening excel file the format is '1.23E+10'.
How could this be achieved for the sample above? Thank you!

Hello try the following code see if it works for you:
wb = Workbook()
ws = wb.active
ws['A1'] = 123455656565464563302589013
ws['B1'] = 123455656565464563302589013
ws['A1'].number_format = '0' # Number formatting
ws['B1'].number_format = '0.00E+00' # Scientific formatting
wb.save("formating_test.xlsx")

Found the solution which worked for me. Realized from documentation that one has to iterate through each cell.
for cell in sheet[('D')]:
cell.number_format ='0'

Related

How to keep thousand separator by writing pandas df in excel

I have a question about writing pandas dataframe into Excel. I have numbers with thousand separator as ., after writing to Excel it changes to ,. How can I write my data without changing the separator?
This is how it looks in Jupyter notebook:
And here is how it looks in Excel:
wb = load_workbook(filename)
sheet=wb[s_name]
writer = pd.ExcelWriter(filename, engine='openpyxl')
pivot_to_excel(wb, filename, prepared_data, s_name, writer)
def pivot_to_excel(book, excelfilename, PivotTable, s_name, writer):
writer.sheets = {ws.title: ws for ws in book.worksheets}
for sheetname in writer.sheets:
if (sheetname==s_name):
PivotTable.to_excel(writer,sheet_name=sheetname, startrow=writer.sheets[sheetname].max_row, index = False
UPD:
If my data is mixed like this, it seems correct after writing in excel:

Activate hyphenation formatting while generating Excel document with Python

I'm using Python to generate an excel document from a Pandas DataFrame.
I can set column width and text wrap with workbook.add_format({"text_wrap": True}) and worksheet.set_column(f"{cols[idx]}:{cols[idx]}", 30, format), but I don't know how to activate "Hyphenation active". I fail to find it in the doc : https://xlsxwriter.readthedocs.io/format.html
Here is a sample of my code :
df = get_pandas_dataframe()
writer = pd.ExcelWriter(path, engine="xlsxwriter")
sheet_name = "abc"
df.to_excel(writer, index=False, sheet_name=sheet_name)
workbook = writer.book
worksheet = writer.sheets[sheet_name]
max_row, max_col = pdf.shape
format = workbook.add_format({"text_wrap": True})
cols = dict(zip(range(26), list(string.ascii_uppercase)))
for idx, col in enumerate(df):
worksheet.set_column(f"{cols[idx]}:{cols[idx]}", 30, format)
writer.save()
Any idea?
"Hyphenation Active" isn't an Excel option and hence it isn't supported by XlsxWriter.
You can verify that yourself in Libreoffice by saving an xlsx file with that option in a cell, closing the file, and then re-opening it. The option will no longer be there.

Insert worksheet at specified index in existing Excel file using Pandas

Is there a way to insert a worksheet at a specified index using Pandas? With the code below, when adding a dataframe as a new worksheet, it gets added after the last sheet in the existing Excel file. What if I want to insert it at say index 1?
import pandas as pd
from openpyxl import load_workbook
f = 'existing_file.xlsx'
df = pd.DataFrame({'cat':['A','B'], 'word': ['C','D']})
book = load_workbook(f)
writer = pd.ExcelWriter(f, engine = 'openpyxl')
writer.book = book
df.to_excel(writer, sheet_name = 'sheet')
writer.save()
writer.close()
Thank you.

append dataframe to excel with pandas

I desire to append dataframe to excel
This code works nearly as desire. Though it does not append each time. I run it and it puts data-frame in excel. But each time I run it it does not append. I also hear openpyxl is cpu intensive but not hear of many workarounds.
import pandas
from openpyxl import load_workbook
book = load_workbook('C:\\OCC.xlsx')
writer = pandas.ExcelWriter('C:\\OCC.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df1.to_excel(writer, index = False)
writer.save()
I want the data to append each time I run it, this is not happening.
Data output looks like original data:
A B C
H H H
I want after run a second time
A B C
H H H
H H H
Apologies if this is obvious I new to python and examples I practise did not work as wanted.
Question is - how can I append data each time I run. I try change to xlsxwriter but get AttributeError: 'Workbook' object has no attribute 'add_format'
first of all, this post is the first piece of the solution, where you should specify startrow=:
Append existing excel sheet with new dataframe using python pandas
you might also consider header=False.
so it should look like:
df1.to_excel(writer, startrow = 2,index = False, Header = False)
if you want it to automatically get to the end of the sheet and append your df then use:
startrow = writer.sheets['Sheet1'].max_row
and if you want it to go over all of the sheets in the workbook:
for sheetname in writer.sheets:
df1.to_excel(writer,sheet_name=sheetname, startrow=writer.sheets[sheetname].max_row, index = False,header= False)
btw: for the writer.sheets you could use dictionary comprehension (I think it's more clean, but that's up to you, it produces the same output):
writer.sheets = {ws.title: ws for ws in book.worksheets}
so full code will be:
import pandas
from openpyxl import load_workbook
book = load_workbook('test.xlsx')
writer = pandas.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}
for sheetname in writer.sheets:
df1.to_excel(writer,sheet_name=sheetname, startrow=writer.sheets[sheetname].max_row, index = False,header= False)
writer.save()
You can use the append_df_to_excel() helper function, which is defined in this answer:
Usage examples:
filename = r'C:\OCC.xlsx'
append_df_to_excel(filename, df)
append_df_to_excel(filename, df, header=None, index=False)
append_df_to_excel(filename, df, sheet_name='Sheet2', index=False)
append_df_to_excel(filename, df, sheet_name='Sheet2', index=False, startrow=25)
All examples here are quite complicated.
In the documentation, it is much easier:
def append_to_excel(fpath, df, sheet_name):
with pd.ExcelWriter(fpath, mode="a") as f:
df.to_excel(f, sheet_name=sheet_name)
append_to_excel(<your_excel_path>, <new_df>, <new_sheet_name>)
When using this on LibreOffice/OpenOffice excel files, I get the error:
KeyError: "There is no item named 'xl/drawings/drawing1.xml' in the archive"
which is a bug in openpyxl as mentioned here.
I tried to read an excel, put it in a dataframe and then concat the dataframe from excel with the desired dataframe. It worked for me.
def append_df_to_excel(df, excel_path):
df_excel = pd.read_excel(excel_path)
result = pd.concat([df_excel, df], ignore_index=True)
result.to_excel(excel_path, index=False)
df = pd.DataFrame({"a":[11,22,33], "b":[55,66,77]})
append_df_to_excel(df, r"<path_to_dir>\<out_name>.xlsx")
If someone need it, I found an easier way:
Convert DF to rows in a list
rows = your_df.values.tolist()
load your workbook
workbook = load_workbook(filename=your_excel)
Pick your sheet
sheet = workbook[your_sheet]
Iterate over rows to append each:
for row in rows:
sheet.append(row)
Save woorkbook when done
workbook.save(filename=your_excel)
Putting it all together:
rows = your_df.values.tolist()
workbook = load_workbook(filename=your_excel)
sheet = workbook[your_sheet]
for row in rows:
sheet.append(row)
workbook.save(filename=your_excel)
def append_to_excel(fpath, df):
if (os.path.exists(fpath)):
x=pd.read_excel(fpath)
else :
x=pd.DataFrame()
dfNew=pd.concat([df,x])
dfNew.to_excel(fpath,index=False)

How to write on existing excel files without losing previous information using python?

I need to write a program to scrap daily quote from a certain web page and collect them into a single excel file. I wrote something which finds next empty row and starts writing new quotes on it but deletes previous rows too:
wb = openpyxl.load_workbook('gold_quote.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
.
.
.
z = 1
x = sheet['A{}'.format(z)].value
while x != None:
x = sheet['A{}'.format(z)].value
z += 1
writer = pd.ExcelWriter('quote.xlsx')
df.to_excel(writer, sheet_name='Sheet1',na_rep='', float_format=None,columns=['Date', 'Time', 'Price'], header=True,index=False, index_label=None, startrow=z-1, startcol=0, engine=None,merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None)
writer.save()
Question: How to write on existing excel files without losing previous information
openpyxl uses append to write after last used Row:
wb = openpyxl.load_workbook('gold_quote.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
rowData = ['2017-08-01', '16:31', 1.23]
sheet.append(rowData)
wb.save('gold_quote.xlsx')
writer.book = wb
writer.sheets = dict((ws.title, ws) for ws in wb.worksheets)
I figured it out, first we should define a reader to read existing data of excel file then concatenate recently extracted data from web with a defined writer, and we should drop duplicates otherwise any time the program is executed there will be many duplicated data. Then we can write previous and new data altogether:
excel_reader = pd.ExcelFile('gold_quote.xlsx')
to_update = {"Sheet1": df}
excel_writer = pd.ExcelWriter('gold_quote.xlsx')
for sheet in excel_reader.sheet_names:
sheet_df = excel_reader.parse(sheet)
append_df = to_update.get(sheet)
if append_df is not None:
sheet_df = pd.concat([sheet_df, df]).drop_duplicates()
sheet_df.to_excel(excel_writer, sheet, index=False)
excel_writer.save()

Categories