writing and appending to excel from pandas dataframe not working - python

I am trying to write pandas dataframe (allagents) to excel sheet...If the file is not there it should create a new file and if file is already there, it should append the data at the end. Below is my code..
try:
output_file = "all_agents_file.xlsx"
# try to open an existing workbook
book = load_workbook(output_file)
writer = pd.ExcelWriter(output_file)
writer.book = book
# copy existing sheets
writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
# read existing file
previous_data = pd.read_excel(output_file)
# write out the new sheet
if not all_agents.empty:
all_agents.index = np.arange(len(previous_data) + 1,
len(previous_data) + len(all_agents) + 1)
all_agents.to_excel(writer, index=True, header=False,
startrow=len(previous_data) + 1)
writer.close()
except Exception as e:
all_agents.index = np.arange(1, len(all_agents) + 1)
all_agents.to_excel(output_file)
print("File Created and Data Written in it...:", e)
The issue is it shows Exception like File is not a recognised excel file and if I specify engine="openpyxl" while reading, it gives Exception as File is not a zip file, if I give engine as "xlsxwriter", it gives exception as unknown engine xlsxwriter pandas==1.2.5, openpyxl==3.0.7
My machine is Ubuntu and the same code works on Jupyter notebook on my machine....But does not work when I run through terminal.
Any help would be appreciated

Related

How do I write multiple dataframes to excel using python and download the results using streamlit?

I have a Python script using streamlit, that allows the user to upload certain excel files, then it automatically runs my anslysis on it, and then I want them to download the results in xlsx format using the streamlit download button. However, I know how to make them download one dataframe to a csv, but not an xlsx file using the streamlit download button, which is what I want to do.
Here's what I've tried so far, and this is after my analysis where I'm just trying to create the download button for the user to download the results that are stored in 3 different dataframes:
Import pandas as pd
Import streamlit as st
# arrived_clean, booked_grouped, and arrived_grouped are all dataframes that I want to export to an excel file as results for the user to download.
def convert_df():
writer = pd.ExcelWriter('test_data.xlsx', engine='xlsxwriter')
arrived_clean.to_excel(writer, sheet_name='Cleaned', startrow=0, startcol=0, index=False)
booked_grouped.to_excel(writer, sheet_name='Output', startrow=0, startcol=0, index=False)
arrived_grouped.to_excel(writer, sheet_name='Output', startrow=0, startcol=20, index=False)
writer.save()
csv = convert_df()
st.download_button(
label="Download data",
data=csv,
file_name='test_data.xlsx',
mime='text/xlsx',
)
When I first run the streamlit app locally I get this error:
"NameError: name 'booked_grouped' is not defined"
I get it because I haven't uploaded any files yet. After I upload my files the error message goes away and everything runs normally. However, I get this error and I don't see the download button to download my new dataframes:
RuntimeError: Invalid binary data format: <class 'NoneType'> line 313,
in marshall_file raise RuntimeError("Invalid binary data format: %s" %
type(data))
Can someone tell me what I'm doing wrong? It's the last piece I have to figure out.
The Pluviophile's answer is correct, but you should use output in pd.ExcelWriter instead of file_name:
def dfs_tabs(df_list, sheet_list, file_name):
output = BytesIO()
writer = pd.ExcelWriter(output, engine='xlsxwriter')
for dataframe, sheet in zip(df_list, sheet_list):
dataframe.to_excel(writer, sheet_name=sheet, startrow=0 , startcol=0)
writer.save()
processed_data = output.getvalue()
return processed_data
When I first run the streamlit app locally I get this error:
"NameError: name 'booked_grouped' is not defined"
Assuming your code
booked_grouped = st.fileuploader('Something.....`)
You can use the below method to skip the error
if booked_grouped:
# All your code inside this indentation
To Download excel
Convert all dataframes to one single excel
# Function to save all dataframes to one single excel
def dfs_tabs(df_list, sheet_list, file_name):
output = BytesIO()
writer = pd.ExcelWriter(file_name,engine='xlsxwriter')
for dataframe, sheet in zip(df_list, sheet_list):
dataframe.to_excel(writer, sheet_name=sheet, startrow=0 , startcol=0)
writer.save()
processed_data = output.getvalue()
return processed_data
# list of dataframes
dfs = [df, df1, df2]
# list of sheet names
sheets = ['df','df1','df2']
Note that the data to be downloaded is stored in memory while the user is connected, so it's a good idea to keep file sizes under a couple of hundred megabytes to conserve memory.
df_xlsx = dfs_tabs(dfs, sheets, 'multi-test.xlsx')
st.download_button(label='📥 Download Current Result',
data=df_xlsx ,
file_name= 'df_test.xlsx')

unable to update the column to the excel file which is checkout from perforce

i have a code where i was reading the Excel file from Perforce and storing it to the local.
then doing some other work like:
-- read all sheets
-- search for particular columns and extract that column data.
-- and from that data extract the other info from JIRA.
Till here its working fine so once we got all the data we will create a dataframe and then search for the column "STATUS" if there update the column with same data otherwise create a column in the same sheet and write the data to the column.
Code:
import os
import pandas as pd
from jira import JIRA
from pandas import ExcelWriter
from openpyxl import load_workbook
def getStatus(issueID):
jiraURL='http://in-jira-test:0000' #Test server
options = {'server': jiraURL}
jira = JIRA(options, basic_auth=(userName, password))
"""Getting the status for the particular issueID"""
issue = jira.issue(issueID)
status = issue.fields.status
return status
def getFileFromPerforce():
"""
Getting the file from perforce
"""
p4File = ' "//depot/Planning/Configurations.xlsx" '
p4Localfile = "C:/depot/Planning/Configurations.xlsx"
global p4runcmd
p4runcmd = p4Cmd + " sync -f " + p4File
stream = os.popen(p4runcmd)
output = stream.read()
print(output)
return p4File, p4Localfile
def excelReader():
# function call to get the filepath
p4FileLocation, filePath = getFileFromPerforce()
xls=pd.ExcelFile(filePath)
# gets the all sheets names in a list
sheetNameList = xls.sheet_names
for sheets in sheetNameList:
data=pd.read_excel(filePath,sheet_name=sheets)
# Checking the Jira column availability in all sheets
if any("Jira" in columnName for columnName in data.columns):
Value = data['Jira']
colValue=Value.to_frame()
# Getting the status of particular jira issue and updating to the dataframe
for row,rowlen in zip(colValue.iterrows(), range(len(colValue))):
stringData=row[1].to_string()
# getting the issueID from the jira issue url
issueID = stringData.partition('/')[2].rsplit('/')[3]
status = getStatus(issueID)
# data.set_value(k, 'Status', status) #---> deprecated
data.at[rowlen, "Status"]=status
# writting the data to the same excel sheet
print("filePath-",filePath)
excelBook = load_workbook(filePath)
with ExcelWriter(filePath, engine='openpyxl') as writer:
# Save the file workbook as base
writer.book = excelBook
writer.sheets = dict((ws.title, ws) for ws in excelBook.worksheets)
# Creating the new column Status and writing to the sheet which having jira column
data.to_excel(writer, sheets, index=False)
# Save the file
writer.save()
else:
continue
if __name__ == '__main__':
# read userName and passwrod from account file
f = open("account.txt", "r")
lines = f.readlines()
userName = str(lines[0].rstrip())
password = str(lines[1].rstrip())
AdminUser = str(lines[2].rstrip())
AdminPassword = str(lines[3].rstrip())
p4Cmd = 'p4 -c snehil_tool -p indperforce:1444 -u %s -P %s '%(AdminUser,AdminPassword)
f.close
excelReader()
In this code i'm not able to write the data inside the file which i have checkout from perforce i was getting the error :
Traceback (most recent call last):
File "C:/Users/snsingh/PycharmProjects/DemoProgram/JiraStatusUpdate/updateStatusInOpticalFile.py", line 105, in <module>
excelReader()
File "C:/Users/snsingh/PycharmProjects/DemoProgram/JiraStatusUpdate/updateStatusInOpticalFile.py", line 88, in excelReader
writer.save()
File "C:\Users\snsingh\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_base.py", line 779, in __exit__
self.close()
File "C:\Users\snsingh\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_base.py", line 783, in close
return self.save()
File "C:\Users\snsingh\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_openpyxl.py", line 44, in save
return self.book.save(self.path)
File "C:\Users\snsingh\AppData\Local\Programs\Python\Python37\lib\site-packages\openpyxl\workbook\workbook.py", line 392, in save
save_workbook(self, filename)
File "C:\Users\snsingh\AppData\Local\Programs\Python\Python37\lib\site-packages\openpyxl\writer\excel.py", line 291, in save_workbook
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
File "C:\Users\snsingh\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1204, in __init__
self.fp = io.open(file, filemode)
PermissionError: [Errno 13] Permission denied: 'C:/depot/Planning/Configurations.xlsx'
This piece of code is not working from the above code:
# writting the data to the same excel sheet
print("filePath-",filePath)
excelBook = load_workbook(filePath)
with ExcelWriter(filePath, engine='openpyxl') as writer:
# Save the file workbook as base
writer.book = excelBook
writer.sheets = dict((ws.title, ws) for ws in excelBook.worksheets)
# Creating the new column Status and writing to the sheet which having jira column
data.to_excel(writer, sheets, index=False)
# Save the file
writer.save()
NOTE:
This code works fine with local file contain the same data and its able to write perfectly. but its only happens when i readed the file from perforce.
Even i have given all the permission to the folder and tried with different folder path but i got the same error .Please tell me where i'm making mistake any help would be grate or any questions please fill free to write in comment.
thanks
Three things:
When you get the file from Perforce, use p4 sync instead of p4 sync -f.
After you p4 sync the file, p4 edit it. That makes it writable so that you can edit it.
After you save your edits to the file, p4 submit it. That puts your changes in the depot.

How to manually open a file that Python is writing in simultaneously?

I have a python code which performs some calculations and then writes the output to an excel sheet as follows-
# import the necessary packages
import numpy as np
import argparse
import time
import cv2
import os
from openpyxl import load_workbook
import pandas as pd
from datetime import date, datetime
filename = r'PathToDirectory\Data.xlsx'
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs, header = False, index=False)
# save the workbook
writer.save()
while(True):
# --------------------------------------------------------------
# -----------REST OF THE CODE AND COMPUTATION HERE--------------
# --------------------------------------------------------------
today = date.today()
today = today.strftime("%d/%m/%Y")
now = datetime.now()
now = now.strftime("%H:%M:%S")
rec_classes = list(set(classIDs))
for counting in range(len(rec_classes)):
my_label = LABELS[rec_classes[counting]]
my_count = classIDs.count(rec_classes[counting])
data_dict = ({'today_date' : [today], 'now_time' : [now], 'animal_class' : [my_label], 'animal_count' : my_count})
df = pd.DataFrame(data_dict)
append_df_to_excel(filename, df)
The code works fine if I want to write to an excel sheet, and after the code has run, I can open the file and all the content appears perfectly.
The problem is that I want to open the excel file while it is running. I want to see the rows being added and data being appended as the code runs. However, whenever I open the excel file while the code is running, I get a 'Permission Denied' error, and the code stops.
I have tried solving it using except OSError pass but it did not help.
Is there anything that can be done?
I suggest you two possible ways:
To write with prints on the the terminal the data you are inserting, or
To use Python's logging library, which can be used to log the data in a file and to have a complete vision of what's going on. You can set format, level of importance of logging, handlers... it's an awesome tool, also for other usages.
No, there is no straightforward way to have Excel dynamically update its display of a changing file. It's loaded from disk into memory once, and then Excel ignores the disk file.
(If you wrote an Excel macro which revisited the file periodically, maybe you could pull this off. But friends don't let friends use Excel anyway.)

Excel file can not be opened anymore after writing with pd.ExcelWriter

I am writing a dataframe to a range of Excel file in a certain tab, but after saving the file, I see that the Excel file has become unusable. Could anyone suggest a solution?
import openpyxl as pyx
df3_xmax= df3.iloc[0]
wb = pyx.load_workbook(dst)
xl_writer = pd.ExcelWriter(dst, engine='openpyxl')
xl_writer.book = wb
xl_writer.sheets = {ws.title:ws for ws in wb.worksheets}
df3_xmax.to_excel(xl_writer, 'shname', index=False, header=False, startcol=3, startrow=7)
xl_writer.save()

Saving excel work book not working in python

for sheet_name in book.sheet_names():
for index in range(len(tabs)):
tab = tabs[index]
if sheet_name == tab:
dump_file_name = dump_files[index]
dump_file_name = file_prefix+dump_file_name
sheet = book.sheet_by_name(sheet_name)
new_book = Workbook()
sheet1 = new_book.add_sheet("Sheet 1")
for row in range(sheet.nrows):
values = []
for col in range(sheet.ncols):
sheet1.write(row,col,sheet.cell(row,col).value)
xlsx_file_name = dirname+"/"+dump_file_name+".xlsx"
sheet1.title = xlsx_file_name
new_book.save(xlsx_file_name)
The file is creating and data is there, but if I open it in openoffice.org and click the save button it asks for new name.
The file can not be read by PHP also. Again if I open and save it with new name then it works perfectly. I think we have to add something in the code so that it could be used by PHP.
i did google and found the solution here
http://xlsxwriter.readthedocs.org/getting_started.html
This is exactly what i wanted.
Creating and saving files to xlsx format.
Now its working perfectly.
original source
How to save Xlsxwriter file in certain path?
important link:
https://pypi.python.org/pypi/PyExcelerate

Categories