how to open passward protected excel file and save in dataframe - python

i know this question has been asked many times , i have read through the answers of previously asked question however im still not getting how to open the file
what i want to do is i have some data in df that i want to save in the existing excel sheet df2 which is password procted
df2 = pd.read_excel(r'C:\Users\RTambe00000\Desktop\python basics\web scraping\IEDriverServer_Win32_4.0.0\Data Miner Data.xlsx', sheet_name='data (1)')
df2 = df2.merge(df, left_on='Created', right_on ='Preferred Call Time')
im getting this below error
XLRDError: Can't find workbook in OLE2 compound document

i have solved my issue and was able to open the password protected excel file. Below is my code:-
import win32com.client
xl = win32com.client.gencache.EnsureDispatch('Excel.Application')
wb = xl.Workbooks.Open('File path', False, False, None, 'Read passward','Edit passward')
xl.Visible=1
time.sleep(5)
# to refresh the file
wb.RefreshAll()
time.sleep(5)
wb.Save()
xl.Quit()

Related

How do I write multiple dataframes to excel using python and download the results using streamlit?

I have a Python script using streamlit, that allows the user to upload certain excel files, then it automatically runs my anslysis on it, and then I want them to download the results in xlsx format using the streamlit download button. However, I know how to make them download one dataframe to a csv, but not an xlsx file using the streamlit download button, which is what I want to do.
Here's what I've tried so far, and this is after my analysis where I'm just trying to create the download button for the user to download the results that are stored in 3 different dataframes:
Import pandas as pd
Import streamlit as st
# arrived_clean, booked_grouped, and arrived_grouped are all dataframes that I want to export to an excel file as results for the user to download.
def convert_df():
writer = pd.ExcelWriter('test_data.xlsx', engine='xlsxwriter')
arrived_clean.to_excel(writer, sheet_name='Cleaned', startrow=0, startcol=0, index=False)
booked_grouped.to_excel(writer, sheet_name='Output', startrow=0, startcol=0, index=False)
arrived_grouped.to_excel(writer, sheet_name='Output', startrow=0, startcol=20, index=False)
writer.save()
csv = convert_df()
st.download_button(
label="Download data",
data=csv,
file_name='test_data.xlsx',
mime='text/xlsx',
)
When I first run the streamlit app locally I get this error:
"NameError: name 'booked_grouped' is not defined"
I get it because I haven't uploaded any files yet. After I upload my files the error message goes away and everything runs normally. However, I get this error and I don't see the download button to download my new dataframes:
RuntimeError: Invalid binary data format: <class 'NoneType'> line 313,
in marshall_file raise RuntimeError("Invalid binary data format: %s" %
type(data))
Can someone tell me what I'm doing wrong? It's the last piece I have to figure out.
The Pluviophile's answer is correct, but you should use output in pd.ExcelWriter instead of file_name:
def dfs_tabs(df_list, sheet_list, file_name):
output = BytesIO()
writer = pd.ExcelWriter(output, engine='xlsxwriter')
for dataframe, sheet in zip(df_list, sheet_list):
dataframe.to_excel(writer, sheet_name=sheet, startrow=0 , startcol=0)
writer.save()
processed_data = output.getvalue()
return processed_data
When I first run the streamlit app locally I get this error:
"NameError: name 'booked_grouped' is not defined"
Assuming your code
booked_grouped = st.fileuploader('Something.....`)
You can use the below method to skip the error
if booked_grouped:
# All your code inside this indentation
To Download excel
Convert all dataframes to one single excel
# Function to save all dataframes to one single excel
def dfs_tabs(df_list, sheet_list, file_name):
output = BytesIO()
writer = pd.ExcelWriter(file_name,engine='xlsxwriter')
for dataframe, sheet in zip(df_list, sheet_list):
dataframe.to_excel(writer, sheet_name=sheet, startrow=0 , startcol=0)
writer.save()
processed_data = output.getvalue()
return processed_data
# list of dataframes
dfs = [df, df1, df2]
# list of sheet names
sheets = ['df','df1','df2']
Note that the data to be downloaded is stored in memory while the user is connected, so it's a good idea to keep file sizes under a couple of hundred megabytes to conserve memory.
df_xlsx = dfs_tabs(dfs, sheets, 'multi-test.xlsx')
st.download_button(label='📥 Download Current Result',
data=df_xlsx ,
file_name= 'df_test.xlsx')

Read XLS file with Pandas & xlrd returns error; xlrd opens file on its own

I am writing some automated scripts to process Excel files in Python, some are in XLS format. Here's a code snippet of my attempting to do so with Pandas:
df = pd.read_excel(contents, engine='xlrd', skiprows=5, names=['some', 'column', 'headers'])
contents is the file contents pulled from an AWS S3 bucket. When this line runs I get [ERROR] ValueError: File is not a recognized excel file.
In troubleshooting this, I have tried to access the spreadsheet using xlrd directly:
book = xlrd.open_workbook(file_contents=contents)
print("Number of worksheets is {}".format(book.nsheets))
print("Worksheet names: {}".format(book.sheet_names()))
This works without errors so xlrd seems to recognize it as an Excel file, just not when asked to do so by Pandas.
Anyone know why Pandas won't read the file with xlrd as the engine? Or can someone help me take the sheet from xlrd and convert it into a Pandas dataframe?
Or can someone help me take the sheet from xlrd and convert it into a
Pandas dataframe?
pd.read_excel can take a book...
import xlrd
book = xlrd.open_workbook(filename='./file_check/file.xls')
df = pd.read_excel(book, skiprows=5)
print(df)
some column headers
0 1 some foo
1 2 strings bar
2 3 here yes
3 4 too no
I'll include the code below that may help if you want to check/handle Excel file types. Maybe you can adapt it for your needs.
The code loops through a local folder and shows the file and extension but then uses python-magic to drill into it. It also has a column showing guessing from mimetypes but that isn't as good. Do zoom into the image of the frame and see that some .xls are not what the extension says. Also a .txt is actually an Excel file.
import pandas as pd
import glob
import mimetypes
import os
# https://pypi.org/project/python-magic/
import magic
path = r'./file_check' # use your path
all_files = glob.glob(path + "/*.*")
data = []
for file in all_files:
name, extension = os.path.splitext(file)
data.append([file, extension, magic.from_file(file, mime=True), mimetypes.guess_type(file)[0]])
df = pd.DataFrame(data, columns=['Path', 'Extension', 'magic.from_file(file, mime=True)', 'mimetypes.guess_type'])
# del df['magic.from_file(file, mime=True)']
df
From there you could filter files based on their type:
xlsx_file_format = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
xls_file_format = 'application/vnd.ms-excel'
for file in all_files:
if magic.from_file(file, mime=True) == xlsx_file_format:
print('xlsx')
# DO SOMETHING SPECIAL WITH XLSX FILES
elif magic.from_file(file, mime=True) == xls_file_format:
print('xls')
# DO SOMETHING SPECIAL WITH XLS FILES
else:
continue
dfs = []
for file in all_files:
if (magic.from_file(file, mime=True) == xlsx_file_format) or \
(magic.from_file(file, mime=True) == xls_file_format):
# who cares, it all works with this for the demo...
df = pd.read_excel(file, skiprows=5, names=['some', 'column', 'headers'])
dfs.append(df)
print('\nHow many frames did we get from seven files? ', len(dfs))
Output:
xlsx
xls
xls
xlsx
How many frames did we get from seven files? 4

Unable to read excel file , list index out of range error, cant find Sheets

I am trying to read excel (.xlsx) file and convert it to dataframe. I used pandas.ExelFile , pandas.read_excel, openpyxl load_workbook and even io file reading methods but i am unable to read Sheet of this file. Every time i get list index out of range error or no sheet names is case of openpyxl. Also tried xlrd method.
temp_df = pd.read_excel("v2s.xlsx", sheet_name = 0)
or
temp_df = pd.read_excel("v2s.xlsx", sheet_name = "Sheet1")
or
from openpyxl import load_workbook
workbook = load_workbook(filename="v2s.xlsx",read_only = True, data_only = True)
workbook.sheetnames
Link to excel file
According to this ticket, the file is saved in a "slightly defective" format.
The user posted that he used Save As to change the type of document back to a normal Excel spreadsheet file.
Your file is this type:
You need to save it as:
Then running your code
from openpyxl import load_workbook
workbook = load_workbook(filename="v2s_0.xlsx",read_only = True, data_only = True)
print(workbook.sheetnames)
Outputs:
['Sheet1']

How to manually open a file that Python is writing in simultaneously?

I have a python code which performs some calculations and then writes the output to an excel sheet as follows-
# import the necessary packages
import numpy as np
import argparse
import time
import cv2
import os
from openpyxl import load_workbook
import pandas as pd
from datetime import date, datetime
filename = r'PathToDirectory\Data.xlsx'
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs, header = False, index=False)
# save the workbook
writer.save()
while(True):
# --------------------------------------------------------------
# -----------REST OF THE CODE AND COMPUTATION HERE--------------
# --------------------------------------------------------------
today = date.today()
today = today.strftime("%d/%m/%Y")
now = datetime.now()
now = now.strftime("%H:%M:%S")
rec_classes = list(set(classIDs))
for counting in range(len(rec_classes)):
my_label = LABELS[rec_classes[counting]]
my_count = classIDs.count(rec_classes[counting])
data_dict = ({'today_date' : [today], 'now_time' : [now], 'animal_class' : [my_label], 'animal_count' : my_count})
df = pd.DataFrame(data_dict)
append_df_to_excel(filename, df)
The code works fine if I want to write to an excel sheet, and after the code has run, I can open the file and all the content appears perfectly.
The problem is that I want to open the excel file while it is running. I want to see the rows being added and data being appended as the code runs. However, whenever I open the excel file while the code is running, I get a 'Permission Denied' error, and the code stops.
I have tried solving it using except OSError pass but it did not help.
Is there anything that can be done?
I suggest you two possible ways:
To write with prints on the the terminal the data you are inserting, or
To use Python's logging library, which can be used to log the data in a file and to have a complete vision of what's going on. You can set format, level of importance of logging, handlers... it's an awesome tool, also for other usages.
No, there is no straightforward way to have Excel dynamically update its display of a changing file. It's loaded from disk into memory once, and then Excel ignores the disk file.
(If you wrote an Excel macro which revisited the file periodically, maybe you could pull this off. But friends don't let friends use Excel anyway.)

Asking warning message after saving workbook in openpyxl

wb_write = openpyxl.load_workbook(file_path)
first_sheet = wb_write.get_sheet_names()[0]
ws = wb_write.get_sheet_by_name(first_sheet)
#here row=1 ,column= 5
ws.cell(row=i, column=mt_mac_id_column).value = MT_MAC
wb_write.save(file_path)
After writing data into xlsx file and saved in the same workbook.
Opened manually xlsx file in windows Changes are reflected in xlsx file but while closing the xlsx file it is asking for confirmation like "do you want to save the changes you made to file"
Why it is asking for confirmation? How do I avoid this warning?

Categories