Open Excel file from zipfolder in openpyxl - python

I am trying following code.
from zipfile import ZipFile
from openpyxl import load_workbook
from io import BytesIO
zip_path = r"path/to/zipfile.zip"
with ZipFile(zip_path) as myzip:
with myzip.open(myzip.namelist()[0]) as myfile:
wb = load_workbook(filename=BytesIO(myfile.read()))
data_sheet = wb.worksheets[1]
for row in data_sheet.iter_rows(min_row=3, min_col=3):
print(row[0].value)
it shows
ValueError: stat: path too long for Windows
Is this possible?
I am trying logic from Using openpyxl to read file from memory

With xlrd following code works fine.
with ZipFile(zip_path) as myzip:
with myzip.open(myzip.namelist()[0]) as myfile:
book = xlrd.open_workbook(file_contents=(myfile.read()))
sh = book.sheet_by_index(0)
#your code here

Related

How to copy data from txt file and paste to XLSX as value with Python?

How to copy data from txt file and paste to XLSX as value with Python?
(txt)File: simple.txt which contains date,name,qty,order id
I need the data from txt and copy paste to xlsx as VALUE.
How it's possible it? Which package could handle this process with Python?
openpyxl?Panda? Could you please give an example code?
My code which not suitable for the paste and save as values:
import csv
import openpyxl
input_file = 'C:\Users\mike\Documents\rep\LX02.txt'
output_file = 'C:\Users\mike\Documents\rep\LX02.xlsx'
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
with open(input_file, 'r') as data:
reader = csv.reader(data, delimiter='\t')
for row in reader:
ws.append(row)
wb.save(output_file)
In pandas, with pandas.read_csv and pandas.DataFrame.to_excel combined, you can store the content of a comma delimited .txt file in an .xlsx spreedsheet by running the code below :
#pip install pandas
import pandas as pd
input_file = r'C:\Users\mbalog\Documents\FGI\LX02.txt'
output_file = r'C:\Users\mbalog\Documents\FGI\LX02.xlsx'
pd.read_csv(input_file).to_excel(output_file, index=False)

Moving data from multiple csv files to xlsx files

I have a folder that contains 2 more folders. Inside each folder is a csv and xlsx file.
Ex:
test (folder 1)
test.csv
test.xlsx
test2 (folder 2)
test2.csv
test2.xlsx
I have a working script that moves data from a csv file to a xlsx file.
Say ‘test.csv’ contains the following data:
A
B
test.com
yes
test.com/dl
no
1.1.1.1
yes
The code below will move that data into test.xlsx:
from openpyxl import load_workbook
import csv
wb = load_workbook(“D:\\local\\test\\test\\test.xlsx”)
ws = wb.active
with open(“D:\\local\\test\\test\\test.csv”, ‘r’) as f:
for row in csv.reader(f):
ws.append(row)
wb.save(“D:\\local\\test\\test\\test.xlsx”)
Is there an easy way to move all data from ‘test.csv’ to ‘test.xlsx’ and ‘test2.csv’ to ‘test2.xlsx’ at once? The names of the csv and xlsx files will not always be the same but the location will.
I have tried the following but it returns a traceback error:
from openpyxl import load_workbook
import csv
wb = load_workbook(“D:\\local\\test\\{}\\{}.xlsx”)
ws = wb.active
with open(“D:\\local\\test\\{}\\{}.csv”, ‘r’) as f:
for row in csv.reader(f):
ws.append(row)
wb.save(“D:\\local\\test\\{}\\{}.xlsx”)
Thanks!
Assuming that the .xlsx files already exist and are empty, you can use the code below to copy the content of multiple .csv files to those .xlsx files (that have the same stem/filename).
import os
from pathlib import Path
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
directory = r'D:\local\test'
for file in Path(directory).glob('*/*.csv'):
df = pd.read_csv(file, encoding='utf-8-sig')
excel_path = os.path.splitext(file)[0]+'.xlsx'
wb = load_workbook(excel_path)
ws = wb.active
for r in dataframe_to_rows(df, index=False, header=True):
ws.append(r)
wb.save(excel_path)

Issue writting to file with pyinstaller

So an update, I found my compile issue was that I needed to change my notebook to a py file and choosing save as doesn't do that. So I had to run a different script turn my notebook to a py file. And part of my exe issue was I was using the fopen command that apparently isn't useable when compiled into a exe. So I redid the code to what is above. But now I get a write error when trying to run the script. I can not find anything on write functions with os is there somewhere else I should look?
Original code:
import requests
import json
import pandas as pd
import csv
from pathlib import Path
response = requests.get('url', headers={'CERT': 'cert'}, stream=True).json()
json2 = json.dumps(response)
f = open('data.json', 'r+')
f.write(json2)
f.close()
Path altered code:
import requests
import json
import pandas as pd
import csv
from pathlib import Path
response = requests.get('url', headers={'CERT': 'cert'}, stream=True).json()
json2 = json.dumps(response)
filename = 'data.json'
if '_MEIPASS2' in os.environ:
filename = os.path.join(os.environ['_MEIPASS2'], filename)
fd = open(filename, 'r+')
fd.write(json2)
fd.close()
The changes to the code allowed me to get past the fopen issue but created a write issue. Any ideas?
If you want to write to a file, you have to open it as writable.
fd = open(filename, 'wb')
Although I don't know why you're opening it in binary if you're writing text.

how to load workbook using tempfile using openpyxl

In my flask web app, I am writing data from excel to a temporary file which I then parse in memory. This method works fine with xlrd but it does not with openpyxl.
Here is how I am writing to a temporary file which I then parse with xlrd.
xls_str = request.json.get('file')
try:
xls_str = xls_str.split('base64,')[1]
xls_data = b64decode(xls_str)
except IndexError:
return 'Invalid form data', 406
save_path = os.path.join(tempfile.gettempdir(), random_alphanum(10))
with open(save_path, 'wb') as f:
f.write(xls_data)
f.close()
try:
bundle = parse(save_path, current_user)
except UnsupportedFileException:
return 'Unsupported file format', 406
except IncompatibleExcelException as ex:
return str(ex), 406
finally:
os.remove(save_path)]
When I use openpyxl with the code above it complains about an unsupported type but that is because I'm using a temporary file to parse the data hence it doesn't have an ".xlsx" extension and even if I added it, it would not work because its not a excel file after all.
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support file format,
please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm
What should I do?
Why not create a temp excel file with openpyxl instead. Give this example a try. I did something similar in the past.
from io import BytesIO
from openpyxl.writer.excel import save_virtual_workbook
from openpyxl import Workbook
def create_xlsx():
wb = Workbook()
ws = wb.active
row = ('Hello', 'Boosted_d16')
ws.append(row)
return wb
#app.route('/', methods=['GET'])
def main():
xlsx = create_xlsx()
filename = BytesIO(save_virtual_workbook(xlsx))
return send_file(
filename,
attachment_filename='test.xlsx',
as_attachment=True
)

Processing files stored on cloud (S3 or Spaces)

I've setup a script to process excel files uploaded by a user. The scripts works fine when the file is stored on the local disk.
from openpyxl import load_workbook
wb = load_workbook("file_path.xlsx") # Load workbook from disk works fine
ws = wb.worksheets[0]
I've then setup django-storages to allow user uploaded files to be stored on digital ocean spaces.
My problem now is how to access and process the cloud stored file. For the record, if I pass the file URL to load_workbook it fails with the error No such file or directory: file_url.
Do I have to download the file using requests and then process it as a local file? Feels inefficient? What options do I have?
You can get byte content of the file, wrap it in ContentFile and pass it to openpyxl. Assuming your model is FileContainer and field name is file:
from django.core.files.base import ContentFile
from openpyxl import load_workbook
fc = FileContainer.objects.first()
bytefile = fc.file.read()
wb = load_workbook(ContentFile(bytefile))
ws = wb.worksheets[0]
I checked it with S3 and it works just fine.
If you want to actually save file locally, you can try this:
from django.core.files.base import ContentFile
from django.core.files.storage import FileSystemStorage
from openpyxl import load_workbook
fc = FileContainer.objects.first()
local_storage = FileSystemStorage()
bytefile = fc.file.read()
newfile = ContentFile(bytefile)
relative_path = local_storage.save(fc.file.name, newfile)
wb = load_workbook(local_storage.path(relative_path))
ws = wb.worksheets[0]

Categories