Parsing xlsx sheet from HTTP response using openpyxl library

Parsing xlsx sheet from HTTP response using openpyxl library - python

I am writing a test case for testing Excel sheet parsing.
I tried to parse the response.content into list of objects using openpyxl.
I have extracted the filename from response header and converted into File like object. load_workbook() is not accepting the filename.
def test_export_timesheet(self):
change_url = '/admin/core/timesheet/'
#response contains the generated file using openpyxl
response = self.client.post(change_url, {'action': 'export_xlsx', '_selected_action': [x.id for x in timesheets]})
content = response._headers.get('content-disposition')[1]
start = content.find('=') + 1
end = content.find('.xlsx')
content_path = (content[start:end]+'.xlsx')
#Passing file like object
wb = load_workbook(BytesIO(filename="'"+content_path+"'"))
ws = wb.get_sheet_by_name(name="'" + content[start:end] + "'")
for row in ws.iter_rows():
for cell in row:
print cell.value
Basically I am trying to validate the contents of the file in my testcase.
Is there a way to do this?

# response contains the generated file using openpyxl
response = self.client.post(change_url, ・・・・・
When you get the response above, "response.content" is bytes-type, so you can load it into the buffer with BytesIO. Continuing from above, write:
from io import BytesIO
file_like_object = BytesIO(response.content)
(from openpyxl import load_workbook) # if this line is needed...
wb = load_workbook(file_like_object)
Now you can use this "wb" for general openpyxl operations

Related

How can I update an excel file in sharepoint after read?

Originally I want to use data in one excel file to update data in another excel file in sharepoint, while is split 3 steps.
implemented read excel file in sharepoint site.
implement writing changes to the excel file in sharepoint site.
need to implement reading from an excel and get data then use data to update anther excel.(not in the code below)
I know I should use Office365 API to read excel file in sharepoint. When I want to use openpyxl to do wb.save (), I got error:OSError: [Errno 22] Invalid argument. I don't know how to put absolute web url in save(). This is different with saving an excel in local drive. frustrated, appreciate it.
SP_SITE_URL ='https://companyname.sharepoint.com/sites/SiteName'
relative_url = "/sites/SiteName/Shared Documents/FolderName"
# 1. Create a ClientContext object and use the user’s credentials for authentication
ctx = ClientContext(SP_SITE_URL).with_user_credentials(USERNAME, PASSWORD)
ClientFolder = ctx.web.get_folder_by_server_relative_path(relative_url)
ctx.load(ClientFolder)
ctx.execute_query()
#if you want to get the files in the folder
files = ClientFolder.files
print(files)
ctx.load(files)
ctx.execute_query()
newest_file_url = ''
for myfile in files:
if myfile.properties["Name"] == 'Filename.xlsx':
newest_file_url = myfile.properties["ServerRelativeUrl"]
# Get Excel File by newest_file_url identified above
response= File.open_binary(ctx, newest_file_url)
# save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) # set file object to start
# load Excel file from BytesIO stream
wb = openpyxl.load_workbook(bytes_file_obj)
worksheet= wb['Sheet1']
# updates
row_count = worksheet.max_row
col_count = worksheet.max_column
for i in range(2,row_count+1):
for j in range(4,col_count + 1):
cellref=worksheet.cell(i, j)
cellref.value=datetime.today().strftime('%Y-%m-%d')
# save update to the file
wb.save('https://companyname.sharepoint.com/:x:/r/sites/SiteName/Shared%20Documents/FolderName/Filename.xlsx?d=xxxxx&csf=1&web=1&e=xxx')

how to load workbook using tempfile using openpyxl

In my flask web app, I am writing data from excel to a temporary file which I then parse in memory. This method works fine with xlrd but it does not with openpyxl.
Here is how I am writing to a temporary file which I then parse with xlrd.
xls_str = request.json.get('file')
try:
xls_str = xls_str.split('base64,')[1]
xls_data = b64decode(xls_str)
except IndexError:
return 'Invalid form data', 406
save_path = os.path.join(tempfile.gettempdir(), random_alphanum(10))
with open(save_path, 'wb') as f:
f.write(xls_data)
f.close()
try:
bundle = parse(save_path, current_user)
except UnsupportedFileException:
return 'Unsupported file format', 406
except IncompatibleExcelException as ex:
return str(ex), 406
finally:
os.remove(save_path)]
When I use openpyxl with the code above it complains about an unsupported type but that is because I'm using a temporary file to parse the data hence it doesn't have an ".xlsx" extension and even if I added it, it would not work because its not a excel file after all.
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support file format,
please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm
What should I do?

Why not create a temp excel file with openpyxl instead. Give this example a try. I did something similar in the past.
from io import BytesIO
from openpyxl.writer.excel import save_virtual_workbook
from openpyxl import Workbook
def create_xlsx():
wb = Workbook()
ws = wb.active
row = ('Hello', 'Boosted_d16')
ws.append(row)
return wb
#app.route('/', methods=['GET'])
def main():
xlsx = create_xlsx()
filename = BytesIO(save_virtual_workbook(xlsx))
return send_file(
filename,
attachment_filename='test.xlsx',
as_attachment=True
)

Testing the data sent by Flask's send_file()

I have a Flask view that generates an Excel file (using openpyxl) from some data and it's returned to the user using send_file(). A very simplified version:
import io
from flask import send_file
from openpyxl.workbook import Workbook
#app.route("/download/<int:id>")
def file_download(id):
wb = Workbook()
# Add sheets and data to the workbook here.
file = io.BytesIO()
wb.save(file)
file.seek(0)
return send_file(file, attachment_filename=f"{id}.xlsx", as_attachment=True)
This works fine -- the file downloads and is a valid Excel file. But I'm not sure how to test the file download. So far I have something like this (using pytest):
def test_file_download(test_client):
response = test_client.get("/download/123")
assert response.status_code == 200
assert response.content_type == "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
Which passes, but I'd like to test that the (a) the filename used is as expected and (b) that the file... exists? Is an Excel file?
I can access response.get_data(), which is a bytes object, but I'm not sure what to do with it.

To check that the filename used is as expected you could check that the Content-Disposition header is as expected. For example:
assert response.headers['Content-Disposition'] == 'attachment; filename=123.xlsx'
To check "the existance of the file" you could for example check that for some test data it lies within an expected range of size. For example:
assert 3000 <= response.content_length <= 5000
assert 3000 <= len(response.data) <= 5000
Another level of verifying that the Excel file works would be attempting to load the data back into openpyxl and checking if it reports any problems. For example:
from io import BytesIO
from openpyxl import load_workbook
load_workbook(filename=BytesIO(response.data))
Here you risk running into some sort of exception like:
zipfile.BadZipFile: File is not a zip file
Which would indicate that the data contents of the file are invalid as a Excel file.

Read csv from url one line at the time in Python 3.X

I have to read an online csv-file into a postgres database, and in that context I have some problems reading the online csv-file properly.
If I just import the file it reads as bytes, so I have to decode it. During the decoding it, however, seems that the entire file is turned into one long string.
# Libraries
import csv
import urllib.request
# Function for importing csv from url
def csv_import(url):
url_open = urllib.request.urlopen(url)
csvfile = csv.reader(url_open.decode('utf-8'), delimiter=',')
return csvfile;
# Reading file
p_pladser = csv_import("http://wfs-kbhkort.kk.dk/k101/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=k101:p_pladser&outputFormat=csv&SRSNAME=EPSG:4326")
When I try to read the imported file line by line it only reads one character at the time.
for row in p_pladser:
print(row)
break
['F']
Can you help me identify where it goes wrong? I am using Python 3.6.
EDIT: Per request my solution in R
# Loading library
library(RPostgreSQL)
# Reading dataframe
p_pladser = read.csv("http://wfs-kbhkort.kk.dk/k101/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=k101:p_pladser&outputFormat=csv&SRSNAME=EPSG:4326", encoding = "UTF-8", stringsAsFactors = FALSE)
# Creating database connection
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = "secretdatabase", host = "secrethost", user = "secretuser", password = "secretpassword")
# Uploading dataframe to postgres database
dbWriteTable(con, "p_pladser", p_pladser , append = TRUE, row.names = FALSE, encoding = "UTF-8")
I have to upload several tables for 10,000 to 100,000 rows, and it total in R it takes 1-2 seconds to upload them all.

csv.reader expect as argument a file like object and not a string. You have 2 options here:
either you read the data into a string (as you currently do) and then use a io.StringIO to build a file like object around that string:
def csv_import(url):
url_open = urllib.request.urlopen(url)
csvfile = csv.reader(io.StringIO(url_open.read().decode('utf-8')), delimiter=',')
return csvfile;
or you use a io.TextIOWrapper around the binary stream provided by urllib.request:
def csv_import(url):
url_open = urllib.request.urlopen(url)
csvfile = csv.reader(io.TextIOWrapper(url_open, encoding = 'utf-8'), delimiter=',')
return csvfile;

How about loading the CSV with pandas!
import pandas as pd
csv = pd.read_csv("http://wfs-kbhkort.kk.dk/k101/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=k101:p_pladser&outputFormat=csv&SRSNAME=EPSG:4326")
print csv.columns
OR if you have the CSV downloaded in your machine, then directly
csv = pd.read_csv("<path_to_csv>")
Ok! You may consider passing delimiter and quotechar arguments to csv.reader, because the CSV contains quotes as well! Something like this,
with open('p_pladser.csv') as f:
rows = csv.reader(f, delimiter=',', quotechar='"')
for row in rows:
print(row)

How to export data in python with excel format?

views.py
def export_to_excel(request):
lists = MyModel.objects.all()
# your excel html format
template_name = "sample_excel_format.html"
response = render_to_response(template_name, {'lists': lists})
# this is the output file
filename = "model.csv"
response['Content-Disposition'] = 'attachment; filename='+filename
response['Content-Type'] = 'application/vnd.ms-excel; charset=utf-16'
return response
urls.py
from django.conf.urls.defaults import *
urlpatterns = patterns('app_name.views',
url(r'^export/$', 'export_to_excel', name='export_to_excel'),
)
Last, in your page create a button or link that will point in exporting.
page.html
Export
Nothing getting file option for download and not giving any error but i can see all result in log its working fine.

I think that the solution to export excel files is :
if 'excel' in request.POST:
response = HttpResponse(content_type='application/vnd.ms-excel')
response['Content-Disposition'] = 'attachment; filename=Report.xlsx'
xlsx_data = WriteToExcel(weather_period, town)
response.write(xlsx_data)
return response
In this example the library used for exporting is xlsxWriter.
Here is a very complete and practical solution for this, and many others: http://assist-software.net/blog/how-export-excel-files-python-django-application .

In addition to the options shown in the other answers you can also use XlsxWriter to create Excel files.
See this example.

It seems that you are trying to generate an excel workbook with HTML content. I don't know if Excel (or LibreOffice) is able to open such file but I think it is not the right approach.
You should fist generate a excel file : you can use csv, xlwt for xls and openpyxl for xlsx
The content of the file can be passed to the HttpResponse
for example, if you work with xlwt:
import xlwt
wb = xlwt.Workbook()
#use xlwt to fill the workbook
#
#ws = wb.add_sheet("sheet")
#ws.write(0, 0, "something")
response = HttpResponse(mimetype='application/vnd.ms-excel')
response['Content-Disposition'] = 'attachment; filename=the-file.xls'
wb.save(response)
return response
You can also look at
django-excel-response which does all the work for you. (I think it doesn't support xlsx format)
django-excel-export
I hope it helps

This seems to be based on the practice of tricking excel into opening an HTML table by changing the file name and MIME type. In order to make this work, the HTML file has to assemble an HTML table, and this is likely to trigger a warning that the real content of the file is different from the declared content.
IMHO it is a crude hack and should be avoided. Instead you can create a real excel file using the xlwt module, or you can create a real CSV file using the csv module.
[update]
After looking the blog post you refered, I see it is recommending another bad practice: using anything but the csv module to produce CSV files is dangerous because if the data contains the delimiter character, quotes or line breaks, you may end up with a bad CSV.
The csv module will take care of all corner cases and produce a proper formatted output.
I've seen people use a Django template naming the file "something.xls" and using HTML tables instead of the CSV format, but this has some corner cases as well.

Export Data to XLS File
Use it if you really need to export to a .xls file. You will be able to add formating as bold font, font size, define column size, etc.
First of all, install the xlwt module. The easiest way is to use pip.
pip install xlwt
views.py
import xlwt
from django.http import HttpResponse
from django.contrib.auth.models import User
def export_users_xls(request):
response = HttpResponse(content_type='application/ms-excel')
response['Content-Disposition'] = 'attachment; filename="users.xls"'
wb = xlwt.Workbook(encoding='utf-8')
ws = wb.add_sheet('Users')
# Sheet header, first row
row_num = 0
font_style = xlwt.XFStyle()
font_style.font.bold = True
columns = ['Username', 'First name', 'Last name', 'Email address', ]
for col_num in range(len(columns)):
ws.write(row_num, col_num, columns[col_num], font_style)
# Sheet body, remaining rows
font_style = xlwt.XFStyle()
rows = User.objects.all().values_list('username', 'first_name', 'last_name', 'email')
for row in rows:
row_num += 1
for col_num in range(len(row)):
ws.write(row_num, col_num, row[col_num], font_style)
wb.save(response)
return response
urls.py
import views
urlpatterns = [
...
url(r'^export/xls/$', views.export_users_xls, name='export_users_xls'),
]
template.html
Export all users
Learn more about the xlwt module reading its official documentation. https://simpleisbetterthancomplex.com/tutorial/2016/07/29/how-to-export-to-excel.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing xlsx sheet from HTTP response using openpyxl library - python

Related

How can I update an excel file in sharepoint after read?

how to load workbook using tempfile using openpyxl

Testing the data sent by Flask's send_file()

Read csv from url one line at the time in Python 3.X

How to export data in python with excel format?

Categories

Resources