Django/Python: Save an HTML table to Excel - python

I have an HTML table that I'd like to be able to export to an Excel file. I already have an option to export the table into an IQY file, but I'd prefer something that didn't allow the user to refresh the data via Excel. I just want a feature that takes a snapshot of the table at the time the user clicks the link/button.
I'd prefer it if the feature was a link/button on the HTML page that allows the user to save the query results displayed in the table. It would also be nice if the formatting from the HTML/CSS could be retained. Is there a way to do this at all? Or, something I can modify with the IQY?
I can try to provide more details if needed. Thanks in advance.

You can use the excellent xlwt module.
It is very easy to use, and creates files in xls format (Excel 2003).
Here is an (untested!) example of use for a Django view:
from django.http import HttpResponse
import xlwt
def excel_view(request):
normal_style = xlwt.easyxf("""
font:
name Verdana
""")
response = HttpResponse(mimetype='application/ms-excel')
wb = xlwt.Workbook()
ws0 = wb.add_sheet('Worksheet')
ws0.write(0, 0, "something", normal_style)
wb.save(response)
return response

Use CSV. There's a module in Python ("csv") to generate it, and excel can read it natively.

Excel support opening an HTML file containing a table as a spreadsheet (even with CSS formatting).
You basically have to serve that HTML content from a django view, with the content-type application/ms-excel as Roberto said.
Or if you feel adventurous, you could use something like Downloadify to prepare the file to be downloaded on the client side.

Related

How to update Page header using python odfdo module?

I am a complete beginner at python language. For a project I am writing a python script to update a template Open Document File using odfdo module. I am having a hard time with understanding the concept of updating page header. I have looked into Odfdo documentation and found 'get_page_headers' and 'set_page_headers' functions, but have not succeed with its usage.Could someone help me with it?
Thanks
This works for Libreoffice 6.4:
Get the master-page style. With that style loaded, you can just modify the page header.
from odfdo import Document, Style
doc = Document(testdoc)
# its master-page style has the page-header & footer (returns one element list)
mpstyle = doc.get_styles('master-page')[0]
# get the page_header style, you can take a look at the content
print(mpstyle.get_page_header().serialize())
# Now change the page header
mpstyle.set_page_header('New text')
# save your odt file
doc.save(moddoc, pretty=True)
Regards,
Robert

How can I Export Pandas DataFrame to Google Sheets using Python?

I managed to read data from a Google Sheet file using this method:
# ACCES GOOGLE SHEET
googleSheetId = 'myGoogleSheetId'
workSheetName = 'mySheetName'
URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format(
googleSheetId,
workSheetName
)
df = pd.read_csv(URL)
However, after generating a pd.DataFrame that fetches info from the web using selenium, I need to append that data to the Google Sheet.
Question: Do you know a way to export that DataFrame to Google Sheets?
Yes, there is a module called "gspread". Just install it with pip and import it into your script.
Here you can find the documentation:
https://gspread.readthedocs.io/en/latest/
In particular their section on Examples of gspread with pandas.
worksheet.update([dataframe.columns.values.tolist()] + dataframe.values.tolist())
This might be a little late answer to the original author but will be of a help to others. Following is a utility function which can help write any python pandas dataframe to gsheet.
import pygsheets
def write_to_gsheet(service_file_path, spreadsheet_id, sheet_name, data_df):
"""
this function takes data_df and writes it under spreadsheet_id
and sheet_name using your credentials under service_file_path
"""
gc = pygsheets.authorize(service_file=service_file_path)
sh = gc.open_by_key(spreadsheet_id)
try:
sh.add_worksheet(sheet_name)
except:
pass
wks_write = sh.worksheet_by_title(sheet_name)
wks_write.clear('A1',None,'*')
wks_write.set_dataframe(data_df, (1,1), encoding='utf-8', fit=True)
wks_write.frozen_rows = 1
Steps to get service_file_path, spreadsheet_id, sheet_name:
Click Sheets API | Google Developers
Create new project under Dashboard (provide relevant project name and other required information)
Go to Credentials
Click on “Create Credentials” and Choose “Service Account”. Fill in all required information viz. Service account name, id, description et. al.
Go to Step 2 and 3 and Click on “Done”
Click on your service account and Go to “Keys”
Click on “Add Key”, Choose “Create New Key” and Select “Json”. Your Service Json File will be downloaded. Put this under your repo folder and path to this file is your service_file_path.
In that Json, “client_email” key can be found.
Create a new google spreadsheet. Note the url of the spreadsheet.
Provide an Editor access to the spreadsheet to "client_email" (step 8) and Keep this service json file while running your python code.
Note: add json file to .gitignore without fail.
From url (e.g. https://docs.google.com/spreadsheets/d/1E5gTTkuLTs4rhkZAB8vvGMx7MH008HjW7YOjIOvKYJ1/) extract part between /d/ and / (e.g. 1E5gTTkuLTs4rhkZAB8vvGMx7MH008HjW7YOjIOvKYJ1 in this case) which is your spreadsheet_id.
sheet_name is the name of the tab in google spreadsheet. By default it is "Sheet1" (unless you have modified it.
Google Sheets has a nice api you can use from python (see the docs here), which allows you to append single rows or entire batch updates to a Sheet.
Another way of doing it without that API would be to export the data to a csv file using the python csv library, and then you can easily import that csv file into a Google Sheet.

How to access data from pdf forms with python?

I need to access data from pdf form fields. I tried the package PyPDF2 with this code:
import PyPDF2
reader = PyPDF2.PdfReader('formular.pdf')
print(reader.pages[0].extract_text())
But this gives me only the text of the normal pdf data, not the form fields.
Does anyone know how to read text from the form fields?
You can use the getFormTextFields() method to return a dictionary of form fields (see https://pythonhosted.org/PyPDF2/PdfFileReader.html). Use the dictionary keys (the field name) to access the values (the field values).The following example might help:
from PyPDF2 import PdfFileReader
infile = "myInputPdf.pdf"
pdf_reader = PdfFileReader(open(infile, "rb"))
dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
my_field_value = str(dictionary['my_field_name']) # use field name (dictionary key) to access field value (dictionary value)
There are library in python through which you can access pdf data. As pdf is not a raw data like csv, txt,tsv etc. So python can't directly read data inside pdf files.
There is a python library name as slate Slate documentation. Read this documentation. I hope you will get answer to your question.

Open BytesIO (xlsx) with xlrd

I'm working with Django and need to read the sheets and cells of an uploaded xlsx file. It should be possible with xlrd but because the file has to stay in memory and may not be saved to a location I'm not sure how to continue.
The start point in this case is a web page with an upload input and a submit button. When submitted the file is caught with request.FILES['xlsx_file'].file and send to a processing class that would have to extract all the important data for further processing.
The type of request.FILES['xlsx_file'].file is BytesIO and xlrd is not able to read that type because of no getitem methode.
After converting the BytesIO to StringIO the error messages seems to stay the same '_io.StringIO' object has no attribute '__getitem__'
file_enc = chardet.detect(xlsx_file.read(8))['encoding']
xlsx_file.seek(0)
sio = io.StringIO(xlsx_file.read().decode(encoding=file_enc, errors='replace'))
workbook = xlrd.open_workbook(file_contents=sio)
I'm moving my comment into an answer of it's own. It related to the example code (which includes decoding) given in the updated question:
Ok, thanks for your pointers. I downloaded xlrd and tested it locally. It seems the best way to go here is to pass it a string ie. open_workbook(file_contents=xlsx_file.read().decode(encoding=file_enc, errors='replace')). I misunderstood the docs, but I'm positive that file_contents= will work with a string.
Try xlrd.open_workbook(file_contents=request.FILES['xlsx_file'].read())
I had a similar problem but in my case I needed to unit test a Djano app with download by the user of an xls file.
The basic code using StringIO worked for me.
class myTest(TestCase):
def test_download(self):
response = self.client('...')
f = StringIO.StringIO(response.content)
book = xlrd.open_workbook(file_contents = f.getvalue() )
...
#unit-tests here

how write hyperlink to local picture into the cell in openpyxl?

I use Python 2.7.3
I need to write hyperlink to local picture into the cell by openpyxl library.
when I need add hyperlink to web site I write something like this:
from openpyxl import Workbook
wb = Workbook()
dest_filename = r'empty_book.xlsx'
ws = wb.worksheets[0]
ws.title = 'Name'
hyperlink to local picture
ws.cell('B1').hyperlink = ('http://pythonhosted.org/openpyxl/api.html')
hyperlink to local picture
ws.cell('B2').hyperlink = ('1.png') # It doesn't work!
wb.save(filename = dest_filename)
I have 3 question:
how we can write hyperlink like VBA's style function:
ActiveCell.FormulaR1C1 = _
"=HYPERLINK(""http://stackoverflow.com/questions/ask"",""site"")"
with hyherlink and her name
how we can write hyperlink to local image?
ws.cell('B2').hyperlink = ('1.png') # It doesn't work! And I don't now what to do )
Plese, help me )
Can we use unicode hyperlinks to image? for example when I use
ws.cell('B1').hyperlink =
(u'http://pythonhosted.org/openpyxl/api.html') It fail with error!
for example we have picture 'russian_language_name.png' and we
create hyperlink in exel without any problem. We click to the cell,
and then print
'=Hyperlink("http://stackoverflow.com/questions/ask";"site_by_russian_language")
save document, unzip him. Then we go to him directory to xl->worksheets->sheet1.xml
and we see the title
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
and then ...
row r="2" x14ac:dyDescent="0.25" spans="2:6">-<c r="B2" t="str" s="1"><f>HYPERLINK("http://stackoverflow.com/questions/ask","site_by_russian_language")</f><v>site_by_russian_language</v></c>
everything ok =) Exel supports unicode, but what about python's library openpyxl? It support the unicode in hyperlinks ?
As the files in the .xlsx file are XML files with UTF-8 encoding, Unicode hyperlinks are not a problem.
About Question 2, you need to include the full path of the file link, i think.
If you cannot access the file link in your Excel file, it's the security strategy of Excel that prohibits such actions.
I answered a similar question. Hope this helps.
Well, I could arrive at this. While there is no direct way to build a hyperlink, in your case we could do this way. I was able to build a hyperlink to an existing file using the below code.
wb=openpyxl.Workbook()
s = wb.get_sheet_by_name('Sheet')
s['B4'].value = '=HYPERLINK("C:\\Users\\Manoj.Waghmare\\Desktop\\script.txt", "newfile")'
s['B4'].style = 'Hyperlink'
wb.save('trial.xlsx')
By mentioning the style attribute as 'Hyperlink' is the key. All other code I have may not be of any much importance to you. style attribute would otherwise have a value of 'Normal' Strange thing is even without the style attribute, the hyperlink we working but just that it was lacking style! of course. Though strange, I have seen stranger things. Hope this helps.

Categories