One of our page templates is made up of a bunch of macros. These items are a bunch of html tables.
Now, I want a couple of these tables in a Python script to create a PDF. Is there a way call a macro from a Python script and get back the HTML that is produced?
If so, can you explain?
Thanks
Eric
Maybe you could create a new template including (use-macro) just the macros you want to access from python and then use z3c.pt.pagetemplate.PageTemplateFile() to render it?
Actually, it might be possible (and certainly easier) to use chameleon.zpt.template.PageTemplate('<div tal:use-macro="<your-macro-here>" />'), but I've never did this myself.
I'd probably use urllib.urlopen(url), pull the data from the page back to python and use BeautifulSoup to pull the table(s) out of the HTML... And then render that to PDF with XHTML2PDF (pisa.ho).
There might be a simpler way but for me, this would be the least stressful approach.
Related
I have to generate a large PDF using a web app. The PDF is generated using a large data set of email content for clients, right now it is written in php and what I am doing is basically looping over every item in the dataset, create an individual HTML page for each client and then add all those pages one by one to wkhtmltopdf via add page option.
This is obviously not very elegant and the php dies when the input is very big, like for 1000 clients. The idea behind this pdf is that we have to regularly send physical mails to our clients and we just want to create a big file, which we will then print and individually put them in envelopes and then mail them and stuff.
I'm now redoing this using Python instead of php. I am also not sure of what coding practice should I follow to make sure the PDF is generated in the fastest and most efficient manner.
Here are couple of options I thought about
Create one big variable
I'm wondering can I create a single big variable and then write the entire contents in one go into a html file and then use it to create pdf using wkhtmltopdf. However this would be a one really big variable and the RAM might go nuts.
Write to only one file
Not sure how will I be able to implement this, but maybe instead of creating a bunch of html files, I should just create one file and keep appending things in the bottom of that html file?
Stick with current concept?
Maybe the exact same programming design/concept will magically work well with Python
...?
Any or all of these options I have thought maybe be completely wrong and flawed though.
EDIT: Write to one file cannot work, since these mails have to be sent physically, I need to make sure every new content for each client starts from a new page. And if I write a single big file, there is no way I would be able to do it.
As far as wkhtmltopdf is concerned, page breaking depends a lot on your content and your requirements. I need specific page breaks but I don't have a "1 content should always be one 1 page" limitation - if you do, don't bother with it. Also, if you have very specific styling rules, it might be difficult depending on what the styles are. The largest PDFs I've done with wkhtmltopdf are only 100 pages or so, so I can't comment on the sizes.
What I would with wkhtmltopdf do is format the content like this
<head>
<style>
.pb { page-break-before: always; }
</style>
</head>
<body>
<div id="mail1" class="pb">...</div>
<div id="mail2" class="pb">...</div>
<!-- etc etc -->
</body>
This ensures that each email starts from a new page. Then feed that output to wkhtmltopdf using the desired styles and cli options and check if everything worked out as planned. This test should be very quick to do.
Additionally, if the HTML is Extremely simple and you can always rely on it being in a specific format (you could validate it with a simple XML schema) you could try iTextSharp and manually transform the HTML. I haven't done it and it sounds horrible, but might work for you - iTextSharp is quite fast.
I want to migrate data from an old Tomcat/Jetty website to a new one which runs on Python & Django. Ideally I would like to populate the new website by directly reading the data from the old database and storing them in the new one.
Problem is that the database I was given comes in the form of a bunch of WEB-INF/data/*.dbx and I didn't find any way to read them. So, I have a few questions.
Which format do the WEB-INF/data/*.dbx use?
Is there a python module for directly reading from the WEB-INF/data/*.dbx files?
Is there some external tool for dumpint the WEB-INF/data/*.dbx to an ascii format that will be parsable by python?
If someone has attempted a similar data migration, how does it compare against scraping the data from the old website? (assuming that all important data can be scraped)
Thanks!
The ".dbx" suffix has been used by various softwares over the years so it could be almost anything. The only way to know what you really have here is to browse the source code of the legacy java app (or the relevant doc or ask the author etc).
wrt/ scraping, it's probably going to be a lot of a pain for not much results, depending on the app.
ive looked through the current related questions but have not managed to find anything similar to my needs.
Im in the process of creating a affiliate store using zencart - now one of the issues is that zencart is not designed for redirects and affiliate stores but it can be done. I will be changing the store so it acts like a showcase store showing prices.
There is a mod called easy populate which allows me to upload datafeeds. This is all well and good however my affiliate link will not be in each product. I can do it manually after uploading the data feed and going to each product and then adding it as an image with a redirect link - However when there are over 500 items its going to be a long repetitive and time consuming job.
I have been told that I can add the links to the data feed before uploading it to zencart and this should be done using python. Ive been reading about python for several days now and feel im looking for the wrong things. I was wondering if someone could please advise the simplest way for me to get this done.
I hope the question makes sense
thanks
abs
You could craft a python script using csv module like this:
>>> import csv
>>> cartWriter = csv.writer(open('yourcart.csv', 'wb'))
>>> cartWriter.writerow(['Product', 'yourinfo', 'yourlink'])
You need to know how link should be formatted hoping that it could be composed using the other parameters present on csv file.
First, use the CSV module as systempuntoout told you, secondly, you will want to change your header to:
mimetype='text/csv'
Content-Disposition = 'attachment; filename=name_of_your_file.csv'
The way to do it depends very much of your website implementation. In pure Python you would probably do that with an HttpResponse object. In django, as well, but there are some shortcuts.
You can find a video demonstrating how to create CSV files with Python on showmedo. It's not free however.
Now, to provide a link to download the CSV, this depends of your Website. What is the technology behinds it : pure Python, Django, Pylons, Tubogear ?
If you can't answer the question, you should ask your boss a training about your infrastructure before trying to make change to it.
I want to automatically generate booking confirmation PDF files in Python. Most of the content will be static (i.e. logos, booking terms, phone numbers), with a few dynamic bits (dates, costs, etc).
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just add the dynamic parts. Is this a simple process?
From doing a bit of search, it seems that I can use reportlab for creating content and pyPdf for merging PDF's together. Is this the best approach? Or is there a really funky way that I haven't come across yet?
Thanks!
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just add the dynamic parts. Is this a simple process?
Unfortunately no. There are several tools that are good at producing PDFs from scratch (most commonly for Python, ReportLab), but they don't generally load existing PDFs. You would have to include generating code for any boilerplate text, lines, blocks, shapes and images, rather than this being freely editable by the user.
On the other side there's pyPdf which can load PDFs, collate the pages, and extract some of the information, but can't really add new content. You can ‘merge’ pages into one, but you'd still have to create the extra information overlay as a page in ReportLab first.
Look into docutils and reSTructuredText. You could quickly write out your PDF document in reST and then compile the PDF using rst2pdf.py
I've used this, it creates very beautiful documents and the markup is extensible! Later you could take the same code and run it into rst2html to create a website out if it!
Take a look here:
http://docutils.sourceforge.net/docs/user/rst/quickref.html
http://code.google.com/p/rst2pdf/
Good luck
You could generate a document through, for example, TeX, or OpenOffice, or whatever gives you the most comfortable bindings and then print the document with a pdf printer.
This allows you not to have to figure out where to put fields precisely or figure out what to do if your content overflows the space allocated for it.
I use cvs to maintain all my python snippets, notes, c, c++ code. As the hosting provider provides a public web- server also, I was thinking that I should convert the cvs automatically to a programming snippets website.
cvsweb is not what I mean.
doxygen is for a complete project and to browse the self-referencing codes online.I think doxygen is more like web based ctags.
I tried with rest2web, it is requires that I write /restweb headers and files to be .txt files and it will interfere with the programming language syntax.
An approach I have thought is:
1) run source-hightlight and create .html pages for all the scripts.
2) now write a script to index those script .htmls and create webpage.
3) Create the website of those pages.
before proceeding, I thought I shall discuss here, if the members have any suggestion.
What do do, when you want to maintain your snippets and notes in cvs and also auto generate it into a good website. I like rest2web for converting notes to html.
Run Trac on the server linked to the (svn) repository. The Trac wiki can conveniently refer to files and changesets. You get TODO tickets, too.
enscript or pygmentize (part of pygments) can be used to convert code to HTML. You can use a custom header or footer to link to the actual code for download.
I finally settled for rest2web. I had to do the following.
Use a separate python script to recursively copy the files in the CVS to a separate directory.
Added extra files index.txt and template.txt to all the directories which I wanted to be in the webpage.
The best thing about rest2web is that it supports python scripting within the template.txt, so I just ran a loop of the contents and indexed them in the page.
There is still lot more to go to automate the entire process. For eg. Inline viewing of programs and colorization, which I think can be done with some more trials.
I have the completed website here, It is called uthcode.