PYPDFTK Error 32 - python

I am back at coding above my head again in producing a program that will automatically fill out some Partner PDF's with our employees information. Currently, we have a process that involves over a 40 page PDF which we have filling in automatically with information after you make it through the first couple pages.
What we are looking for is creating a UI that allows the employee to type their info once, then have it pumped through the 40 pages filling in all the key form spots, then break the PDF up appropriately and file to the correct folders for compliance.
90% of this I have experience coding from the UI, to splitting up a completed file, but my problem is working with the PDF to fill in forms. I have exerpience using items such as PDFMiner and PDFQuery to scrape a PDF but I am stuck on entering it.
I am currently attempting to use PyPDFTK but when setting it up via their example, I can't even clear the first step as the temp file it looks like the code is trying to access is not accessible, see example basic code:
import pypdftk
datas = {
'firstname': 'Julien',
'company': 'revolunet',
'price': 42
}
generated_pdf = pypdftk.fill_form('main.pdf', datas)
And it keeps producing an error 32 and I can't figure out why!? Is this the best option and if so how can I try and remedy this.
Thank you,
Andy.

pypdftk author here; can you please paste the traceback ?
Generally this means the pdftk binary was not found and you can change it with the PDFTK_PATH environment variable. see https://github.com/revolunet/pypdftk/blob/master/README.md#pdftk-path

Related

Batch Printing a customized HTML pages in Python

Let me provide a little background.
An organization i am volunteering for delivers meals to people who are unable to come pick them up during the holidays.
They currently have a SQL Server DB that stores the information of all their clients along with the meal information for each year.
Currently a Java desktop application connects to the SQL Server DB and allows several functions to happen.
i.e. Add a client, add meals, remove clients, print delivery sheets.
I am hoping to use python Flask to rewrite the application as a web based application. The one function i am interested in at the moment is the print delivery sheets function.
The way this works is there is a setting for the current year. When you click the print deliveries for year button it will batch print a document for each customer onto an 8.5" x 11.5" paper. The sheet will be split in two with the same exact information on each side. This information includes the customer name, address, number of meals and so forth.
What i am wondering is how/what would be the best way to setup this template so that i could batch print it using python. I was thinking of creating an html template for the page but i am not sure how that would work.
Again i need to pass in every customer within that year into the template and batch print to 8.5" by 11.5" sheet.
What i am asking is.....
How could i create a template for the print that i can pass every customer two.
How would i print that template for every customer in a batch manner for every customer.
I was hoping to do this all in python if possible.
Thank you for your time and help.
If you are already deploying this as a web app, it will probably be easier to design and generate a pdf. You can use an html to pdf converter, which there are several of on PyPI, or there are plenty of resources online, such as:
How to convert webpage into PDF by using Python
https://www.smallsurething.com/how-to-generate-pdf-reports-with-jinja2-and-pyqt/
Once you have found a way to generate PDFs, you can then just use them like any other PDF, and either have the user download them or print them from the browser (this may require a little bit of Javascript, but this shouldn't be that hard since it should pretty much just be a window.open call.
For instance, you can add a button
<button onclick="getPDF()">Download PDF</button>
Which will then call a function called getPDF() which you define, which finds a way to generate the pdf.
function getPDF() {
// Find the uri for the pdf by some method
var urlToPdf = getUrlToPdf();
// Open PDF in new window
window.open(urlToPdf, "_blank");
}
Note Since you are using Flask, you can include the URL for the pdf in the source for the page, even the Javascript using the {{ }} syntax. Then the pdfs are only generated when someone requests that route.
This way you will not have to worry about connecting to a printer yourself at all, just use the browser to handle those kinds of tasks.

Accessing Hovertext with html

I am trying to access hover text found on graph points at this site (bottom):
http://matchhistory.na.leagueoflegends.com/en/#match-details/TRLH1/1002200043?gameHash=b98e62c1bcc887e4&tab=overview
I have the full site html but I am unable to find the values displayed in the hover text. All that can be seen when inspecting a point are x and y values that are transformed versions of these values. The mapping can be determined with manual input taken from the hovertext but this defeats the purpose of looking at the html. Additionally, the mapping changes with each match history so it is not feasible to do this for a large number of games.
Is there any way around this?
thank you
Explanation
Nearly everything on this webpage is loaded via JSON through JavaScript. We don't even have to request the original page. You will, however, have to repiece together the page via id's of items, mysteries and etc., which won't be too hard because you can request masteries similar to how we fetch items.
So, I went through the network tab in inspect and I noticed that it loaded the following JSON formatted URL:
https://acs.leagueoflegends.com/v1/stats/game/TRLH1/1002200043?gameHash=b98e62c1bcc887e4
If you notice, there is a gameHash and the id (similar to that of the link you just sent me). This page contains everything you need to rebuild it, given that you fetch all reliant JSON files.
Dealing with JSON
You can use json.loads in Python to load it, but a great tool I would recomend is:
https://jsonformatter.curiousconcept.com/
You copy and paste JSON in there and it will help you understand the data structure.
Fetching items
The webpage loads all this information via a JSON file:
https://ddragon.leagueoflegends.com/cdn/7.10.1/data/en_US/item.json
It contains all of the information and tool tips about each item in the game. You can access your desired item via: theirJson['data']['1001']. Each image on the page's file name is the id (or 1001) in this example.
For instance, for 'Boots of Speed':
import requests, json
itemJson = json.loads(requests.get('https://ddragon.leagueoflegends.com/cdn/7.10.1/data/en_US/item.json').text)
print(itemJson['data']['1001'])
An alternative: Selenium
Selenium could be used for this. You should look it up. It's been ported for several programming languages, one being Python. It may work as you want it to here, but I sincerely think that the JSON method (describe above), although a little more convoluted, will perform faster (since speed, based on your post, seems to be an important factor).

Send PDF file from Django website to LogicalDOC

I'm developing my Django website since about 2 months and I begin to get a good global result with my own functions.
But, now I have to start a very hard part (to my mind) and I need some advices, ideas before to do that.
My Django website creates some PDF files from HTML templates with Django variables. Up to now, I'm saving PDF files directly on my Desktop (in a specific folder) but it's completely unsecured.
So, I installed another web application which is named LogicalDoc in order to save PDF file directly on this application. PDF files are created and sent to LogicalDoc.
LogicalDoc owns 2 API : SOAP and REST (http://wiki.logicaldoc.com/rest/#/) and I know that Django could communicate with REST method.
I'm reading this part of Django documentation too in order to understand How I can process : https://docs.djangoproject.com/en/dev/topics/http/file-uploads/
I made a scheme in order to understand what I'm exposing :
Then, I write a script which makes some things :
When the PDF file is created, I create a folder inside LogicalDoc which takes for example the following name : lastname_firstname_birthday
Two possibilities : If the folder exists,I don't create a new folder, else I create it.
Once it's done, I send the PDF file directly inside the folder by comparing PDF name with folder name to do that
I have some questions about this process :
Firstly, is it possible to make this kind of things ?
Is it hard to do that ?
What kind of advices could you give me ?
Thank you so much !
PS : If you need some part of my script, mainly PDF creating part, I can post it just after my question ;)
An idea is pretty simple, however it always requires some practice.
I strongly advice you to use REST api and forget about SOAP as the only thing it can bring to you - is 'pain' :)
If we check documentation, document/create it gives next information.
Endpoint we have to communicate with.
[protocol]://[server]:[port]/document/create
HTTP method to use - POST
List of parameters to provide with your request: body,
document, content
Even more, you can test API by clicking on "Try it out" button and check requests in "Network" tab of your browser (if you open Developer Tools)
I am not sure what kind of metadata do you have to provide in 'document' parameter but what I know you can easy get an idea of what should be done by testing it and putting XML or JSON data into 'document' parameter.
Content is an array of bytes transferred to the server (which would be your file).
To sum up, a request to 'document/create' uri will be simple
body = { 'headers': {},'object': {},}
document = "<note>data</note>"
content=open('report.xls', 'rb') #r - reading, b - binary
r = requests.post('http://logicaldoc/document/create', body=body, document=document, content=content)
Please keep in mind that file transferring requests take time and sometimes you may get timeout exception. Your code will stop and will be waiting for response, so it may be a good idea to get some practice with asyncio or celery. Just keep in mind those kind of possible issues.

Scrape with Python , commanded by excel vba

I already had a previous question, but that was pasted in vba tags etc. So I'll try again with proper tags and title since I gained a bit of knowledge now, hopefully.
The problem:
I need to find ~1000 dates from a database with plant variety data which probably is behind a login so here is a screenshot . Now I could of course fill out this form ~1000 times but there must be a smarter way to do this. If it were an HTML site I would know what to do, and have vba just pull in the results. I have been reading all morning about these javascript pages and ajax libraries but it is above my level. So hopefully someone can help me out a bit. I also used firebug to see what is going on when I press search:
These codes are similar to the last picture posted, make it easier to read. Code left here for copying.
f.cc.facet.limit
-1
f.cc.facet.mincount
1
f.end_date.facet.date.end
2030-01-01T00:00:00Z
f.end_date.facet.date.gap
+5YEARS
f.end_date.facet.date.oth...
all
f.end_date.facet.date.sta...
1945-01-01T00:00:00Z
f.end_type.facet.limit
20
f.end_type.facet.mincount
1
f.grant_start_date.facet....
NOW/YEAR
f.grant_start_date.facet....
+5YEARS
f.grant_start_date.facet....
all
f.grant_start_date.facet....
1900-01-01T00:00:00Z
f.status.facet.limit
20
f.status.facet.mincount
1
f.type.facet.limit
20
f.type.facet.mincount
1
facet
true
facet.date
grant_start_date
facet.date
end_date
facet.field
cc
facet.field
type
facet.field
status
facet.field
end_type
fl
uc,cc,type,latin_name,common_name,common_name_en,common_name_others,app_num,app_date,grant_start_date
,den_info,den_final,id
hl
true
hl.fl
cc,latin_name,den_info,den_final
hl.fragsize
5000
hl.requireFieldMatch
false
json.nl
map
q
cc:IT AND latin_name:(Zea Mays) AND den_info:Antilles
qi
3-9BgbCWwYBd7aIWPU1/onjQ==
rows
25
sort
uc asc,score desc
start
0
type
upov
wt
json
Source
fl=uc%2Ccc%2Ctype%2Clatin_name%2Ccommon_name%2Ccommon_name_en%2Ccommon_name_others%2Capp_num%2Capp_date
%2Cgrant_start_date%2Cden_info%2Cden_final%2Cid&hl=true&hl.fragsize=5000&hl.requireFieldMatch=false&json
.nl=map&wt=json&type=upov&sort=uc%20asc%2Cscore%20desc&rows=25&start=0&qi=3-9BgbCWwYBd7aIWPU1%2FonjQ
%3D%3D&hl.fl=cc%2Clatin_name%2Cden_info%2Cden_final&q=cc%3AIT%20AND%20latin_name%3A(Zea%20Mays)%20AND
%20den_info%3AAntilles&facet=true&f.cc.facet.limit=-1&f.cc.facet.mincount=1&f.type.facet.limit=20&f.type
.facet.mincount=1&f.status.facet.limit=20&f.status.facet.mincount=1&f.end_type.facet.limit=20&f.end_type
.facet.mincount=1&f.grant_start_date.facet.date.start=1900-01-01T00%3A00%3A00Z&f.grant_start_date.facet
.date.end=NOW%2FYEAR&f.grant_start_date.facet.date.gap=%2B5YEARS&f.grant_start_date.facet.date.other
=all&f.end_date.facet.date.start=1945-01-01T00%3A00%3A00Z&f.end_date.facet.date.end=2030-01-01T00%3A00
%3A00Z&f.end_date.facet.date.gap=%2B5YEARS&f.end_date.facet.date.other=all&facet.field=cc&facet.field
=type&facet.field=status&facet.field=end_type&facet.date=grant_start_date&facet.date=end_date
And this is what it looks like in HTML, atleast according to firebug:
{"response":{"start":0,"docs":[{"id":"6751513","grant_start_date":"1999-02-04T22:59:59Z","den_final":"Antilles","app_num":"005642_A 005642","latin_name":"Zea mays L.","common_name_others":["MAIS"],"uc":"ZEAAA_MAY","type":"NLI","app_date":"1997-01-10T22:59:59Z","cc":"IT"}],"numFound":1},"qi":"3-9BgbCWwYBd7aIWPU1/onjQ==","facet_counts":{"facet_queries":{},"facet_ranges":{},"facet_dates":{"end_date":{"after":0,"start":"1945-01-01T00:00:00Z","before":0,"2010-01-01T00:00:00Z":1,"between":1,"end":"2030-01-01T00:00:00Z","gap":"+5YEARS"},"grant_start_date":{"after":0,"1995-01-01T00:00:00Z":1,"start":"1900-01-01T00:00:00Z","before":0,"between":1,"end":"2015-01-01T00:00:00Z","gap":"+5YEARS"}},"facet_intervals":{},"facet_fields":{"status":{"approved":1},"end_type":{"ter":1},"type":{"nli":1},"cc":{"it":1}}},"sv":"bswa1.wipo.int","lastUpdated":1435987857572,"highlighting":{"6751513":{"den_final":["Antilles<\/em>"],"latin_name":["Zea<\/em> mays<\/em> L."],"cc":["IT<\/em>"]}}}
Edit:
It uses the GET method and XMLHttpRequest, as can be seen from this screenshot:
I already found how to make python run from excel vba here in this topic
I also downloaded beautiful soup but python is not my kind of language, so any help would be greatly appreciated.
Image refered to in comment on answer of Will
1) Use Excel to store your search parameters.
2) Run a few manual searches to find out what parameters you need to change on each request.
3) Invoke an http get request to the url that you have found in firebug/Fiddler (the url that it calls when you click "search" manually). See Urllib3 https://urllib3.readthedocs.org/en/latest/
3) Look at Json pickle to help you deal with the json response, saving (serializing) it to a file.
4) Reading and writing data involves IO libraries. Google is your friend. (Possibly easier to save your excel file as a csv and then just read the csv file for your search parameters).
5) Download PyCharm for your python development - it's really good.
Hope this helps.
I finally figured it out. I don't need to use python, I can just use an url, and then import the content into excel. I found out with Fiddler that the URL should become https://www3.wipo.int/pluto/user/jsp/select.jsp? And then the piece of code from the OP goes behind that.
The rest of my solution can be found in another question I had. It uses no Python but only VBA, which commands IE to open a website and copies the content of it.

Properly watch websites for updates

I wrote a script that I'm using to push updates to Pushbullet channels whenever a new Nexus factory image is released. A separate channel exists for each of the first 11 devices on that page, and I'm using a rather convoluted script to watch for updates. The full setup is here (specifically this script), but I'll briefly summarize the script below. My question is this: This is clearly not the correct way to be doing this, as it's very susceptible to multiple points of failure. What would be a better method of doing this? I would prefer to stick with Python, but I'm open to other languages if they would be simpler/better.
(This question is prompted by the fact that I updated my apache 2.4 config tonight and it apparently triggered a slight change in the output of the local files that are watched by urlwatch, so ALL 11 channels got an erroneous update pushed to them.)
Basic script functionality (some nonessential parts are not included):
Create dictionary of each device codename associated with its full model name
Get existing Nexus Factory Images page using Requests
Make bs4 object from source code
For each of the 11 devices in the dictionary (loop), do the following:
Open/create page in public web directory for the device
Write source to that page, filtered using bs4: str(soup.select("h2#" + dev + " ~ table")[0])
Call urlwatch on the page to check for updates, save output to temp file
If temp file size is > 0 then the page has changed, so push update to the appropriate channel
Remove webpage and temp file
A thought that I had while typing this question: Would a possible solution be to save each current version string (for example: 5.1.0 (LMY47I)) as a pickled variable, then if urlwatch detects a difference it would compare the new version string to the pickled one and only push if they're different? I would throw regex matching in as well to ensure that the new format matches the old format and just has updated data, but could this at least be a good temporary measure to try to prevent future false alarms?
Scraping is inherently fragile, but if they don't change the source format it should be pretty straightforward in this case. You should parse the webpage into a data structure. Using bs4 is fine for this. The end result should be a python dictionary:
{
'mantaray': {
'4.2.2 (JDQ39)': {'link': 'https://...'},
'4.3 (JWR66Y)': {'link': 'https://...'},
},
...
}
Save this structure with json.dumps. Now every time you parse the page you can generate a similar data structure and compare it to the one you have on disk (update the saved one each time after you are done).
Then the only part left is comparing the datastructure. You can iterate all models and check that each version you have in the current version of the page exists in the previous version. If it does not, you have a new version.
You can also potentially generate an easy to use API for this using https://www.kimonolabs.com/ instead of doing the parsing yourself.

Categories