Automating Acrobat to Create PDFs

Automating Acrobat to Create PDFs - python

I would like to use python to open a pdf, fill it with some preset data and then save it. I have a repetitive pdf form I have to generate about 40 times a month, and most of the the time only one data point changes. I'm looking for the libraries that would work best and a general over view. I have a few ideas but my experience is fairly limited so any experience or wisdom on this topic is greatly appreciated.
I am considering using OOP and pulling the data out of the object to fill the form. That way I can just manipulate a few variables and let the hierarchy take over. The biggest challenge is going to be getting acrobat to play along.

Related

MES system for manufacturing plant

I am working on creating an app that access PLC to gather data and then display it on the screen with the option of viewing the data as an interactive graph. There has been some work done in python that collects data, store them in excel and at the end process the data to get some meaning full data points and visualization using matplotlib.
They have used tkinter to be the interface to get data and display the current values on 2 screens, I have plans to update the program to store the data in a database and query them as per the needs using python.
What would be the ideal GUI package and the data visualization tools I can use to make the app interactive and easy to use on a PC?
As for as why I chose python, I feel comfortable using python.
Thank you guys in advance!

It's not a free solution, but you might want to look at Ignition [link]
I have been using it for several years now and it has some great depth and flexibility for capturing data from most brands of PLCs and then displaying the data on a HMI, webpage, uploading to SQL, and so on.
The scripting language is jython, so you should feel very comfortable there. You can also try if for free for two-hours at a time. Then reset the trial as many times as you like.

Best way to link excel workbooks feeding into each other for memory efficiency. Python?

I am building a tool which displays and compares data for a non specialist audience. I have to automate the whole procedure as much as possible.
I am extracting select data from several large data sets, processing it into a format that is useful and then displaying it in a variety of ways. The problem i foresee is in the updating of the model.
I don't really want the user to have to do anything more than download the relevant files from the relevant database, re-name and save them to a location and the spreadsheet should do the rest. Then the user will be able to look at the data in a variety of ways, perform a few different analytical functions, depending on what they are looking at. Output some graphs etc
Although some database exported files wont be that large, other data will be being pulled from very large xml or csv files (500000x50 cells) and there are several arrays working on the pulled data once it has been chopped down to the minimum possible. So it will be necessary to open and update several files in order, so that the data in the user control panel is up to date and not all at once so that the user machine freezes.
At the moment I am building all of this just using excel formulas.
My question is how best to do the updating and feeding bit. Perhaps some kind of controller program built with python? I don't know Python but i have other reasons to learn it.
Any advice would be very welcome.
Thanks

Converting PDF to any parse-able format

I have a PDF file which consists of tables which can spread across various pages and may have text in between. An example of it can be found here.
I am able to convert the PDF to any format but the output files are not in any way parse-able i.e. I cannot extract data out of it as they are scattered. Here are the links to the output files which I created using pdftotext and pdftohtml.
Is there a way to extract data in a more suitable way?
Thanks in advance.

The general answer is no. pdf is a format intended for visual presentation and printing, and there is no guarantee that the contents will be in any particular order let alone structured as a table in any way other than what appears when the pdf is rendered onto paper or a screen. Sometimes there is even deliberate obfuscation to prevent anyone doing what you are attempting.
In this case it appears to be possible to cut and paste the contents of each table element. For a small number of similar files that is almost certainly the quickest thing to do. Open the pdf on the left hand of your screen, a spreadsheet or data-entry program on the right hand, then cut and paste. For a medium number - tens, hundreds? - it's probably cheapest to hire a temp to do the donkey-work. For a large number - thousands? - it would be possible to create a program to automate this process, but definitely not easy. I might think about using human input via the mouse to identify the corners of the table and the horizontal / vertical divisions, then generating cut and paste operations via control of the human interface devices. Don't ask me how. I'd have to find out if I had to do this, and I'd much rather not. It's a WOMBAT.
Whatever form of analysis you did on the pdf contents would certainly not generalize to other pdfs created by different organisations using different software, and possibly not even by the same organisation using the same process but merely a later release of the same software.

Following in the line of #nigel222, it really depends on the PDF how easily you can get the data out in some useful way.
It is best if the PDF is structured (has a document structure, created when the PDF was written). In this case, you can access the structure, and you are all set.
As structure is a fundamental necessity of an accessible PDF, you may try to "massage" the document by applying the various "make accessible" utilities floating around; definitely something to follow.

Python and creating Excel-like tables of data

I am relatively new to (Python) programming and would like some direction for my current project.
I have been learning to web scrape and parse data. I am now sitting on a program that can create lots of information in the form of lists and dictionaries.
I now need a way to create formatted tables and output to some sort of web-based system. I am looking at tables of about 10-40 rows and 20 columns of alphanumeric (names and numbers) data. If I can produce basic bar/line charts that would be useful. It also needs to be entirely automated - the program will run once a day and download information. I need it to output seamlessly in report form to something like dropbox that I can access on-the-go. The table template will always be the same and will be heavily formatted (colouring mostly, like Excel conditional formatting).
I am also keen to learn to create web apps and I'm wondering if this is something I can do with this project? I'm not sure what I'll need to do and I'd like some direction. I'm pretty new to programming and the jargon is tough to wade through. Is it possible to create a website that takes automated input and creates good-looking data tables? Is this efficient? What are the available tools for this? If not efficient what are the other available options?

want to add url links to .csv datafeed using python

ive looked through the current related questions but have not managed to find anything similar to my needs.
Im in the process of creating a affiliate store using zencart - now one of the issues is that zencart is not designed for redirects and affiliate stores but it can be done. I will be changing the store so it acts like a showcase store showing prices.
There is a mod called easy populate which allows me to upload datafeeds. This is all well and good however my affiliate link will not be in each product. I can do it manually after uploading the data feed and going to each product and then adding it as an image with a redirect link - However when there are over 500 items its going to be a long repetitive and time consuming job.
I have been told that I can add the links to the data feed before uploading it to zencart and this should be done using python. Ive been reading about python for several days now and feel im looking for the wrong things. I was wondering if someone could please advise the simplest way for me to get this done.
I hope the question makes sense
thanks
abs

You could craft a python script using csv module like this:
>>> import csv
>>> cartWriter = csv.writer(open('yourcart.csv', 'wb'))
>>> cartWriter.writerow(['Product', 'yourinfo', 'yourlink'])
You need to know how link should be formatted hoping that it could be composed using the other parameters present on csv file.

First, use the CSV module as systempuntoout told you, secondly, you will want to change your header to:
mimetype='text/csv'
Content-Disposition = 'attachment; filename=name_of_your_file.csv'
The way to do it depends very much of your website implementation. In pure Python you would probably do that with an HttpResponse object. In django, as well, but there are some shortcuts.
You can find a video demonstrating how to create CSV files with Python on showmedo. It's not free however.
Now, to provide a link to download the CSV, this depends of your Website. What is the technology behinds it : pure Python, Django, Pylons, Tubogear ?
If you can't answer the question, you should ask your boss a training about your infrastructure before trying to make change to it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Automating Acrobat to Create PDFs - python

Related

MES system for manufacturing plant

Best way to link excel workbooks feeding into each other for memory efficiency. Python?

Converting PDF to any parse-able format

Python and creating Excel-like tables of data

want to add url links to .csv datafeed using python

Categories

Resources