Creating an infographic with multiple PDF's in Python

Creating an infographic with multiple PDF's in Python - python

I've created multiple charts using Matplotlib and saved them as PDF's. I need to combine up to 5 PDF's into one PDF, as this will be done many times the task needs to automated with Python. The reason I'm combining PDF's instead of .jpg or .png is that the PDF scales the best and doesn't result in a fuzzy image. I've tried using code from here Is there a matplotlib flowable for ReportLab? but I don't understand how to control the image placement. Reportlab has a function:
.drawImage(file, x-coord, y-coord) which allows for specific placement of the image on the page, unfortunately this function only takes .jpg or .png which are too low quality. If anyone has any suggestions on how to combine PDF's it would be greatly appreciated!

If anyone stumbles upon this I've found that the best way to actually do this is in Latex. There is a python plugin called PyLatex but there is no documentation so instead I will create a Latex template then using subprocess.call in my Python script create the Infographic.

Related

I am generating a PDF file using wkhtmltopdf in python, but the images overlap

enter image description here
I am generating a PDF file using wkhtmltopdf in python, but the images overlap.
As in the attached screenshot, the images overlap, so I tried to find all the manuals and stackoverflow methods provided by wkhtmltopdf, but could not solve the overlapping phenomenon.
I hope you can tell me a good solution.

must it be Python?
can you have a look to (pdf) LaTeX ?
With LaTeX, you can easily create all type of documentation for producing pdf
files, books, articles, ...

Is there a way on Python to extract a time series from an image?

I am trying to extract a time series dataset from an image (with x-axis and y-axis). Is there a quick way to do so on Python?
To be more precise, this is my graph:
HEL Share Price
and I am trying to get daily data.
Any help?
Thanks! :)

I know this Web App that can do it: WebPlotDigitizer
Looking at alternativeto.net I found Engauge Digitizer which "accepts image files (like PNG, JPEG and TIFF) containing graphs, and recovers the data points from those graphs" and a recent version "adds python support". I never used Engauge, but it sounds like what want...
Keep in mind, that it is not that easy to automate such a task, because finding the correct axis labels and "49,28" label even might overlap the graph sometimes...

In Python, you could try this Python3 utility. It says it can extract raw data from plots images.
But you can more easily extract data from graph images using GUI-friendly tools, like plotdigitizer.com or automeris.io. I prefer the former over the latter. You can find the entire list of such programs over here.

Is it possible to generate vector based pdf using wordcloud

I am using wordcloud in python to generate word clouds.
I was able to reproduce this example on my machine, and then tried to change the last line plt.show() to plt.savefig('image.pdf') to have a pdf output.
I had a pdf with the same result, however, the pdf seems like pixel-based instead of vector-based. When I focus a particular point in the pdf it becomes a very low-quality picture.
Is there any way to produce vector-based pdf using wordcloud? If not, is there any other library that I can produce vector-based (pdf) wordclouds in Python?

If wordcloud can generate any sort of vector output such as ps or svg, inkscape can usually convert it to a PDF without rasterizing it. You can even do this headless, e.g. inkscape my.svg -A my.pdf.
Hmm, looking at wordcloud, it looks like it uses PIL. I don't think that PIL can produce vector images. But if you could use the logic in wordcloud and separate it from PIL, you can get vector fonts onto PDFs by drawing onto a reportlab canvas.

You can save the images in a vector format so that they will be scalable without quality loss. Such formats are PDF and EPS. Just change the extension to .pdf or .eps and matplotlib will write the correct image format.
plt.savefig('destination_path.eps', format='eps')
plt.savefig('destination_path.pdf', format='pdf')
I have found that eps/pdf files work best.

text layout recognition with python

I'm trying to sort through several thousand scanned files and sort them into folders based on type (ie: if one of the files is a scanned copy of formA, then it should go in the formA folder, if it's a scanned copy of formB, then it should go in the formB folder, etc...). I feel like the best way to match the files and types is based on their text outlines, but am totally new to image processing, so if there's a better solution, then I'm all ears.
I'm working in python. Any ideas of a best way to do this? PIL? OpenCV? imageMagick?
Thanks in advance...

This library is probably of interest to you -
http://code.google.com/p/ocropus/
Its made by googlers and lets you do OCR and layout analysis from python.
I had some trouble installing it, but that was quite a while back, so things may have gotten fixed by now.

I don't know in what format you've got the scanned documents, but pdfminer can do layout analysis for pdf. I guess it would fit the bill for your purpose, provided you get the documents in somewhat decent pdf format (if you've just got "pure images", it won't do you any good)

Recommendations for a simple 2D graphics python library that can output to screen and pdf?

I'm looking for an easy-to-use graphics lib for python that can output to screen as well as pdf. So, I would use code to draw some stuff (simple prims like ovals, rectangles, lines and points) to screen and then when things look good, have it output to pdf.

If you use Tkinter, you can draw on a Canvas widget, then use its .postscript method to save the contents as a PostScript file, which you can convert to PDF using ps2pdf.
postscript(self, cnf={}, **kw)
Print the contents of the canvas to a postscript
file. Valid options: colormap, colormode, file, fontmap,
height, pageanchor, pageheight, pagewidth, pagex, pagey,
rotate, witdh, x, y.

Matplotlib should be able to do it. See event handling here: http://matplotlib.sourceforge.net/examples/event_handling/index.html

You can use the Python Imaging Library for drawing images which can easily be displayed in various UIs, e.g. by displaying a jpg. Then, use ReportLab. Here's an example which shows how to use ReportLab with an image.
I'm not sure what you mean by drawing to "screen", i.e. if you're working with a specific UI toolkit. But if it's acceptable to draw and display PDFs without using an intermediate image (jpg, etc), then you might consider the PyX library, which makes it quite simple to do graphics with PDFs.

You could look into matplotlib, which is mainly for plotting but you could probably do some basic drawing.
Then there is pygame. But I'm not so sure if it can generate a pdf, however you can do 2D graphics with it.
There is something called ReportLab that can generate pdf's. Here is a bunch of tutorials using it.

This is a tricky question, because there are so many libraries available - there is a trade-off between beauty/easiness.
What I've done and works great is to produce the Postscript directly, it is not difficult at all, and you can preview it using Ghostview; converting tyo PDF is trivial (ps2pdf). Learning how to tell Postscript to create lines and circles is extremely simple.
If you want more extensibility, then go to Matplotlib, but beware of the many times when it will "decide for you what looks best" even if you don't like it.
Good luck.

Creating PDFs is always a pain, it doesn't make sense if you do not aim to lose sanity.
With that said, you are aiming to do two completely different things: when you draw to screen you draw into a raster bitmap, while PDFs are mostly dynamic, like HTML. (unlike HTML they are more prone to be the same over different platforms, but that's beside the point)
If you really want to do that, the solution might be finding something that outputs PDFs, and then showing the generated PDF on screen at every step.
I guess that's the only way to have WYSIWYG results.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating an infographic with multiple PDF's in Python - python

If anyone stumbles upon this I've found that the best way to actually do this is in Latex. There is a python plugin called PyLatex but there is no documentation so instead I will create a Latex template then using subprocess.call in my Python script create the Infographic.

Related

I am generating a PDF file using wkhtmltopdf in python, but the images overlap

Is there a way on Python to extract a time series from an image?

Is it possible to generate vector based pdf using wordcloud

text layout recognition with python

Recommendations for a simple 2D graphics python library that can output to screen and pdf?

Categories

Resources