Printing Graphics in Python - python

I need to print "Wheel Tags" from python. Wheel tags will have images, lines, and text.
The Python tutorial has two paragraphs about creating postscript files with the image lib. After reading it I still do not know how to lay out the data. I was hoping some one might have samples of how to layout the images, text and lines?
Thanks for any help.

See http://effbot.org/imagingbook/psdraw.htm
Note that:
the PSDraw module does not appear to have been actively maintained since 2005; I would guess that most of the effort has been redirected into supporting the PDF format instead. You might be happier using pypdf instead;
it has comments like '# FIXME: incomplete' and 'NOT YET IMPLEMENTED' in the source
it does not appear to have any method of setting the page size - which as I recall means it defaults to A4 (8.26 x 11.69 inches)
all measurements are in points, at 72 points per inch.
You will need to do something like:
import Image
import PSDraw
# fns for measurement conversion
PTS = lambda x: 1.00 * x # points
INS = lambda x: 72.00 * x # inches-to-points
CMS = lambda x: 28.35 * x # centimeters-to-points
outputFile = 'myfilename.ps'
outputFileTitle = 'Wheel Tag 36147'
myf = open(outputFile,'w')
ps = PSDraw.PSDraw(myf)
ps.begin_document(outputFileTitle)
ps is now a PSDraw object which will write PostScript to the specified file, and the document header has been written - you are ready to start drawing stuff.
To add an image:
im = Image.open("myimage.jpg")
box = ( # bounding-box for positioning on page
INS(1), # left
INS(1), # top
INS(3), # right
INS(3) # bottom
)
dpi = 300 # desired on-page resolution
ps.image(box, im, dpi)
To add text:
ps.setfont("Helvetica", PTS(12)) # PostScript fonts only -
# must be one which your printer has available
loc = ( # where to put the text?
INS(1), # horizontal value - I do not know whether it is left- or middle-aligned
INS(3.25) # vertical value - I do not know whether it is top- or bottom-aligned
)
ps.text(loc, "Here is some text")
To add a line:
lineFrom = ( INS(4), INS(1) )
lineTo = ( INS(4), INS(9) )
ps.line( lineFrom, lineTo )
... and I don't see any options for changing stroke weight.
When you are finished, you have to close the file off like:
ps.end_document()
myf.close()
Edit: I was doing a bit of reading on setting stroke weights, and I ran across a different module, psfile: http://seehuhn.de/pages/psfile#sec:2.0.0 The module itself looks pretty minimal - he's writing a lot of raw postscript - but it should give you a better idea of what's going on behind the scenes.

I would recommend the open source library Reportlab for this sort of task.
It is very simple to use and outputs directly to PDF format.
A very simple example from the official documentation:
from reportlab.pdfgen import canvas
def hello(c):
c.drawString(100,100,"Hello World")
c = canvas.Canvas("hello.pdf")
hello(c)
c.showPage()
c.save()
As long as PIL is installed, it is also very easy to add images to your page:
canvas.drawImage(self, image, x,y, width=None,height=None,mask=None)
where "image" is either a PIL Image object, or the filename of the image you wish to use.
Plenty of examples in the documentation also.

Related

How to trim (crop) bottom whitespace of a PDF document, in memory

I am using wkhtmltopdf to render a (Django-templated) HTML document to a single-page PDF file. I would like to either render it immediately with the correct height (which I've failed to do so far) or render it incorrectly and trim it. I'm using Python.
Attempt type 1:
wkhtmltopdf render to a very, very long single-page PDF with a lot of extra space using --page-height
Use pdfCropMargins to trim: crop(["-p4", "100", "0", "100", "100", "-a4", "0", "-28", "0", "0", "input.pdf"])
The PDF is rendered perfectly with 28 units of margin at the bottom, but I had to use the filesystem to execute the crop command. It seems that the tool expects an input file and output file, and also creates temporary files midway through. So I can't use it.
Attempt type 2:
wkhtmltopdf render to multi-page PDF with default parameters
Use PyPDF4 (or PyPDF2) to read the file and combine pages into a long, single page
The PDF is rendered fine-ish in most cases, however, sometimes a lot of extra white space can be seen on the bottom if by chance the last PDF page had very little content.
Ideal scenario:
The ideal scenario would involve a function that takes HTML and renders it into a single-page PDF with the expected amount of white space at the bottom. I would be happy with rendering the PDF using wkhtmltopdf, since it returns bytes, and later processing these bytes to remove any extra white space. But I don't want to involve the file system in this, as instead, I want to perform all operations in memory. Perhaps I can somehow inspect the PDF directly and remove the white space manually, or do some HTML magic to determine the render height before-hand?
What am I doing now:
Note that pdfkit is a wkhtmltopdf wrapper
# This is not a valid HTML (includes Django-specific stuff)
template: Template = get_template("some-django-template.html")
# This is now valid HTML
rendered = template.render({
"foo": "bar",
})
# This first renders PDF from HTML normally (multiple pages)
# Then counts how many pages were created and determines the required single-page height
# Then renders a single-page PDF from HTML using the page height and width arguments
return pdfkit.from_string(rendered, options={
"page-height": f"{297 * PdfFileReader(BytesIO(pdfkit.from_string(rendered))).getNumPages()}mm",
"page-width": "210mm"
})
It's equivalent to Attempt type 2, except I don't use PyDPF4 here to stitch the pages together, but instead render again with wkhtmltopdf using precomputed page height.
There might be better ways to do this, but this at least works.
I'm assuming that you are able to crop the PDF yourself, and all I'm doing here is determining how far down on the last page you still have content. If that assumption is wrong, I could probably figure out how to crop the PDF. Or otherwise, just crop the image (easy in Pillow) and then convert that to PDF?
Also, if you have one big PDF, you might need to figure how how far down on the whole PDF the text ends. I'm just finding out how far down on the last page the content ends. But converting from one to the other is like just an easy arithmetic problem.
Tested code:
import pdfkit
from PyPDF2 import PdfFileReader
from io import BytesIO
# This library isn't named fitz on pypi,
# obtain this library with `pip install PyMuPDF==1.19.4`
import fitz
# `pip install Pillow==8.3.1`
from PIL import Image
import numpy as np
# However you arrive at valid HTML, it makes no difference to the solution.
rendered = "<html><head></head><body><h3>Hello World</h3><p>hello</p></body></html>"
# This first renders PDF from HTML normally (multiple pages)
# Then counts how many pages were created and determines the required single-page height
# Then renders a single-page PDF from HTML using the page height and width arguments
pdf_bytes = pdfkit.from_string(rendered, options={
"page-height": f"{297 * PdfFileReader(BytesIO(pdfkit.from_string(rendered))).getNumPages()}mm",
"page-width": "210mm"
})
# convert the pdf into an image.
pdf = fitz.open(stream=pdf_bytes, filetype="pdf")
last_page = pdf[pdf.pageCount-1]
matrix = fitz.Matrix(1, 1)
image_pixels = last_page.get_pixmap(matrix=matrix, colorspace="GRAY")
image = Image.frombytes("L", [image_pixels.width, image_pixels.height], image_pixels.samples)
#Uncomment if you want to see.
#image.show()
# Now figure out where the end of the text is:
# First binarize. This might not be the most efficient way to do this.
# But it's how I do it.
THRESHOLD = 100
# I wrote this code ages ago and don't remember the details but
# basically, we treat every pixel > 100 as a white pixel,
# We convert the result to a true/false matrix
# And then invert that.
# The upshot is that, at the end, a value of "True"
# in the matrix will represent a black pixel in that location.
binary_matrix = np.logical_not(image.point( lambda p: 255 if p > THRESHOLD else 0 ).convert("1"))
# Now find last white row, starting at the bottom
row_count, column_count = binary_matrix.shape
last_row = 0
for i, row in enumerate(reversed(binary_matrix)):
if any(row):
last_row = i
break
else:
continue
percentage_from_top = (1 - last_row / row_count) * 100
print(percentage_from_top)
# Now you know where the page ends.
# Go back and crop the PDF accordingly.

Matplotlib scaling an image when saving a converted dxf as PDF

I am currently converting a dxf drawing that I made, to a pdf drawing using the function described here: Python converting DXF files to PDF or PNG or JPEG. (I am also putting the code below)
The problem is, that when I convert to the pdf, the code automatically scales the drawing to make it fit to a certain size. Now I need to either turn this off, or have a way of knowing what the scaling factor that it used was.
The complete code is as follows:
import matplotlib.pyplot as plt
import ezdxf
from ezdxf.addons.drawing import RenderContext, Frontend
from ezdxf.addons.drawing.matplotlib import MatplotlibBackend
# import wx
import glob
import re
class DXF2IMG(object):
default_img_format = '.png'
default_img_res = 300
def convert_dxf2img(self, names, img_format=default_img_format, img_res=default_img_res):
for name in names:
doc = ezdxf.readfile(name)
msp = doc.modelspace()
# Recommended: audit & repair DXF document before rendering
auditor = doc.audit()
# The auditor.errors attribute stores severe errors,
# which *may* raise exceptions when rendering.
if len(auditor.errors) != 0:
raise exception("The DXF document is damaged and can't be converted!")
else :
fig = plt.figure()
ax = fig.add_axes([0, 0, 1, 1])
ctx = RenderContext(doc)
ctx.set_current_layout(msp)
ctx.current_layout.set_colors(bg='#FFFFFF')
out = MatplotlibBackend(ax)
Frontend(ctx, out).draw_layout(msp, finalize=True)
img_name = re.findall("(\S+)\.",name) # select the image name that is the same as the dxf file name
first_param = ''.join(img_name) + img_format #concatenate list and string
fig.savefig(first_param, dpi=img_res)
if __name__ == '__main__':
first = DXF2IMG()
first.convert_dxf2img(['test.DXF'],img_format='.pdf')
From the github discussion thread: https://github.com/mozman/ezdxf/discussions/357
This can be solved in a way that isn't specific to ezdxf by carefully setting the figure size before saving. Matplotlib is quite complex when it comes to measurements. I have a solution which seems to work well but there may be slight inaccuracies since the calculations are done using floating point numbers but at the end of the day pixels are a discrete measurement so it's probably possible to be at least 1 pixel off. There are probably lots of other things like line widths which have an effect. ... you can calculate the desired figure size by specifying a desired units_to_pixels conversion factor and scaling the figure size so that the data spans the correct number of pixels. This assumes that the figure aspect ratio is already correct as my solution uses the same scale factor for width and height.
There's an extended workaround at the page I linked. Rather than copy-paste it here, I think the whole response is worth reading.

Programming a picture maker template in Python possible?

I'm looking for a library that enables to "create pictures" (or even videos) with the following functions:
Accepting picture inputs
Resizing said inputs to fit given template / scheme
Positioning the pictures in pre-set up layers or coordinates
A rather schematic approach to look at this:
whereas the red spots are supposed to represent e.g. text, picture (or if possible video) elements.
The end goal would be to give the .py script multiple input pictures and the .py creating a finished version like mentioned above.
Solutions I tried were looking into Python PIL, but I wasn't able to find what I was looking for.
Yes, it is possible to do this with Python.
The library you are looking for is OpenCV([https://opencv.org][1]/).
Some basic OpenCV python tutorials (https://docs.opencv.org/master/d9/df8/tutorial_root.html).
1) You can use imread() function to read images from files.
2) You can use resize() function to resize the images.
3) You can create a empty master numpy array matching the size and depth(color depth) of the black rectangle in the figure you have shown, resize your image and copy the contents into the empty array starting from the position you want.
Below is a sample code which does something close to what you might need, you can modify this to suit your actual needs. (Since your requirements are not clear I have written the code like this so that it can at least guide you.)
import numpy as np
import cv2
import matplotlib.pyplot as plt
# You can store most of these values in another file and load them.
# You can modify this to set the dimensions of the background image.
BG_IMAGE_WIDTH = 100
BG_IMAGE_HEIGHT = 100
BG_IMAGE_COLOR_DEPTH = 3
# This will act as the black bounding box you have shown in your figure.
# You can also load another image instead of creating empty background image.
empty_background_image = np.zeros(
(BG_IMAGE_HEIGHT, BG_IMAGE_WIDTH, BG_IMAGE_COLOR_DEPTH),
dtype=np.int
)
# Loading an image.
# This will be copied later into one of those red boxes you have shown.
IMAGE_PATH = "./image1.jpg"
foreground_image = cv2.imread(IMAGE_PATH)
# Setting the resize target and top left position with respect to bg image.
X_POS = 4
Y_POS = 10
RESIZE_TARGET_WIDTH = 30
RESIZE_TARGET_HEIGHT = 30
# Resizing
foreground_image= cv2.resize(
src=foreground_image,
dsize=(RESIZE_TARGET_WIDTH, RESIZE_TARGET_HEIGHT),
)
# Copying this into background image
empty_background_image[
Y_POS: Y_POS + RESIZE_TARGET_HEIGHT,
X_POS: X_POS + RESIZE_TARGET_WIDTH
] = foreground_image
plt.imshow(empty_background_image)
plt.show()

Which is the best way to make a report in PDF with more than 100 plots with Python?

I need to have a report in PDF with a lot of plots. Most of them will be created with matplotlib within a loop, but I would need also to include pandas plots and dataframes (the whole view) and seaborn plots. Right now I have explored the following solutions:
PythonTex. I have already used it for other projects, but it would consume a lot of time because you have to write \pythontexprint for each plot you want to display.
Use savefig command in every iteration of the loop and save all the plots as image for inserting all in Latex later. That would be very time consuming choice too. Other option is with that command save the plots as pdf and then merge all the pdfs. That would create an ugly report since the plots are not going to fit the whole page.
Use RStudio with reticulate for creating a Markdown report. The problem here is that I would need to learn reticulate functionality, thus spending time.
As far as I know, PyPDF does not fit my needs.
Create a jupyter notebook and then try to export it to a PDF. Once again, I do not know how to use jupyter notebook and I read that I would have to convert first to html and then to pdf.
Solutions from here: Generating Reports with Python: PDF or HTML to PDF However, the question is from three years ago and it might better options nowadays.
So my question is the following: is there any easy and quick way of getting all those plots (if it is along the code which generates them even better) in a PDF with a decent aspect?
My recommendation would be to use matplotlibs savefig to a BytesIO buffer (or save buffers to a list or similar data structure for 100). Then you can use those image buffers to insert the image into a pdf using a library like reportlab (website here and docs here). I regularly use this approach to create PowerPoint documents using python-pptx library but also verified it via PDF with reportlab. reportlab library is very powerful and a bit "low level" so there might be a little learning curve getting started but it surely meets your needs. There is a simple getting started tutorial here. reportlab is BSD license and available on pip and conda.
Anyways my code snippet looks like this.
Sorry its a bit long but my code has some helper functions to print text and dummy images. You should be able to copy/paste it directly.
The code will yield a PDF that looks like this
import io
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch
import numpy as np
import matplotlib.pyplot as plt
def plot_hist():
""" Create a sample histogram plot and return a bytesio buffer with plot
Returns
-------
BytesIO : in memory buffer with plot image, can be passed to reportlab or elsewhere
"""
# from https://matplotlib.org/gallery/lines_bars_and_markers/scatter_masked.html#sphx-glr-gallery-lines-bars-and-markers-scatter-masked-py
plt.figure(figsize=(7, 2.25))
N = 100
r0 = 0.6
x = 0.9 * np.random.rand(N)
y = 0.9 * np.random.rand(N)
area = (20 * np.random.rand(N))**2 # 0 to 10 point radii
c = np.sqrt(area)
r = np.sqrt(x * x + y * y)
area1 = np.ma.masked_where(r < r0, area)
area2 = np.ma.masked_where(r >= r0, area)
plt.scatter(x, y, s=area1, marker='^', c=c)
plt.scatter(x, y, s=area2, marker='o', c=c)
# Show the boundary between the regions:
theta = np.arange(0, np.pi / 2, 0.01)
plt.plot(r0 * np.cos(theta), r0 * np.sin(theta))
# create buffer and save image to buffer
# dpi should match the dpi of your PDF, I think 300 is typical otherwise it won't pretty well
buf = io.BytesIO()
plt.savefig(buf, format='png', dpi=300)
buf.seek(0)
# you'll want to close the figure once its saved to buffer
plt.close()
return buf
def add_text(text, style="Normal", fontsize=12):
""" Adds text with some spacing around it to PDF report
Parameters
----------
text : str
The string to print to PDF
style : str
The reportlab style
fontsize : int
The fontsize for the text
"""
Story.append(Spacer(1, 12))
ptext = "<font size={}>{}</font>".format(fontsize, text)
Story.append(Paragraph(ptext, styles[style]))
Story.append(Spacer(1, 12))
# Use basic styles and the SimpleDocTemplate to get started with reportlab
styles=getSampleStyleSheet()
doc = SimpleDocTemplate("form_letter.pdf",pagesize=letter,
rightMargin=inch/2,leftMargin=inch/2,
topMargin=72,bottomMargin=18)
# The "story" just holds "instructions" on how to build the PDF
Story=[]
add_text("My Report", style="Heading1", fontsize=24)
# See plot_hist for information on how to get BytesIO object of matplotlib plot
# This code uses reportlab Image function to add and valid PIL input to the report
image_buffer1 = plot_hist()
im = Image(image_buffer1, 7*inch, 2.25*inch)
Story.append(im)
add_text("This text explains something about the chart.")
image_buffer2 = plot_hist()
im = Image(image_buffer2, 7*inch, 2.25*inch)
Story.append(im)
add_text("This text explains something else about another chart chart.")
# This command will actually build the PDF
doc.build(Story)
# should close open buffers, can use a "with" statement in python to do this for you
# if that works better
image_buffer1.close()
image_buffer2.close()

ghostscript or python : how to combine pdf of different page sizes into a pdf of same page sizes?

I searched the stackoverflow for the problem. The nearest link is:
How to set custom page size with Ghostscript
How to convert multiple, different-sized PostScript files to a single PDF?
But this could NOT solve my problem.
The question is plain simple.
How can we combine multiple pdf (with different page sizes) into a combined pdf which have all the pages of same size.
Example:
two input pdfs are:
hw1.pdf with single page of size 5.43x3.26 inch (found from adobe reader)
hw6.pdf with single page of size 5.43x6.51 inch
The pdfs can be found here:
https://github.com/bhishanpdl/Questions
The code is:
gs -sDEVICE=pdfwrite -r720 -g2347x3909 -dPDFFitPage -o homeworks.pdf hw1.pdf hw6.pdf
PROBLEM: First pdf is portrait, and second page is landscape.
QUESTION: How can we make both pages portrait ?
NOTE:
-r720 is pixels/inch.
The size -g2347x3909 is found using python script:
wd = int(np.floor(720 * 5.43))
ht = int(np.floor(720 * 3.26))
gsize = '-g' + str(ht) + 'x' + str(wd) + ' '
# this gives: gsize = -g4308x6066
Another Attempt
commands = 'gs -o homeworks.pdf -sDEVICE=pdfwrite -dDEVICEWIDTHPOINTS=674 ' +\
' -dDEVICEHEIGHTPOINTS=912 -dPDFFitPage ' +\
'hw1.pdf hw6.pdf'
subprocess.call(commands, shell=1)
This gives first both pages portrait, but they do not have the same size.
First page is smaller is size, and second is full when I open the output in adobe reader.
In general, how can we make size of all the pages same?
The reason (in the first example) that one of the pages is rotated is because it fits better that way round. Because Ghostscript is primarily intended as print software, the assumption is that you want to print the input. If the output is to fixed media size, page fitting is requested, and the requested media size fits better (ie with less scaling) when rotated, then the content will be rotated.
In order to prevent that, you would need to rewrite the FitPage procedure, which is defined in /ghostpdl/Resource/Init/pdf_main.ps in the procedure pdf_PDF2PS_matrix. You can modify that procedure so that it does not rotate the page for a better fit.
In the second case you haven't set -dFIXEDMEDIA (-g implies -dFIXEDMEDIA, -dDEVICE...POINTS does not), so the media size requests in the PDF files will override the media size you set on the command line. Which is why the pages are not resized. Since the media is then the size requested by the PDF file, the page will fit without modification, thus -dPDFFitPage will do nothing. So you need to set -dFIXEDMEDIA if you use -dDEVICE...POINTS and any of the FitPage switches.
You would be better advised (as your second attempt) to use -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS to set the media size, since these are not dependent on the resolution (unlike -g) which can be overridden by PostScript input programs. You should not meddle with the resolution without a good reason, so don't set -r720.
Please be aware that this process does not 'merge', 'combine' or anything else which implies that the content of the input is unchanged in the output. You should read the documentation on the subject and understand the process before attempting to use this procedure.
You have tagged this question "ghostscript" but I assume by your use of subprocess.call() that you are not averse to using Python.
The pagemerge canvas of the pdfrw Python library can do this. There are some examples of dealing with different sized pages in the examples directory and at the source of pagemerge.py. The fancy_watermark.py shows an example of dealing with different page sizes, in the context of applying watermarks.
pdfrw can rotate, scale, or simply position source pages on the output. If you want rotation or scaling, you can look in the examples directory. (Since this is for homework, for extra credit you can control the scaling and rotation by looking at the various page sizes. :) But if all you want is the second page to be extended to be as long as the first, you could do that with this bit of code:
from pdfrw import PdfReader, PdfWriter, PageMerge
pages = PdfReader('hw1.pdf').pages + PdfReader('hw6.pdf').pages
output = PdfWriter()
rects = [[float(num) for num in page.MediaBox] for page in pages]
height = max(x[3] - x[1] for x in rects)
width = max(x[2] - x[0] for x in rects)
mbox = [0, 0, width, height]
for page in pages:
newpage = PageMerge()
newpage.mbox = mbox # Set boundaries of output page
newpage.add(page) # Add one old page to new page
image = newpage[0] # Get image of old page (first item)
image.x = (width - image.w) / 2 # Center old page left/right
image.y = (height - image.h) # Move old page to top of output page
output.addpage(newpage.render())
output.write('homeworks.pdf')
(Disclaimer: I am the primary pdfrw author.)

Categories