I'm writing a CAD application that outputs PDF files using the Cairo graphics library. A lot of the unit testing does not require actually generating the PDF files, such as computing the expected bounding boxes of the objects. However, I want to make sure that the generated PDF files "look" correct after I change the code. Is there an automated way to do this? How can I automate as much as possible? Do I need to visually inspect each generated PDF? How can I solve this problem without pulling my hair out?
(See also update below!)
I'm doing the same thing using a shell script on Linux that wraps
ImageMagick's compare command
the pdftk utility
Ghostscript (optionally)
(It would be rather easy to port this to a .bat Batch file for DOS/Windows.)
I have a few reference PDFs created by my application which are "known good". Newly generated PDFs after code changes are compared to these reference PDFs. The comparison is done pixel by pixel and is saved as a new PDF. In this PDF, all unchanged pixels are painted in white, while all differing pixels are painted in red.
Here are the building blocks:
pdftk
Use this command to split multipage PDF files into multiple singlepage PDFs:
pdftk reference.pdf burst output somewhere/reference_page_%03d.pdf
pdftk comparison.pdf burst output somewhere/comparison_page_%03d.pdf
compare
Use this command to create a "diff" PDF page for each of the pages:
compare \
-verbose \
-debug coder -log "%u %m:%l %e" \
somewhere/reference_page_001.pdf \
somewhere/comparison_page_001.pdf \
-compose src \
somewhereelse/reference_diff_page_001.pdf
Ghostscript
Because of automatically inserted meta data (such as the current date+time), PDF output is not working well for MD5hash-based file comparisons.
If you want to automatically discover all cases which consist of purely white pages, you could also convert to a meta-data free bitmap format using the bmp256 output device. You can do that for the original PDFs (reference and comparison), or for the diff-PDF pages:
gs \
-o reference_diff_page_001.bmp \
-r72 \
-g595x842 \
-sDEVICE=bmp256 \
reference_diff_page_001.pdf
md5sum reference_diff_page_001.bmp
If the MD5sum is what you expect for an all-white page of 595x842 PostScript points, then your unit test passed.
Update:
I don't know why I didn't previously think of generating a histogram output from the ImageMagick compare...
The following is a command pipeline chaining 2 different commands:
the first one is the same as the above compare which generates the 'white pixels are equal, red pixels are differences'-format, only it outputs the ImageMagick internal miff format. It doesn't write to a file, but to stdout.
the second one uses convert to read stdin, generate a histogram and output the result in text form. There will be two lines:
one indicating the number of white pixels
the other one indicating the number of red pixels.
Here it goes:
compare \
reference.pdf \
current.pdf \
-compose src \
miff:- \
| \
convert \
- \
-define histogram:unique-colors=true \
-format %c \
histogram:info:-
Sample output:
56934: (61937, 0, 7710,52428) #F1F100001E1ECCCC srgba(241,0,30,0.8)
444056: (65535,65535,65535,52428) #FFFFFFFFFFFFCCCC srgba(255,255,255,0.8)
(Sample output was generated by using these reference.pdf and current.pdf files.)
I think this type of output is really well suited for automatic unit testing. If you evaluate the two numbers, you can easily compute the "red pixel" percentage and you could even decide to return PASSED or FAILED based on a certain threshold (if you don't necessarily need "zero red" for some reason).
You could capture the PDF as a bitmap (or at least a losslessly-compressed) image, and then compare the image generated by each test with a reference image of what it's supposed to look like. Any differences would be flagged as an error for the test.
The first idea that pops in my head is to use a diff utility. These are generally used to compare texts of documents but they might also compare the layout of the PDF. Using it, you can compare the expected output with the output supplied.
The first result google gives me is this. Altough it is commercial, there might be other free/open source alternatives.
I would try this using xpresser - (https://wiki.ubuntu.com/Xpresser ) You can try to match images to similar images not exact copies - which is the problem in these cases.
I don't know if xpresser is being ctively developed, or if it can be used with stand alone image files (I think so) -- anyway it takes its ideas from teh Sikuli project (which is Java with a Jython front end, while xpresser is Python).
I wrote a tool in Python to validate PDFs for my employer's documentation. It has the capability to compare individual pages to master images. I used a library I found called swftools to export the page to PNG, then used the Python Imaging Library to compare it with the master.
The relevant code looks something like this (this won't run as there are some dependencies on other parts of the script, but you should get the idea):
# exporting
gfxpdf = gfx.open("pdf", self.pdfpath)
if os.path.isfile(pngPath):
os.remove(pngPath)
page = gfxpdf.getPage(pagenum)
img = gfx.ImageList()
img.startpage(page.width, page.height)
page.render(img)
img.endpage()
img.save(pngPath)
return os.path.isfile(pngPath)
# comparing
outPng = os.path.join(outpath, pngname)
masterPng = os.path.join(outpath, "_master", pngname)
if os.path.isfile(masterPng):
output = Image.open(outPng).convert("RGB") # discard alpha channel, if any
master = Image.open(masterPng).convert("RGB")
mismatch = any(x[1] for x in ImageChops.difference(output, master).getextrema())
Related
I recently started learning Laravel, and currently watching an online course. Online courses are fine, but I like to have local copies, so I'm trying to download/merge segmented audio from Laracasts: Laravel 8 From Scratch series.
I've written some scripts (in Python) that does the following:
Download the master.json
Read master.json and download audio segments
Merge the segments into a single file (the file is not playable yet)
Process the audio file via ffmpeg (now it's playable, but has issues)
I think there's a problem with the step 3 and/or 4.
In step/script 3, I create a new file, and add the contents of the segments to the file in binary.
Then (step/script 4), run a ffmpeg command in python: ffmpeg -i merged-file.mp4 -c copy processed-file.mp4
However, the final file doesn't work/play as expected. There's a delay in the beginning, and some parts seem to be cut off/skipped.
There are three possibilities:
Segment files are problematic (not likely?)
I'm doing the merging wrong
I'm doing the ffmpeg processing wrong
Can someone guide me here?
The issues/colored parts in the ffmpeg output are:
...
[mov,mp4,m4a,3gp,3g2,mj2 # 000001cfbc0de780] could not find corresponding track id 2
[mov,mp4,m4a,3gp,3g2,mj2 # 000001cfbc0de780] could not find corresponding trex (id 2)
...
[aac # 000001cfbc0f0380] Number of bands (31) exceeds limit (6).
...
[mp4 # 000001cfbc20ecc0] track 0: codec frame size is not set
...
[mp4 # 000001cfbc20ecc0] Non-monotonous DTS in output stream 0:0; previous: 318318, current: 286286; changing to 318319. This may result in incorrect timestamps in the output file.
...
Everything required for a test case is located in GitHub (akinuri/dump/m4s-segments/). Screenshot of the contents:
Note: there are two types/formats of audio in the master.json: mp42 and dash. dash works as expected, and seem to be used in limited videos/courses. On the other hand, mp42 appears more. So I need a way to make mp42 work.
I have to acomplish the following task: Two square, equally sized png images have to be put together side by side and exported as a combined image. This has to be done to hundreds of pairs in a folder with endings "_1" and "_2"
I think this can be done in Gimp with Pytho-Fu, but trying to understand the fundamentals of scripting for Gimp is a bit overwhelming on a tight schedule and I really just need a solution for this single task. I would really appreciate you to point me in the right direction with this.
(If there is a simpler solution than using Gimp, please let me know. It should run on Linux and ideally be able to be executed from bash.
After xenoid's recommendation:
I found ImageMagick's syntax and documentation to be a horrible mess less than optimal, so I'll share how I got it done:
with Ubuntu 18.04.04:
montage -tile x1 -geometry +0+0 input1.png input2.png output.png
The whole thing (probably not interesting for anyone else...)
#! /bin/bash
input="./Input/"
output="./Output/"
# add to output filename
prefix="CIF_"
postfix="_2"
# get file list
readarray -d '' RRA < <(find $input -regextype posix-egrep -regex '(.*)?_1_cr\.png$' -print0)
echo "Merging ${#RRA[#]} images.."
# remove directory from filename
RRA=( "${RRA[#]##*/}" )
# strip last part of filename: "_1_cr.png"
RRA=( "${RRA[#]/%_1_cr\.png/}" )
# merge images
for fall in "${RRA[#]}";do
# check if there are two images to merge for current case
if test -f "$input${fall}_2_cr.png"; then
echo "${fall}"
montage -tile x1 -tile-offset +10 -geometry +0+0 -border +20+20 -bordercolor white $input${fall}_1_cr.png $input${fall}_2_cr.png $output$prefix${fall}$postfix.png
else
echo "${fall} - no second image found"
fi
done
With ImageMagick, you can loop over each pair and just do:
convert image_1.png image_2.png +append image_1_2.png
See
https://imagemagick.org/Usage/layers/#append
If you want space between them, then use +smush X, where X is the amount of space that you want. If you want them overlapped, then use a negative value for X. You can set the color of the space using -background color.
See
https://imagemagick.org/script/command-line-options.php#smush
I have a vectorized floorplan image. I want to identify the objects in the image through the vector data in the SVG file of that image. The SVG code does not have any close points(z) in between them. So I am unable to understand when does the point moves to the other object? Can somebody help me, please?
I have very little knowledge about these SVG files and using them in Tkinter. So please somebody help me or suggest me what can I do?
This is the vector data of the image.
vector data of the image
use in conjunction with SO floorplan question.
Jump to z_final_floorplan.svg for final file.
A
Create 4 files:
w_original_floorplan.svg
x_rough_static_floorplan.svg
y_rough_live_floorplan.svg
z_final_floorplan.svg
w_original_floorplan.svg and x_rough_static_floorplan.svg are identical apart from filename.
y_rough_live_floorplan.svg and z_final_floorplan.svg are empty; to be populated.
Copy x_rough_static_floorplan.svg to y_rough_live_floorplan.svg.
Open y_rough_live_floorplan.svg on browser using server.
x_rough_static_floorplan.svg find all M and replace with two newlines / symbol M (case sensitive). shift + enter shift + enter /M
B
[this section takes the time]
Take away 1st '/' in path in y_rough_live_floorplan.svg [shows blackout_floorplan]
Label x_rough_static_floorplan.svg code section blackout_floorplan where code is.
(this file is used as rough-work, so being xml / svg valid is irrelevant)
In y_rough_live_floorplan.svg find next '/' and delete it [shows floorplan_top_left_whiteout]
Label x_rough_static_floorplan.svg code section floorplan_top_left_whiteout where code is.
Have x_rough_static_floorplan.svg and y_rough_live_floorplan.svg open in 2 windows, will be going back and forth to each of them. Keep repeating until at end.
(hint: find tool seems to be on switching from files in vscode, so you can use find / and next one cmd + g easily) Maybe handy to have a paper printout of original svg as reference and label the names of objects you create e.g.bath, sink, table, as you go along (don’t be fooled by this, one table is 'table'. Is 2nd chair chair2, chair_2, chair_two etc.?) etc..
C
Reorder the whole labels and corresponding code in path x_rough_static_floorplan.svg so the labels are ordered next to each other, but in the order they are found in the path:
e.g.
…
floorplan
bath
sink
table_chairs
sofa
…
Use the 'find' tool here. This process, itself will require a temp file to copy and paste to rather than reorder within the file working on. And rewrite temp to file working on. Might be good idea to create checklist of objects and cross-off as done.
E.g. floorplan, bath, table_chairs, sink…
D
Create path elements from your grouped objects, putting each id as id=“floorplan_main”, id=“bath”, id=“sink” etc.. etc..
Bear in mind, the data of how this is drawn is really, really bad. Really they should be drawn with rect elements for a rectangle when possible and a lot of the path data is very unnecessary, but that’s obviously how the application generates the svg.
I want to make some operationi automatic.But I met some trouble in export the image after I bake it.At first I try to use "bpy.ops.object.bake_image()" to bake the image.But the result image can not be active in uv editor.
The bake was success,but the result image didn't appear in the uv editor.It need selected so that I could export the file.
So I search the document , and found the other command "bpy.ops.object.bake()".It have a parameter "save_mode",but I still met some obstacle in using this command.It always point me out that " RuntimeError: error: No active image found in material "material" (0) for object "1.001" ".
Here is the official document about this two command:
https://docs.blender.org/api/blender_python_api_2_78a_release/bpy.ops.object.html?highlight=bake#bpy.ops.object.bake
Can anyone try to give me some solution or some advice that how can I make this thing right.
Many of blenders operators require a certain context to be right before they will work, for bpy.ops.image.save() that includes the UV/Image editor having an active image. While there are ways to override the current context to make them work, it can often be easier to use other methods.
The Image object can save() itself. If it is a new image you will first need to set it's filepath, you may also want to set it's file_format.
img = bpy.data.images['imagename']
img.filepath = '/path/to/save/imagename.png'
img.file_format = 'PNG'
img.save()
I am generating SVG image in python (pure, no external libs yet). I want to know what will be a text element size, before I place it properly. Any good idea how to make it? I checked pysvg library but I saw nothing like getTextSize()
This can be be pretty complicated. To start with, you'll have to familiarize yourself with chapter on text of the SVG specification. Assuming you want to get the width of plain text elements, and not textpath elements, at a minimum you'd have to:
Parse the font selection properties, spacing properties and read the xml:space attibute, as well as the writing-mode property (can also be top-bottom instead of just left-to-right and right-to-left).
Based on the above, open the correct font, and read the glyph data and extract the widths and heights of the glyphs in your text string. Alone finding the font can be a big task, seeing the multiple places where font files can hide.
(optionally) Look through the string for possible ligatures (depending on the language), and replace them with the correct glyph if it exists in the font.
Add the widths for all the characters and spaces, the latter depending on the spacing properties and (optionally) possible kerning pairs.
A possible solution would be to use the pango library. You can find python bindings for it in py-gtk. Unfortunately, except from some examples, the documentation for the python bindings is pretty scarce. But it would take care of the details of font loading and determining the extents of a Layout.
Another way is to study the SVG renderer in your browser. But e.g. the support for SVG text in Firefox is limited.
Also instructive is to study how TeX does it, especially the concept (pdf) of boxes (for letters) and glue (for spacing).
I had this exact same problem, but I had a variable width font. I solved it by taking the text element (correct font and content) I wanted, wrote it to a svg file, and I used Inkscape installed on my PC to render the drawing to a temporary png file. I then read back the dimensions of the png file (extracted from the header), removed the temp svg and png files and used the result to place the text where I wanted and elements around it.
I found that rendering to a drawing, using a DPI of 90 seemed to give me the exact numbers I needed, or the native numbers used in svgwrite as a whole. -D is the flag to use so that only the drawable element, i.e. the text, is rendered.
os.cmd(/cygdrive/c/Program\ Files\ \(x86\)/Inkscape/inkscape.exe -f work_temp.svg -e work_temp.png -d 90 -D)
I used these functions to extract the png numbers, found at this link, note mine is corrected slightly for python3 (still working in python2)
def is_png(data):
return (data[:8] == b'\x89PNG\r\n\x1a\n'and (data[12:16] == b'IHDR'))
def get_image_info(data):
if is_png(data):
w, h = struct.unpack('>LL', data[16:24])
width = int(w)
height = int(h)
else:
raise Exception('not a png image')
return width, height
if __name__ == '__main__':
with open('foo.png', 'rb') as f:
data = f.read()
print is_png(data)
print get_image_info(data)
It's clunky, but it worked