Pdf overlaying not working - python

I have been looking for a solution for this problem : I have two landscape-oriented A3 pdfs with images and I want to overlay them in a manner that the resulting pdf contains both images merged into one as if one of them was a watermark, but with the same density. Think of it as if about printing two different pdfs on one A3 sheet of paper, I want to get exactly that effect.
In other words - just came up with a way to express it - I would like to overlay two pdfs and for the upper layer, make all the "white" area transparent.
Basically, I just followed steps in any solution from this question:
overlay one pdf or ps file on top of another
The pdftk didn't work in my case. The resulting PDF displayed the pdf that was on the top layer, but the bottom layer was not seen. So, I proceeded to programming solution and downloaded pyPdf.
The code from the site was exactly an implementation of the desired solution:
from pyPdf import PdfFileReader,PdfFileWriter
output = PdfFileWriter()
input1 = PdfFileReader(file("b.pdf", "rb"))
page1 = input1.getPage(0)
watermark = PdfFileReader(file("a.pdf", "rb"))
page1.mergePage(watermark.getPage(0))
output.addPage(page1)
outputStream = file("c.pdf", "wb")
output.write(outputStream)
outputStream.close()
However, the result was the same as after using pdftk.
What am I doing wrong? Maybe this is not pdf merging, multimerging, stamping, overlaying etc but something else? If so, how is it called?

White in a pdf can be a result of two fundamental situations: Either it is the result of nothing being drawn there or of something being drawn there using an effective color white. PDFs of the first type can be given a background using those page merge methods, PDFs of the latter can't.
The content stream of the page of your sample file a.pdf starts like this:
1 0 0 -1 0 841 cm
0.45 0 0 0.45 0 0 cm
1 0 0 1 0 0 cm
0 0 m 2646 0 l 2646 1870 l 0 1870 l h
q
1 1 1 rg f
Q
The first three lines change the coordinate system for the operations to come to have its origin in the upper left corner, coordinate values increasing right and down, and one unit being 1/160 inch.
The fourth line draws a rectangle covering the whole page (actually even slightly more) and the sixth line fills that rectangle with white. (The fifth and seventh line merely save and restore the graphics state.)
By overlaying this PDF over a page of another one, therefore, this PDF first of all covers all the existing content of that page with a white rectangle.
Thus, your PDF cannot be given a background by simply adding the page content to the content of a background PDF page. You have to
either remove the lines 4 and 6 from that content first (maybe there is some checkbox in lucidchart allowing you to switch this white background rectangle on or of)
or use a different watermarking procedure (like doing it the other way around, overlaying your PDF page with the watermark PDF page using transparency).
PS: Strictly speaking those content lines already are erroneous: As soon as you start constructing a path (which in the sample above happens with 0 0 m, i.e. a move to position 0, 0), you may only use path construction (or path clipping) operations until you finally use a path drawing operator (f, i.e. fill, in your sample). Cf. this answer for a reference.
Thus, the color setting 1 1 1 rg (i.e. set fill color to RGB 100%, 100%, 100%) and the special graphic state manipulation q (save graphic state) operations are not allowed here. Depending on the PDF viewer, therefore, different things may happen while displaying that page, e.g. the filling operation might be completely ignored or merely the color setting operation might be and the current fill color (black?) might be used instead. One cannot count on all PDF viewers handling this error like Adobe Reader does.
Maybe lucidchart have already fixed that issue and an update suffices. Otherwise you should ask lucidchart to start doing their PDF charts right.

Related

Tesseract OCR fails to detect varying font size and letters that are not horizontally aligned

I am trying to detect these price labels text which is always clearly preprocessed. Although it can easily read the text written above it, it fails to detect price values. I am using python bindings pytesseract although it also fails to read from the CLI commands. Most of the time it tries to recognize the part where the price as one or two characters.
Sample 1:
tesseract D:\tesseract\tesseract_test_images\test.png output
And the output of the sample image is this.
je Beutel
13
However if I crop and stretch the price to look like they are seperated and are the same font size, output is just fine.
Processed image(cropped and shrinked price):
je Beutel
1,89
How do get OCR tesseract to work as I intended, as I will be going over a lot of similar images?
Edit: Added more price tags:
sample5 sample6 sample7
The problem is the image you are using is of small size. Now when tesseract processes the image it considers '8', '9' and ',' as a single letter and thus predicts it to '3' or may consider '8' and ',' as one letter and '9' as a different letter and so produces wrong output. The image shown below explains it.
A simple solution could be increasing its size by factor of 2 or 3 or even more as per the size of your original image and then passing to tesseract so that it detects each letter individually as shown below. (Here I increased its size by factor of 2)
Bellow is a simple python script that will solve your purpose
import pytesseract
import cv2
img = cv2.imread('dKC6k.png')
img = cv2.resize(img, None, fx=2, fy=2)
data = pytesseract.image_to_string(img)
print(data)
Detected text:
je Beutel
89
1.
Now you can simply extract the required data from the text and format it as per your requirement.
data = data.replace('\n\n', '\n')
data = data.split('\n')
dollars = data[2].strip(',').strip('.')
cents = data[1]
print('{}.{}'.format(dollars, cents))
Desired Format:
1.89
The problem is that the Tesseract engine was not trained to read this kind of text topology.
You can:
train your own model, and you'll need in particular to provide images with variations of topology (position of characters). You can actually use the same image, and shuffle the positions of the characters.
reorganize the image into clusters of text and use tesseract, in particular, I would consider the cents part and move it on the right of the coma, in that case you can use tesseract out of the box. Few relevant criterions would be the height of the clusters (to differenciate cents and integers), and the position of the clusters (read from the left to the right).
In general computer vision algorithms (including CNNs) are giving you tool to have a higher representation of an image (features or descriptors), but they fail to create a logic or an algorithm to process intermediate results in a certain way.
In your case that would be:
"if the height of those letters are smaller, it's cents",
"if the height, and vertical position is the same, it's about the
same number, either on left of coma, or on the right of coma".
The thing is that it's difficult to reach that through training, and at the same time it's extremely simple to write this for a human as an algorithm. Sorry for not giving you an actual implementation, but my text is the pseudo code.
TrainingTesseract2
TrainingTesseract4
Joint Unsupervised Learning of Deep Representations and Image Clusters

How to properly configure coloring for mandelbrot set using python?

Looking for trainings on python I decided to draw the mandelbrot set using a script. Drawing it wasn't too complicated so I decided to use color and I discovered the smooth coloring algorithm. Using this question I was able to render something really beautiful and similar to this one.
To achieve that I set up a gradation color palette using three "steps" : From dark blue to light blue, then light blue to yellow and finally yellow to dark brown. The overall image is perfect.
Problem comes when I try too zoom in. Let's take the example of this area. When I'm at this level of zoom, my script doesn't draw dark blue anymore. I think I mis coded something because whereever you see dark blue on the wikipedia image, I have dark brown (so a color near the end of my palette). When I first thought about this I told myself if the pattern is going back to the original one, then it should use the same colors cause escape time should be the same.
So, was this coloring configured in the palette or is there something about escape time I didn't understand ?
Here is the code I use for the coloring :
def color_pixel(n, z):
smoothcolor = n + 1 - math.log(math.log(abs(z)))/math.log(2)
f = smoothcolor/iterate_max
i = int(f*500)
color = palette[i]
return color
500 is the number of colors in my palette (len(palette)-1).
z the value of z when it escaped over 10.
I use 100 as the max iterations but same results with a higher value.
Thanks !
My colouring method is to use a rotative array in three sections. First blue cross-fades to green without using red, then green to red without using blue, and finally red to (almost) blue with no green, where the next iteration level will wrap back to pure blue at the bottom of the array by using a modulus of the iterations.
However when I made a supposedly smoothe realtime zoom (by storing the data with a doubling scale, and then in-betweening 16 frames by interpolation for playback), I found that in the neighbourhood of the M-set, where the contours look chaotic, the effect was messy as the colours tend to dance around. There I used a different scheme, bending the colours to a gray scale.
My final colouring method was to use the rotating palette for pixels having one or more neighbours of the same depth, but tending towards mid-gray depending on how many neighbours were different. Bear in mind though, that the requirements for a moving image are different from a static image, and sharp detail is not necessarily desirable.
At deep zooms the number of iterations needed to extract the detail can be 1000 or more. I solved the problem laterally. I do not brute-force the map calculations. I developed a curve-stitching method that follows the contour of an iteration level, and then fills the region. In the smoothly changing areas that means large areas do not have to be iterated. Similarly for the M-Set itself where the function has not escaped - I avoid iterating there as far as possible by again trying to follow round its edge and then filling. This method can suffer from nipping off some detail, but the speed gain is enormous. In the chaotic region near the edge of the M-Set my method was to just iterate at every pixel.
I'm also looking into this now as well (the coloring scheme). Since the image was made using Ultra Fractal 3, I looked into that program and poked around and finally found the details, which are slightly different than from what you and the wiki are doing. It's written in some custom scripting language but hopefully you can understand what it's doing. Here's the code:
Smooth(OUTSIDE) {
;
; This coloring method provides smooth iteration
; colors for Mandelbrot and other z^2 formula types
; (Phoenix, Julia). Results on other types may be
; unpredictable, but might be interesting.
;
; Thanks to F. Slijkerman for some tweaks.
; Thanks to Linas Vepstas for the math.
;
; Written by Damien M. Jones
;
init:
complex il = 1/log(#power) ; Inverse log (power).
float lp = log(log(#bailout)) ; log(log bailout).
final:
#index = 0.05 * real(#numiter + il*lp - il*log(log(cabs(#z))))
default:
title = "Smooth (Mandelbrot)"
helpfile = "Uf*.chm"
helptopic = "Html/coloring/standard/smooth.html"
$IFDEF VER50
rating = recommended
$ENDIF
param power
caption = "Exponent"
default = (2,0)
hint = "This should be set to match the exponent of the \
formula you are using. For Mandelbrot, this is usually 2."
endparam
param bailout
caption = "Bail-out value"
default = 128.0
min = 1
hint = "This should be set to match the bail-out value in \
the Formula tab. This formula works best with bail-out \
values higher than 100."
endparam
}
My math isn't good enough to know how to compute the log of a complex number so I'm stuck at the moment in going further using this, but I thought I'd share what I've found on this topic.

Python - print image bit data to a ESC/POS printer with python

I've been looking for an example of how to format and print bmp's to my receipt printer (so I can add logos) for a long time, so I doubt this is a duplicate post considering others were for java or other script languages. Usually I'm pretty good at understanding instructions, but all I seem to find is the same old instructions I can never fully understand.
I am using python 2.7 and I have a function pI(x) which uses win32print to send data to the printer, where x is the data in string format using "\x??" for hex data like formatting text. It seems to work well.
The programmer manual that came with my printer says (for downloading bit image, GS *) for syntax:
Hex 1D 2A x y d1...dk
and:
d=1 for printing the corresponding dot and d=0 for not printing the corresponding dot.
Here is my questions about these instructions:
Does this mean that all x, y, d1...dk is in hex (or "\x??")? I think so.
What is x and y representing? I read a while ago on a site (maybe this one) that x+y*255 = image width, and I assume that is using the order of operations. Is this correct?
The instructions on my generic printer also state that x and y are both supposed to be between 1 and 48, totaling no more than 1500, unlike some manuals which say x is supposed to be between 0 and 3 and y being between 1 and 128 (I think) which said x+y*255=width, totaling about 2000. it also says k=x*y*8 which I think that the example would be 8*8*8=512*"\x01", so where does the third 8 come from and how do I code that in the string??? Does x=width and y=height?... then how do I get an image width of the maximum 384 dots?
Does this mean that I have to enter "\x00" or "\x01" for each dot, so one instance (a small black block of 8x8) of GS* would be 64*"\x01"?
Do I have to GS * each group of 8 dots tall or line of 8 dots, or will that overwrite the previously programmed data?
I'd like to later include in my program a means of easily creating logos using a tkinter canvas widget and saving it to a text file for future printing using pI(), so I really need to know how to directly 'download' image data to the printer and using a third party module probably won't work since I want to continue using my pI() function. Yes, it's ambitious and I'm probably doing it the hard way. But I'm afraid if I start incorporating too much new stuff I'm not familiar with, I'll get too confused.
Basically, what string should I send to pI() to download an image of a solid 8x8-dot black box with a 2x wide white line down the center on the printer?
Here's an example of what I would like the printer to print so I can see a working code string

ReportLab Two-Column TOC?

I have a PDF I am generating using ReportLab. I am using the standard TableOfContents flowable, but am trying to split it up into two columns, so it will all fit on the first page. the content will only ever be on one level, so I am not worried about odd-looking indentations.
Right now I have the PageTemplate using 2 Frames to create 2 columns on the first page. I get a
LayoutError: Flowable <TableOfContents at 0x.... frame=RightCol>...(200.5 x 720) too large on page 1 in frame 'RightCol'(200.5 x 708.0*)
Any ideas?
Well, color me embarrassed.
For anyone else having this problem, check your DocTemplate for allowSplitting. The default is 1, but I had changed mine to 0 and that was the reason.
*facepalm*

Simple object recognition

===SOLVED===
Thanks for your suggestions and comments. By working on the flood_fill algorithm given in Beginning Python Visualization book (Chapter 9 - Image Processing) I have implemented what I have wanted. I can count the objects, get enclosing rectangles for each object (therefore height and widths), and lastly can construct NumPy arrays or matrices for each of them.
Although it is not an optimized approach it does what I want. The source code (lab2.py) and the png file (lab2-particles.png) that I use have been put under http://code.google.com/p/ccnworks/source/browse/#svn/trunk/AtSc450.
You need NumPy and PIL installed, and matplotlib to see the histogram. Core of the code lies within the objfind function where the main recursive object search action occurs.
One further update:
SciPy's ndimage.label() does exactly what I want, too.
Cheers for David-Warde Farley and Zachary Pincus from the NumPy and SciPy mailing-lists for pointing this right into my eyes :)
=============
Hello,
I have an image that contains the shadows of ice particles measured by a particle spectrometer. I want to be able to identify each object, so that I can later classify and use them further in my calculations.
In essence, what I am willing to do is to simply implement a fuzzy selection tool where I can simply select each entity.
How could I easily solve this problem? (Preferably using Python)
Thanks.
NOTE: In my question I am referring to each specific connected pixels as objects or entities. My intention to extract them and create NumPy array representations as shown below. (Here I am using the top-left object; if a pixel exist use 1's if not use 0's. This object's shape is 3 by 3 which correspondingly 3 pixel height by 3 pixel width. These are projections of real ice-particles onto 2D domain, under the assumption of their being spherical and equivalent radius is (height+width)/2, and later some scalings --from pixels to actual sizes and volume calculations will follow)
import numpy as np
np.array([[1,1,1], [1,1,1], [0,0,1]])
array([[1, 1, 1],
[1, 1, 1],
[0, 0, 1]])
Here is a section from the image which I am going to use.
screenshot http://img43.imageshack.us/img43/2327/particles.png
Scan every square (e.g. from the top-left, left-to-right, top-to-bottom)
When you hit a blue square then:
a. Record this square as a location of a new object
b. Find all the other contiguous blue squares (e.g. by looking at the neighbours of this square, and the neighbours of those neighbours, etc.) and mark them as being part of the same object
Continue to scan
When you find another blue square, test to see whether it's part of a known object before going to step 2; alternatively in step 2b, erase any square after you've associated it with an object
I used to do this kind of analysis on micrographs and eventually put everything I needed into an image processing and analysis package written in C, driven via Tcl. (It worked with 512 x 512 images only, which explains why 512 crops up so often. There were images with pixels of various sizes allocated, but most of the work was done with 8-bit pixels, which explains why there is that business of 0xff and maximum meaningful count of 254 on an image.)
Briefly, the 'zz' at the begining of the Tcl commands sends the remainder of the line to the package's parser which calls the appropriate C routine with the given arguments. Right after the 'zz' is an argument that indicates the input and output of the command. (There can be multiple inputs but only a single output.) 'r' indicates a 512 x 512 x 8-bit image. The third word is the name of the command to be invoked; 'graphs' marks up an image as described in the text below. So, 'zz rr graphs' means 'Call the ZZ parser; input an r image to the graphs command and get back an r image.' The rest of the Tcl command line specifies which of the pre-allocated images to use. (The 'g' image is an ROI, i.e., region-of-interest, image; almost all ZZ ops are done under ROI control.) So, 'r1 r1 g8' means 'Use r1 as input, use r1 as output (that is, mark up the input image itself), and do the operation wherever the corresponding pixel on image g8 --- that is, r8, used as an ROI --- is >0.
I don't think it is available online anywhere, but if you want to pick through the source code or even compile the whole shebang, I'll be happy to send it to you. Here's an excerpt from the manual (but I think I see some errors in the manual at this late date --- that's embarrassing ...):
Example 6. Counting features.
Problem
Counting is a common task. The items counted are called “features”, and it is usually necessary to prepare images carefully so that features correspond in a one-to-one way with things that are the real objects to be counted. Here, however, we ignore image preparation and consider, instead, the mechanics of counting. The first counting exercise is to find out how many features are on the images in the directory ./cells?
Approach
First, let us define “feature”. A feature is the largest group of “set” (non-zero) pixels all of which can be reached by travelling from one set pixel to another along north-south-east-west (up-down-right-left) routes, starting from a given set pixel. The zz command that detects and marks such features on an image is “zz rr graphs R:src R:dest G:ROI”, so called because the mathematical term for such a feature is a “graph”. If all the pixels on an image are set, then there is only a single graph on the image, but it contains 262144 pixels (512 * 512). If pixels are set and clear (equal to zero) in a checkerboard pattern,
then there will be 131072 (512 * 512 / 2) graphs, but each will containing only one pixel.
Briefly explained, “zz rr graphs” starts in the upper-left corner of an image and scans each
succeeding row left to right until it finds a set pixel, then finds all the set pixels attached to that through north, south, east, or west borders (“4-connected”). It then sets all pixels in that graph to 1 (0x01). After finding and marking graph 1, it starts scanning again at the pixel after the one where it first discovered graph 1, this time ignoring any pixels that already belong to a graph. The first 254 graphs that it finds will be marked uniquely; all graphs found after that, however, will be marked with the value 255 (0xff)
and so cannot be distinguished from each other. The key to being able to count any number of graphs accurately is to process each image in stages, that is, find the number of graphs on an image and, if the number is greater than 254, erase the 254 graphs just found, repeating the process until 254 or fewer graphs are found. The Tcl language provides the means to set up control of this operation.
Let us begin to build the commands needed by reading a ZZ image file into an R image and detecting and marking the graphs. Before the processing loop, we declare and zero a variable to hold the total number of features in the image series. Within the processing loop, we begin by reading the image file into an R image and detecting and marking the graphs.
zz ur to $inDir/$img r1
zz rr graphs r1 r1 g8
Next, we zero some variables to keep track of the counts, then use the “ra max” command to find out whether more than 254 graphs were detected.
set nGraphs [ zz ra max r1 a1 g1 ]
If nGraphs does equal 255, then the 254 accurately counted graphs should be added to the total, the graphs from 1 through 254 should be erased, and the count repeated for as many times as it takes to reduce the number of graphs below 255.
while {$nGraphs == 255} {
incr sumGraphs 254
zz rbr lt r1 155 r1 g1 0 255
set sumGraphs 0
zz rr graphs r1 r1 g8
set nGraphs [ zz ra max r1 a1 g8 ]
}
When the “while” loop exits, the variable nGraphs must hold a number less than 255, that is, a number of accurately counted graphs; this is added to the rising total of the number of features in the image series.
incr sumGraphs $nGraphs
After the processing loop, print out the total number of features found in the series.
puts “Total number of features in $inDir \
images $beginImg through $endImg is $sumGraphs.”
After the processing loop, print out the total number of features found in the series.
Looking at the image you provided, all you need to do next is to apply a simple region growing algorithm.
If I were using MATLAB, I would use bwlabel/bwboundaries functions. I believe there's an equivalent function somewhere in Numpy, or use OpenCV with python wrappers as suggested by #kwatford
OpenCV has a Python interface that you might find useful.
Connected component analysis may be what you are looking for.

Categories