pdf2image wrong font and crop text

pdf2image wrong font and crop text - python

I am converting my PDF into an image in Python with convert_from_path from pdf2image library.
This is the original PDF :
This is the generated image :
As you can see, the issue here is that the font in the image is not the good one and also that some text is missing (the adress in the bottom).
So :
Why is my text cropped ?
How can I add the font to pdf2library ?
EDIT : Link to the PDF (download it to your computer in order to see the right font which is Mistral)

On opening the file the Appearance should look like this with fields not highlighted. There is some text as " " and fields that seem to be
on digging deeper needing to change visually NeedAppearances true.
Whilst others may attempt to put something in the field positions, and struggle with the multiline entry as that is not normal behaviour for a PDF where single line printers blocks of text are the norm. A good simple font test, is in MS Edge for a well placed font, can it be selected and read out loud? Which is not the case here, so something is wrong with the inserted text. Later we see they are FDF (i.e. plain text) entries.
When using the non 14base fonts it is essential they are fully embeded or worse sub-set but in both cases there can be restrictions on font licenses, that should also be chequed :-) pun.
The fonts are perhaps not embeded well so some viewers may see NOTHING is searchable other than , however the file says base font in use is BaseFont/BCDEEE+Calibri & FontName/BCDEEE+Calibri (presumably for all those blank texts) That includes embeded in the font the license for use, © 2018 Microsoft ... for ... Biblical Hebrew ... is Open Source Software under the MIT License ... you may use this font to create ... the Microsoft ... content ... Any other use is prohibited. Producer(DocHub v5.0.7, build 9d3cd43) (from MS Office 365).
One other font mentioned in connection with the fields is /Font << /FThcmByOND later as /BaseFont/Helvetica and presumably that font was the one intended to be used with the self adjusting fields. and Adobe also reports that there is a MyriadPro-Regular embeded somewhere as open type (I could not see that license as easily, so probably excluded or encoded)
However fonts will most likely default on Windows to Arial if the embeded characters are not applied as seen in the Xchange Editor window.
Looking internally we can see that all the text on the left is described as " " thus nothing to be shown and whilst the file declares it may use Calibri overall, the font name here is defaulting to invisible Arial.
Thus many conflicts in behaviour that result in no font to be considered usable. The visible text comes from form fields and depending how those are defined will need their appearance to be altered which is not allowed for in some viewers hence the initial blank cheque entries.

Related

How can I programmatically duplicate a Tkinter font, replacing all the lowercase glyphs with uppercase glyphs?

I want to be able to show text in a Tkinter/ttk Text widget in all uppercase without losing the original case info or having to manage/store it. The easiest way I can think of is by reconfiguring a Text tag with an all-uppercase font version of the original font. My Python app is cross-platform, running on Windows and OS X. I prefer fixed-width fonts.
I would prefer a programmatic means to generate such an all-caps font on the fly based on some (standard) input font. What's the best way to do this?
Effectively, I want to be able to click a button to toggle certain selected text strings between the original case and all uppercase. The Text widget holds all the data. There is no other container for the data. A simple upper() call will not suffice for what I'm trying to accomplish.
Although not ideal, a potential workaround would be to find a pair of matching fixed-width fonts, one with mixed case and another essentially the same but with only uppercase letters.

Different behavior in powerpoint and libreoffice by ppt generated using python-pptx

I am trying to create a file using python-pptx on a flask server. All of this is working and even the download is working by the problem comes when I try to use text_frame.auto_size in my code. LibreOffice Impress displays the text perfectly but MS Powerpoint does not display the text properly.
Here are the images explaining the issue -
LibreOffice -
Powerpoint -
Also, here is the code that I am using -
text_box = slide.shapes.add_textbox(left, top, width, height)
text_frame = text_box.text_frame
text_frame.word_wrap = True
text_frame.auto_size = MSO_AUTO_SIZE.TEXT_TO_FIT_SHAPE
Any idea what I am doing wrong here?

This is unfortunately a limitation of PowerPoint and an unusual (in my experience) place where LibreOffice actually does better.
You'll notice that if you click in the PowerPoint text, insert a space and then delete it, that the text will automatically autofit. This may not fix the problem, but it points to the cause.
In the XML behind the slide, the current "auto-fitted" font size of the textbox is cached. LibreOffice automatically recalculates this cached figure when the presentation opens; PowerPoint does not.
Calculating the "auto-fitted" font size is the job of the rendering engine, which has access to font sizes, line/word breaks, etc. python-pptx does not include a rendering engine nor does it have access to one (none exists for Python as far as I know). So the best it can do is estimate it and it prefers not to do that, since that's getting into rendering.
However, there is an experimental feature in the form of the .fit_text() method that may get you most of the way there. Basically, that capability was so wanted that someone was willing to sponsor a "best-efforts" solution, which is what that method represents. The documentation at that link explains how to use it and its limitations.
Note that that method is experimental, meaning it won't be considered a bug if it doesn't work the way you need it to. You're free to elaborate it if you can do better.

How do I configure Python IDLE's text font?

I've searched all over and can't seem to find a solution. Python's IDLE just looks terrible on my laptop (Lenovo Yoga 2 Pro) with a 3200x1800 display running 8.1
I've attached a screenshot so you can see what I'm talking about. Has anyone figured out how to configure this? Thanks for the help!

Add a file like %USERPROFILE%\.idlerc\config-main.cfg
Add the following lines:
[EditorWindow]
font-size = 14
font = monaco
Or pick your favorite font and size.

I presume 'terrible looking' applies to the old Courier (typewriter) font. IDLE's tk Text windows (Shell, Editor, and Output) default to using 10 point 'TkFixedFont'. On Windows, (and only on Windows), that resolves to 'Courier'. I believe that this is or at least used to be standard on Windows.
The best solution is to select Options on the top menu bar and then Configure IDLE. One is then presented with a dialog with the Fonts/Tabs tab selected and the current font selected in the Base Editor Font box and the current size next to Size. There is an example box showing some text in the current font and size. Change either the font or size and the example is updated. Select OK or Apply and current text windows are updated.
Choices other than default are written to config-main.cfg in directory %USERPROFILE%/.idlerc/. The directory and file are created if necessary.
I personally use the fixed-pitch Lucida Console. I occasionally use variable-pitch Lucida Sans Unicode when using strings with non-Latin characters. One can tell from the example in the box whether a font is fixed or variable in character width by whether characters line up neatly in columns or not. (One of my goals is to add examples from several other scripts so one can also see the unicode coverage offered by a font.)

Reportlab printing defaults

I am using reportlab to create a pdf with some portait and some landscape pages. The pdf looks great on screen, but when printing it the default print settings are to shrink the pages, rather than rotate them. At first I though that this was just something to do with the settings, but a few other people have commented on it, using a variety of pdf readers and printers. After a bit of investigating it seems that this is something to do with an option set inside the pdf itself, recommending those print settings. Does anyone know of a way to change this when the pdf is generated?

writing a font viewer - getting font properties, loading ttf dynamically

I'm trying to write a font viewer for TrueType / OpenType fonts with VB6 / VB5 code (under Windows).
it is surprisingly difficult:
1) in VB / winAPI, i did not find how to extract the font's name, or font properties in general.
2) i can install the font (using AddFontResource API function), but then have to uninstall it. However, while (AddFontResource" expects a pathname, removing the font requires the font's name which is unknown to me.
is there a way to use an non-installed font ttf) ?
is there a way to extract a font's properties using vb6 ?
(I can write the program in wxPython but i know even less about fonts in python than with VB)

You could use the FreeType library.

It indeed is. I have faced the same problem myself (see my question). I ended up writing my own parser though because I needed to detect if the font was corrupt or not. There is a AddFontMemResourceEx function which:
When the function succeeds, the caller of this function can free the memory pointed to by pbFont because the system has made its own copy of the memory. To remove the fonts that were installed, call RemoveFontMemResourceEx. However, when the process goes away, the system will unload the fonts even if the process did not call RemoveFontMemResource.
Also, you can use the Font and Text Functions to get the font metrics.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.