Does Pillow want my text parameters to be unicode or strings? - python

The function in question:
PIL.ImageDraw.Draw.text(xy, text, fill=None, font=None, anchor=None)
The issue is pretty standard... gibberish:
Right now, I'm running a string (utf-8) into the draw text function above, but it's giving all those weird characters. However, if I just print it, it shows the characters fine.
Should I pass a Unicode object instead?

This works correctly on Python 3.6.2 and 2.7.13 with pillow-4.2.1 (strings are default Unicode in Python 3.x). Chinese didn't display with the default font, but Arial MS Unicode worked.
#coding:utf8
from PIL import Image,ImageDraw,ImageFont
im = Image.new('1',(100,100))
draw = ImageDraw.Draw(im)
font = ImageFont.truetype(font='ARIALUNI.TTF',size=20)
draw.text((0,0),u'马克','white',font=font)
im.show()
Output:

Related

Bengali words printing out all wrong in manim

I had been trying to animate bengali characters using Manim. I used this method to use pc fonts in Manim. Everything seemed to be working well until i saw the output. For instance, if i write বাংলা লেখা i get the output as (look closely at the output) বাংলা লখো. Most of the times it spits out absolutely meaningless words.
The code used was:
class test_3(Scene):
def construct(self):
text1 = Text('বাংলা লেখা', font='Akaash')
text2 = Text('english text', font='Arial').move_to(DOWN)
self.play(Write(text1), Write(text2))
self.wait()
Bangla texts can be displayed properly just by specifying a Bangla font in Text() or MarkupText().
For example, if I like to display the Bangla text আইনস্টাইনের সমীকরণ in Kalpurush font, it can be done by:
from manim import *
class bangla(Scene):
def construct(self):
text = Text("আইনস্টাইনের সমীকরণ", font="Kalpurush")
self.play(Write(text))
Here, the font is locally installed. Many fonts can be used directly from online via the python package manim-fonts.
If you want to nicely show Bangla texts/sentences that contain inline-maths, you can use the LaTeX package latexbangla.
Here's an example code:
from manim import *
class bangla(Scene):
def construct(self):
myTemplate = TexTemplate(tex_compiler="xelatex", output_format=".pdf", preamble=r"\usepackage[banglamainfont=Kalpurush, banglattfont=Kalpurush]{latexbangla}")
tex = Tex(r"আইনস্টাইনের সমীকরণ, $E^2=(mc^2)^2+(pc)^2$", tex_template=myTemplate)
self.play(Write(tex))
The output:
N.B. The issue was also discussed on the Github repositories: ManimCommunity/manim and 3b1b/maim.
This is because of font type.
You should use bangli font. Try any font from here
Try to use ANSI Bangla fonts like "SutonnyMJ". If you are using Avro keyboard you can use Output as ANSI option like this,
Then if you have chosen font for example "SutonnyMJ", your code should look like this,
class test_3(Scene):
def construct(self):
text1 = Text('evsjv †jKAv', font='SutonnyMJ')
text2 = Text('english text', font='Arial').move_to(DOWN)
self.play(Write(text1), Write(text2))
self.wait()
Here I've replaced বাংলা লেখা with evsjv †jKAv (just ANSI form of the same Unicode text) which will render বাংলা লেখা as the font is now ANSI. I hope that Manim will support unicode fonts soon.
EDIT
I've found Bengali Unicode fonts to be working on Manim now. (24 March, 2021). I did this with Kalpurush font.
The code is
class FirstScene(Scene):
def construct(self):
text = Text("বাংলা অক্ষরে লেখা", font="Kalpurush")
text2 = Text("Another text")
self.play(Write(text), run_time=1)
self.wait(3)
self.remove(text)
self.play(Write(text2))
See the screenshot below,

MoviePy not displaying non-English characters

I'm trying to display non English characters in movie py but it's not displaying the actual character I typed. The language I'm trying is Telugu. What is the problem in displaying the characters?
This is the code I'm using
# -*- coding: utf-8 -*-
from moviepy.editor import *
# create clip from image
clip = ImageClip('img/1.jpg').on_color((1920, 1080))
clip = clip.set_duration(2)
# add annotation to clip
txtclip = TextClip('n+<ý² yûTq¿£yû', fontsize=50, color='red', font="Deepika")
cvc = CompositeVideoClip([ clip, txtclip.set_pos(('center', 'bottom'))])
cvc = cvc.set_duration(2)
# write video to file
cvc.write_videofile("text.mp4", fps=24)
The characters(Language) displayed in the code is weird but when I copy the text from the original file which was different characters displayed as this. And this worked in displaying the text in PySide QLabel.
Its just displaying boxes instead of the characters.
Can anyone help me with this issue?For your reference I'm adding image of text displayed in the code for language
I had a similar problem. I printed Japanese characters, but they were not displayed in video.
The problem was that the font did not support these characters.
Thus this produced empty result:
my_text = mp.TextClip("すみません。お先に失礼します",
font= "Amiri-regular", color= "white", fontsize= 34)
Solution was to download a custom font and import it as in example:
my_text2 = mp.TextClip("すみません。お先に失礼します",
font="wqy-microhei.ttc", color="white", fontsize=34)
I downloaded this font from github and it can be simply placed into directory with the python source code to be imported.

Pytesseract foreign language extraction using python

I am using Python 2.7, Pytesseract-0.1.7 and Tesseract-ocr 3.05.01 on a Windows machine.
I tried to extract text for Korean and Russian languages, and I am positive that I extracted.
And now I need to compare with the string and string got extracted from the image.
I can't compare the strings and to get the correct result, it just says not match.
Here is my code :
# -*- coding: utf-8 -*-
from PIL import Image
import pytesseract
import argparse
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", required=True, help="path to the image")
args = vars(ap.parse_args())
img = Image.open(args["input"])
img.load()
text = pytesseract.image_to_string(img)
print(text)
text = text.encode('ascii')
print(text)
i = 'Сред. Скорость'
print i
if ( text == i):
print "Match"
else :
print "Not Match"
The image used to extract text is attached.
Now I need a way to match it. And also I need to know the string extracted from pytesseract will be in Unicode or what? and if there is way to convert it into Unicode (like we have option in wordpad for converting character into Unicode)
You are using Tesseract with a language other than English, so first of all, make sure, that you have learning dataset for your language installed, as it is shown here (linux instructions only).
Secondly, I strongly suggest you to switch to Python 3 if you are working with non ascii langugages (as I do, as a slovenian). Python 3 works with Unicode out of the box, so it really saves you tons of pain with encoding and decoding strings...
# python3 obligatory !!!
from PIL import Image
import pytesseract
img = Image.open("T9esw.png")
img.load()
text = pytesseract.image_to_string(img, lang="rus") #Specify language to look after!
print(text)
i = 'Сред. Скорость'
print(i)
if (text == i):
print("Match")
else :
print("Not Match")
Which outputs:
Фред скорасть
Сред. Скорость
Not Match
This means the words didn't quite match, but still, considering the minimal coding effort and awful quality of input image, it think that the performance is quite amazing. Anyways, the example shows that encoding and decoding should no longer be a problem.

ReportLab Django Not Rendering Chinese Characters

I'm having difficulty making ReportLab render Chinese Characters. From everything I've looked up people are saying that it is probably a font problem but I've used a lot of different fonts and it doesn't even seem to be using them at all. The Chinese characters always just come out as black squares. Below is some sample code of what I have.
# -*- coding: utf8 -*-
from reportlab.lib.pagesizes import letter
from reportlab.pdfbase.ttfonts import TTFont
from io import BytesIO
pdfmetrics.registerFont(TTFont('Arial', 'arial.ttf', 'UTF-8'))
buffer = BytesIO()
doc = SimpleDocTemplate(buffer,
rightMargin=inch*0.5, # 1/2 Inch
leftMargin=inch*0.5, # 1/2 Inch
bottomMargin=0,
topMargin=inch*0.375, # 3/8 Inch
pagesize=letter)
# Get Styles
styles = getSampleStyleSheet()
# Custom Style
styles.add(ParagraphStyle(name='Address', font='Arial', fontSize=8))
elements = []
elements.append(Paragraph(u'6905\u897f\u963f\u79d1\u8857\uff0c\u5927\u53a6\uff03\u5927', styles['Address']))
doc.build(elements)
# Get the value of the BytesIO buffer and write it to the response.
pdf = buffer.getvalue()
buffer.close()
return pdf
I'm using an arial.ttf font found on my Ubuntu 12.04 installation in the fonts folder. I have also tried other fonts installed on this machine and all have exactly the same look even on the numbers and none of the Chinese characters are anything other than black squares.
Am I registering fonts wrong if even the numbers at the beginning aren't printing correctly? What could be causing the black squares?
Solved it. Turns out in your ParagraphStyle it needs to be fontName="Arial" not font="Arial" but I did learn some other tricks of getting it to work in other ways below.
styles.add(ParagraphStyle(name='Address', fontName='Arial')
After doing some digging I've learned a few things that I hope helps someone else in this situation. When you add the tags inside of your Paragraph around the Unicode text and set it explicitly to a font it will work.
elements.append(Paragraph(u'<font name="Arial">6905\u897f\u963f\u79d1\u8857\uff0c\u5927\u53a6\uff03\u5927</font>', styles['Address']))
This fixes the problem at least for Paragraphs with various fonts.
Again this code will work.
Choose the fonts that supports Chinese characters.
In Ubuntu, I choose "AR PL UMing CN" for example.
My code snippets:
# -*- coding: utf-8 -*-
...
pdfmetrics.registerFont(TTFont('AR PL UMing CN', 'uming.ttc'))
styles = getSampleStyleSheet()
...
styles.add(ParagraphStyle(name='Chinese', fontName='AR PL UMing CN', fontSize=20))
elements=[]
elements.append(Paragraph("成”, styles['Chinese']))
doc.build(elements)
...
I can even change to Chinese editor and type in the character straight off. Hope this helps.

How to use unicode characters with PIL?

I would like to add Russian text to the image. I use PIL 1.1.7 and Python 2.7 on Windows machine. Since PIL compiled without libfreetype library, I use the following on development server:
font_text = ImageFont.load('helvR24.pil')
draw.text((0, 0), 'Текст на русском', font=font_text)
(helvR24.pil is taken from http://effbot.org/media/downloads/pilfonts.zip)
On Production environment I do the following:
font_text = ImageFont.truetype('HelveticaRegular.ttf', 24, encoding="utf-8")
draw.text((0, 0), 'Текст на русском', font=font_text)
(tried to use unic, cp-1251 instead of utf-8)
In both cases it doesn't display Russian characters ('squares' or dummy characters are displayed instead). I think it doesn't work on Development environment since most probably helvR24.pil doesn't contain Russian characters (don't know how to check it). But HelveticaRegular.ttf surely has it. I also checked that my .py file has геа-8 encoding. And it doesn't display Russian characters even with default font:
draw.text((0, 0), 'Текст на русском', font=ImageFont.load_default())
What else should I try / verify? I've looked thru https://stackoverflow.com/a/18729512/604388 - it doesn't help.
I had a similar issue and solved it.
There are a couple things you have to be careful about:
Ensure that your strings are interpreted as unicode, either by
importing unicode_literarls from _____future_____ or by prepending the u
to your strings
Ensure you are using a font that is unicode,there are some free
here: open-source unicode typefaces I suggest this: dejavu
here is the code:
#!/usr/bin/python
# -*- coding: utf-8 -*-
from PIL import Image, ImageDraw, ImageFont, ImageFilter
#configuration
font_size=36
width=500
height=100
back_ground_color=(255,255,255)
font_size=36
font_color=(0,0,0)
unicode_text = u"\u2605" + u"\u2606" + u"Текст на русском"
im = Image.new ( "RGB", (width,height), back_ground_color )
draw = ImageDraw.Draw ( im )
unicode_font = ImageFont.truetype("DejaVuSans.ttf", font_size)
draw.text ( (10,10), unicode_text, font=unicode_font, fill=font_color )
im.save("text.jpg")
here is the results
Can you examine your TTF file? I suspect that it doesn't support the characters you want to draw.
On my computer (Ubuntu 13.04), this sequence produces the correct image:
ttf=ImageFont.truetype('/usr/share/fonts/truetype/msttcorefonts/Arial.ttf', 16)
im = Image.new("RGB", (512,512), "white")
ImageDraw.Draw(im).text((00,00), u'Текст на русском', fill='black', font=ttf)
im.show()
N.b. When I didn't specify unicode (u'...'), the result was mojibake.

Categories