Fontforge python implementation won't accept unicode value in glyphpen definition

Fontforge python implementation won't accept unicode value in glyphpen definition - python

I'm trying to convert a CAD font file to ttf for use with HTML using Python and Fontforge.
The program reads the fontfile data:
data=f.read(4)
glyph['offset']=f.tell()
glyph['glyphname']=data[1]*256+data[0]
glyph['pathsize']=((data[3]<<8)&0xff00)+(data[2]&0xff)
(Forgive the weird manipulation of the data bytes: I have been trying various ways of inputting the data in case there's something I'm doing wrong).
I then define the glyph by creating my character
uniname=glyph['glyphname']
char=font.createChar(uniname)
pen=font[uniname].glyphPen()
This works fine until I get to the unicode character 260, when pdb tells me that there is a TypeError: Index out of bounds.
The funny thing is that, if I run the following instead:
for i in range(253,280):
uniname=i
print(uniname)
char=font.createChar(uniname)
pen=font[uniname].glyphPen()
Then it happily accepts all the values without complaint.
I'm baffled.

I finally got this to work.
Instead of doing this:
char=font.createChar(uniname)
pen=font[uniname].glyphPen()
I did this:
char=font.createChar(uniname)
pen=char.glyphPen()
In the first example, the glyph is created for the uniname character using font.createChar() and the pen is assigned from the font's list of characters.
In the second example, the gyph is created as before, but the pen is assigned directly and I no longer get the 'index out of bounds' error.
I have no idea why this works, but hope it will help someone else with similar issues.

Related

Can we remove the input function's line length limit purely within Python? [duplicate]

I'm trying to input() a string containing a large paste of JSON.
(Why I'm pasting a large blob of json is outside the scope of my question, but please believe me when I say I have a not-completely-idiotic reason!)
However, input() only grabs the first 4095 characters of the paste, for reasons described in this answer.
My code looks roughly like this:
import json
foo = input()
json.loads(foo)
When I paste a blob of JSON that's longer than 4095 characters, json.loads(foo) raises an error. (The error varies based on the specifics of how the JSON gets cut off, but it invariably fails one way or another because it's missing the final }.)
I looked at the documentation for input(), and it made no mention of anything that looked useful for this issue. No flags to input in non-canonical mode, no alternate input()-style functions to handle larger inputs, etc.
Is there a way to be able to paste large inputs successfully? This would make my tool's workflow way less janky than having to paste into a file, save it somewhere, and then pass the file's location into the script.

Python has to follow the terminal rules. But you could use a system call from python to change terminal behaviour and change it back (Linux):
import subprocess,json
subprocess.check_call(["stty","-icanon"])
result = json.loads(input())
subprocess.check_call(["stty","icanon"])
Alternately, consider trying to get an indented json dump from your provider that you can read line by line, then decode.
data = "".join(sys.stdin.readlines())
result = json.loads(data)

Unicode-strings from xlrd

I'm trying to read some information from an excel-file using the xlrd-module. This works fine most of the time, but when the script encounters any scandinavian letters the script stops. I've been reading several posts about unicode and encoding, but I must admit I'm not familiar with it.
The cell I'm reading contains text (string) and is being read as unicode (as normal with xlrd). One example of a value that fails is Glørmestervej and it is read by xlrd as u'Gl\xf8mestervej. If I try to print the variable, the script stops. I've had most success by encoding the value with latin1:
print cellValue.encode("latin1")
which gives the result Glormestervej, but with a KeyError.
How do I get the variable to become a string with ø instead of \xf8? The reason is that I need to use it as an input to another service and it does not seem to work using unicode.
Regards, Torbjørn

I'm happy to say the problem have been solved, in fact there were not any error after all. There were some permission-issues with the user that I used for calling the service in which the variable was used. Thank you for your response!

python :same character, different behavior

I'm generating file names from a list pulled out from a postgres DB with Python 2.7.9. In this list there are words with special char. Normally I use ''.join() to record the name and fire it to my loader but I have just one name that want be recognized. the .py is set for utf-8 coding, but the words are in Portuguese, I think latin-1 coding.
from pydub import AudioSegment
from pydub.playback import play
templist = ['+ Orégano','- Búfala','+ Rúcola']
count_ins = (len(templist)-1)
while (count_ins >= 0 ):
kot_istructions = AudioSegment.from_ogg('/home/effe/voice_orders/Voz/'+"".join(templist[count_ins])+'.ogg')
count_ins-=1
play(kot_istructions)
The first two files are loaded:
/home/effe/voice_orders/Voz/+ Orégano.ogg
/home/effe/voice_orders/Voz/- Búfala.ogg
The third should be:
/home/effe/voice_orders/Voz/+ Rúcola.ogg
But python is trying to load
/home/effe/voice_orders/Voz/+ R\xc3\xbacola.ogg
Why just this one? I've tried to use normalize() to remove the accent but since this is a string the method didn't work.
Print works well, as db update. Just file name creation doesn't works as expected.
Suggestions?

It seems the root cause might be that the encoding of these names in inconsisitent within your database.
If you run:
>>> 'R\xc3\xbacola'.decode('utf-8')
You get
u'R\xfacola'
which is in fact a Python unicode, correctly representing the name. So, what should you do? Although it's a really unclean programming style, you could play .encode()/.decode() whackamole, where you try to decode the raw string from your db using utf-8, and failing that, latin-1. It would look something like this:
try:
clean_unicode = dirty_string.decode('utf-8')
except UnicodeDecodeError:
clean_unicode = dirty_string.decode('latin-1')
As a general rule, always work with clean unicode objects within your own source, and only convert to an encoding on saving it out. Also, don't let people insert data into a database without specifying the encoding, as that will stop you from having this problem in the first place.
Hope that helps!

Solved: Was a problem with the file. Deleting and build it again do the job.

How to convert python byte string containing a mix of hex characters?

Specifically, I am receiving a stream of bytes from a TCP socket that looks something like this:
inc_tcp_data = b'\x02hello\x1cthisisthedata'
The stream using hex values to denote different parts of the incoming data. However I want to use the inc_data in the following format:
converted_data = '\x02hello\x1cthisisthedata'
essentially I want to get rid of the b and just literally spit out what came in.
I've tried various struct.unpack methods as well as .decode("encoding). I could not get the former to work at all, and the latter would strip out the hex values if there was no visual way to encode it or it would convert it to character if it could. Any ideas?
Update:
I was able to get my desired result with the following code:
inc_tcp_data = b'\x02hello\x3Fthisisthedata'.decode("ascii")
d = repr(inc_tcp_data)
print(d)
print(len(d))
print(len(inc_tcp_data))
the output is:
'\x02hello?thisisthedata'
25
20
however, this still doesn't help me because I do actually need the regular expression that follows to see \x02 as a hex value and not as a 4 byte string.
what am I doing wrong?
UPDATE
I've solved this issue by not solving it. The reason I wanted the hex characters to remain unchanged was so that a regular expression would be able to detect it further down the road. However what I should have done (and did) was simply change the regular expression to analyze the bytes without decoding it. Once I had separated out all the parts via regular expression, I decoded the parts with .decode("ascii") and everything worked out great.
I'm just updating this if it happens to help someone else.

Assuming you are using python 3
>>> inc_tcp_data.decode('ascii')
'\x02hello\x1cthisisthedata'

Python Dict to JSON: json.dumps unicode error, but ord(str(dict)) yields nothing over 128

I have a task where I needed to generate a portable version of a data dictionary, with some extra fields inserted. I ended up building a somewhat large Python dictionary, which I then wanted to convert to JSON. However, when I attempt this conversion...
with open('CPS14_data_dict.json','w') as f:
json.dump(data_dict,f,indent=4,encoding='utf-8')
I get smacked with an exception:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 17: invalid start byte
I know this is a common error, but I cannot, for the life of me, find an instance of how one locates the problematic portion in the dictionary. The only plausible thing I have seen is to convert the dictionary to a string and run ord() on each character:
for i,c in enumerate(str(data_dict)):
if ord(c)>128:
print i,'|',c
The problem is, this operation returns nothing at all. Am I missing something about how ord() works? Alternatively, a position (17) is reported, but it's not clear to me what this refers to. There do not appear to be any problems at the 17th character, row, or entry.
I should say that I know about the ensure_ascii=False option. Indeed, it will write to disk (and it's beautiful). This approach, however, seems to just kick the can down the road. I get the same encoding error when I try to read the file back in. Since I will want to use this file for multiple purposes (converted back to a dictionary), this is an issue.
It would also be helpful to note that this is my work computer with Windows 7, so I don't have my shell tools to explore the file (and my VM is on the fritz).
Any help would be greatly appreciated.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.