Converting to Precomposed Unicode String using Python-AppKit-ObjectiveC

Converting to Precomposed Unicode String using Python-AppKit-ObjectiveC - python

This document by Apple Technical Q&A QA1235 describes a way to convert unicode strings from a composed to a decomposed version. Since I have a problem with file names containing some characters (e.g. an accent grave), I'd like to try the conversion function
void CFStringNormalize(CFMutableStringRef theString,
CFStringNormalizationForm theForm);
I am using this with Python and the AppKit library. If i pass a Python String as an argument, I get:
CoreFoundation.CFStringNormalize("abc",0)
2009-04-27 21:00:54.314 Python[4519:613] * -[OC_PythonString _cfNormalize:]: unrecognized selector sent to instance 0x1f02510
Traceback (most recent call last):
File "", line 1, in
ValueError: NSInvalidArgumentException - * -[OC_PythonString _cfNormalize:]: unrecognized selector sent to instance 0x1f02510
I suppose this is because a CFMutableStringRef is needed as an argument. How do I convert a Python String to CFMutableStringRef?

OC_PythonString (which is what Python strings are bridged to) is an NSString subclass, so you could get an NSMutableString with:
mutableString = NSMutableString.alloc().initWithString_("abc")
then use mutableString as the argument to CFStringNormalize.

Related

Expressions with Python 3.6 'f' strings not compatible with `.format` or `.format_map`

I discovered that .format_map and .format are not compatible with Python 3.6 f strings, in that the native f prefix allows complex expressions (such as slices, and function calls), but doing .format_map doesn't allow complex expressions, for example:
>>> version = '1.13.8.10'
>>> f'example-{".".join(version.split(".")[:3])}'
'example-1.13.8'
>>> 'example-{".".join(version.split(".")[:3])}'.format_map(dict(version=version))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: '"'
'"'
>>> 'example-{".".join(version.split(".")[:3])}'.format(version=version)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: '"'
'"'
I really want to be able to expose the full capability of Python f strings via a configuration file, where the user supplies a string which may contain non-trivial {...} segments which reference the same file and optionally do some basic data manipulation, e.g. within a YAML file I post-process a subset of keys using .format so they can reference variables from within the same config file.
This works fine for simple variables, e.g. {version} where the YAML file is a dictionary with a version key and I pass the dict in as an argument to .format_map, but throws KeyError with more complicated expressions (as shown above).
There must be a way to get the same functionality as f strings... I thought .format_map was it... but it doesn't offer complex expressions...

You are looking for eval
>>> def format_map_eval(string, mapping):
... return eval(f'f{string!r}', mapping)
...
>>> version = '1.13.8.10'
>>> some_config_string = 'example-{".".join(version.split(".")[:3])}'
>>> format_map_eval(some_config_string, dict(version=version))
'example-1.13.8'
This at least is explicit about you providing this.
The key feature of f-strings is that they evaluate arbitrary expressions inside their formatting brackets. If you want a function call that does this, you are asking for eval.
Now that I think about it, I'm not sure this is safe or portable across implementations because there is no guarantee about the repr of str as far as I know.

f-string format specifier with None throws TypeError

Using plain f-strings with a NoneType object works:
>>> a = None
>>> f'{a}'
'None'
However, when using a format specifier, it breaks---as does str.format():
>>> f'{a:>6}'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported format string passed to NoneType.__format__
>>> '{:>6}'.format(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported format string passed to NoneType.__format__
Unexpectedly, (for me, at least) the old C-style string formatting works:
>>> '%10s' % a
' None'
What is going on here? I don't understand why f'{a:>6}' doesn't evaluate to ' None'. Why should a format specifier break it?
Is this a bug in python? If it is a bug, how would I fix it?

None is not a string, so f'{None:>6}' makes no sense. You can convert it to a string with f'{None!s:>6}'. !a, !s, and !r call ascii(), str(), and repr() respectively on an object.

None doesn't support format specifiers. It's up to each object type to determine how it wants to handle format specifiers, and the default is to reject them:
The __format__ method of object itself raises a TypeError if passed any non-empty string.
None inherits this default.
You seem to be expecting None to handle format specifiers the same way strings do, where '{:>6}'.format('None') == ' None'. It kind of sounds like you expect all types to handle format specifiers the way strings do, or you expect the string behavior to be the default. The way strings handle format specifiers is specific to strings; other types have their own handling.
You might be thinking, hey, why doesn't %10s fail too? First, the s requests that the argument be converted to a string by str before any further processing. Second, all conversion specifier handling in printf-style string formatting is performed by str.__mod__; it never delegates to the arguments to figure out what a conversion specifier means.

The accepted answer above explains why. A solution that I have used effectively is something along the lines of:
f"{mystring:.2f}" if mystring is not None else ""

Why does ord() fail when porting from Python 2 to Python 3?

I am trying to port a Python library called heroprotocol from Python 2 to Python 3. This library is used to parse replay files from an online game called Heroes of the Storm, for the purpose of getting data from the file (i.e. who played against who, when did they die, when did the game end, who won, etc).
It seems that this library was created for Python 2, and since I am using Python 3 (specifically Anaconda, Jupyter notebook) I would like to convert it to Python 3.
The specific issue I am having is that when I run
header = protocol.decode_replay_header(mpq.header['user_data_header']['content'])
which should get some basic data about the replay file, I get this error:
TypeError: ord() expected string of length 1, but int found
I googled the ord() function and found a few posts about the usage of ord() in Python 3, but none of them solved the issue I am having. I also tried posting in the "Issues" section on Github, but I got no response yet.
Why am I seeing this error?

According to the issue you raised, the exception occurs on line 69 of decoders.py:
self._next = ord(self._data[self._used])
The obvious reason this would succeed in Python 2 but fail in Python 3 is that self._data is a bytestring. In Python 2, bytestrings are the "standard" string objects, so that indexing into one returns the character at that position (itself a string) …
# Python 2.7
>>> b'whatever'[3]
't'
… and calling ord() on the result behaves as expected:
>>> ord(b'whatever'[3])
116
However, in Python 3, everything is different: the standard string object is a Unicode string, and bytestrings are instead sequences of integers. Because of this, indexing into a bytestring returns the relevant integer directly …
# Python 3.6
>>> b'whatever'[3]
116
… so calling ord() on that integer makes no sense:
>>> ord(b'whatever'[3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ord() expected string of length 1, but int found
So, you ought to be able to prevent the specific exception you're asking about here by simply removing the call to ord() on that and similar lines:
self._next = self._data[self._used]
… although of course it's likely that further problems (out of scope for this question) will be revealed as a result.

Python formatted string literals for objects

one of a very cool new feature of Python3.6 is the implementation of Formatted string literals (https://docs.python.org/3.6/whatsnew/3.6.html#whatsnew36-pep498).
Unfortunately, it does not behave like the well known format() function:
>> a="abcd"
>> print(f"{a[:2]}")
>> 'ab'
As you see, slicing is possible (actually all python functions on string).
But format() will not work with slicing:
>> print("{a[:2]}".format(a="abcd")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string indices must be integers
Is there a way to get the functionality of the new formatted string literals on string objects??
>> string_object = "{a[:2]}" # may also be comming from a file
>> # some way to get the result 'ab' with 'string_object'

The str.format syntax does not and will not support the full range of expressions that the newer f-strings will. You'll have to manually evaluate the slice expression outside of the string and supply it to the format function instead:
a = "abcd"
string_object = "{a}".format(a = a[:2])
It should also be noted there are subtle differences between the syntax allowed by f-strings and str.format, so that the former is not strictly a superset of the latter.

Nope, str.format tries to cast the indexes to a str first before applying them, that's why you get that error; it tries to index the string with str indices:
a = "abcd"
>>> a[:'2']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: slice indices must be integers or None or have an __index__ method
It really isn't meant for cases like that; "{a[::]}".format(a=a) would probably be evaluated as a[:':'] too I'd guess.
This is one of the reasons f-strings came about, in order to support any Python expressions' desire to be formatted.

base64.encodestring failing in python 3

The following piece of code runs successfully on a python 2 machine:
base64_str = base64.encodestring('%s:%s' % (username,password)).replace('\n', '')
I am trying to port it over to Python 3 but when I do so I encounter the following error:
>>> a = base64.encodestring('{0}:{1}'.format(username,password)).replace('\n','')
Traceback (most recent call last):
File "/auto/pysw/cel55/python/3.4.1/lib/python3.4/base64.py", line 519, in _input_type_check
m = memoryview(s)
TypeError: memoryview: str object does not have the buffer interface
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/auto/pysw/cel55/python/3.4.1/lib/python3.4/base64.py", line 548, in encodestring
return encodebytes(s)
File "/auto/pysw/cel55/python/3.4.1/lib/python3.4/base64.py", line 536, in encodebytes
_input_type_check(s)
File "/auto/pysw/cel55/python/3.4.1/lib/python3.4/base64.py", line 522, in _input_type_check
raise TypeError(msg) from err
TypeError: expected bytes-like object, not str
I tried searching examples for encodestring usage but not able to find a good document. Am I missing something obvious? I am running this on RHEL 2.6.18-371.11.1.el5

You can encode() the string (to convert it to byte string) , before passing it into base64.encodestring . Example -
base64_str = base64.encodestring(('%s:%s' % (username,password)).encode()).decode().strip()

To expand on Anand's answer (which is quite correct), Python 2 made little distinction between "Here's a string which I want to treat like text" and "Here's a string which I want to treat like a sequence of 8-bit byte values". Python 3 firmly distinguishes the two, and doesn't let you mix them up: the former is the str type, and the latter is the bytes type.
When you Base64 encode a string, you're not actually treating the string as text, you're treating it as a series of 8-bit byte values. That's why you're getting an error from base64.encodestring() in Python 3: because that is an operation that deals with the string's characters as 8-bit bytes, and so you should pass it a paramter of type bytes rather than a parameter of type str.
Therefore, to convert your str object into a bytes object, you have to call its encode() method to turn it into a set of 8-bit byte values, in whatever Unicode encoding you have chosen to use. (Which should be UTF-8 unless you have a very specific reason to choose something else, but that's another topic).

In Python 3 encodestring docs says:
def encodestring(s):
"""Legacy alias of encodebytes()."""
import warnings
warnings.warn("encodestring() is a deprecated alias, use encodebytes()", DeprecationWarning, 2)
return encodebytes(s)
Here is working code for Python 3.5.1, it also shows how to url encode:
def _encodeBase64(consumer_key, consumer_secret):
"""
:type consumer_key: str
:type consumer_secret: str
:rtype str
"""
# 1. URL encode the consumer key and the consumer secret according to RFC 1738.
dummy_param_name = 'bla'
key_url_encoded = urllib.parse.urlencode({dummy_param_name: consumer_key})[len(dummy_param_name) + 1:]
secret_url_encoded = urllib.parse.urlencode({dummy_param_name: consumer_secret})[len(dummy_param_name) + 1:]
# 2. Concatenate the encoded consumer key, a colon character “:”, and the encoded consumer secret into a single string.
credentials = '{}:{}'.format(key_url_encoded, secret_url_encoded)
# 3. Base64 encode the string from the previous step.
bytes_base64_encoded_credentials = base64.encodebytes(credentials.encode('utf-8'))
return bytes_base64_encoded_credentials.decode('utf-8').replace('\n', '')
(I am sure it could be more concise, I am new to Python...)
Also see: http://pythoncentral.io/encoding-and-decoding-strings-in-python-3-x/

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting to Precomposed Unicode String using Python-AppKit-ObjectiveC - python

OC_PythonString (which is what Python strings are bridged to) is an NSString subclass, so you could get an NSMutableString with: mutableString = NSMutableString.alloc().initWithString_("abc") then use mutableString as the argument to CFStringNormalize.

Related

Expressions with Python 3.6 'f' strings not compatible with `.format` or `.format_map`

f-string format specifier with None throws TypeError

Why does ord() fail when porting from Python 2 to Python 3?

Python formatted string literals for objects

base64.encodestring failing in python 3

Categories

Resources