Python string "b" prefix (byte literals) - python

I am looking through some unit testing code and I found this:
self.assertIn(b'Hello', res.body)
I know that this means bytes in Python 3 which returns a byte array, as I found here. I believe that the code was written for Python 3.3 and am trying to figure out how it works in other versions (in my case 2.7) The related question that I found had a poorly-written accepted answer with contradictory comments that confused me.
Questions:
In what versions of python does b'myString' "work"?
How does it behave in python 2.x?
How does it behave in python 3.x?
Does that have something to do with the byte literal change?

This is all described in the document you linked.
In what versions of python does b'myString' "work"?: 2.6+.
How does it behave in python 2.x? It creates a bytes literal—which is the exact same thing as a str literal in 2.x.
How does it behave in python 3.x? It creates a bytes literal—which is not the same thing as a str literal in 3.x.
Does that have something to do with the byte literal change? Yes. That's the whole point; it lets you write "future compatible" code—or code that works in both 2.6+ and 3.0+ without 2to3.
Quoting from the first paragraph in the section you linked:
For future compatibility, Python 2.6 adds bytes as a synonym for the str type, and it also supports the b'' notation.
Note that, as it says a few lines down, Python 2.x bytes/str is not exactly the same type as Python 3.x bytes: "most notably, the constructor is completely different". But bytes literals are the same, except in the edge case where you're putting Unicode characters into a bytes literal (which has no defined meaning in 2.x, but does something arbitrary that may sometimes happen to be what you'd hoped, while in 3.x it's a guaranteed SyntaxError).

Related

Odd placeholder or pre-statement on string declaration [duplicate]

Like in:
u'Hello'
My guess is that it indicates "Unicode", is that correct?
If so, since when has it been available?
You're right, see 3.1.3. Unicode Strings.
It's been the syntax since Python 2.0.
Python 3 made them redundant, as the default string type is Unicode. Versions 3.0 through 3.2 removed them, but they were re-added in 3.3+ for compatibility with Python 2 to aide the 2 to 3 transition.
The u in u'Some String' means that your string is a Unicode string.
Q: I'm in a terrible, awful hurry and I landed here from Google Search. I'm trying to write this data to a file, I'm getting an error, and I need the dead simplest, probably flawed, solution this second.
A: You should really read Joel's Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) essay on character sets.
Q: sry no time code pls
A: Fine. try str('Some String') or 'Some String'.encode('ascii', 'ignore'). But you should really read some of the answers and discussion on Converting a Unicode string and this excellent, excellent, primer on character encoding.
My guess is that it indicates "Unicode", is it correct?
Yes.
If so, since when is it available?
Python 2.x.
In Python 3.x the strings use Unicode by default and there's no need for the u prefix. Note: in Python 3.0-3.2, the u is a syntax error. In Python 3.3+ it's legal again to make it easier to write 2/3 compatible apps.
I came here because I had funny-char-syndrome on my requests output. I thought response.text would give me a properly decoded string, but in the output I found funny double-chars where German umlauts should have been.
Turns out response.encoding was empty somehow and so response did not know how to properly decode the content and just treated it as ASCII (I guess).
My solution was to get the raw bytes with 'response.content' and manually apply decode('utf_8') to it. The result was schöne Umlaute.
The correctly decoded
für
vs. the improperly decoded
fĂźr
All strings meant for humans should use u"".
I found that the following mindset helps a lot when dealing with Python strings: All Python manifest strings should use the u"" syntax. The "" syntax is for byte arrays, only.
Before the bashing begins, let me explain. Most Python programs start out with using "" for strings. But then they need to support documentation off the Internet, so they start using "".decode and all of a sudden they are getting exceptions everywhere about decoding this and that - all because of the use of "" for strings. In this case, Unicode does act like a virus and will wreak havoc.
But, if you follow my rule, you won't have this infection (because you will already be infected).

SQL Alchemy : what is the 'u' in the selection query [duplicate]

Like in:
u'Hello'
My guess is that it indicates "Unicode", is that correct?
If so, since when has it been available?
You're right, see 3.1.3. Unicode Strings.
It's been the syntax since Python 2.0.
Python 3 made them redundant, as the default string type is Unicode. Versions 3.0 through 3.2 removed them, but they were re-added in 3.3+ for compatibility with Python 2 to aide the 2 to 3 transition.
The u in u'Some String' means that your string is a Unicode string.
Q: I'm in a terrible, awful hurry and I landed here from Google Search. I'm trying to write this data to a file, I'm getting an error, and I need the dead simplest, probably flawed, solution this second.
A: You should really read Joel's Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) essay on character sets.
Q: sry no time code pls
A: Fine. try str('Some String') or 'Some String'.encode('ascii', 'ignore'). But you should really read some of the answers and discussion on Converting a Unicode string and this excellent, excellent, primer on character encoding.
My guess is that it indicates "Unicode", is it correct?
Yes.
If so, since when is it available?
Python 2.x.
In Python 3.x the strings use Unicode by default and there's no need for the u prefix. Note: in Python 3.0-3.2, the u is a syntax error. In Python 3.3+ it's legal again to make it easier to write 2/3 compatible apps.
I came here because I had funny-char-syndrome on my requests output. I thought response.text would give me a properly decoded string, but in the output I found funny double-chars where German umlauts should have been.
Turns out response.encoding was empty somehow and so response did not know how to properly decode the content and just treated it as ASCII (I guess).
My solution was to get the raw bytes with 'response.content' and manually apply decode('utf_8') to it. The result was schöne Umlaute.
The correctly decoded
für
vs. the improperly decoded
fĂźr
All strings meant for humans should use u"".
I found that the following mindset helps a lot when dealing with Python strings: All Python manifest strings should use the u"" syntax. The "" syntax is for byte arrays, only.
Before the bashing begins, let me explain. Most Python programs start out with using "" for strings. But then they need to support documentation off the Internet, so they start using "".decode and all of a sudden they are getting exceptions everywhere about decoding this and that - all because of the use of "" for strings. In this case, Unicode does act like a virus and will wreak havoc.
But, if you follow my rule, you won't have this infection (because you will already be infected).

GhPython vs Python: TypeErrorException from GH but not from Shell

I am using a library called py_openshowvar to communicate with a Kuka robot arm from grasshopper.
Everything works fine if I run the program from a command line python shell. However, when I run the same thing from GhPython, I get the following:
Not sure why I get the exceptions with GhPython but not when I run it outside the GH environment. The program still connects to the server and retrieves/sends the info I need, but I want to make sure and address this exception.
Thanks!
It is hard to tell you how to fix the error as you do not provide the code that triggers it, but in substance it comes from the fact that GHPython is IronPython (an implementation of Python based on .Net) whereas Python Shell is an implementation written in C.
The two implementations are similar but you sometimes hit a difference.
In your case the script is expecting a string or tuple but gets an IronPython.Runtime.Bytes.
Hmm, got bytes when expecting str looks like a unicode string vs byte string problem. You do no describe what are the versions of your CPython and GHPython, but you should know that Python 2 strings are byte strings while Python 3 ones are unicode strings.
If in Python 2 you can force a litteral string to be unicode by prepending it with u: u"foo" is a unicode string. You can also decode a byte string to its unicode version: b'ae\xe9\xe8'.decode('Latin1') is the unicode string u'aeéè'

how to convert Python 2 unicode() function into correct Python 3.x syntax

I enabled the compatibility check in my Python IDE and now I realize that the inherited Python 2.7 code has a lot of calls to unicode() which are not allowed in Python 3.x.
I looked at the docs of Python2 and found no hint how to upgrade:
I don't want to switch to Python3 now, but maybe in the future.
The code contains about 500 calls to unicode()
How to proceed?
Update
The comment of user vaultah to read the pyporting guide has received several upvotes.
My current solution is this (thanks to Peter Brittain):
from builtins import str
... I could not find this hint in the pyporting docs.....
As has already been pointed out in the comments, there is already advice on porting from 2 to 3.
Having recently had to port some of my own code from 2 to 3 and maintain compatibility for each for now, I wholeheartedly recommend using python-future, which provides a great tool to help update your code (futurize) as well as clear guidance for how to write cross-compatible code.
In your specific case, I would simply convert all calls to unicode to use str and then import str from builtins. Any IDE worth its salt these days will do that global search and replace in one operation.
Of course, that's the sort of thing futurize should catch too, if you just want to use automatic conversion (and to look for other potential issues in your code).
You can test whether there is such a function as unicode() in the version of Python that you're running. If not, you can create a unicode() alias for the str() function, which does in Python 3 what unicode() did in Python 2, as all strings are unicode in Python 3.
# Python 3 compatibility hack
try:
unicode('')
except NameError:
unicode = str
Note that a more complete port is probably a better idea; see the porting guide for details.
Short answer: Replace all unicode calls with str calls.
Long answer: In Python 3, Unicode was replaced with strings because of its abundance. The following solution should work if you are only using Python 3:
unicode = str
# the rest of your goes goes here
If you are using it with both Python 2 or Python 3, use this instead:
import sys
if sys.version_info.major == 3:
unicode = str
# the rest of your code goes here
The other way: run this in the command line
$ 2to3 package -w
First, as a strategy, I would take a small part of your program and try to port it. The number of unicode calls you are describing suggest to me that your application cares about string representations more than most and each use-case is often different.
The important consideration is that all strings are unicode in Python 3. If you are using the str type to store "bytes" (for example, if they are read from a file), then you should be aware that those will not be bytes in Python3 but will be unicode characters to begin with.
Let's look at a few cases.
First, if you do not have any non-ASCII characters at all and really are not using the Unicode character set, it is easy. Chances are you can simply change the unicode() function to str(). That will assure that any object passed as an argument is properly converted. However, it is wishful thinking to assume it's that easy.
Most likely, you'll need to look at the argument to unicode() to see what it is, and determine how to treat it.
For example, if you are reading UTF-8 characters from a file in Python 2 and converting them to Unicode your code would look like this:
data = open('somefile', 'r').read()
udata = unicode(data)
However, in Python3, read() returns Unicode data to begin with, and the unicode decoding must be specified when opening the file:
udata = open('somefile', 'r', encoding='UTF-8').read()
As you can see, transforming unicode() simply when porting may depend heavily on how and why the application is doing Unicode conversions, where the data has come from, and where it is going to.
Python3 brings greater clarity to string representations, which is welcome, but can make porting daunting. For example, Python3 has a proper bytes type, and you convert byte-data to unicode like this:
udata = bytedata.decode('UTF-8')
or convert Unicode data to character form using the opposite transform.
bytedata = udata.encode('UTF-8')
I hope this at least helps determine a strategy.
You can use six library which have text_type function (unicode in py2, str in py3):
from six import text_type

What's the u prefix in a Python string?

Like in:
u'Hello'
My guess is that it indicates "Unicode", is that correct?
If so, since when has it been available?
You're right, see 3.1.3. Unicode Strings.
It's been the syntax since Python 2.0.
Python 3 made them redundant, as the default string type is Unicode. Versions 3.0 through 3.2 removed them, but they were re-added in 3.3+ for compatibility with Python 2 to aide the 2 to 3 transition.
The u in u'Some String' means that your string is a Unicode string.
Q: I'm in a terrible, awful hurry and I landed here from Google Search. I'm trying to write this data to a file, I'm getting an error, and I need the dead simplest, probably flawed, solution this second.
A: You should really read Joel's Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) essay on character sets.
Q: sry no time code pls
A: Fine. try str('Some String') or 'Some String'.encode('ascii', 'ignore'). But you should really read some of the answers and discussion on Converting a Unicode string and this excellent, excellent, primer on character encoding.
My guess is that it indicates "Unicode", is it correct?
Yes.
If so, since when is it available?
Python 2.x.
In Python 3.x the strings use Unicode by default and there's no need for the u prefix. Note: in Python 3.0-3.2, the u is a syntax error. In Python 3.3+ it's legal again to make it easier to write 2/3 compatible apps.
I came here because I had funny-char-syndrome on my requests output. I thought response.text would give me a properly decoded string, but in the output I found funny double-chars where German umlauts should have been.
Turns out response.encoding was empty somehow and so response did not know how to properly decode the content and just treated it as ASCII (I guess).
My solution was to get the raw bytes with 'response.content' and manually apply decode('utf_8') to it. The result was schöne Umlaute.
The correctly decoded
für
vs. the improperly decoded
fĂźr
All strings meant for humans should use u"".
I found that the following mindset helps a lot when dealing with Python strings: All Python manifest strings should use the u"" syntax. The "" syntax is for byte arrays, only.
Before the bashing begins, let me explain. Most Python programs start out with using "" for strings. But then they need to support documentation off the Internet, so they start using "".decode and all of a sudden they are getting exceptions everywhere about decoding this and that - all because of the use of "" for strings. In this case, Unicode does act like a virus and will wreak havoc.
But, if you follow my rule, you won't have this infection (because you will already be infected).

Categories