So I am looking at code that I wrote perhaps as much as 4 years ago, and I know it ran correctly at the time. But now I am trying to run it, on a different computer than when I wrote it years ago and I am getting an error now. (Tried it today on both Windows 10 and Ubuntu)
I am using Python 2.7, as I was then too. I am using the Struct library to unpack C types from files, specifically I am trying to unpack 4 byte long values.
Here is the documentation for the 2.7 struct library. https://docs.python.org/2/library/struct.html
If you scroll down to the "Format Characters" section, you can see a table of the C types.
Here is my code:
bps = int(unpack('L', fmap[o+10:o+14])[0])
And here is the error I get.
error: unpack requires a string argument of length 8
The part that is confusing me is the "length 8" part. If I change the C type to "I" the code executes fine. But the documentation seems clear that "L" is 4 byte as well, and it worked in the past. I think I can use the "I" type for my purposes, but I am curious if any one else has seen this.
The standard size of L is 4 bytes, but that is only used if you are explicit about the endianness to use by beginning the format string with >, <, !, or =. Otherwise, the platform-dependent native size for your machine is used. (In this case, 8 bytes.)
Related
I am using a library called py_openshowvar to communicate with a Kuka robot arm from grasshopper.
Everything works fine if I run the program from a command line python shell. However, when I run the same thing from GhPython, I get the following:
Not sure why I get the exceptions with GhPython but not when I run it outside the GH environment. The program still connects to the server and retrieves/sends the info I need, but I want to make sure and address this exception.
Thanks!
It is hard to tell you how to fix the error as you do not provide the code that triggers it, but in substance it comes from the fact that GHPython is IronPython (an implementation of Python based on .Net) whereas Python Shell is an implementation written in C.
The two implementations are similar but you sometimes hit a difference.
In your case the script is expecting a string or tuple but gets an IronPython.Runtime.Bytes.
Hmm, got bytes when expecting str looks like a unicode string vs byte string problem. You do no describe what are the versions of your CPython and GHPython, but you should know that Python 2 strings are byte strings while Python 3 ones are unicode strings.
If in Python 2 you can force a litteral string to be unicode by prepending it with u: u"foo" is a unicode string. You can also decode a byte string to its unicode version: b'ae\xe9\xe8'.decode('Latin1') is the unicode string u'aeéè'
I enabled the compatibility check in my Python IDE and now I realize that the inherited Python 2.7 code has a lot of calls to unicode() which are not allowed in Python 3.x.
I looked at the docs of Python2 and found no hint how to upgrade:
I don't want to switch to Python3 now, but maybe in the future.
The code contains about 500 calls to unicode()
How to proceed?
Update
The comment of user vaultah to read the pyporting guide has received several upvotes.
My current solution is this (thanks to Peter Brittain):
from builtins import str
... I could not find this hint in the pyporting docs.....
As has already been pointed out in the comments, there is already advice on porting from 2 to 3.
Having recently had to port some of my own code from 2 to 3 and maintain compatibility for each for now, I wholeheartedly recommend using python-future, which provides a great tool to help update your code (futurize) as well as clear guidance for how to write cross-compatible code.
In your specific case, I would simply convert all calls to unicode to use str and then import str from builtins. Any IDE worth its salt these days will do that global search and replace in one operation.
Of course, that's the sort of thing futurize should catch too, if you just want to use automatic conversion (and to look for other potential issues in your code).
You can test whether there is such a function as unicode() in the version of Python that you're running. If not, you can create a unicode() alias for the str() function, which does in Python 3 what unicode() did in Python 2, as all strings are unicode in Python 3.
# Python 3 compatibility hack
try:
unicode('')
except NameError:
unicode = str
Note that a more complete port is probably a better idea; see the porting guide for details.
Short answer: Replace all unicode calls with str calls.
Long answer: In Python 3, Unicode was replaced with strings because of its abundance. The following solution should work if you are only using Python 3:
unicode = str
# the rest of your goes goes here
If you are using it with both Python 2 or Python 3, use this instead:
import sys
if sys.version_info.major == 3:
unicode = str
# the rest of your code goes here
The other way: run this in the command line
$ 2to3 package -w
First, as a strategy, I would take a small part of your program and try to port it. The number of unicode calls you are describing suggest to me that your application cares about string representations more than most and each use-case is often different.
The important consideration is that all strings are unicode in Python 3. If you are using the str type to store "bytes" (for example, if they are read from a file), then you should be aware that those will not be bytes in Python3 but will be unicode characters to begin with.
Let's look at a few cases.
First, if you do not have any non-ASCII characters at all and really are not using the Unicode character set, it is easy. Chances are you can simply change the unicode() function to str(). That will assure that any object passed as an argument is properly converted. However, it is wishful thinking to assume it's that easy.
Most likely, you'll need to look at the argument to unicode() to see what it is, and determine how to treat it.
For example, if you are reading UTF-8 characters from a file in Python 2 and converting them to Unicode your code would look like this:
data = open('somefile', 'r').read()
udata = unicode(data)
However, in Python3, read() returns Unicode data to begin with, and the unicode decoding must be specified when opening the file:
udata = open('somefile', 'r', encoding='UTF-8').read()
As you can see, transforming unicode() simply when porting may depend heavily on how and why the application is doing Unicode conversions, where the data has come from, and where it is going to.
Python3 brings greater clarity to string representations, which is welcome, but can make porting daunting. For example, Python3 has a proper bytes type, and you convert byte-data to unicode like this:
udata = bytedata.decode('UTF-8')
or convert Unicode data to character form using the opposite transform.
bytedata = udata.encode('UTF-8')
I hope this at least helps determine a strategy.
You can use six library which have text_type function (unicode in py2, str in py3):
from six import text_type
I have a Python2 codebase that makes extensive use of str to store raw binary data. I want to support both Python2 and Python3.
The bytes (an alis of str) type in Python2 and bytes in Python3 are completely different. They take different arguments to construct, index to different types and have different str and repr.
What's the best way of unifying the code for both Python versions, using a single type to store raw data?
The python-future package has a backport of the Python3 bytes type.
>>> from builtins import bytes # in py2, this picks up the backport
>>> b = bytes(b'ABCD')
This provides the Python 3 interface in both Python 2 and Python 3. In Python 3, it is the builtin bytes type. In Python 2, it is a compatibility layer on top of the str type.
I don't know on what parts you want to work with bytes, I allmost allways work with bytearray's, and this is how I do it when reading from a file
with open(file, 'rb') as imageFile:
f = imageFile.read()
b = bytearray(f)
I took that right out of a project I am working on, and it works in both 2 and 3. Maybe something for you to look at?
If your project small and simple use six.
Otherwise I suggest to have two independent codebases: one for Python 2 and one for Python 3. Initially it may sound like a lot of unnecessary work, but eventually it's actually a lot easier to maintain.
As an example of what your project may become if you decide to support both pythons in a single codebase, take a look at google's protobuf. Lots of often counterintuitive branching all round the code, abstractions that were modified just to allow hacks. And as your project will evolve it won't get better: deadlines play against quality of the code.
With two separate codebases you will simply apply almost identical patches which isn't a lot of work compared to what is ahead of you if you want a single code base. And it will be easier to migrate to Python 3 completely once number of Python 2 users of your package drop.
Assuming you only need to support Python 2.6 and newer, you can simply use bytes for, well, bytes. Use b literals to create bytes objects, such as b'\x0a\x0b\x00'. When working with files, make sure the mode includes a b (as in open('file.bin', 'rb')).
Beware that iteration and element access is different though. In these cases, you can write your code to use chunks. Instead of b[0] == 0 (Python 3) or b[0] == b'\x00' (Python 2) write b[0:1] == b'\x00'. Other options is using bytearray (when the bytes are mutable) or helper functions.
Strings of characters should be unicode in Python 2, independent from Python 3 porting; otherwise the code would likely be wrong when encountering non-ASCII characters anyways. The equivalent is str in Python 3.
Either use u literals to create character strings (such as u'Düsseldorf') and/or make sure to start every file with from __future__ import unicode_literals. Declare file encodings when necessary by starting files with # encoding: utf-8.
Use io.open to read character strings from files. For network code, fetch bytes and call decode on them to get a character string.
If you need to support Python 2.5 or 3.2, have a look at six to convert literals.
Add plenty of assertions to make sure you that functions which operate on character strings don't get bytes, and vice versa. As usual, a good test suite with 100% coverage helps a lot.
I have a python units package. I want to have both Å and angstrom be two aliases for angstroms, so that people can use whichever one they prefer (Å is easier to read but angstrom is easier to type). Since unicode identifiers are forbidden in Python 2, the Å option will obviously only be available in Python 3. My question is: Is there any way to have a single source file that works in both Python 2 and Python 3, and has this variable defined in Python 3 only?
The naive solution if sys.version_info >= (3,): Å = angstrom does not work, because Python 2 raises a syntax error.
Normally, I don't encourage anything other than ascii characters for variable names... However, this is an interesting idea since the angstrom symbol actual has it's standard meaning in this context so I guess I'm cool with it this time :-).
I think you should be able to accomplish this with:
globals()['Å'] = angstrom
and this will "work" on both python2.x and python3.x. Of course, your python2.x users won't be able to reference it in their code without reverting to weird hacks like getattr(units, 'Å'), but it won't throw an error either which is the point of the question (I think).
I have a C pipe client (pulled directly from the CallNamedPipe example found here) that, if given the string "first", sends the following message to my Python pipeserver:
b'f\x00i\x00r\x00s\x00t\x00\x00\x00'
The struct documentation gives examples where I did both the packing and unpacking in Python. That means I know the format because I explicitly specified it when I called struct.pack.
Is there some way for me to either a) infer the format from the above output or b) set the format in C the same way I do in Python?
Here's the relevant client code:
LPTSTR lpszPipename = TEXT("\\\\.\\pipe\\testpipe");
LPTSTR lpszWrite = TEXT("first");
fSuccess = CallNamedPipe(
lpszPipename, // pipe name
lpszWrite, // message to server
...
>>> b'f\x00i\x00r\x00s\x00t\x00\x00\x00'.decode('utf-16le')
u'first\x00'
"The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"
Your C code is not writing a struct to the pipe, it is writing a null-terminated string encoded as little-endian UTF-16 text, which is produced by the TEXT() macro when you compile your Windows program in Unicode mode for an Intel CPU. Python knows how to decode these strings without using the struct module. Try this:
null_terminated_unicode_string = data.decode('utf-16le')
unicode_string = null_terminated_unicode_string[:-1]
You can use decode('utf-16') if your python code is running on the same CPU architecture as the C program that writes the data. You might want to read up on python's unicode codecs.
EDIT: You can infer the type of that data by knowing how UTF-16 and Windows string macros work, but python cannot infer it. You could set the string encoding in C the same way you would in python if you wanted to write some code to do so, but it's probably not worth your time.