How is the conversion in the python struct module done?

How is the conversion in the python struct module done? - python

I need to unpack information in python from a C Structure,
doing it by the following code:
struct.unpack_from('>I', file.read(4))[0]
and afterwards, writing changed values back:
new_value = struct.pack('>I', 008200)
file.write(new_value)
a few examples:
008200 returns an syntaxerror: invalid token.
000010 is written into: 8
000017 is written into: 15
000017 returns a syntaxerror.
I have no idea what kind of conversion that is.
Any kind of help would be great.

This is invalid python code and is not related to the struct module. In python, numbers starting with a zero are octal (base 8). So, python tries to decode 008200 in octal but '8' isn't valid. Assuming you wanted decimal, use 8200. If you wanted hex, use 0x8200.

Related

python pack integer as hex byte

I thought this would be simple, but spent quite some time trying to figure it out.
I want to convert an integer into a byte string, and display in hex format. But I seem to get the ascii representation? Specifically, for int value of 122.
from struct import *
pack("B",122) #this returns b'z', what i need is 'b\x7A'
pack("B",255) #this returns b'\xff', which is fine.
I know in python 2.x you can use something like chr() but not in python 3, which is what I have. Ideally the solution would work in both.

You can use codecs or string encoding
codecs.encode(pack("B",122),"hex")
or
a = pack("B",122)
a.encode("hex")

I think you are getting the results you desire, and that whatever you are using to look at your results is causing the confusion. Try running this code:
from struct import *
x = pack("B",122)
assert 123 == x[0] + 1
You will discover that it works as expected and does not assert.

Big Binary Code into File in Python

I have been working on a program and I have been trying to convert a big binary file (As a string) and pack it into a file. I have tried for days to make such thing possible. Here is the code I had written to pack the large binary string.
binaryRecieved="11001010101....(Shortened)"
f=open(fileName,'wb')
m=long(binaryRecieved,2)
struct.pack('i',m)
f.write(struct.pack('i',m))
f.close()
quit()
I am left with the error
struct.pack('i',x)
struct.error: integer out of range for 'i' format code
My integer is out of range, so I was wondering if there is a different way of going about with this.
Thanks

Convert your bit string to a byte string: see for example this question Converting bits to bytes in Python. Then pack the bytes with struct.pack('c', bytestring)

For encoding m in big-endian order (like "ten" being written as "10" in normal decimal use) use:
def as_big_endian_bytes(i):
out=bytearray()
while i:
out.append(i&0xff)
i=i>>8
out.reverse()
return out
For encoding m in little-endian order (like "ten" being written as "01" in normal decimal use) use:
def as_little_endian_bytes(i):
out=bytearray()
while i:
out.append(i&0xff)
i=i>>8
return out
both functions work on numbers - like you do in your question - so the returned bytearray may be shorter than expected (because for numbers leading zeroes do not matter).
For an exact representation of a binary-digit-string (which is only possible if its length is dividable by 8) you would have to do:
def as_bytes(s):
assert len(s)%8==0
out=bytearray()
for i in range(0,len(s)-8,8):
out.append(int(s[i:i+8],2))
return out

In struct.pack you have used 'i' which represents an integer number, which is limited. As your code states, you have a long output; thus, you may want to use 'd' in stead of 'i', to pack your data up as double. It should work.
See Python struct for more information.

Wrong math with Python?

Just starting out with Python, so this is probably my mistake, but...
I'm trying out Python. I like to use it as a calculator, and I'm slowly working through some tutorials.
I ran into something weird today. I wanted to find out 2013*2013, but I wrote the wrong thing and wrote 2013*013, and got this:
>>> 2013*013
22143
I checked with my calculator, and 22143 is the wrong answer! 2013 * 13 is supposed to be 26169.
Why is Python giving me a wrong answer? My old Casio calculator doesn't do this...

Because of octal arithmetic, 013 is actually the integer 11.
>>> 013
11
With a leading zero, 013 is interpreted as a base-8 number and 1*81 + 3*80 = 11.
Note: this behaviour was changed in python 3. Here is a particularly appropriate quote from PEP 3127
The default octal representation of integers is silently confusing to
people unfamiliar with C-like languages. It is extremely easy to
inadvertently create an integer object with the wrong value, because
'013' means 'decimal 11', not 'decimal 13', to the Python language
itself, which is not the meaning that most humans would assign to this
literal.

013 is an octal integer literal (equivalent to the decimal integer literal 11), due to the leading 0.
>>> 2013*013
22143
>>> 2013*11
22143
>>> 2013*13
26169
It is very common (certainly in most of the languages I'm familiar with) to have octal integer literals start with 0 and hexadecimal integer literals start with 0x. Due to the exact confusion you experienced, Python 3 raises a SyntaxError:
>>> 2013*013
File "<stdin>", line 1
2013*013
^
SyntaxError: invalid token
and requires either 0o or 0O instead:
>>> 2013*0o13
22143
>>> 2013*0O13
22143

Python's 'leading zero' syntax for octal literals is a common gotcha:
Python 2.7.3
>>> 010
8
The syntax was changed in Python 3.x http://docs.python.org/3.0/whatsnew/3.0.html#integers

This is mostly just expanding on #Wim's answer a bit, but Python indicates the base of integer literals using certain prefixes. Without a prefix, integers are interpreted as being in base-10. With an "0x", the integer will be interpreted as a hexadecimal int. The full grammar specification is here, though it's a bit tricky to understand if you're not familiar with formal grammars: http://docs.python.org/2/reference/lexical_analysis.html#integers
The table essentially says that if you want a long value (i.e. one that exceeds the capacity of a normal int), write the number followed by the letter "L" or "l"; if you want your number to be interpreted in decimal, write the number normally (with no leading 0); if you want it interpreted in octal, prefix it with "0", "0o", or "0O"; if you want it in hex, prefix it with "0x"; and if you want it in binary, prefix it with "0b" or "0B".

Decode base64 string in python 3 (with lxml or not)

I know this looks embarrassingly easy, and I guess the problem is that I just don't have a clear understanding of all this bytes-str-unicode (and encoding-decoding, speaking frankly) stuff yet.
I've been trying to get my working code to run on Python 3. The part I'm stuck with is when I parse an XML with lxml and decode a base64 string that is in that XML.
The code now works in the following manner:
I retrieve the binary data with an XPath query '.../binary/text()'. This produces a one-element list containing a lxml.etree._ElementUnicodeResult object. Then, with python 2, I was able to do:
decoded = source.decode('base64')
and finally
output = numpy.frombuffer(decoded)
However, on python 3 I get an error message saying
AttributeError: 'lxml.etree._ElementUnicodeResult' object has no attribute 'decode'
This is not so surprising, because lxml.etree._ElementUnicodeResult is a subclass of str.
Another way would be to get a real str with the same data in it with
binary = tree.xpath('//binary')[0]
binary_string = binary.text
That would be essentially the same. So what do I do to decode it from base64? I've looked at the base64 module, but it takes a bytes object as an argument, and I can't think of the way to present str as bytes, because if I try to construct a bytes object, Python will try to encode the string, which I don't need.
Googling further, I came across the binascii module (which is invoked indirectly from base64 anyway, if I'm not mistaken), but calling binascii.b2a_base64() on my string produces
TypeError: 'str' does not support the buffer interface
P.S. I've even found an answered question on how to decode a hex string in Python 3, but this is done with a dedicated method bytes.fromhex() so I don't see how it would be helpful.
Could someone please tell me what I'm missing? I'm afraid most of the post is irrelevant and only aggravates my shame, but at least you guys know what I tried.

OK, I think I'm going to summarize my current understanding of things (feel free to correct me). Hopefully it will help someone else out there as confused as I've been.
The credit totally goes to thebjorn and delnan, of course.
So, starting with the most common things:
there's Unicode, and it's a global standard that assigns codes (or code points) to all the exotic characters you can imagine. Those codes are just integer numbers. As of Unicode 6.1 there are 109,975 graphic characters, says Wikipedia.
Then there are encodings that define how to designate Unicode characters with byte codes. One byte isn't enough to designate an arbitrary Unicode char. Although, if you only take a small subset of them (English alphabet, digits, punctuation, some control characters), you can do with one byte per character (or even 7 bits; see ASCII).
To pass a Unicode string anywhere, one needs to encode it in bytes, then it can be decoded on the other end.
In Python 2, str is actually bytes, and unicode is Unicode, but Python 2 will do implicit encoding/decoding for you when needed. It will try to use ASCII encoding.
In Python 3, str is always a Unicode string, and bytes is a new data type for actual bytes. No implicit conversion is ever done by Python 3, you always need to do it yourself and specify the encoding. That means that your program won't work until you understand what's going on, which totally happened to me.
Now, that being more or less clear, let's move on to base64 encoding, which is also an encoding of sorts, but has a slightly different meaning.
Suppose you have some binary data (i.e. bytes) that may mean anything (in my case it's a bunch of floats). Now you want to represent this binary array with a string. That's what base64 encoding means: you have your bytes represented as an ASCII string.
Base64 means 6 bit, so in a base64-encoded string a single character stands for 6 bits of your data. That is why base64-encoded strings need to have the length that is a multiple of 4: otherwise the number of bytes encoded will be not integer.
Finally, to decode from base64 you need an ASCII string. A Unicode string won't do, there can only be characters from the base64 alphabet. Base64 module does the job in Python. The base64.b64decode() function takes a byte string as the argument. In Python 2 it means: str. In Python 3 it means: bytes. So if you have a str, such as
>>> s = 'U3RhY2sgT3ZlcmZsb3c='
In Python 2 you could just do
>>> s.decode('base64')
because s is already in ASCII.
In Python 3, you need to encode it in ASCII first, so you'll have to do:
>>> base64.b64decode(s.encode('ascii'))
And by the way, this will return a bytes object, so it's really up to you how to treat those bytes then. Maybe it's my floats, but maybe you should try to decode it as ASCII :)
In Python 2 however it will be just a str. Anyway, have a look at struct for the tools to unpack your data from those bytes.
So if you need the code to work on both Python 2 and 3, go with the last one. To make sure you have Unicode in the end (if you are decoding text from base64), you'll have to decode it:
>>> base64.b64decode(s.encode('ascii')).decode('ascii')
On Python 2, encode('ascii') won't effectively do anything because it's applied to str. So it will do an implicit conversion to Unicode first, and then do what you want (convert it back to ASCII). decode('ascii') will return a unicode object on Python 2.

I don't have Python 3 installed, but it sounds like you need to convert the Unicode returned from lxml to bytes, perhaps by calling .encode('ascii') ?

Converting 2.5 byte comparisons to 3

I'm trying to convert a 2.5 program to 3.
Is there a way in python 3 to change a byte string, such as b'\x01\x02' to a python 2.5 style string, such as '\x01\x02', so that string and byte-by-byte comparisons work similarly to 2.5? I'm reading the string from a binary file.
I have a 2.5 program that reads bytes from a file, then compares or processes each byte or combination of bytes with specified constants. To run the program under 3, I'd like to avoid changing all my constants to bytes and byte strings ('\x01' to b'\x01'), then dealing with issues in 3 such as:
a = b'\x01'
b = b'\x02'
results in
(a+b)[0] != a
even though similar operation work in 2.5. I have to do (a+b)[0] == ord(a), while a+b == b'\x01\x02' works fine. (By the way, what do I do to (a+b)[0] so it equals a?)
Unpacking structures is also an issue.
Am I missing something simple?

Bytes is an immutable sequence of integers (in the range 0<= to <256), therefore when you're accessing (a+b)[0] you're getting back an integer, exactly the same one you'd get by accessing a[0]. so when you're comparing sequence a to an integer (a+b)[0], they're naturally different.
using the slice notation you could however get a sequence back:
>>> (a+b)[:1] == a # 1 == len(a) ;)
True
because slicing returns bytes object.
I would also advised to run 2to3 utility (it needs to be run with py2k) to convert some code automatically. It won't solve all your problems, but it'll help a lot.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.