How to create zero string of defined length in Python 2.x

How to create zero string of defined length in Python 2.x - python

I want to create a string of 1024 zero bytes in Python 2.7
I know in Python 3.x I can just do data = bytes(1024) but in Python 2.7 that is just an alias for str which therefore creates a string of '1024'
This is part of a system that generates a data file on the fly via django - and the header needs to include an area of zero padding. Our server is stuck on Python 2.7.

You can probably use a bytearray instead:
data = bytearray(1024)
but if you need a bytes object (you probably don't), you can convert to bytes:
data = bytes(bytearray(1024))

Related

Python: How to send a hexdecimal string through socket in Python3 without encoding it?

I performed socket communication in python2, it worked well and I have to make it works in python3 again. I have tired str.encode() stuff with many formats, but the other side of the network can't recognize what I send. The only thing I know is that the python3 str type is encoded as Unicode uft-8 in default, and I'm pretty sure the critical question in here is that what is the format of python2 str type. I have to send exactly the same thing as what was stored in python2 str. But the tricky thing is the socket of python3 only sends the encoded unicode bytes or other buffer interface, rather than the str type with the raw data in Python2. The example is as follow:
In python2:
data = 'AA060100B155'
datasplit = [fulldata[i: i+2] for i in range(0, len(fulldata), 2)]
senddata = ''
for item in datasplit:
itemdec = chr(int(item, 16))
senddata += itemdec
print(senddata)
#'\xaa\x06\x01\x00\xb1U',which is the data I need
In python3, seems it can only sends the encoded bytes using "senddata.encode()", but it is not the format I want. You can try:
print(senddata.encode('latin-1'))
#b'\xaa\x06\x01\x01\xb2U'
to see the difference of two senddatas, and an interesting thing is that it is faulty encoded when using utf-8.
The data stored in Python3 str type is the thing I need, but my question is how to send the data of that string without encoding it? Or how to perform the same str type of Python2 in Python3?
Can anyone help me with this?

I performed socket communication in python2, it worked well and I have to make it works in python3 again. I have tired str.encode() stuff with many formats, but the other side of the network can't recognize what I send.
You have to make sure that whatever you send is decodable by the other side. The first step you need to take is to know what sort of encoding that network/file/socket is using. If you use UTF-8 for instance to send your encoded data and the client has ASCII encoding, this will work. But, say cp500 is the encoding scheme of your client and you send the encoded string as UTF-8, this won't work. It's better to pass the name of your desired encoding explicitly to functions, because sometimes the default encoding of your platform may not necessarily be UTF-8. You can always check the default encoding by this call sys.getdefaultencoding().
The only thing I know is that the python3 str type is encoded as Unicode uft-8 in default, and I'm pretty sure the critical question in here is that what is the format of python2 str type. I have to send exactly the same thing as what was stored in python2 str. But the tricky thing is the socket of python3 only sends the encoded unicode bytes or other buffer interface, rather than the str type with the raw data in Python2
Yes, Python 3.X uses UTF-8 as the default encoding, but this is not guaranteed in some cases the default encoding could be changed, it's better to pass the name of the desired encoding explicitly to avoid such cases. Notice though, str in Python 3.X is the equivalent of unicode + str in 2.X, but str in 2.X supports only 8-bit (1-byte) (0-255) characters.
On one hand, your problem seems with 3.X and its type distinction between str and bytes strings. For APIs that expect bytes won't accept str in 3.X as of today. This is unlike 2.X, where you can mix unicode and str freely. This distinction in 3.X makes sense, given str represents decoded strings and used for textual data. Whereas, bytes represents encoded strings as raw bytes with absolute byte values.
On the other hand, you have problem with choosing the right encoding for your text in 3.X that you need to pass to client. First check what sort of encoding does your client use. Second, pass the encoded string with the the proper encoding scheme of your client so your client can decode it properly: str.encode('same-encoding-as-client').
Because you pass your data as str in 2.X and it works, I suspect and it's most likely your client uses 8-bit encoding for characters, something like Latin-1 might be the encoding used by your client.

You can convert the whole string to an integer, then use the integer method to_bytes to convert it into a bytes object:
fulldata = 'AA060100B155'
senddata = int(fulldata, 16).to_bytes(len(fulldata)//2, byteorder='big')
print(senddata)
# b'\xaa\x06\x01\x00\xb1U'
The first parameter of to_bytes is the number of bytes, the second (required) is the byteorder.
See int.to_bytes in the official documentation for reference.

There are various ways to do this. Here's one that works in both Python 2 and Python 3.
from binascii import unhexlify
fulldata = 'AA060100B155'
senddata = unhexlify(fulldata)
print(repr(senddata))
Python 2 output
'\xaa\x06\x01\x00\xb1U'
Python 3 output
b'\xaa\x06\x01\x00\xb1U'

The following is Python 2/3 compatible. The unhexlify function converts hexadecimal notation to bytes. Use a byte string and you don't have to deal with Unicode strings. Python 2 is byte strings by default, but recognizes the b'' syntax that Python 3 requires to use a byte string.
from binascii import unhexlify
fulldata = b'AA060100B155'
print(repr(unhexlify(fulldata)))
Python 2 output:
'\xaa\x06\x01\x00\xb1U'
Python 3 output:
b'\xaa\x06\x01\x00\xb1U'

Supporting python 2 and 3: str, bytes or alternative

I have a Python2 codebase that makes extensive use of str to store raw binary data. I want to support both Python2 and Python3.
The bytes (an alis of str) type in Python2 and bytes in Python3 are completely different. They take different arguments to construct, index to different types and have different str and repr.
What's the best way of unifying the code for both Python versions, using a single type to store raw data?

The python-future package has a backport of the Python3 bytes type.
>>> from builtins import bytes # in py2, this picks up the backport
>>> b = bytes(b'ABCD')
This provides the Python 3 interface in both Python 2 and Python 3. In Python 3, it is the builtin bytes type. In Python 2, it is a compatibility layer on top of the str type.

I don't know on what parts you want to work with bytes, I allmost allways work with bytearray's, and this is how I do it when reading from a file
with open(file, 'rb') as imageFile:
f = imageFile.read()
b = bytearray(f)
I took that right out of a project I am working on, and it works in both 2 and 3. Maybe something for you to look at?

If your project small and simple use six.
Otherwise I suggest to have two independent codebases: one for Python 2 and one for Python 3. Initially it may sound like a lot of unnecessary work, but eventually it's actually a lot easier to maintain.
As an example of what your project may become if you decide to support both pythons in a single codebase, take a look at google's protobuf. Lots of often counterintuitive branching all round the code, abstractions that were modified just to allow hacks. And as your project will evolve it won't get better: deadlines play against quality of the code.
With two separate codebases you will simply apply almost identical patches which isn't a lot of work compared to what is ahead of you if you want a single code base. And it will be easier to migrate to Python 3 completely once number of Python 2 users of your package drop.

Assuming you only need to support Python 2.6 and newer, you can simply use bytes for, well, bytes. Use b literals to create bytes objects, such as b'\x0a\x0b\x00'. When working with files, make sure the mode includes a b (as in open('file.bin', 'rb')).
Beware that iteration and element access is different though. In these cases, you can write your code to use chunks. Instead of b[0] == 0 (Python 3) or b[0] == b'\x00' (Python 2) write b[0:1] == b'\x00'. Other options is using bytearray (when the bytes are mutable) or helper functions.
Strings of characters should be unicode in Python 2, independent from Python 3 porting; otherwise the code would likely be wrong when encountering non-ASCII characters anyways. The equivalent is str in Python 3.
Either use u literals to create character strings (such as u'Düsseldorf') and/or make sure to start every file with from __future__ import unicode_literals. Declare file encodings when necessary by starting files with # encoding: utf-8.
Use io.open to read character strings from files. For network code, fetch bytes and call decode on them to get a character string.
If you need to support Python 2.5 or 3.2, have a look at six to convert literals.
Add plenty of assertions to make sure you that functions which operate on character strings don't get bytes, and vice versa. As usual, a good test suite with 100% coverage helps a lot.

Write a single byte to a file in python 3.x

In a previous Python 2 program, I used the following line for writing a single byte to a binary file:
self.output.write(chr(self.StartElementNew))
But since Python 3, you can't write strings and chars to a stream without encoding them to bytes first (which makes sense for proper multibyte char support)
Is there something such as byte(self.StartElementNew) now? And if possible, with Python 2 compatibility?

For values in the range 0-127, the following line will always produce the right type in Python 2 (str) and 3 (bytes):
chr(self.StartElementNew).encode('ascii')
This doesn't work for values in the range 128-255 because in Python 2, the str.encode() call includes an implicit str.decode() using ASCII as the codec, which will fail.
For bytes in the range 0-255, I'd define a separate function:
if sys.version_info.major >= 3:
as_byte = lambda value: bytes([value])
else:
as_byte = chr
then use that when writing single bytes:
self.output.write(as_byte(self.StartElementNew))
Alternatively, use the six library, it has a six.int2byte() function; the library does the Python version test for you to provide you with a suitable version of the function:
self.output.write(six.int2byte(self.StartElementNew))

Another alternative, which works with Python 2 and 3, is to use struct:
import struct
self.output.write(struct.pack('B', self.StartElementNew))

binascii.unhexlify working differently in Python 3.2 and Python3.4?

I used to work on Linux Mint, and the latest version of Python 3 embedded in it is Python 3.4. My program takes a hex string as input, decodes it and creates a bytearray so I can decode several information using struct.unpack. For example:
hex_str = "00000E0C180E180FEABF070030313564336332363338303431653039004A62004A62006A62406A622E636F6D00"
s = binascii.unhexlify(hex_str)
print(s) # Would print b'\x00\x00\x0e\x0c\x18\x0e\x18\x0f\xea\xbf\x07\x00015d3c2638041e09\x00Jb\x00Jb\x00jb#jb.com\x00'
data = bytearray(s)
date_data = data[:9]
form_date = get_date(date_data) # Get the date using a bunch of struct.unpack
print(form_date) # Would print '2014-12-24 14:24:15'
Last week my computer crashed, so I had to build a new machine. I decided to give a try to Debian Wheezy. However, I discovered that the only version of Python is Python 2.7. I installed Python 3 using apt-get, but I noticed that the version installed is only Python 3.2. When I run the exact same code as above, I get a TypeError on the binascii.unhexlify line:
hex_str = "00000E0C180E180FEABF070030313564336332363338303431653039004A62004A62006A62406A622E636F6D00"
s = binascii.unhexlify(hex_str)
# TypeError: 'NavigableString' does not support the buffer interface
I don't understand this error, what does it mean?
I checked on Google but couldn't find anything: have there been any changes on binascii.unhexlify between the two versions? Do I have to change something in 3.2?
I really don't see how to solve this... Maybe there is a better way to achieve that?
Thanks.
PS: I could go back to Linux Mint, or install Python 3.4 on Debian, but I think my production server is a fresh install of Debian, so with Python 3.2... so I'd better target that version (and I am glad I discovered it now!).

Yes, there was a change in behavior between versions. From the binascii module documentation:
Note: a2b_* functions accept Unicode strings containing only ASCII characters. Other functions only accept bytes-like objects (such as bytes, bytearray and other objects that support the buffer protocol).
Changed in version 3.3: ASCII-only unicode strings are now accepted by the a2b_* functions.
So if you want to target Python <3.3, you need to pass in either bytes or bytearray objects instead of strings.

How can I read a byte array from a socket in Python

I am using bluetooth to send a 16-byte byte-array to a Python server. Basically what I would like to achieve is read the byte-array as it is. How can I do that in Python.
What I am doing right now is reading a string since that is the only way I know how I can read data from a socket. This is my code from the socket in python
data = client_sock.recv(1024)
Where data is the string. Any ideas?

You're already doing exactly what you asked.
data is the bytes received from the socket, as-is.
In Python 3.x, it's a bytes object, which is just an immutable version of bytearray. In Python 2.x, it's a str object, since str and bytes are the same type. But either way, that type is just a string of bytes.
If you want to access those bytes as numbers rather than characters: In Python 3.x, just indexing or iterating the bytes will do that, but in Python 2.x, you have to call ord on each character. That's easy.
Or, in both versions, you can just call data = bytearray(data), which makes a mutable bytearray copy of the data, which gives you numbers rather than characters when you index or iterate it.
So, for example, let's say we want to write the decimal values of each bytes on a separate line to a text file (a silly thing to do, but it demonstrates the ideas) in Python 2.7:
data = client_sock.recv(1024)
with open('textfile.txt', 'a') as f:
for ch in data:
f.write('{}\n'.format(ord(ch)))

what you want is the struct module. specifically struct.unpack()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.