Writing hex value into file Python - python

What I am really doing is creating a BMP file from JPEG using python and it's got some header data which contains info like size, height or width of the image, so basically I want to read a JPEG file, gets it width and height, calculate the new size of a BMP file and store it in the header.
Let's say the new size of the BMP file is 40000 bytes whose hex value is 0x9c40, now as there is 4 byte space to save this in the header, we can write it as 0x00009c40. In BMP header data, LSB is written first and then MSB so I have to write, 0x409c0000 in the file.
My Problems:-
I was able to do this in C but I am totally lost how to do so in Python.
For example, if I have i=40000, and by using str=hex(i)[2:] I got the hex value, now by some coding I was able to add the extra zeros and then reverse the code. Now how to write this '409c0000' data in the file as hex?
The header size is 54 bytes for BMP file, so is there is another way to just store the data in a string like str='00ffcf4f...'(upto 54 bytes) and just convert the whole str at once as hex and write it to file?
My friend told me to use unhexlify from binascii,
by doing unhexlify('fffcff') I get '\xff\xfc\xff' which is what I want but when I try unhexlify('3000') I get '0\x00'` which is not what I want. It is same for any value containing 3, 4, 5, 6 or 7. Is it the right way to do this?

You are not writing hex, you are writing binary data. Hexadecimal is a helpful notation when dealing with binary data, but don't confuse the notation with the value.
Use the struct module to pack integer data into binary structures, the same way C would.
binascii.unhexlify also is a good choice, provided you already have the data in a string using hex notation. The output is correct, but the binary representation only uses hex escapes for bytes outside the printable ASCII range.
Thus fffcff does correctly becomes \xff\xfc\xff, representing 3 bytes in hex escape notation, and 3000 is \x30\x00, but \x30 is the '0' character in ASCII, so the Python representation for that byte simply uses that ASCII character, as that is the most common way to interpret bytes.
Packing the integer value 40000 using struct.pack() as an unsigned integer (little endian) then becomes:
>>> import struct
>>> struct.pack('<I', 40000)
'#\x9c\x00\x00'
where the 40 byte is represented by the ASCII character for that byte, the # glyph.
If this is confusing, you can always create a new hex representation by going the other way and use 0binascii.hexlify() function](https://docs.python.org/2/library/binascii.html#binascii.hexlify) to create a hexadecimal representation for yourself, just to debug the output:
>>> import binascii
>>> binascii.hexlify(struct.pack('<I', 40000))
'409c0000'
and you'll see that the # byte is still the right hex value.

Related

How to write integers with different sizes to a file

in my code, I write integers and string to the same line of an output text file as follows:
f.write("%d\t%d\t%d\t%s"%(int_2b_var, int_4b_var, int_8b_var, string_val))
where int_2b_val needs to be an integer of 2 bytes, int_4b_var an int of 4 bytes and int_8b_var an int of 8 bytes.
How can I manage to write the int variables with the desired length into my txt file?
you need to write bytes (and open the file in binary mode) as here you're going to write a textual representation of the numbers and e.g. the textual representation of 1 byte is up to 4 characters (-128)
use the struct module, which is used to pack data into binary form, see the format characters for how to specify the details of your packing e.g. byte order
you'll also need to encode string_val and decide how you want to reify it exactly e.g. s needs a hard-coded number of items (so you might need to encode the string length as well), p is limited to 255 bytes, ...

How can I extract mixed binary and ascii values from a bytes string like I did in 2.x?

The following represents a binary image extracted from a file (spaces inserted between bytes to make reading easier). File is opened with 'rb' mode.
01 77 33 9F 41 42 43 44 00 11 11 11
In Python 2.7, I read it as a character string and I use ord() to extract the binary values and then I can extract or even search the string for a specific text value (such as the "ABCD" in characters 4-7). The binary bytes can be anything from 0-FF. I've been putting off conversion to python 3 partly because of this.
I need to be able, in Python 3, to treat a string of bytes as a mixture of binary and ascii (not unicode) values. The format is not fixed, it consists of data structures. For example, the 33 in byte 2 might be a record length that tells me where the start of the next record is. In other words, I can't just say that I know the text string is always in location 4.
I don't write the file, I just use it, so changing it is not an option.
I've seen lots of examples of using b' and other things to convert fixed strings but I need a way to intermix these values, extracting bytes, 2-byte to 8-byte values as 16-bit to 64-bit words, and extracting/searching for ASCII strings within the larger string.
The byte/character separation in Python 3 seems somewhat inflexible for what I need. I'm sure there's a way to do this I just haven't found an example or an answered question that seems to cover this case.
This is a simplified example, I can't provide real data (it's proprietary) but this illustrates the problem. The real files may be short (<1K) or huge (>100K), containing multiple records of different sizes.
Is there an easy, straightforward way to essentially replicate the functionality I have in Python 2.7?
This is on Windows.
Thanks
I need to be able, in Python 3, to treat a string of bytes as a mixture of binary and ascii (not unicode) values. The format is not fixed, it consists of data structures. For example, the 33 in byte 2 might be a record length that tells me where the start of the next record is. In other words, I can't just say that I know the text string is always in location 4.
Read the file in binary mode, as you are doing. This produces a bytes object, which in 3.x is not the same as a str (as it would be in 2.x).
Interpret the bytes as bytes, as needed, to figure out the general structure of the data. Slicing the bytes produces another bytes as before; indexing produces an int with the numeric value of that single byte (not as before) - no ord required.
When you have determined a subset of the bytes that represent a string (let's say for convenience that you have sliced it out), convert to string using the appropriate encoding: e.g. str(my_bytes, 'ascii'). Note that ASCII will not handle byte values 0x80 through 0xFF; especially with binary-ish legacy file formats, there's a good chance your data is actually something like Latin-1: str(my_bytes, 'iso-8859-1').
search the string for a specific text value
You can search at either the text or the byte level - bytes objects support the in operator, searching for either a subsequence of bytes or a single integer value. Whether it makes more sense to search before or after string conversion will depend on what you are doing.
using b' and other things to convert fixed strings
b'' is just the syntax for a literal bytes object. It's what you'll see if you ask for the repr of what you read from the file. Prefixing a b onto an existing string literal in your code isn't really "converting" anything, but replacing it with the value you should have had in the first place.
2-byte to 8-byte values as 16-bit to 64-bit words
The documentation says it at least as well as I could:
>>> help(int.from_bytes)
Help on built-in function from_bytes:
from_bytes(...) method of builtins.type instance
int.from_bytes(bytes, byteorder, *, signed=False) -> int
Return the integer represented by the given array of bytes.
The bytes argument must be a bytes-like object (e.g. bytes or bytearray).
The byteorder argument determines the byte order used to represent the
integer. If byteorder is 'big', the most significant byte is at the
beginning of the byte array. If byteorder is 'little', the most
significant byte is at the end of the byte array. To request the native
byte order of the host system, use `sys.byteorder' as the byte order value.
The signed keyword-only argument indicates whether two's complement is
used to represent the integer.

ByteArray show not hexadecimal digits Python

I'm working with an array of bytes extracted from an UDP packet in Python.
The data it's represented like this:
data = [0x00,0x01,0x23,0x84,0xa6]
And when I use bytearray(data) and prints its content the prompt shows me a not hexadecimal digit like x01# or with other data contents the # digit its replace by a \n digit. I don't really know why this happens.
The complete code example
data = [0x00,0x01,0x23,0x84,0xa6]
data1 = bytearray(data)
print(data)
print(data1)
And the print shows
[0, 1, 35, 132, 166]
bytearray(b'\x00\x01#\x84\xa6')
Using bytes(data) the problem is the same.
Your bytearray is represented as a string. When a string is represented for human eyes the characters are displayed according to the current encoding (ASCII, utf-8, etc.). In your current encoding, the character with the value 0x23 is a hash-symbol (#). Only for the bytes which do not have a character representation (0x00, etc.) the hex representation is displayed (e.g. \x00).
So what you see is absolutely correct because you asked (maybe without knowing) for a string representation of your byte array.
If you want to see a hex value for each byte, use data1.hex(). This will create a hex representation for each byte and concatenate all of these. The result will be a string containing only hex digits (0-9 and a-f). This is only useful for printing, in most cases it is not useful for further processing.
In Python3, consider using bytes([0x00, 0x01, ...]) instead. That will produce a bytes object which is more native to the language (e.g. many functions like write(), send(), etc. will accept it as input). It also has a hex() method as described above.

Python: Converting special characters into operable integers?

I am currently working on a really simple encryption project algorithm to show basic understanding of how encryption works, and my encryption algorithm basically just uses the 'ord()' function for converting standard ASCII characters into integers that the algorithm can work on.
The problem I have run into is that I also need my program to be capable of encrypting, for example, the contents of a Windows executable (EXE) file. To do so, I need to convert all sorts of special characters (Not ASCII) into integers that I can operate off of.
I don't know a whole lot about encoding, but from what I understand, 'ord()' only works because there is a ASCII character map that has a corresponding number for each character. I couldn't seem to figure how to convert the special characters of an EXE file straight to integers, so I tried converting to bytes which seems a little more universal to me (please correct me if I am wrong).
At this point, I am just looking for a solution to be able to read an EXE file, and convert each character into a number specific to that character (for encryption/ decryption purposes).
You are confusing the meaning assigned to bytes (like the ASCII standard) with the bytes themselves. ord() just gives you the numerical value for a given byte. That Python interprets those bytes and shows you ASCII codepoints is neither here nor there.
In other words, ord() doesn't have to consult an ASCII table and can handle any byte value. All it has to do is take the already known byte value and give you a Python int object for it.
Read your data as binary (open the file with b added to the file mode), and use ord(). In Python 2, that'll result in str objects, and each character in such an object is really a byte value in the range 0 - 255.
Note that if you are using Python 3, reading from a file in binary mode results in a bytes object that makes it clearer still that these are integer values in a range:
>>> b'abc'
b'abc'
>>> b'abc'[0]
97
Indexing to an individual point in a bytes object produces the integer value and no call to ord() is required.

Writing binary data to a file in Python

I am trying to write data (text, floating point data) to a file in binary, which is to be read by another program later. The problem is that this program (in Fort95) is incredibly particular; each byte has to be in exactly the right place in order for the file to be read correctly. I've tried using Bytes objects and .encode() to write, but haven't had much luck (I can tell from the file size that it is writing extra bytes of data). Some code I've tried:
mgcnmbr='42'
bts=bytes(mgcnmbr)
test_file=open(PATH_HERE/test_file.dat','ab')
test_file.write(bts)
test_file.close()
I've also tried:
mgcnmbr='42'
bts=mgcnmbr.encode(utf_32_le)
test_file=open(PATH_HERE/test_file.dat','ab')
test_file.write(bts)
test_file.close()
To clarify, what I need is the integer value 42, written as a 4 byte binary. Next, I would write the numbers 1 and 0 in 4 byte binary. At that point, I should have exactly 12 bytes. Each is a 4 byte signed integer, written in binary. I'm pretty new to Python, and can't seem to get it to work out. Any suggestions? Soemthing like this? I need complete control over how many bytes each integer (and later, 4 byte floating point ) is.
Thanks
You need the struct module.
import struct
fout = open('test.dat', 'wb')
fout.write(struct.pack('>i', 42))
fout.write(struct.pack('>f', 2.71828182846))
fout.close()
The first argument in struct.pack is the format string.
The first character in the format string dictates the byte order or endianness of the data (Is the most significant or least significant byte stored first - big-endian or little-endian). Endianness varies from system to system. If ">" doesn't work try "<".
The second character in the format string is the data type. Unsurprisingly the "i" stands for integer and the "f" stands for float. The number of bytes is determined by the type. Shorts or "h's" for example are two bytes long. There are also codes for unsigned types. "H" corresponds to an unsigned short for instance.
The second argument in struct.pack is of course the value to be packed into the bytes object.
Here's the part where I tell you that I lied about a couple of things. First I said that the number of bytes is determined by the type. This is only partially true. The size of a given type is technically platform dependent as the C/C++ standard (which the struct module is based on) merely specifies minimum sizes. This leads me to the second lie. The first character in the format string also encodes whether the standard (minimum) number of bytes or the native (platform dependent) number of bytes is to be used. (Both ">" and "<" guarantee that the standard, minimum number of bytes is used which is in fact four in the case of an integer "i" or float "f".) It additionally encodes the alignment of the data.
The documentation on the struct module has tables for the format string parameters.
You can also pack multiple primitives into a single bytes object and realize the same result.
import struct
fout = open('test.dat', 'wb')
fout.write(struct.pack('>if', 42, 2.71828182846))
fout.close()
And you can of course parse binary data with struct.unpack.
Assuming that you want it in little-endian, you could do something like this to write 42 in a four byte binary.
test_file=open(PATH_HERE/test_file.dat','ab')
test_file.write(b'\xA2\0\0\0')
test_file.close()
A2 is 42 in hexadecimal, and the bytes '\xA2\0\0\0' makes the first byte equal to 42 followed by three empty bytes. This code writes the byte: 42, 0, 0, 0.
Your code writes the bytes to represent the character '4' in UTF 32 and the bytes to represent 2 in UTF 32. This means it writes the bytes: 52, 0, 0, 0, 50, 0, 0, 0, because each character is four bytes when encoded in UTF 32.
Also having a hex editor for debugging could be useful for you, then you could see the bytes that your program is outputting and not just the size.
In my problem Write binary string in binary file Python 3.4 I do like this:
file.write(bytes(chr(int(mgcnmbr)), 'iso8859-1'))

Categories