Python write to file - python

I've got a little problem here.
I'm converting binary to ascii, in order to compress data.
All seems to work fine, but when I convert '11011011' to ascii and try to write it into file, I keep getting error
UnicodeEncodeError: 'charmap' codec can't encode character '\xdb' in position 0: character maps to
Here's my code:
byte = ""
handleR = open(self.getInput())
handleW = open(self.getOutput(), 'w')
file = handleR.readlines()
for line in file:
for a in range(0, len(line)):
chunk = result[ord(line[a])]
for b in chunk:
if (len(byte) < 8):
byte+=str(chunk[b])
else:
char = chr(eval('0b'+byte))
print(byte, char)
handleW.write(char)
byte = ""
handleR.close()
handleW.close()
Any help appreciated,
Thank You

I think you want:
handleR = open(self.getInput(), 'rb')
handleW = open(self.getOutput(), 'wb')
That will ensure you're reading and writing byte streams. Also, you can parse binary strings without eval:
char = chr(int(byte, 2))
And of course, it would be faster to use bit manipulation. Instead of appending to a string, you can use << (left shift) and | (bitwise or).
EDIT: For the actual writing, you can use:
handleW.write(bytes([char]))
This creates and writes a bytes from a list consisting of a single number.
EDIT 2: Correction, it should be:
handleW.write(bytes([int(byte, 2)]))
There is no need to use chr.

Related

Python: write and read blocks of binary data to a file

I am working on a script where it will breakdown another python script into blocks and using pycrypto to encrypt the blocks (all of this i have successfully done so far), now i am storing the encrypted blocks to a file so that the decrypter can read it and execute each block. The final result of the encryption is a list of binary outputs (something like blocks=[b'\xa1\r\xa594\x92z\xf8\x16\xaa',b'xfbI\xfdqx|\xcd\xdb\x1b\xb3',etc...]).
When writing the output to a file, they all end up into one giant line, so that when reading the file, all the bytes come back in one giant line, instead of each item from the original list. I also tried converting the bytes into a string, and adding a '\n' at the end of each one, but the problem there is that I still need the bytes, and I can't figure out how to undo the string to get the original byte.
To summarize this, i am looking to either: write each binary item to a separate line in a file so i can easily read the data and use it in the decryption, or i could translate the data to a string and in the decrpytion undo the string to get back the original binary data.
Here is the code for writing to the file:
new_file = open('C:/Python34/testfile.txt','wb')
for byte_item in byte_list:
# This or for the string i just replaced wb with w and
# byte_item with ascii(byte_item) + '\n'
new_file.write(byte_item)
new_file.close()
and for reading the file:
# Or 'r' instead of 'rb' if using string method
byte_list = open('C:/Python34/testfile.txt','rb').readlines()
A file is a stream of bytes without any implied structure. If you want to load a list of binary blobs then you should store some additional metadata to restore the structure e.g., you could use the netstring format:
#!/usr/bin/env python
blocks = [b'\xa1\r\xa594\x92z\xf8\x16\xaa', b'xfbI\xfdqx|\xcd\xdb\x1b\xb3']
# save blocks
with open('blocks.netstring', 'wb') as output_file:
for blob in blocks:
# [len]":"[string]","
output_file.write(str(len(blob)).encode())
output_file.write(b":")
output_file.write(blob)
output_file.write(b",")
Read them back:
#!/usr/bin/env python3
import re
from mmap import ACCESS_READ, mmap
blocks = []
match_size = re.compile(br'(\d+):').match
with open('blocks.netstring', 'rb') as file, \
mmap(file.fileno(), 0, access=ACCESS_READ) as mm:
position = 0
for m in iter(lambda: match_size(mm, position), None):
i, size = m.end(), int(m.group(1))
blocks.append(mm[i:i + size])
position = i + size + 1 # shift to the next netstring
print(blocks)
As an alternative, you could consider BSON format for your data or ascii armor format.
I think what you're looking for is byte_list=open('C:/Python34/testfile.txt','rb').read()
If you know how many bytes each item is, you can use read(number_of_bytes) to process one item at a time.
read() will read the entire file, but then it is up to you to decode that entire list of bytes into their respective items.
In general, since you're using Python 3, you will be working with bytes objects (which are immutable) and/or bytearray objects (which are mutable).
Example:
b1 = bytearray('hello', 'utf-8')
print b1
b1 += bytearray(' goodbye', 'utf-8')
print b1
open('temp.bin', 'wb').write(b1)
#------
b2 = open('temp.bin', 'rb').read()
print b2
Output:
bytearray(b'hello')
bytearray(b'hello goodbye')
b'hello goodbye'

Extract first 20 bytes in a binary header

I'm trying to learn how to do this in Python, play'd arround with the psuedo code bellow but couln't come up with anything worth a penny
with open(file, "rb") as f:
byte = f.read(20) # read the first 20 bytes?
while byte != "":
print f.read(1)
In the end I'd like to end up with a code capable of the following: https://stackoverflow.com/a/2538034/2080223
But I'm ofcourse interested in learning how to get there so any pointers would be much apphriciated!
Very close
with open(file, "rb") as f:
byte = f.read(20) # read the first 20 bytes? *Yes*
will indeed read the first 20 bytes.
But
while byte != "":
print f.read(1) # print a single byte?
will (as you expect) read a single byte and print it, but it will print it forever, since your loop condition will always be true.
Its not clear what you want to do here, but if you just want to print a single byte, removing the while loop will do that:
print f.read(1)
If you want to print single bytes until the end of file, consider:
while True:
byte = f.read(1)
if byte == "": break
print byte
Alternatively, if you're looking for specific bytes within the first 20 you read into byte, you can use iterable indexing:
with open(file, "rb") as f:
byte = f.read(20)
print byte[0] # First byte of the 20 bytes / first byte of the file
print byte[1] # Second byte of the 20 bytes / ...
# ...
Or as Lucas suggests in the comments, you could iterate over the string byte (it's a string by the way, that's returned from read()):
with open(file, "rb") as f:
byte = f.read(20)
for b in byte:
print b
You may also be interested in the position of the byte, and it's hexidecimal value (for values like 0x0a, 0x0d, etc):
with open(file, "rb") as f:
byte = f.read(20)
for i,b in enumerate(byte):
print "%02d: %02x" % (i,b)

How to Reverse Hebrew String in Python?

I am trying to reverse Hebrew string in Python:
line = 'אבגד'
reversed = line[::-1]
print reversed
but I get:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 0: ordinal not in range(128)
Care to explain what I'm doing wrong?
EDIT:
I'm also trying to save the string into a file using:
w1 = open('~/fileName', 'w')
w1.write(reverseLine)
but now I get:
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-3: character maps to <undefined>
Any ideas how to fix that, too?
you need more than reverse a string to flip hebrew backwords, due to the opposite order of numbers etc.
The algorithms is much more complicated;
All the answers in this page (to this date) will most likely screw your numbers and non-hebrew texts.
For most cases you should use
from bidi.algorithm import get_display
print get_display(text)
Adding u in front of the hebrew string works for me:
In [1]: line = u'אבגד'
In [2]: reversed = line[::-1]
In [2]: print reversed
דגבא
To your second question, you can use:
import codecs
w1 = codecs.open("~/fileName", "r", "utf-8")
w1.write(reversed)
To write unicode string to file fileName.
Alternatively, without using codecs, you will need to encode reversed string with utf-8 when writing to file:
with open('~/fileName', 'w') as f:
f.write(reversed.encode('utf-8'))
You need to use a unicode string constant:
line = u'אבגד'
reversed = line[::-1]
print reversed
String defaults to being treated as ascii. Use u'' for unicode
line = u'אבגד'
reversed = line[::-1]
print reversed
Make sure you're using unicode objects
line = unicode('אבגד', 'utf-8')
reversed = line[::-1]
print reversed
Found how to write to file:
w1 = codecs.open('~/fileName', 'w', encoding='utf-8')
w1.write(reverseLine)

Error accessing binary data from a python list

I'm pretty new to python, using python 2.7. I have to read in a binary file, and then concatenate some of the bytes together. So I tried
f = open("filename", "rb")
j=0
infile = []
try:
byte = f.read(1)
while byte != "":
infile.append(byte)
byte = f.read(1)
finally:
f.close()
blerg = (bin(infile[8])<<8 | bin(infile[9]))
print type
where I realize that the recast as binary is probably unnecessary, but this is one of my later attempts.
The error I'm getting is TypeError: 'str' object cannot be interpreted as index.
This is news to me, since I'm not using a string anywhere. What the !##% am I doing wrong?
EDIT: Full traceback
file binaryExtractor.py, line 25, in
blerg = (bin(infile[8])<<8 | bin(infile[9]))
TypeError: 'str' object cannot be interpreted as index
You should be using struct whenever possible instead of writing your own code for this.
>>> struct.unpack('<H', '\x12\x34')
(13330,)
You want to use the ord function which returns an integer from a single character string, not bin which returns a string representation of a binary number.

Why is struct.unpack() throwing an exception when my strings are the correct length?

I have a file where some lines are metadata I can ignore and some lines are the printed results of struct.pack calls. Say that f.txt is:
key: 3175
\x00\x00\x00\x00\x00\x00\x00\x00
key: 3266
\x00\x00\x00\x00\x00\x00\x00\x00
In this case, the lines starting with "key" are the metadata and the byte strings are the values I want to extract. Also in this case, the two byte string lines were produced with struct.pack('d', 0). The following code is what I would like to do:
import struct
for line in open('f.txt', 'r'):
# if not metadata, remove newline character and unpack
if line[0:3] != 'key':
val = struct.unpack('d', line[0:-1])
appendToList(val) # do something else with val
With this, I get: "struct.error: unpack requires a string argument of length 8".
If we modify the code slightly:
import struct
for line in open('f.txt', 'r'):
# if not metadata, remove newline character and unpack
if line[0:3] != 'key': print line[:-1]
then the output is as expected:
\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00
If I put the byte string directly into the unpack call then, I have success:
import struct
# successful unpacking
struct.unpack('d', '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
I have tried passing the following variations of line to unpack, all of which give the same result:
str(line)
repr(line)
b"%s" % line
The actual bytes in your text file are the string-escaped bytes you see at a python console, not the binary bytes they represent.
For example, your text file contains \x00 (four bytes long), not the null byte (one byte long).
You need to unescape this text (convert it to binary form) before struct can work on it.
(NOTE that your file format is not very good because you could conceivably have a line that is a number but starts with 'key:'! E.g. 'key: \x00\x00\x00' is a valid number 6.8388560679e-313! If you alternate between metadata and value every other line, you should just keep track of what line number you are on and parse accordingly.)
There is a much simpler solution than the others here.
Python has a built-in codec called string_escape that will convert python-escape codes into the binary bytes they represent:
for line in thefile:
if line[0:3] != 'key':
binaryline = line[:-1].decode('string_escape')
val = struct.unpack('d', binaryline)
If you have a big list of these double values and want to store them efficiently in an array structure, consider using the array module instead of struct:
vals = array.array('d')
for line in thefile:
if line[0:3] != 'key':
binaryline = line[:-1].decode('string_escape')
# appends binaryline to vals array, interpreting as a double
vals.fromstring(binaryline)
for your string in txt file:
\x00\x00\x00\x00\x00\x00\x00\x00
which in python it acturally is:
\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00
so you should parse this string and convert it. for your sample, use following code can get what you want:
s = line.strip().split('\\x')
r = ''
for v in s:
if len(v) > 0:
print v
r += struct.pack('b', int(v, 16))
val = struct.unpack('d', r)[0]
print val

Categories